scripts.data.preprocessing.generateDataset.dataGen¶

scripts.data.preprocessing.generateDataset.dataGen(file, out_folder, cutoff, coords_only, dummy_terms)[source]¶

Wrapper function for parallelization which deals with paths and other args.

Parameters:

file (str) – The .red.pdb file for the protein to featurize.
out_folder (str) – Path to the output folder
cutoff (int) – Max number of TERMs to featurize
coords_only (bool) – Whether to use only backbone-derived features
dummy_terms (str or None) – Method by which to incorperate dummy TERMs. Options include 'replace', which means replacing TERM features with those derived from a dummy TERM, or 'include', which includes the dummy TERM into the mined TERM matches.