scripts.data.preprocessing.generateDataset.dataGen¶
- scripts.data.preprocessing.generateDataset.dataGen(file, out_folder, cutoff, coords_only, dummy_terms)[source]¶
Wrapper function for parallelization which deals with paths and other args.
- Parameters:
file (str) – The .red.pdb file for the protein to featurize.
out_folder (str) – Path to the output folder
cutoff (int) – Max number of TERMs to featurize
coords_only (bool) – Whether to use only backbone-derived features
dummy_terms (str or None) – Method by which to incorperate dummy TERMs. Options include
'replace', which means replacing TERM features with those derived from a dummy TERM, or'include', which includes the dummy TERM into the mined TERM matches.