scripts.data.preprocessing.generateDataset¶
Generate feature files for TERMinator.
- Usage:
python generateDataset.py \ --in_folder <input_folder> \ --out_folder <output_folder> \ [--cutoff <matches_cutoff>] \ [-n <num_processes>] \ [-u] \ # update existing files [--coords_only] \ [--dummy_terms [None, 'replace', 'include']]
--in_folder <input_folder>should be structured as<input_folder>/<pdb_id>/<pdb_id>.<ext>. For full feature generation,extmust include.datand.red.pdb, while if running using--coords_onlyonly.red.pdbis required. If you usescripts/data/preprocessing/cleanStructs.py, this structure is automatically built.--out_folder <output_folder>will be structured as<input_folder>/<pdb_id>/<pdb_id>.<ext>, where<ext>includes.features, which specifies protein and TERM features, and.length, which contains two integerss. The first integer specifies the number of TERM residues in the protein, while the second integer specifies the sequence length of the protein.--cutoff <matches_cutoff>restricts the number of matches featurized to the top<matches_cutoff>, ranked by increasing RMSD. Defaults to 50.-n <num_processes>specifies how many processes to use while processing. Defaults to 1.[-u]is an optional flag which, if specified, forces rewriting of existing feature files.--coords_onlyis an option flag which, if specified, generated only backbone-derived features. Running this mode does not require prior TERM mining, but does require you clean the backbone usingscripts/data/preprocessing/cleanStructs.py.--dummy_termsallows specifying how dummy TERMs are incorperated into features. Dummy TERMs are constructs where there is one TERM match with a degenerate X sequence and structural features derived from the target structure, By default, it is set toNone, or no dummy TERMs. If set to'replace', only the dummy TERM is included. If set to'include', the first match is set to the dummy TERM match and the remaining TERMs are those parsed from the.datfile.
See python generateDataset.py --help for more info.
Functions
|
Wrapper function for parallelization which deals with paths and other args. |
|
Parallelize |