scripts.data.preprocessing.generateDataset.dataGen

scripts.data.preprocessing.generateDataset.dataGen(file, out_folder, cutoff, coords_only, dummy_terms)[source]

Wrapper function for parallelization which deals with paths and other args.

Parameters:
  • file (str) – The .red.pdb file for the protein to featurize.

  • out_folder (str) – Path to the output folder

  • cutoff (int) – Max number of TERMs to featurize

  • coords_only (bool) – Whether to use only backbone-derived features

  • dummy_terms (str or None) – Method by which to incorperate dummy TERMs. Options include 'replace', which means replacing TERM features with those derived from a dummy TERM, or 'include', which includes the dummy TERM into the mined TERM matches.