CoreFilter¶
- class chemFilters.core.CoreFilter(rdkit_filter=True, pep_filter=True, silly_filter=True, bloom_filter=True, rdfilter_subset='ALL', rdfilter_output='string', std_mols=True, std_method='chembl', n_jobs=1, parallel_chunk_size=None)[source]¶
Bases:
objectClass implementation to run all filters on a list of smiles, with the option of adding a chunk size to process the input in batches. The filtering the dataset is done by:
>>> from chemFilters.core import CoreFilter >>> core_filter = CoreFilter() # all filters enabled by default >>> filtered_df = core_filter(smiles, chunksize=100)
- Parameters:
rdkit_filter (bool) – toggle applying rdkit filters to smiles. Defaults to True.
pep_filter (bool) – toggle applying peptide filters to smiles. Defaults to True.
silly_filter (bool) – toggle applying silly filters to smiles. Defaults to True.
bloom_filter (bool) – toggle applying bloom filters to smiles. Defaults to True.
rdfilter_subset (str) – subset of the rdkit filters to be applied. For the available filters, see RdkitFilters.available_filters. Defaults to “ALL”.
rdfilter_output (str) – output format of the rdkit filters. Available: ‘bool’ and ‘string’. Defaults to “string”.
std_mols (bool) – whether to standardize the mols. Defaults to False.
std_method (str) – standardization method to be used. Defaults to “chembl”.
n_jobs – number of jobs to run in parallel. Defaults to 1.
parallel_chunk_size (int)
- __init__(rdkit_filter=True, pep_filter=True, silly_filter=True, bloom_filter=True, rdfilter_subset='ALL', rdfilter_output='string', std_mols=True, std_method='chembl', n_jobs=1, parallel_chunk_size=None)[source]¶
Initialize the CoreFilter class. The filters are initialized with the default parameters.
- Parameters:
rdkit_filter (bool) – toggle applying rdkit filters to smiles. Defaults to True.
pep_filter (bool) – toggle applying peptide filters to smiles. Defaults to True.
silly_filter (bool) – toggle applying silly filters to smiles. Defaults to True.
bloom_filter (bool) – toggle applying bloom filters to smiles. Defaults to True.
rdfilter_subset (str) – subset of the rdkit filters to be applied. For the available filters, see RdkitFilters.available_filters. Defaults to “ALL”.
rdfilter_output (str) – output format of the rdkit filters. Available: ‘bool’ and ‘string’. Defaults to “string”.
std_mols (bool) – whether to standardize the mols. Defaults to True.
std_method (str) – standardization method to be used. Defaults to “chembl”.
n_jobs – number of jobs to run in parallel. Defaults to 1.
parallel_chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.
- Return type:
None