Filters¶
RdkitFilters¶
- class chemFilters.filters.rdkit_filters.RdkitFilters(filter_type='ALL', n_jobs=1, from_smi=False, chunk_size=None)[source]¶
Bases:
MoleculeHandler- __init__(filter_type='ALL', n_jobs=1, from_smi=False, chunk_size=None)[source]¶
Initiaze RdkitFilters object.
- Parameters:
filter_type – type of filter from RDKit FilterCatalogs. Defaults to “ALL”.
n_jobs – number of jobs if wanted to run things in parallel. Defaults to 1.
True (from_smi = if)
object. (will do the conversion from SMILES to RDKit Mol)
chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.
from_smi (bool)
- Return type:
None
- property available_filters¶
List of available filters from RDKit FilterCatalogs.
- filter_mols(stdin, match_type='string')[source]¶
Filter molecules using RDKit FilterCatalogs.
- Parameters:
stdin (List[Mol | str]) – list of RDKit Mol objects of SMILES strings if self._from_smi is True
match_type (str) – values within the flagging dataframe. If bool, will spare retrieving substructures and descriptions. If string, will have the description of the filter that was matched. Defaults to string.
- Returns:
list of filter names that were matched. descriptions: list of filter descriptions that were matched. substructs: list of substructures that were matched.
- Return type:
filter_names
- get_flagging_df(stdin, match_type='string', save_matches=False)[source]¶
Flag molecules using the defined RDKit FilterCatalogs and return a dataframe with all the detedcted filters as columns and the molecules as rows. Items within the dataframe will be the description of the molecular filter that was caught. Will also save the filter names, descriptions, and substructures as attributes.
- Parameters:
stdin (List[Mol | str]) – list of RDKit Mol objects or SMILES strings if self._from_smi is True
match_type (str) – values within the flagging dataframe. If bool, will spare retrieving substructures and descriptions. If string, will have the description of the filter that was matched. Defaults to string.
save_matches (bool) – if True, will save the filter names, descriptions, and substructures as attributes. Defaults to False.
- Returns:
dataframe with columns as filter types and rows as molecules.
- Return type:
pd.DataFrame
MolbloomFilters¶
Note
Requires the allfilters extra: pip install 'chem-filters[allfilters]'
- class chemFilters.filters.bloom_filters.MolbloomFilters(from_smi=False, standardize=False, std_method='chembl', n_jobs=1, chunk_size=None, **kwargs)[source]¶
Bases:
MoleculeHandlerWrapper class for molbloom. Requires molbloom to be installed.
- Parameters:
from_smi (bool) – treats standard inputs (stdin) as smiles. Defaults to False.
standardize (bool) – whether to standardize stdin or not. Defaults to False.
std_method (str) – SMILES/mol standardization method. Available: canon, chembl, papyrus. Defaults to “chembl”.
n_jobs – number of jobs to run in parallel. Defaults to 1.
chunk_size (int)
- __init__(from_smi=False, standardize=False, std_method='chembl', n_jobs=1, chunk_size=None, **kwargs)[source]¶
Initalize the MolbloomFilters class.
- Parameters:
from_smi (bool) – treats standard inputs (stdin) as smiles. Defaults to False.
standardize (bool) – whether to standardize stdin or not. Defaults to False.
std_method (str) – SMILES/mol standardization method. Available: canon, chembl, papyrus. Defaults to “chembl”.
n_jobs – number of jobs to run in parallel. Defaults to 1.
chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.
- buy_smi(smi, catalog='zinc-instock')[source]¶
Wrapper of molbloom.buy. Returns True if the SMILES is probably in the catalog, False if it is definitely not.
- get_flagging_df(stdin)[source]¶
Returns a dataframe with the flagging results for each catalog. Flags will be the resutls from molbloom.buy, where True means the SMILES is probably in the catalog, False means it is definitely not. For more information, see the original repo: https://github.com/whitead/molbloom
PeptideFilters¶
Note
Requires the allfilters extra: pip install 'chem-filters[allfilters]'
- class chemFilters.filters.pep_filters.PeptideFilters(filter_type='all', from_smi=False, n_jobs=1, chunk_size=None)[source]¶
Bases:
MoleculeHandlerWrapper class for PepSift, a tool for identifying peptides and their derivatives from small molecule datasets. For the original repo, see: https://github.com/OlivierBeq/PepSift/tree/master
- Parameters:
- __init__(filter_type='all', from_smi=False, n_jobs=1, chunk_size=None)[source]¶
Initialize the PeptideFilters class.
- Parameters:
filter_type (str | int) – filter type to initialize a PepSift object. See available filters on self.available filters. Defaults to “all”.
from_smi (bool) – treats standard inputs (stdin) as smiles. Defaults to False.
n_jobs (int) – number of jobs to run in parallel. Defaults to 1.
chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.
- Return type:
None
- property available_filters¶
List of available filters on pepsift.
- filter_mols(stdin)[source]¶
Filter molecules using the designated pepsift filter. If sift_level=None as default, will load it from self.filter.
SillyMolSpotterFilter¶
Note
Requires the allfilters extra: pip install 'chem-filters[allfilters]'
- class chemFilters.filters.silly_filters.SillyMolSpotterFilter(from_smi=False, standardize=False, std_method='chembl', n_jobs=1, chunk_size=None, **kwargs)[source]¶
Bases:
MoleculeHandlerWrapper class to molspotter, a tool based on Pat Water’s silly walks filter. It helps finding unusual molecules in a dataset based the detection of unusual bits on a hashed ECFP fingerprint. For more information, see the original repo: https://github.com/OlivierBeq/molspotter
- Parameters:
from_smi – treats standard inputs (stdin) as smiles. Defaults to False.
standardize – whether to standardize stdin or not. Defaults to False.
std_method – SMILES/mol standardization method. Available: canon, chembl, papyrus. Defaults to “chembl”.
n_jobs – number of jobs to run in parallel. Defaults to 1.
chunk_size (int)
- __init__(from_smi=False, standardize=False, std_method='chembl', n_jobs=1, chunk_size=None, **kwargs)[source]¶
Initialize the SillyMolSpotterFilter class.
- Parameters:
from_smi – treats standard inputs (stdin) as smiles. Defaults to False.
standardize – whether to standardize stdin or not. Defaults to False.
std_method – SMILES/mol standardization method. Available: canon, chembl, papyrus. Defaults to “chembl”.
n_jobs – number of jobs to run in parallel. Defaults to 1.
chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.
- score_smi(smi, spotter_name='chembl')[source]¶
Score a SMILES string with a pretrained spotter, indicating how silly the processed molecule is.