Filters

RdkitFilters

class chemFilters.filters.rdkit_filters.RdkitFilters(filter_type='ALL', n_jobs=1, from_smi=False, chunk_size=None)[source]

Bases: MoleculeHandler

Parameters:
__init__(filter_type='ALL', n_jobs=1, from_smi=False, chunk_size=None)[source]

Initiaze RdkitFilters object.

Parameters:
  • filter_type – type of filter from RDKit FilterCatalogs. Defaults to “ALL”.

  • n_jobs – number of jobs if wanted to run things in parallel. Defaults to 1.

  • True (from_smi = if)

  • object. (will do the conversion from SMILES to RDKit Mol)

  • chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.

  • from_smi (bool)

Return type:

None

property available_filters

List of available filters from RDKit FilterCatalogs.

filter_mols(stdin, match_type='string')[source]

Filter molecules using RDKit FilterCatalogs.

Parameters:
  • stdin (List[Mol | str]) – list of RDKit Mol objects of SMILES strings if self._from_smi is True

  • match_type (str) – values within the flagging dataframe. If bool, will spare retrieving substructures and descriptions. If string, will have the description of the filter that was matched. Defaults to string.

Returns:

list of filter names that were matched. descriptions: list of filter descriptions that were matched. substructs: list of substructures that were matched.

Return type:

filter_names

get_flagging_df(stdin, match_type='string', save_matches=False)[source]

Flag molecules using the defined RDKit FilterCatalogs and return a dataframe with all the detedcted filters as columns and the molecules as rows. Items within the dataframe will be the description of the molecular filter that was caught. Will also save the filter names, descriptions, and substructures as attributes.

Parameters:
  • stdin (List[Mol | str]) – list of RDKit Mol objects or SMILES strings if self._from_smi is True

  • match_type (str) – values within the flagging dataframe. If bool, will spare retrieving substructures and descriptions. If string, will have the description of the filter that was matched. Defaults to string.

  • save_matches (bool) – if True, will save the filter names, descriptions, and substructures as attributes. Defaults to False.

Returns:

dataframe with columns as filter types and rows as molecules.

Return type:

pd.DataFrame

MolbloomFilters

Note

Requires the allfilters extra: pip install 'chem-filters[allfilters]'

class chemFilters.filters.bloom_filters.MolbloomFilters(from_smi=False, standardize=False, std_method='chembl', n_jobs=1, chunk_size=None, **kwargs)[source]

Bases: MoleculeHandler

Wrapper class for molbloom. Requires molbloom to be installed.

Parameters:
  • from_smi (bool) – treats standard inputs (stdin) as smiles. Defaults to False.

  • standardize (bool) – whether to standardize stdin or not. Defaults to False.

  • std_method (str) – SMILES/mol standardization method. Available: canon, chembl, papyrus. Defaults to “chembl”.

  • n_jobs – number of jobs to run in parallel. Defaults to 1.

  • chunk_size (int)

__init__(from_smi=False, standardize=False, std_method='chembl', n_jobs=1, chunk_size=None, **kwargs)[source]

Initalize the MolbloomFilters class.

Parameters:
  • from_smi (bool) – treats standard inputs (stdin) as smiles. Defaults to False.

  • standardize (bool) – whether to standardize stdin or not. Defaults to False.

  • std_method (str) – SMILES/mol standardization method. Available: canon, chembl, papyrus. Defaults to “chembl”.

  • n_jobs – number of jobs to run in parallel. Defaults to 1.

  • chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.

get_catalogs()[source]

Avilable catalogs in molbloom.

buy_smi(smi, catalog='zinc-instock')[source]

Wrapper of molbloom.buy. Returns True if the SMILES is probably in the catalog, False if it is definitely not.

Parameters:
get_flagging_df(stdin)[source]

Returns a dataframe with the flagging results for each catalog. Flags will be the resutls from molbloom.buy, where True means the SMILES is probably in the catalog, False means it is definitely not. For more information, see the original repo: https://github.com/whitead/molbloom

Parameters:

stdin (List[Mol | str]) – standard input; a list of SMILES strings or rdkit.Chem.Mol objects depending on the value of self._from_smi.

Returns:

dataframe with the flagging results for each catalog.

Return type:

pd.DataFrame

PeptideFilters

Note

Requires the allfilters extra: pip install 'chem-filters[allfilters]'

class chemFilters.filters.pep_filters.PeptideFilters(filter_type='all', from_smi=False, n_jobs=1, chunk_size=None)[source]

Bases: MoleculeHandler

Wrapper class for PepSift, a tool for identifying peptides and their derivatives from small molecule datasets. For the original repo, see: https://github.com/OlivierBeq/PepSift/tree/master

Parameters:
  • filter_type (str | int) – filter type to initialize a PepSift object. See available filters on self.available filters. Defaults to “all”.

  • from_smi (bool) – treats standard inputs (stdin) as smiles. Defaults to False.

  • n_jobs (int) – number of jobs to run in parallel. Defaults to 1.

  • chunk_size (int)

__init__(filter_type='all', from_smi=False, n_jobs=1, chunk_size=None)[source]

Initialize the PeptideFilters class.

Parameters:
  • filter_type (str | int) – filter type to initialize a PepSift object. See available filters on self.available filters. Defaults to “all”.

  • from_smi (bool) – treats standard inputs (stdin) as smiles. Defaults to False.

  • n_jobs (int) – number of jobs to run in parallel. Defaults to 1.

  • chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.

Return type:

None

property available_filters

List of available filters on pepsift.

filter_mols(stdin)[source]

Filter molecules using the designated pepsift filter. If sift_level=None as default, will load it from self.filter.

Parameters:

stdin (List[Mol]) – standard input; a list of SMILES strings or rdkit.Chem.Mol objects depending on the value of self._from_smi.

Returns:

a list of booleans indicating whether the molecule is a peptide

according to the initialized filter level on self.filter.

Return type:

List[bool]

get_flagging_df(stdin)[source]

Will flag the molecules according to all filter types avialable in pepsift.

Parameters:

stdin (List[Mol | str]) – standard input; a list of SMILES strings or rdkit.Chem.Mol objects depending on the value of self._from_smi.

Returns:

dataframe with the flags for each filter type.

Return type:

pd.DataFrame

SillyMolSpotterFilter

Note

Requires the allfilters extra: pip install 'chem-filters[allfilters]'

class chemFilters.filters.silly_filters.SillyMolSpotterFilter(from_smi=False, standardize=False, std_method='chembl', n_jobs=1, chunk_size=None, **kwargs)[source]

Bases: MoleculeHandler

Wrapper class to molspotter, a tool based on Pat Water’s silly walks filter. It helps finding unusual molecules in a dataset based the detection of unusual bits on a hashed ECFP fingerprint. For more information, see the original repo: https://github.com/OlivierBeq/molspotter

Parameters:
  • from_smi – treats standard inputs (stdin) as smiles. Defaults to False.

  • standardize – whether to standardize stdin or not. Defaults to False.

  • std_method – SMILES/mol standardization method. Available: canon, chembl, papyrus. Defaults to “chembl”.

  • n_jobs – number of jobs to run in parallel. Defaults to 1.

  • chunk_size (int)

__init__(from_smi=False, standardize=False, std_method='chembl', n_jobs=1, chunk_size=None, **kwargs)[source]

Initialize the SillyMolSpotterFilter class.

Parameters:
  • from_smi – treats standard inputs (stdin) as smiles. Defaults to False.

  • standardize – whether to standardize stdin or not. Defaults to False.

  • std_method – SMILES/mol standardization method. Available: canon, chembl, papyrus. Defaults to “chembl”.

  • n_jobs – number of jobs to run in parallel. Defaults to 1.

  • chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.

score_smi(smi, spotter_name='chembl')[source]

Score a SMILES string with a pretrained spotter, indicating how silly the processed molecule is.

Parameters:
  • smi (str)

  • spotter_name (str)

get_scoring_df(stdin)[source]

Get a dataframe with the scoring results for each spotter.

Parameters:

stdin (List[Mol | str]) – standard input; a list of SMILES strings or rdkit.Chem.Mol objects depending on the value of self._from_smi.

Returns:

a dataframe with the scoring results for each spotter.

Return type:

pd.DataFrame