Chemistry utilities

ChemStandardizer

class chemFilters.chem.standardizers.ChemStandardizer(method='chembl', n_jobs=5, isomeric=True, progress=False, rdkit_loglevel='warning', from_smi=False, return_smi=True, chunk_size=None, **kwargs)[source]

Bases: MoleculeHandler

A class to standardize molecules/SMILES strings. Initialization allows for the selection of the settings of the standardizer. The object can then be called on a iterable containing molecules/SMILES strings to apply the standardization.

Parameters:
__init__(method='chembl', n_jobs=5, isomeric=True, progress=False, rdkit_loglevel='warning', from_smi=False, return_smi=True, chunk_size=None, **kwargs)[source]

Initializes the ChemStandardizer class.

Parameters:
  • method (str | Callable) – standardization pipeline to use. Current supports “canon”, “chembl”, “papyrus”, “molvs”, or a callable. If callable, ensure it takes rdkit.Mol objects as input. Defaults to “chembl”. “canon” is rdkit’s SMILES canonicalization.

  • n_jobs (int) – number of jobs running in parallel. Defaults to 5.

  • isomeric (bool) – output smiles with isomeric information. Defaults to True.

  • progress (bool) – display a progress bar with tqdm. Defaults to False.

  • rdkit_loglevel (str) – one of debug, info, warning, error, critical. Defaults to “warning”.

  • from_smi (bool) – if True, the standardizer will expect SMILES strings as input. Defaults to False.

  • chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.

  • kwargs – additional keyword arguments to pass to the standardizer.

  • return_smi (bool)

Raises:
  • ImportError – if method is “papyrus” but the optional dependency is not installed.

  • ValueError – when an invalid method is passed.

Return type:

None

papyrusStandardizer(stdin, **kwargs)[source]

Uses the Papyrus standardizer to standardize a SMILES string. By default, this standardization pipeline removes stereocenters, so beware of the isomeric flag. Accepts extra keyword arguments that will be passed to the standardizer.

For more information: https://github.com/OlivierBeq/Papyrus_structure_pipeline

Parameters:
  • stdin (str | Mol) – standard input; single SMILES strings or single rdkit.Chem.Mol object depending on the value of self._from_smi.

  • isomeric – output isomeric smiles. Defaults to True.

  • kwargs – aditional keyword arguments to pass to the standardizer.

Returns:

standardized smiles string

Return type:

str

chemblStandardizer(stdin, neutralize=True, **kwargs)[source]

Uses the ChEMBL standardizer to standardize a SMILES string. Accepts extra keyword arguments that will be passed to the standardizer

Parameters:
  • stdin (str | Mol) – standard input; single SMILES strings or single rdkit.Chem.Mol object depending on the value of self._from_smi.

  • isomeric – output isomeric smiles. Defaults to True.

  • neutralize (bool) – configure get_parent_mol to neutralize the molecule. Defaults to True.

  • kwargs – keyword arguments to pass to the get_parent_mol and the standardize_mol functions.

Returns:

standardized smiles string

Return type:

str

molvsStandardizer(stdin, **kwargs)[source]

Uses molvs to standardize a SMILES string. By default, this standardization pipeline applies the functions canonicalize_tautomer and standardize implemented in the package.

For more information, see the docs: https://molvs.readthedocs.io/en/latest/

Parameters:
  • stdin (str | Mol) – standard input; single SMILES strings or single rdkit.Chem.Mol object depending on the value of self._from_smi.

  • isomeric – output isomeric smiles. Defaults to True.

  • kwargs – aditional molvs.Standardizer object.

Returns:

standardized smiles string

Return type:

str

InchiHandling

class chemFilters.chem.standardizers.InchiHandling(convert_to, n_jobs=5, progress=False, rdkit_loglevel='warning', from_smi=False, chunk_size=None)[source]

Bases: MoleculeHandler

Obtain a list of inchis, inchikeys or connectivities from a list of smiles. Initialization allows for the selection of the settings. The object can then be called on a iterable containing SMILES strings to obtain the desired identifier.

Parameters:
  • convert_to (str)

  • n_jobs (int)

  • progress (bool)

  • rdkit_loglevel (str)

  • from_smi (bool)

  • chunk_size (int)

__init__(convert_to, n_jobs=5, progress=False, rdkit_loglevel='warning', from_smi=False, chunk_size=None)[source]

Initialize the InchiHandling class.

Parameters:
  • convert_to (str) – what to convert the smiles to. Can be “inchi”, “inchikey” or “connectivity”.

  • n_jobs (int) – Number of jobs for processing in parallel. Defaults to 5.

  • progress (bool) – whether to show the progress bar. Defaults to False.

  • rdkit_loglevel (str) – one of debug, info, warning, error, critical. Defaults to “warning”.

  • from_smi (bool) – if True, the standardizer will expect SMILES strings as input. Defaults to False.

  • chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.

Raises:

ValueError – if the convert_to argument is not one of the three options.

Return type:

None

MoleculeHandler

class chemFilters.chem.interface.MoleculeHandler(from_smi=False, isomeric=True)[source]

Bases: object

Interface class for handling molecules. implemented so I can use this from_smi functionalitiy on other classes in the package.

pmap(n_jobs, progress, stdin, func, pickable=True, custom_desc=None, chunk_size=None)[source]

Helper function to map a function to an iterable using ParallelApplier.

Parameters:
  • n_jobs (int) – number of jobs for parallel processing.

  • progress (bool) – display progress bar with tqdm.

  • stdin (Iterable) – iterable to map the function to.

  • func (Callable) – function to be mapped to the variables.

  • pickable (bool) – bool indicating whether the function can be parallelized. Defaults to True.

  • custom_desc (str) – custom description for the progress bar. Defaults to None.

  • chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.

Returns:

A list of the results of the function mapped to the iterable.

Utility functions

Utility functions to be used by the chemFilters.chem subpackage.

chemFilters.chem.utils.rdkit_log_controller(level)[source]

Context manager for controlling the RDKit logger level.

Parameters:

level – desired logging level. One of debug, info, warning, error, critical.

chemFilters.chem.utils.molToConnectivity(mol)[source]

Converts a SMILES string to a connectivity string.

Parameters:

mol (Mol)

chemFilters.chem.utils.molToInchiKey(mol)[source]

Converts a SMILES string to an InChI string.

Parameters:

mol (Mol)

chemFilters.chem.utils.molToInchi(mol)[source]

Converts a SMILES string to an InChI string.

Parameters:

mol (Mol)

chemFilters.chem.utils.molToCanon(mol, isomeric=True)[source]
Parameters:

isomeric (bool)