Chemistry utilities¶
ChemStandardizer¶
- class chemFilters.chem.standardizers.ChemStandardizer(method='chembl', n_jobs=5, isomeric=True, progress=False, rdkit_loglevel='warning', from_smi=False, return_smi=True, chunk_size=None, **kwargs)[source]¶
Bases:
MoleculeHandlerA class to standardize molecules/SMILES strings. Initialization allows for the selection of the settings of the standardizer. The object can then be called on a iterable containing molecules/SMILES strings to apply the standardization.
- Parameters:
- __init__(method='chembl', n_jobs=5, isomeric=True, progress=False, rdkit_loglevel='warning', from_smi=False, return_smi=True, chunk_size=None, **kwargs)[source]¶
Initializes the ChemStandardizer class.
- Parameters:
method (str | Callable) – standardization pipeline to use. Current supports “canon”, “chembl”, “papyrus”, “molvs”, or a callable. If callable, ensure it takes rdkit.Mol objects as input. Defaults to “chembl”. “canon” is rdkit’s SMILES canonicalization.
n_jobs (int) – number of jobs running in parallel. Defaults to 5.
isomeric (bool) – output smiles with isomeric information. Defaults to True.
progress (bool) – display a progress bar with tqdm. Defaults to False.
rdkit_loglevel (str) – one of debug, info, warning, error, critical. Defaults to “warning”.
from_smi (bool) – if True, the standardizer will expect SMILES strings as input. Defaults to False.
chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.
kwargs – additional keyword arguments to pass to the standardizer.
return_smi (bool)
- Raises:
ImportError – if method is “papyrus” but the optional dependency is not installed.
ValueError – when an invalid method is passed.
- Return type:
None
- papyrusStandardizer(stdin, **kwargs)[source]¶
Uses the Papyrus standardizer to standardize a SMILES string. By default, this standardization pipeline removes stereocenters, so beware of the isomeric flag. Accepts extra keyword arguments that will be passed to the standardizer.
For more information: https://github.com/OlivierBeq/Papyrus_structure_pipeline
- Parameters:
- Returns:
standardized smiles string
- Return type:
- chemblStandardizer(stdin, neutralize=True, **kwargs)[source]¶
Uses the ChEMBL standardizer to standardize a SMILES string. Accepts extra keyword arguments that will be passed to the standardizer
- Parameters:
stdin (str | Mol) – standard input; single SMILES strings or single rdkit.Chem.Mol object depending on the value of self._from_smi.
isomeric – output isomeric smiles. Defaults to True.
neutralize (bool) – configure get_parent_mol to neutralize the molecule. Defaults to True.
kwargs – keyword arguments to pass to the get_parent_mol and the standardize_mol functions.
- Returns:
standardized smiles string
- Return type:
- molvsStandardizer(stdin, **kwargs)[source]¶
Uses molvs to standardize a SMILES string. By default, this standardization pipeline applies the functions canonicalize_tautomer and standardize implemented in the package.
For more information, see the docs: https://molvs.readthedocs.io/en/latest/
InchiHandling¶
- class chemFilters.chem.standardizers.InchiHandling(convert_to, n_jobs=5, progress=False, rdkit_loglevel='warning', from_smi=False, chunk_size=None)[source]¶
Bases:
MoleculeHandlerObtain a list of inchis, inchikeys or connectivities from a list of smiles. Initialization allows for the selection of the settings. The object can then be called on a iterable containing SMILES strings to obtain the desired identifier.
- Parameters:
- __init__(convert_to, n_jobs=5, progress=False, rdkit_loglevel='warning', from_smi=False, chunk_size=None)[source]¶
Initialize the InchiHandling class.
- Parameters:
convert_to (str) – what to convert the smiles to. Can be “inchi”, “inchikey” or “connectivity”.
n_jobs (int) – Number of jobs for processing in parallel. Defaults to 5.
progress (bool) – whether to show the progress bar. Defaults to False.
rdkit_loglevel (str) – one of debug, info, warning, error, critical. Defaults to “warning”.
from_smi (bool) – if True, the standardizer will expect SMILES strings as input. Defaults to False.
chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.
- Raises:
ValueError – if the convert_to argument is not one of the three options.
- Return type:
None
MoleculeHandler¶
- class chemFilters.chem.interface.MoleculeHandler(from_smi=False, isomeric=True)[source]¶
Bases:
objectInterface class for handling molecules. implemented so I can use this from_smi functionalitiy on other classes in the package.
- pmap(n_jobs, progress, stdin, func, pickable=True, custom_desc=None, chunk_size=None)[source]¶
Helper function to map a function to an iterable using ParallelApplier.
- Parameters:
n_jobs (int) – number of jobs for parallel processing.
progress (bool) – display progress bar with tqdm.
stdin (Iterable) – iterable to map the function to.
func (Callable) – function to be mapped to the variables.
pickable (bool) – bool indicating whether the function can be parallelized. Defaults to True.
custom_desc (str) – custom description for the progress bar. Defaults to None.
chunk_size (int) – size of chunks for ParallelApplier. If None, auto-calculated. Defaults to None.
- Returns:
A list of the results of the function mapped to the iterable.
Utility functions¶
Utility functions to be used by the chemFilters.chem subpackage.
- chemFilters.chem.utils.rdkit_log_controller(level)[source]¶
Context manager for controlling the RDKit logger level.
- Parameters:
level – desired logging level. One of debug, info, warning, error, critical.
- chemFilters.chem.utils.molToConnectivity(mol)[source]¶
Converts a SMILES string to a connectivity string.
- Parameters:
mol (Mol)
- chemFilters.chem.utils.molToInchiKey(mol)[source]¶
Converts a SMILES string to an InChI string.
- Parameters:
mol (Mol)