# Filtering compounds chemFilters provides several filter classes, each wrapping a different filtering tool. All filters follow a similar API: initialize the filter, then call `get_flagging_df` (or `get_scoring_df`) on a list of molecules to get a DataFrame of results. ## RDKit structural alert filters `RdkitFilters` is always available with the base install. It wraps RDKit's `FilterCatalog` system. ```python from chemFilters import RdkitFilters from rdkit import Chem mols = [ Chem.MolFromSmiles("CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2"), Chem.MolFromSmiles("CC1=C2C(=COC(C)C2C)C(O)=C(C(=O)O)C1=O"), Chem.MolFromSmiles("CCOP(=O)(Nc1cccc(Cl)c1)OCC"), Chem.MolFromSmiles("Nc1ccc(C=Cc2ccc(N)cc2S(=O)(=O)O)c(S(=O)(=O)O)c1"), ] rdkit_filter = RdkitFilters(filter_type="ALL", from_smi=False) flagging_df = rdkit_filter.get_flagging_df(mols) ``` The `filter_type` parameter selects which filter catalog to use. See `RdkitFilters.available_filters` for the full list. Common choices include `"ALL"`, `"PAINS"`, `"CHEMBL"`, and `"BRENK"`. You can also retrieve substructure matches directly: ```python filter_names, descriptions, substructs = rdkit_filter.filter_mols(mols) ``` ## Purchasability filters (molbloom) :::{note} Requires the `allfilters` extra: `pip install 'chem-filters[allfilters]'` ::: ```python from chemFilters import MolbloomFilters bloom_filter = MolbloomFilters(from_smi=False, standardize=False) bloom_filter.get_flagging_df(mols) ``` Results indicate whether a molecule is *probably* in a given catalog (`True`) or *definitely not* (`False`). ## Peptide filters (PepSift) :::{note} Requires the `allfilters` extra: `pip install 'chem-filters[allfilters]'` ::: ```python from chemFilters import PeptideFilters pep_filter = PeptideFilters(from_smi=False) pep_filter.get_flagging_df(mols) ``` ## Silly molecule filters (molspotter) :::{note} Requires the `allfilters` extra: `pip install 'chem-filters[allfilters]'` ::: ```python from chemFilters import SillyMolSpotterFilter silly_filter = SillyMolSpotterFilter(from_smi=False) silly_filter.get_scoring_df(mols) ``` Scores indicate how "unusual" a molecule is based on detection of rare bits in hashed ECFP fingerprints. ## Running all filters at once (CoreFilter) `CoreFilter` combines all available filters into a single callable. This is also what the CLI uses under the hood. ```python from chemFilters.core import CoreFilter smiles = [ "CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)=[O+]2", "CC1=C2C(=COC(C)C2C)C(O)=C(C(=O)O)C1=O", "CCOP(=O)(Nc1cccc(Cl)c1)OCC", "Nc1ccc(C=Cc2ccc(N)cc2S(=O)(=O)O)c(S(=O)(=O)O)c1", ] core_filter = CoreFilter() filtered_df = core_filter(smiles) ``` Individual filters can be toggled on/off via constructor arguments (e.g., `pep_filter=False`). Data can be processed in chunks with the `chunksize` parameter.