pytoda.preprocessing.crawlers module¶
Summary¶
Functions:
Uses the PubChem database to retrieve the SMILES of a drug name given as string (default) or a PubChem ID. |
|
Whether a given SMILES in PubChem. |
|
Queries pubchem for a given SMILES. |
|
Function for removing PubChem molecules from an iterable of smiles. |
Reference¶
-
get_smiles_from_pubchem
(drug, query_type='name', use_isomeric=True, kekulize=False, sanitize=True)[source]¶ Uses the PubChem database to retrieve the SMILES of a drug name given as string (default) or a PubChem ID.
- Parameters
drug (str) – string with a drug name (or a PubChem ID as a string).
query_type (str) – Either ‘name’ or ‘cid’. Identifies whether the argument provided as drug is a name (e.g ‘Tacrine’) or a PubChem ID (1935). Defaults to name.
use_isomeric (bool, optional) – SMILES, not the canonical one.
kekulize (bool, optional) – whether kekulization is used. PubChem uses kekulization per default, so setting this to ‘True’ will not perform any operation on the retrieved SMILES. NOTE: Setting it to ‘False’ will convert aromatic atoms to lower- case characters and induces a RDKit dependency
sanitize (bool, optional) – Sanitize SMILE
- Returns
The SMILES string of the drug name.
- Return type
smiles (str)
-
remove_pubchem_smiles
(smiles_list)[source]¶ Function for removing PubChem molecules from an iterable of smiles. :type smiles_list:
Iterable
[str
] :param smiles_list: many SMILES strings. :type smiles_list: Iterable[str]- Returns
- Filtered list of SMILES, all SMILES pointing to PubChem
molecules are removed.
- Return type
List[str]
-
query_pubchem
(smiles)[source]¶ Queries pubchem for a given SMILES.
- Parameters
smiles (str) – A SMILES string.
- Returns
bool: Whether or not SMILES is known to PubChem.
- int: PubChem ID of matched SMILES, -1 if SMILES was not found.
Instead, -2 means an error in the PubChem query.
- Return type
Tuple[bool, int]