pytoda.preprocessing.crawlers module¶

Summary¶

Functions:

`get_smiles_from_pubchem`	Uses the PubChem database to retrieve the SMILES of a drug name given as string (default) or a PubChem ID.
`is_pubchem`	Whether a given SMILES in PubChem.
`query_pubchem`	Queries pubchem for a given SMILES.
`remove_pubchem_smiles`	Function for removing PubChem molecules from an iterable of smiles.

Reference¶

get_smiles_from_pubchem(drug, query_type='name', use_isomeric=True, kekulize=False, sanitize=True)[source]¶

Uses the PubChem database to retrieve the SMILES of a drug name given as string (default) or a PubChem ID.

Parameters

drug (str) – string with a drug name (or a PubChem ID as a string).
query_type (str) – Either ‘name’ or ‘cid’. Identifies whether the argument provided as drug is a name (e.g ‘Tacrine’) or a PubChem ID (1935). Defaults to name.
use_isomeric (bool, optional) – SMILES, not the canonical one.
kekulize (bool, optional) – whether kekulization is used. PubChem uses kekulization per default, so setting this to ‘True’ will not perform any operation on the retrieved SMILES. NOTE: Setting it to ‘False’ will convert aromatic atoms to lower- case characters and induces a RDKit dependency
sanitize (bool, optional) – Sanitize SMILE

Returns

The SMILES string of the drug name.

Return type

smiles (str)

remove_pubchem_smiles(smiles_list)[source]¶

Function for removing PubChem molecules from an iterable of smiles. :type smiles_list: Iterable[str] :param smiles_list: many SMILES strings. :type smiles_list: Iterable[str]

Returns

Filtered list of SMILES, all SMILES pointing to PubChem: molecules are removed.

Return type

List[str]

query_pubchem(smiles)[source]¶

Queries pubchem for a given SMILES.

Parameters

smiles (str) – A SMILES string.

Returns

bool: Whether or not SMILES is known to PubChem.

int: PubChem ID of matched SMILES, -1 if SMILES was not found.: Instead, -2 means an error in the PubChem query.

Return type

Tuple[bool, int]

is_pubchem(smiles)[source]¶

Whether a given SMILES in PubChem. :type smiles: str :param smiles: A SMILES string. :type smiles: str

Returns: Whether or not SMILES is known to PubChem.
Return type: bool