pytoda.preprocessing.crawlers module

Summary

Functions:

get_smiles_from_pubchem

Uses the PubChem database to retrieve the SMILES of a drug name given as string (default) or a PubChem ID.

is_pubchem

Whether a given SMILES in PubChem.

query_pubchem

Queries pubchem for a given SMILES.

remove_pubchem_smiles

Function for removing PubChem molecules from an iterable of smiles.

Reference

get_smiles_from_pubchem(drug, query_type='name', use_isomeric=True, kekulize=False, sanitize=True)[source]

Uses the PubChem database to retrieve the SMILES of a drug name given as string (default) or a PubChem ID.

Parameters
  • drug (str) – string with a drug name (or a PubChem ID as a string).

  • query_type (str) – Either ‘name’ or ‘cid’. Identifies whether the argument provided as drug is a name (e.g ‘Tacrine’) or a PubChem ID (1935). Defaults to name.

  • use_isomeric (bool, optional) – SMILES, not the canonical one.

  • kekulize (bool, optional) – whether kekulization is used. PubChem uses kekulization per default, so setting this to ‘True’ will not perform any operation on the retrieved SMILES. NOTE: Setting it to ‘False’ will convert aromatic atoms to lower- case characters and induces a RDKit dependency

  • sanitize (bool, optional) – Sanitize SMILE

Returns

The SMILES string of the drug name.

Return type

smiles (str)

remove_pubchem_smiles(smiles_list)[source]

Function for removing PubChem molecules from an iterable of smiles. :type smiles_list: Iterable[str] :param smiles_list: many SMILES strings. :type smiles_list: Iterable[str]

Returns

Filtered list of SMILES, all SMILES pointing to PubChem

molecules are removed.

Return type

List[str]

query_pubchem(smiles)[source]

Queries pubchem for a given SMILES.

Parameters

smiles (str) – A SMILES string.

Returns

bool: Whether or not SMILES is known to PubChem.

int: PubChem ID of matched SMILES, -1 if SMILES was not found.

Instead, -2 means an error in the PubChem query.

Return type

Tuple[bool, int]

is_pubchem(smiles)[source]

Whether a given SMILES in PubChem. :type smiles: str :param smiles: A SMILES string. :type smiles: str

Returns

Whether or not SMILES is known to PubChem.

Return type

bool