pytoda.smiles.transforms module

SMILES transforms.

Summary

Classes:

Augment

Augment a SMILES string, according to Bjerrum (2017).

AugmentTensor

Augment a SMILES (represented as a Tensor) according to Bjerrum (2017).

Canonicalization

Convert any SMILES to RDKit-canonical SMILES.

Kekulize

Transform SMILES to Kekule version.

NotKekulize

Transform SMILES without explicitly converting to Kekule version

RemoveIsomery

Remove isomery (isotopic and chiral specifications) from SMILES

SMILESToMorganFingerprints

Get fingerprints starting from SMILES.

SMILESToTokenIndexes

Transform SMILES to token indexes using SMILES language.

Selfies

Convert a molecule from SMILES to SELFIES.

Functions:

compose_encoding_transforms

Setup a composition of token indexes to token indexes transformations.

compose_smiles_transforms

Setup a composition of SMILES to SMILES (or SELFIES) transformations.

Reference

compose_smiles_transforms(canonical=False, augment=False, kekulize=False, all_bonds_explicit=False, all_hs_explicit=False, remove_bonddir=False, remove_chirality=False, selfies=False, sanitize=True, device=None)[source]

Setup a composition of SMILES to SMILES (or SELFIES) transformations.

Parameters
  • canonical (bool, optional) – performs canonicalization of SMILES (one original string for one molecule). If True, then other transformations (augment etc, see below) do not apply. Defaults to False.

  • augment (bool, optional) – perform SMILES augmentation. Defaults to False.

  • kekulize (bool, optional) – kekulizes SMILES (implicit aromaticity only). Defaults to False.

  • all_bonds_explicit (bool, optional) – makes all bonds explicit. Defaults to False, only applies if kekulize is True.

  • all_hs_explicit (bool, optional) – makes all hydrogens explicit. Defaults to False, only applies if kekulize is True.

  • remove_bonddir (bool, optional) – remove directional info of bonds. Defaults to False.

  • remove_chirality (bool, optional) – remove chirality information. Defaults to False.

  • selfies (bool, optional) – whether selfies is used instead of smiles. Defaults to False.

  • sanitize (bool, optional) – RDKit sanitization of the molecule. Defaults to True.

  • device (torch.device) – DEPRECATED

Returns

A Callable that applies composition of SMILES transforms.

Return type

Compose

compose_encoding_transforms(randomize=False, add_start_and_stop=False, start_index=2, stop_index=3, padding=False, padding_length=None, padding_index=0)[source]

Setup a composition of token indexes to token indexes transformations.

Parameters
  • randomize (bool, optional) – perform a true randomization of token indexes. Defaults to False.

  • add_start_and_stop (bool, optional) – add start and stop token indexes. Defaults to False.

  • start_index (int, optional) – index of start token in vocabulary. Default to 2.

  • stop_index (int, optional) – index of stop token in vocabulary. Default to 3.

  • padding (bool, optional) – pad sequences to given padding_length. Defaults to True.

  • padding_length (int, optional) – manually sets number of applied paddings, applies only if padding is True. Defaults to None, but must be passed in case of padding.

  • padding_index (int, optional) – index of padding token in vocabulary. Default to 0.

Returns

A Callable that applies composition of transforms on

token indexes.

Return type

Compose

Note

Transformations can change the number of tokens.

class SMILESToTokenIndexes(smiles_language)[source]

Bases: pytoda.transforms.Transform

Transform SMILES to token indexes using SMILES language.

__init__(smiles_language)[source]

Initialize a SMILES to token indexes object.

Parameters

smiles_language (SMILESLanguage) – a SMILES language. NOTE: No typing used to prevent circular import.

class RemoveIsomery(bonddir=True, chirality=True, sanitize=True)[source]

Bases: pytoda.transforms.Transform

Remove isomery (isotopic and chiral specifications) from SMILES

__init__(bonddir=True, chirality=True, sanitize=True)[source]

Initialize isomery removal.

Parameters
  • bonddir (bool) – whether bond direction information should be removed or not (default: {True})

  • chirality (bool) – whether chirality information should be removed (default: {True}).

class Kekulize(all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]

Bases: pytoda.transforms.Transform

Transform SMILES to Kekule version.

class NotKekulize(all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]

Bases: pytoda.transforms.Transform

Transform SMILES without explicitly converting to Kekule version

class Augment(kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True, seed=- 1)[source]

Bases: pytoda.transforms.Transform

Augment a SMILES string, according to Bjerrum (2017).

__init__(kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True, seed=- 1)[source]

NOTE: These parameter need to be passed down to the enumerator.

class AugmentTensor(smiles_language, kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]

Bases: pytoda.transforms.Transform

Augment a SMILES (represented as a Tensor) according to Bjerrum (2017).

__init__(smiles_language, kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]

NOTE: These parameter need to be passed down to the enumerator.

update_smiles_language(smiles_language)[source]
class Selfies[source]

Bases: pytoda.transforms.Transform

Convert a molecule from SMILES to SELFIES.

class Canonicalization(sanitize=True)[source]

Bases: pytoda.transforms.Transform

Convert any SMILES to RDKit-canonical SMILES. .. rubric:: Example

An example:

smiles = 'CN2C(=O)N(C)C(=O)C1=C2N=CN1C'
c = Canonicalization()
c(smiles)

Result is: ‘Cn1c(=O)c2c(ncn2C)n(C)c1=O’

__init__(sanitize=True)[source]

Initialize a canonicalizer

Parameters

sanitize (bool, optional) – Whether molecule is sanitized. Defaults to True.

class SMILESToMorganFingerprints(radius=2, bits=512, chirality=True, sanitize=False)[source]

Bases: pytoda.transforms.Transform

Get fingerprints starting from SMILES.

__init__(radius=2, bits=512, chirality=True, sanitize=False)[source]

Initialize a SMILES to fingerprints object.

Parameters
  • radius (int) – radius of the fingerprints.

  • bits (int) – bits used to represent the fingerprints.