pytoda.smiles.transforms module¶

SMILES transforms.

Summary¶

Classes:

`Augment`	Augment a SMILES string, according to Bjerrum (2017).
`AugmentTensor`	Augment a SMILES (represented as a Tensor) according to Bjerrum (2017).
`Canonicalization`	Convert any SMILES to RDKit-canonical SMILES.
`Kekulize`	Transform SMILES to Kekule version.
`NotKekulize`	Transform SMILES without explicitly converting to Kekule version
`RemoveIsomery`	Remove isomery (isotopic and chiral specifications) from SMILES
`SMILESToMorganFingerprints`	Get fingerprints starting from SMILES.
`SMILESToTokenIndexes`	Transform SMILES to token indexes using SMILES language.
`Selfies`	Convert a molecule from SMILES to SELFIES.

Functions:

`compose_encoding_transforms`	Setup a composition of token indexes to token indexes transformations.
`compose_smiles_transforms`	Setup a composition of SMILES to SMILES (or SELFIES) transformations.

Reference¶

compose_smiles_transforms(canonical=False, augment=False, kekulize=False, all_bonds_explicit=False, all_hs_explicit=False, remove_bonddir=False, remove_chirality=False, selfies=False, sanitize=True, device=None)[source]¶

Setup a composition of SMILES to SMILES (or SELFIES) transformations.

Parameters

canonical (bool, optional) – performs canonicalization of SMILES (one original string for one molecule). If True, then other transformations (augment etc, see below) do not apply. Defaults to False.
augment (bool, optional) – perform SMILES augmentation. Defaults to False.
kekulize (bool, optional) – kekulizes SMILES (implicit aromaticity only). Defaults to False.
all_bonds_explicit (bool, optional) – makes all bonds explicit. Defaults to False, only applies if kekulize is True.
all_hs_explicit (bool, optional) – makes all hydrogens explicit. Defaults to False, only applies if kekulize is True.
remove_bonddir (bool, optional) – remove directional info of bonds. Defaults to False.
remove_chirality (bool, optional) – remove chirality information. Defaults to False.
selfies (bool, optional) – whether selfies is used instead of smiles. Defaults to False.
sanitize (bool, optional) – RDKit sanitization of the molecule. Defaults to True.
device (torch.device) – DEPRECATED

Returns

A Callable that applies composition of SMILES transforms.

Return type

Compose

compose_encoding_transforms(randomize=False, add_start_and_stop=False, start_index=2, stop_index=3, padding=False, padding_length=None, padding_index=0)[source]¶

Setup a composition of token indexes to token indexes transformations.

Parameters

randomize (bool, optional) – perform a true randomization of token indexes. Defaults to False.
add_start_and_stop (bool, optional) – add start and stop token indexes. Defaults to False.
start_index (int, optional) – index of start token in vocabulary. Default to 2.
stop_index (int, optional) – index of stop token in vocabulary. Default to 3.
padding (bool, optional) – pad sequences to given padding_length. Defaults to True.
padding_length (int, optional) – manually sets number of applied paddings, applies only if padding is True. Defaults to None, but must be passed in case of padding.
padding_index (int, optional) – index of padding token in vocabulary. Default to 0.

Returns

A Callable that applies composition of transforms on: token indexes.

Return type

Compose

Note

Transformations can change the number of tokens.

class SMILESToTokenIndexes(smiles_language)[source]¶

Bases: pytoda.transforms.Transform

Transform SMILES to token indexes using SMILES language.

__init__(smiles_language)[source]¶

Initialize a SMILES to token indexes object.

Parameters: smiles_language (SMILESLanguage) – a SMILES language. NOTE: No typing used to prevent circular import.

class RemoveIsomery(bonddir=True, chirality=True, sanitize=True)[source]¶

Bases: pytoda.transforms.Transform

Remove isomery (isotopic and chiral specifications) from SMILES

__init__(bonddir=True, chirality=True, sanitize=True)[source]¶

Initialize isomery removal.

Parameters

bonddir (bool) – whether bond direction information should be removed or not (default: {True})
chirality (bool) – whether chirality information should be removed (default: {True}).

class Kekulize(all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]¶

Bases: pytoda.transforms.Transform

Transform SMILES to Kekule version.

class NotKekulize(all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]¶

Bases: pytoda.transforms.Transform

Transform SMILES without explicitly converting to Kekule version

class Augment(kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True, seed=- 1)[source]¶

Bases: pytoda.transforms.Transform

Augment a SMILES string, according to Bjerrum (2017).

__init__(kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True, seed=- 1)[source]¶: NOTE: These parameter need to be passed down to the enumerator.

class AugmentTensor(smiles_language, kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]¶

Bases: pytoda.transforms.Transform

Augment a SMILES (represented as a Tensor) according to Bjerrum (2017).

__init__(smiles_language, kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]¶: NOTE: These parameter need to be passed down to the enumerator.

update_smiles_language(smiles_language)[source]¶

class Selfies[source]¶

Bases: pytoda.transforms.Transform

Convert a molecule from SMILES to SELFIES.

class Canonicalization(sanitize=True)[source]¶

Bases: pytoda.transforms.Transform

Convert any SMILES to RDKit-canonical SMILES. .. rubric:: Example

An example:

smiles = 'CN2C(=O)N(C)C(=O)C1=C2N=CN1C'
c = Canonicalization()
c(smiles)

Result is: ‘Cn1c(=O)c2c(ncn2C)n(C)c1=O’

__init__(sanitize=True)[source]¶

Initialize a canonicalizer

Parameters: sanitize (bool, optional) – Whether molecule is sanitized. Defaults to True.

class SMILESToMorganFingerprints(radius=2, bits=512, chirality=True, sanitize=False)[source]¶

Bases: pytoda.transforms.Transform

Get fingerprints starting from SMILES.

__init__(radius=2, bits=512, chirality=True, sanitize=False)[source]¶

Initialize a SMILES to fingerprints object.

Parameters

radius (int) – radius of the fingerprints.
bits (int) – bits used to represent the fingerprints.