pytoda.smiles.transforms module¶
SMILES transforms.
Summary¶
Classes:
Augment a SMILES string, according to Bjerrum (2017). |
|
Augment a SMILES (represented as a Tensor) according to Bjerrum (2017). |
|
Convert any SMILES to RDKit-canonical SMILES. |
|
Transform SMILES to Kekule version. |
|
Transform SMILES without explicitly converting to Kekule version |
|
Remove isomery (isotopic and chiral specifications) from SMILES |
|
Get fingerprints starting from SMILES. |
|
Transform SMILES to token indexes using SMILES language. |
|
Convert a molecule from SMILES to SELFIES. |
Functions:
Setup a composition of token indexes to token indexes transformations. |
|
Setup a composition of SMILES to SMILES (or SELFIES) transformations. |
Reference¶
-
compose_smiles_transforms
(canonical=False, augment=False, kekulize=False, all_bonds_explicit=False, all_hs_explicit=False, remove_bonddir=False, remove_chirality=False, selfies=False, sanitize=True, device=None)[source]¶ Setup a composition of SMILES to SMILES (or SELFIES) transformations.
- Parameters
canonical (bool, optional) – performs canonicalization of SMILES (one original string for one molecule). If True, then other transformations (augment etc, see below) do not apply. Defaults to False.
augment (bool, optional) – perform SMILES augmentation. Defaults to False.
kekulize (bool, optional) – kekulizes SMILES (implicit aromaticity only). Defaults to False.
all_bonds_explicit (bool, optional) – makes all bonds explicit. Defaults to False, only applies if kekulize is True.
all_hs_explicit (bool, optional) – makes all hydrogens explicit. Defaults to False, only applies if kekulize is True.
remove_bonddir (bool, optional) – remove directional info of bonds. Defaults to False.
remove_chirality (bool, optional) – remove chirality information. Defaults to False.
selfies (bool, optional) – whether selfies is used instead of smiles. Defaults to False.
sanitize (bool, optional) – RDKit sanitization of the molecule. Defaults to True.
device (torch.device) – DEPRECATED
- Returns
A Callable that applies composition of SMILES transforms.
- Return type
-
compose_encoding_transforms
(randomize=False, add_start_and_stop=False, start_index=2, stop_index=3, padding=False, padding_length=None, padding_index=0)[source]¶ Setup a composition of token indexes to token indexes transformations.
- Parameters
randomize (bool, optional) – perform a true randomization of token indexes. Defaults to False.
add_start_and_stop (bool, optional) – add start and stop token indexes. Defaults to False.
start_index (int, optional) – index of start token in vocabulary. Default to 2.
stop_index (int, optional) – index of stop token in vocabulary. Default to 3.
padding (bool, optional) – pad sequences to given padding_length. Defaults to True.
padding_length (int, optional) – manually sets number of applied paddings, applies only if padding is True. Defaults to None, but must be passed in case of padding.
padding_index (int, optional) – index of padding token in vocabulary. Default to 0.
- Returns
- A Callable that applies composition of transforms on
token indexes.
- Return type
Note
Transformations can change the number of tokens.
-
class
SMILESToTokenIndexes
(smiles_language)[source]¶ Bases:
pytoda.transforms.Transform
Transform SMILES to token indexes using SMILES language.
-
__init__
(smiles_language)[source]¶ Initialize a SMILES to token indexes object.
- Parameters
smiles_language (SMILESLanguage) – a SMILES language. NOTE: No typing used to prevent circular import.
-
-
class
RemoveIsomery
(bonddir=True, chirality=True, sanitize=True)[source]¶ Bases:
pytoda.transforms.Transform
Remove isomery (isotopic and chiral specifications) from SMILES
-
class
Kekulize
(all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]¶ Bases:
pytoda.transforms.Transform
Transform SMILES to Kekule version.
-
class
NotKekulize
(all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]¶ Bases:
pytoda.transforms.Transform
Transform SMILES without explicitly converting to Kekule version
-
class
Augment
(kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True, seed=- 1)[source]¶ Bases:
pytoda.transforms.Transform
Augment a SMILES string, according to Bjerrum (2017).
-
class
AugmentTensor
(smiles_language, kekule_smiles=False, all_bonds_explicit=False, all_hs_explicit=False, sanitize=True)[source]¶ Bases:
pytoda.transforms.Transform
Augment a SMILES (represented as a Tensor) according to Bjerrum (2017).
-
class
Selfies
[source]¶ Bases:
pytoda.transforms.Transform
Convert a molecule from SMILES to SELFIES.
-
class
Canonicalization
(sanitize=True)[source]¶ Bases:
pytoda.transforms.Transform
Convert any SMILES to RDKit-canonical SMILES. .. rubric:: Example
An example:
smiles = 'CN2C(=O)N(C)C(=O)C1=C2N=CN1C' c = Canonicalization() c(smiles)
Result is: ‘Cn1c(=O)c2c(ncn2C)n(C)c1=O’
-
class
SMILESToMorganFingerprints
(radius=2, bits=512, chirality=True, sanitize=False)[source]¶ Bases:
pytoda.transforms.Transform
Get fingerprints starting from SMILES.