pytoda.smiles.polymer_language module

Polymer language handling.

Summary

Classes:

PolymerTokenizer

PolymerTokenizer class.

Reference

class PolymerTokenizer(entity_names, name='polymer-language', add_start_and_stop=True, **kwargs)[source]

Bases: pytoda.smiles.smiles_language.SMILESTokenizer

PolymerTokenizer class.

PolymerTokenizer is an extension of SMILESTokenizer adding special start and stop tokens per entity. A polymer language is usually shared across several SMILES datasets (e.g. different entity sources).

__init__(entity_names, name='polymer-language', add_start_and_stop=True, **kwargs)[source]

Initialize Polymer language able to encode different entities.

Parameters
  • entity_names (Sequence[str]) – A list of entity names that the polymer language can distinguish.

  • name (str) – name of the PolymerTokenizer.

  • add_start_and_stop (bool) – add start and stop token indexes. Defaults to True.

  • kwargs (dict) – additional parameters passed to SMILESTokenizer.

Note

See set_smiles_transforms and set_encoding_transforms to change the transforms temporarily and reset with reset_initial_transforms. Assignment of class attributes in the parameter list will trigger such a reset.

update_entity(entity)[source]

Update the current entity and the default transforms (used e.g. in add_dataset) of the Polymer language object.

Parameters

entity (str) – a chemical entity (e.g. ‘Monomer’).

Return type

None

smiles_to_token_indexes(smiles, entity=None)[source]

Transform character-level SMILES into a sequence of token indexes.

In case of add_start_stop, inserts entity specific tokens.

Parameters
  • smiles (str) – a SMILES (or SELFIES) representation.

  • entity (str) – a chemical entity (e.g. ‘Monomer’). Defaults to None, where the current entity is used (initially the SMILESTokenizer default).

Returns

indexes representation for the

SMILES/SELFIES provided.

Return type

Union[Indexes, Tensor]

reset_initial_transforms()[source]

Reset smiles and token indexes transforms as on initialization, including entity specific transforms.

set_smiles_transforms(entity, canonical=None, augment=None, kekulize=None, all_bonds_explicit=None, all_hs_explicit=None, remove_bonddir=None, remove_chirality=None, selfies=None, sanitize=None)[source]

Helper function to reversibly change the transforms per entity.

set_encoding_transforms(entity, randomize=None, add_start_and_stop=None, padding=None, padding_length=None)[source]

Helper function to reversibly change the transforms per entity. Addresses entity specific start and stop tokens.