pytoda.smiles.polymer_language module¶
Polymer language handling.
Reference¶
-
class
PolymerTokenizer
(entity_names, name='polymer-language', add_start_and_stop=True, **kwargs)[source]¶ Bases:
pytoda.smiles.smiles_language.SMILESTokenizer
PolymerTokenizer class.
PolymerTokenizer is an extension of SMILESTokenizer adding special start and stop tokens per entity. A polymer language is usually shared across several SMILES datasets (e.g. different entity sources).
-
__init__
(entity_names, name='polymer-language', add_start_and_stop=True, **kwargs)[source]¶ Initialize Polymer language able to encode different entities.
- Parameters
entity_names (Sequence[str]) – A list of entity names that the polymer language can distinguish.
name (str) – name of the PolymerTokenizer.
add_start_and_stop (bool) – add start and stop token indexes. Defaults to True.
kwargs (dict) – additional parameters passed to SMILESTokenizer.
Note
See set_smiles_transforms and set_encoding_transforms to change the transforms temporarily and reset with reset_initial_transforms. Assignment of class attributes in the parameter list will trigger such a reset.
-
update_entity
(entity)[source]¶ Update the current entity and the default transforms (used e.g. in add_dataset) of the Polymer language object.
- Parameters
entity (str) – a chemical entity (e.g. ‘Monomer’).
- Return type
None
-
smiles_to_token_indexes
(smiles, entity=None)[source]¶ Transform character-level SMILES into a sequence of token indexes.
In case of add_start_stop, inserts entity specific tokens.
- Parameters
smiles (str) – a SMILES (or SELFIES) representation.
entity (str) – a chemical entity (e.g. ‘Monomer’). Defaults to None, where the current entity is used (initially the SMILESTokenizer default).
- Returns
- indexes representation for the
SMILES/SELFIES provided.
- Return type
Union[Indexes, Tensor]
-
reset_initial_transforms
()[source]¶ Reset smiles and token indexes transforms as on initialization, including entity specific transforms.
-