pytoda.proteins.protein_feature_language module¶

Protein language handling.

Summary¶

Exceptions:

IndexesToSequenceError

Classes:

ProteinFeatureLanguage

ProteinFeatureLanguage class.

Functions:

token_indexes_to_sequence_raise

monkey patch to raise Error.

Reference¶

exception IndexesToSequenceError[source]¶: Bases: Exception

token_indexes_to_sequence_raise(token_indexes)[source]¶

monkey patch to raise Error.

Return type: str

class ProteinFeatureLanguage(name='protein-feature-language', features='blosum', tokenizer=<class 'list'>, add_start_and_stop=True)[source]¶

Bases: pytoda.proteins.protein_language.ProteinLanguage

ProteinFeatureLanguage class.

ProteinFeatureLanguage handles Protein data and translates from text to feature space

__init__(name='protein-feature-language', features='blosum', tokenizer=<class 'list'>, add_start_and_stop=True)[source]¶

Initialize Protein feature language.

Parameters

name (str) – name of the ProteinFeatureLanguage.
features (str) – Feature alphabet choice. Defaults to ‘blosum’, alternatives are ‘binary_features’, ‘float_features’ and ‘blosum_norm’.
tokenizer (Tokenizer) – This needs to be a function used to tokenize the amino acid sequences. The default is list which simply splits the sequence character-by-character.

token_indexes_to_sequence(token_indexes)[source]¶

Transform a list of tuples of token indexes into amino acid sequence.

Parameters: token_indexes (list) – a list of tuples, one tuple per AA and each tuple has length self.number_of_features
Returns: an amino acid sequence representation.
Return type: str

property method¶

A string denoting the language encoding method

Return type: str