pytoda.proteins.protein_feature_language module¶
Protein language handling.
Reference¶
-
token_indexes_to_sequence_raise
(token_indexes)[source]¶ monkey patch to raise Error.
- Return type
str
-
class
ProteinFeatureLanguage
(name='protein-feature-language', features='blosum', tokenizer=<class 'list'>, add_start_and_stop=True)[source]¶ Bases:
pytoda.proteins.protein_language.ProteinLanguage
ProteinFeatureLanguage class.
ProteinFeatureLanguage handles Protein data and translates from text to feature space
-
__init__
(name='protein-feature-language', features='blosum', tokenizer=<class 'list'>, add_start_and_stop=True)[source]¶ Initialize Protein feature language.
- Parameters
name (str) – name of the ProteinFeatureLanguage.
features (str) – Feature alphabet choice. Defaults to ‘blosum’, alternatives are ‘binary_features’, ‘float_features’ and ‘blosum_norm’.
tokenizer (Tokenizer) – This needs to be a function used to tokenize the amino acid sequences. The default is list which simply splits the sequence character-by-character.
-
token_indexes_to_sequence
(token_indexes)[source]¶ Transform a list of tuples of token indexes into amino acid sequence.
- Parameters
token_indexes (list) – a list of tuples, one tuple per AA and each tuple has length self.number_of_features
- Returns
an amino acid sequence representation.
- Return type
str
-
property
method
¶ A string denoting the language encoding method
- Return type
str
-