pytoda.proteins.protein_feature_language module

Protein language handling.

Summary

Exceptions:

IndexesToSequenceError

Classes:

ProteinFeatureLanguage

ProteinFeatureLanguage class.

Functions:

token_indexes_to_sequence_raise

monkey patch to raise Error.

Reference

exception IndexesToSequenceError[source]

Bases: Exception

token_indexes_to_sequence_raise(token_indexes)[source]

monkey patch to raise Error.

Return type

str

class ProteinFeatureLanguage(name='protein-feature-language', features='blosum', tokenizer=<class 'list'>, add_start_and_stop=True)[source]

Bases: pytoda.proteins.protein_language.ProteinLanguage

ProteinFeatureLanguage class.

ProteinFeatureLanguage handles Protein data and translates from text to feature space

__init__(name='protein-feature-language', features='blosum', tokenizer=<class 'list'>, add_start_and_stop=True)[source]

Initialize Protein feature language.

Parameters
  • name (str) – name of the ProteinFeatureLanguage.

  • features (str) – Feature alphabet choice. Defaults to ‘blosum’, alternatives are ‘binary_features’, ‘float_features’ and ‘blosum_norm’.

  • tokenizer (Tokenizer) – This needs to be a function used to tokenize the amino acid sequences. The default is list which simply splits the sequence character-by-character.

token_indexes_to_sequence(token_indexes)[source]

Transform a list of tuples of token indexes into amino acid sequence.

Parameters

token_indexes (list) – a list of tuples, one tuple per AA and each tuple has length self.number_of_features

Returns

an amino acid sequence representation.

Return type

str

property method

A string denoting the language encoding method

Return type

str