pytoda.preprocessing.smi module

Processing utilities for .smi files.

Summary

Functions:

filter_invalid_smi

Execute chunked invalid SMILES filtering in a .smi file.

find_undesired_smiles

Whether or not a given SMILES is contained in a list of SMILES, respecting canonicalization.

find_undesired_smiles_files

Method to find undesired SMILES in a list of existing SMILES.

Reference

filter_invalid_smi(input_filepath, output_filepath, chunk_size=100000)[source]

Execute chunked invalid SMILES filtering in a .smi file.

Parameters
  • input_filepath (str) – path to the .smi file to process.

  • output_filepath (str) – path where to store the filtered .smi file.

  • chunk_size (int) – size of the SMILES chunk. Defaults to 100000.

find_undesired_smiles_files(undesired_filepath, data_filepath, save_matches=False, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, **smi_kwargs)[source]

Method to find undesired SMILES in a list of existing SMILES.

Parameters
  • undesired_filepath (str) – Path to .smi file with a header at first row.

  • data_filepath (str) – Path to .csv file with a column ‘SMILES’.

  • save_matches (bool, optional) – Whether found matches should be plotted and saved. Defaults to False.

find_undesired_smiles(smiles, undesired_smiles, canonical=False)[source]

Whether or not a given SMILES is contained in a list of SMILES, respecting canonicalization.

Parameters
  • smiles (str) – Seed SMILES.

  • undesired_smiles (List) – List of SMILES for comparison

  • canonical (bool, optional) – Whether comparison list was canonicalized. Defaults to False.

Returns

Whether SMILES was present in undesired_smiles.

Return type

bool