pytoda.preprocessing.smi module¶

Processing utilities for .smi files.

Summary¶

Functions:

`filter_invalid_smi`	Execute chunked invalid SMILES filtering in a .smi file.
`find_undesired_smiles`	Whether or not a given SMILES is contained in a list of SMILES, respecting canonicalization.
`find_undesired_smiles_files`	Method to find undesired SMILES in a list of existing SMILES.

Reference¶

filter_invalid_smi(input_filepath, output_filepath, chunk_size=100000)[source]¶

Execute chunked invalid SMILES filtering in a .smi file.

Parameters

input_filepath (str) – path to the .smi file to process.
output_filepath (str) – path where to store the filtered .smi file.
chunk_size (int) – size of the SMILES chunk. Defaults to 100000.

find_undesired_smiles_files(undesired_filepath, data_filepath, save_matches=False, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, **smi_kwargs)[source]¶

Method to find undesired SMILES in a list of existing SMILES.

Parameters

undesired_filepath (str) – Path to .smi file with a header at first row.
data_filepath (str) – Path to .csv file with a column ‘SMILES’.
save_matches (bool, optional) – Whether found matches should be plotted and saved. Defaults to False.

find_undesired_smiles(smiles, undesired_smiles, canonical=False)[source]¶

Whether or not a given SMILES is contained in a list of SMILES, respecting canonicalization.

Parameters

smiles (str) – Seed SMILES.
undesired_smiles (List) – List of SMILES for comparison
canonical (bool, optional) – Whether comparison list was canonicalized. Defaults to False.

Returns

Whether SMILES was present in undesired_smiles.

Return type

bool