pytoda.preprocessing.smi module¶
Processing utilities for .smi files.
Summary¶
Functions:
Execute chunked invalid SMILES filtering in a .smi file. |
|
Whether or not a given SMILES is contained in a list of SMILES, respecting canonicalization. |
|
Method to find undesired SMILES in a list of existing SMILES. |
Reference¶
-
filter_invalid_smi
(input_filepath, output_filepath, chunk_size=100000)[source]¶ Execute chunked invalid SMILES filtering in a .smi file.
- Parameters
input_filepath (str) – path to the .smi file to process.
output_filepath (str) – path where to store the filtered .smi file.
chunk_size (int) – size of the SMILES chunk. Defaults to 100000.
-
find_undesired_smiles_files
(undesired_filepath, data_filepath, save_matches=False, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>, **smi_kwargs)[source]¶ Method to find undesired SMILES in a list of existing SMILES.
- Parameters
undesired_filepath (str) – Path to .smi file with a header at first row.
data_filepath (str) – Path to .csv file with a column ‘SMILES’.
save_matches (bool, optional) – Whether found matches should be plotted and saved. Defaults to False.
-
find_undesired_smiles
(smiles, undesired_smiles, canonical=False)[source]¶ Whether or not a given SMILES is contained in a list of SMILES, respecting canonicalization.
- Parameters
smiles (str) – Seed SMILES.
undesired_smiles (List) – List of SMILES for comparison
canonical (bool, optional) – Whether comparison list was canonicalized. Defaults to False.
- Returns
Whether SMILES was present in undesired_smiles.
- Return type
bool