pytoda.files module¶

Utilities for file handling.

Summary¶

Functions:

`count_file_lines`	Count lines in a file without persisting it in memory.
`read_smi`	Read a .smi (or .csv file with tab-separated values) in a pd.DataFrame.

count_file_lines(filepath, buffer_size=1048576)[source]¶

Count lines in a file without persisting it in memory.

Parameters

Returns

Number of lines in the file.

Return type

int

read_smi(filepath, chunk_size=None, index_col=1, names=['SMILES'], header=None, *args, **kwargs)[source]¶

Read a .smi (or .csv file with tab-separated values) in a pd.DataFrame.

Parameters

filepath (str) – path to a .smi file.
chunk_size (int) – size of the chunk. Defaults to None, a.k.a. no chunking.
index_col (int) – Data column used for indexing, defaults to 1.
names (Sequence[str]) – User-assigned names given to the columns.
header (int) – Row number to use as column names. Defaults to None.
() (kwargs) – Optional arguments for pd.read_csv.
() – Optional keyword arguments for pd.read_csv.

Returns

a pd.DataFrame containing the data of the .smi file: where the index is the index_col column.

Return type

pd.DataFrame