pytoda.files module

Utilities for file handling.

Summary

Functions:

count_file_lines

Count lines in a file without persisting it in memory.

read_smi

Read a .smi (or .csv file with tab-separated values) in a pd.DataFrame.

Reference

count_file_lines(filepath, buffer_size=1048576)[source]

Count lines in a file without persisting it in memory.

Parameters
  • filepath (str) – path to the file.

  • buffer_size (int) – size of the buffer.

Returns

Number of lines in the file.

Return type

int

read_smi(filepath, chunk_size=None, index_col=1, names=['SMILES'], header=None, *args, **kwargs)[source]

Read a .smi (or .csv file with tab-separated values) in a pd.DataFrame.

Parameters
  • filepath (str) – path to a .smi file.

  • chunk_size (int) – size of the chunk. Defaults to None, a.k.a. no chunking.

  • index_col (int) – Data column used for indexing, defaults to 1.

  • names (Sequence[str]) – User-assigned names given to the columns.

  • header (int) – Row number to use as column names. Defaults to None.

  • () (kwargs) – Optional arguments for pd.read_csv.

  • () – Optional keyword arguments for pd.read_csv.

Returns

a pd.DataFrame containing the data of the .smi file

where the index is the index_col column.

Return type

pd.DataFrame