pytoda.files module¶
Utilities for file handling.
Summary¶
Functions:
Count lines in a file without persisting it in memory. |
|
Read a .smi (or .csv file with tab-separated values) in a pd.DataFrame. |
Reference¶
-
count_file_lines
(filepath, buffer_size=1048576)[source]¶ Count lines in a file without persisting it in memory.
- Parameters
filepath (str) – path to the file.
buffer_size (int) – size of the buffer.
- Returns
Number of lines in the file.
- Return type
int
-
read_smi
(filepath, chunk_size=None, index_col=1, names=['SMILES'], header=None, *args, **kwargs)[source]¶ Read a .smi (or .csv file with tab-separated values) in a pd.DataFrame.
- Parameters
filepath (str) – path to a .smi file.
chunk_size (int) – size of the chunk. Defaults to None, a.k.a. no chunking.
index_col (int) – Data column used for indexing, defaults to 1.
names (Sequence[str]) – User-assigned names given to the columns.
header (int) – Row number to use as column names. Defaults to None.
() (kwargs) – Optional arguments for pd.read_csv.
() – Optional keyword arguments for pd.read_csv.
- Returns
- a pd.DataFrame containing the data of the .smi file
where the index is the index_col column.
- Return type
pd.DataFrame