pytoda.datasets.utils.utils module¶
Utils for the dataset module.
Summary¶
Functions:
Concatenate file-based datasets into a single one, with the ability to |
|
Returns mutated shallow copy of passed dataset instance, where indexing behavior is changed to additionally returning index. |
|
Returns mutated shallow copy of passed dataset instance, where indexing behavior is changed to additionally returning key. |
|
Padding function for a single item of a batch. |
|
Human readable file size. |
Reference¶
-
sizeof_fmt
(num, suffix='B')[source]¶ Human readable file size. Source: https://stackoverflow.com/a/1094933
-
concatenate_file_based_datasets
(filepaths, dataset_class, **kwargs)[source]¶ - Concatenate file-based datasets into a single one, with the ability to
get the source dataset of items.
- Parameters
filepaths (Files) – list of filepaths.
dataset_class (type) – dataset class reading from file. Supports KeyDataset and DatasetDelegator. For pure torch.utils.data.Dataset the returned instance can still be used like a pytoda.datasets.TransparentConcatDataset, but methods depending on key lookup will fail.
kwargs (dict) – additional args for dataset_class.__init__(filepath, **kwargs).
- Returns
the concatenated dataset.
- Return type
-
indexed
(dataset)[source]¶ Returns mutated shallow copy of passed dataset instance, where indexing behavior is changed to additionally returning index.
- Return type
Union
[KeyDataset
,DatasetDelegator
,ConcatKeyDataset
]
-
keyed
(dataset)[source]¶ Returns mutated shallow copy of passed dataset instance, where indexing behavior is changed to additionally returning key.
- Return type
Union
[KeyDataset
,DatasetDelegator
,ConcatKeyDataset
]
-
pad_item
(item, padding_modes, padding_values, max_length)[source]¶ Padding function for a single item of a batch.
- Parameters
item (Tuple) – Tuple returned by the __getitem__ function of a Dataset class.
padding_modes (List[str]) – The type of padding to perform for each datum in item. Options are ‘constant’ for constant value padding, and ‘range’ to fill the tensor with a range of values.
padding_values (List) – The values with which to fill the background tensor for padding. Can be a constant value or a range depending on the datum to pad in item.
max_length (int) – The maximum length to which the datum should be padded.
- Returns
Tuple of tensors padded according to the given specifications.
- Return type
Tuple
- NOTE: pad_item function uses trailing dimensions as the repetitions argument
for range_tensor(), since the ‘length’ of the set is covered by the value_range. That is, if a tensor of shape (5,) is required for padding_mode ‘range’ then () is passed as shape into range_tensor function which will repeat range(5) exactly once thus giving us a (5,) tensor.