pytoda.datasets.base_dataset module¶
Implementation of base classes working with datasets.
Summary¶
Classes:
Extension of ConcatDataset with transparent indexing supporting KeyDataset |
|
Base class for KeyDataset attribute accesses from self.dataset. |
|
Base Class for Datsets with both integer index and item identifier key. |
|
Extension of ConcatDataset with transparent indexing. |
Reference¶
-
class
KeyDataset
(*args, **kwds)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Base Class for Datsets with both integer index and item identifier key.
Implicit abstract methods are: __len__(self) https://github.com/pytorch/pytorch/blob/66a20c259b3b2063e59102ab23f3fb34fc819455/torch/utils/data/sampler.py#L23 __getitem__(self, index: int) is inherited
Default implementations to index using key and getting all keys are provided but should be overloaded when possible as calls to get_item and get_key might be expensive.
The keys are expected to be unique. Call has_duplicate_keys to make sure. If there are duplicate keys, on lookup generally the first one found will be used, but there are no guarantees.
-
property
has_duplicate_keys
¶ Check whether each key is unique.
- Return type
bool
-
property
-
class
DatasetDelegator
(*args, **kwds)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Base class for KeyDataset attribute accesses from self.dataset.
The attributes/methods to delegate are stored to allow explicit filtering and addition to class documentation.
-
class
TransparentConcatDataset
(datasets)[source]¶ Bases:
torch.utils.data.dataset.Dataset
[torch.utils.data.dataset.T_co
]Extension of ConcatDataset with transparent indexing.
-
get_index_pair
(idx)[source]¶ Get dataset and sample indexes.
- Parameters
idx (int) – index in the concatenated dataset.
- Returns
dataset and sample index.
- Return type
Tuple[int, int]
-
datasets
: List[torch.utils.data.dataset.Dataset[T_co]]¶
-
cumulative_sizes
: List[int]¶
-
-
class
ConcatKeyDataset
(datasets)[source]¶ Bases:
torch.utils.data.dataset.Dataset
[torch.utils.data.dataset.T_co
]Extension of ConcatDataset with transparent indexing supporting KeyDataset
The keys are expected to be unique. If there are duplicate keys, on lookup the first one found will be used by default.
-
__init__
(datasets)[source]¶ Initialize the ConcatKeyDataset.
- Parameters
datasets (List[AnyBaseDataset]) – a list of datasets.
-
get_key_pair
(index)[source]¶ Get dataset index key from integer index.
- Return type
Tuple
[int
,Hashable
]
-
datasets
: List[torch.utils.data.dataset.Dataset[T_co]]¶
-
cumulative_sizes
: List[int]¶
-