pytoda.datasets.base_dataset module¶

Implementation of base classes working with datasets.

Summary¶

Classes:

`ConcatKeyDataset`	Extension of ConcatDataset with transparent indexing supporting KeyDataset
`DatasetDelegator`	Base class for KeyDataset attribute accesses from self.dataset.
`KeyDataset`	Base Class for Datsets with both integer index and item identifier key.
`TransparentConcatDataset`	Extension of ConcatDataset with transparent indexing.

Reference¶

class KeyDataset(*args, **kwds)[source]¶

Bases: Generic[torch.utils.data.dataset.T_co]

Base Class for Datsets with both integer index and item identifier key.

Implicit abstract methods are: __len__(self) https://github.com/pytorch/pytorch/blob/66a20c259b3b2063e59102ab23f3fb34fc819455/torch/utils/data/sampler.py#L23 __getitem__(self, index: int) is inherited

Default implementations to index using key and getting all keys are provided but should be overloaded when possible as calls to get_item and get_key might be expensive.

The keys are expected to be unique. Call has_duplicate_keys to make sure. If there are duplicate keys, on lookup generally the first one found will be used, but there are no guarantees.

get_key(index)[source]¶

Get key from integer index.

Return type: Hashable

get_index(key)[source]¶

Get index for first datum mapping to the given key.

Return type: int

get_item_from_key(key)[source]¶

Get item via key

Return type: Any

keys()[source]¶

Default iterator of keys by iterating over dataset indexes.

Return type: Iterator

property has_duplicate_keys¶

Check whether each key is unique.

Return type: bool

class DatasetDelegator(*args, **kwds)[source]¶

Bases: Generic[torch.utils.data.dataset.T_co]

Base class for KeyDataset attribute accesses from self.dataset.

The attributes/methods to delegate are stored to allow explicit filtering and addition to class documentation.

Source: https://www.fast.ai/2019/08/06/delegation/

get_item_from_key(key)[source]¶

Get datum mapping to the given key.

Return type: Any

class TransparentConcatDataset(datasets)[source]¶

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

Extension of ConcatDataset with transparent indexing.

get_index_pair(idx)[source]¶

Get dataset and sample indexes.

Parameters: idx (int) – index in the concatenated dataset.
Returns: dataset and sample index.
Return type: Tuple[int, int]

datasets: List[torch.utils.data.dataset.Dataset[T_co]]¶

cumulative_sizes: List[int]¶

class ConcatKeyDataset(datasets)[source]¶

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

Extension of ConcatDataset with transparent indexing supporting KeyDataset

The keys are expected to be unique. If there are duplicate keys, on lookup the first one found will be used by default.

__init__(datasets)[source]¶

Initialize the ConcatKeyDataset.

Parameters: datasets (List[AnyBaseDataset]) – a list of datasets.

get_key_pair(index)[source]¶

Get dataset index key from integer index.

Return type: Tuple[int, Hashable]

get_key(index)[source]¶

Get key from integer index.

Return type: Hashable

get_index(key)[source]¶

Get index for first datum mapping to the given key.

Return type: int

get_item_from_key(key)[source]¶

Get datum mapping to the given key.

Return type: Any

keys()[source]¶

Default generator of keys by iterating over dataset.

Return type: Iterator

datasets: List[torch.utils.data.dataset.Dataset[T_co]]¶

cumulative_sizes: List[int]¶