pytoda.datasets.base_dataset module

Implementation of base classes working with datasets.

Summary

Classes:

ConcatKeyDataset

Extension of ConcatDataset with transparent indexing supporting KeyDataset

DatasetDelegator

Base class for KeyDataset attribute accesses from self.dataset.

KeyDataset

Base Class for Datsets with both integer index and item identifier key.

TransparentConcatDataset

Extension of ConcatDataset with transparent indexing.

Reference

class KeyDataset(*args, **kwds)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

Base Class for Datsets with both integer index and item identifier key.

Implicit abstract methods are: __len__(self) https://github.com/pytorch/pytorch/blob/66a20c259b3b2063e59102ab23f3fb34fc819455/torch/utils/data/sampler.py#L23 __getitem__(self, index: int) is inherited

Default implementations to index using key and getting all keys are provided but should be overloaded when possible as calls to get_item and get_key might be expensive.

The keys are expected to be unique. Call has_duplicate_keys to make sure. If there are duplicate keys, on lookup generally the first one found will be used, but there are no guarantees.

get_key(index)[source]

Get key from integer index.

Return type

Hashable

get_index(key)[source]

Get index for first datum mapping to the given key.

Return type

int

get_item_from_key(key)[source]

Get item via key

Return type

Any

keys()[source]

Default iterator of keys by iterating over dataset indexes.

Return type

Iterator

property has_duplicate_keys

Check whether each key is unique.

Return type

bool

class DatasetDelegator(*args, **kwds)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

Base class for KeyDataset attribute accesses from self.dataset.

The attributes/methods to delegate are stored to allow explicit filtering and addition to class documentation.

Source: https://www.fast.ai/2019/08/06/delegation/

get_item_from_key(key)[source]

Get datum mapping to the given key.

Return type

Any

class TransparentConcatDataset(datasets)[source]

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

Extension of ConcatDataset with transparent indexing.

get_index_pair(idx)[source]

Get dataset and sample indexes.

Parameters

idx (int) – index in the concatenated dataset.

Returns

dataset and sample index.

Return type

Tuple[int, int]

datasets: List[torch.utils.data.dataset.Dataset[T_co]]
cumulative_sizes: List[int]
class ConcatKeyDataset(datasets)[source]

Bases: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

Extension of ConcatDataset with transparent indexing supporting KeyDataset

The keys are expected to be unique. If there are duplicate keys, on lookup the first one found will be used by default.

__init__(datasets)[source]

Initialize the ConcatKeyDataset.

Parameters

datasets (List[AnyBaseDataset]) – a list of datasets.

get_key_pair(index)[source]

Get dataset index key from integer index.

Return type

Tuple[int, Hashable]

get_key(index)[source]

Get key from integer index.

Return type

Hashable

get_index(key)[source]

Get index for first datum mapping to the given key.

Return type

int

get_item_from_key(key)[source]

Get datum mapping to the given key.

Return type

Any

keys()[source]

Default generator of keys by iterating over dataset.

Return type

Iterator

datasets: List[torch.utils.data.dataset.Dataset[T_co]]
cumulative_sizes: List[int]