Extraction (`pymia.data.extraction` package)¶

Datasource (`pymia.data.extraction.datasource` module)¶

class pymia.data.extraction.datasource.PymiaDatasource(dataset_path: str, indexing_strategy: IndexingStrategy | None = None, extractor: Extractor | None = None, transform: Transform | None = None, subject_subset: list | None = None, init_reader_once: bool = True)[source]¶

Bases: object

Provides convenient and adaptable reading of the data from a created dataset.

Parameters:

dataset_path (str) – The path to the dataset to be read from.
indexing_strategy (.IndexingStrategy) – Strategy defining how the data is indexed for reading.
extractor (.Extractor) – Extractor or multiple extractors (ComposeExtractor) extracting the desired data from the dataset.
transform (.Transform) – Transformation(s) to be applied to the extracted data.
subject_subset (list) – A list of subject identifiers defining a subset of subject to be processed.
init_reader_once (bool) – Whether the reader is initialized once or for every retrieval (default: True)

Examples

The class mainly allows to modes of operation. The first mode is by extracting the data by index.

>>> ds = PymiaDatasource(...)
>>> for i in range(len(ds)):
>>>     sample = ds[i]

The second mode of operation is by directly extracting data.

>>> ds = PymiaDatasource(...)
>>> # Different from ds[index] since the extractor and transform override the ones in ds
>>> sample = ds.direct_extract(extractor, index, transform=transform)

Typically, the first mode is use to loop over the entire dataset as fast as possible, extracting just the necessary information, such as data chunks (e.g., slice, patch, sub-volume). Less critical information (e.g. image shape, orientation) not required with every chunk of data can independently be extracted with the second mode of operation.

close_reader()[source]¶: Close the reader.

direct_extract(extractor: Extractor, subject_index: int, index_expr: IndexExpression | None = None, transform: Transform | None = None)[source]¶

Extract data directly, bypassing the extractors and transforms of the instance.

The purpose of this method is to enable extraction of data that is not required for every data chunk (e.g., slice, patch, sub-volume) but only from time to time e.g., image shape, origin.

Parameters:

extractor (.Extractor) – Extractor or multiple extractors (ComposeExtractor) extracting the desired data from the dataset.
subject_index (int) – Index of the subject to be extracted.
index_expr (.IndexExpression) – The indexing to extract a chunk of data only. Not required if only image related information (e.g., image shape, origin) should be extracted. Needed when desiring a chunk of data (e.g., slice, patch, sub-volume).
transform (.Transform) – Transformation(s) to be applied to the extracted data.

Returns:

Extracted data in a dictionary. Keys are defined by the used Extractor.

Return type:

dict

get_subjects()[source]¶

“Get all the subjects in the dataset.

Returns:: All subject identifiers in the dataset.
Return type:: list

indices¶

A list containing all sample indices. This is a mapping from item i to tuple (subject_index, index_expression).

Type:: list

set_extractor(extractor: Extractor)[source]¶

Set the extractor(s).

Parameters:: extractor (.Extractor) – Extractor or multiple extractors (ComposeExtractor) extracting the desired data from the dataset.

set_indexing_strategy(indexing_strategy: IndexingStrategy, subject_subset: list | None = None)[source]¶

Set (or modify) the indexing strategy.

Parameters:

indexing_strategy (.IndexingStrategy) – Strategy defining how the data is indexed for reading.
subject_subset (list) – A list of subject identifiers defining a subset of subject to be processed.

set_transform(transform: Transform)[source]¶

Set the transform.

Parameters:: transform (.Transform) – Transformation(s) to be applied to the extracted data.

Extractor (`pymia.data.extraction.extractor` module)¶

class pymia.data.extraction.extractor.ComposeExtractor(extractors: list)[source]¶

Bases: Extractor

Composes many Extractor instances and behaves like an single Extractor instance.

Parameters:: extractors (list) – A list of Extractor instances.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.DataExtractor(categories=('images',), ignore_indexing: bool = False)[source]¶

Bases: Extractor

Extracts data of a given category.

Adds category as key to extracted.

Parameters:

categories (tuple) – Categories for which to extract the names.
ignore_indexing (bool) – Whether to ignore the indexing in params. This is useful when extracting entire images.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.Extractor[source]¶

Bases: ABC

Interface unifying the extraction of data from a dataset.

abstract extract(reader: Reader, params: dict, extracted: dict) → None[source]¶

Extract data from the dataset.

Parameters:

reader (.Reader) – Reader instance that can read from dataset.
params (dict) – Extraction parameters containing information such as subject index and index expression.
extracted (dict) – The dictionary to put the extracted data in.

class pymia.data.extraction.extractor.FilesExtractor(cache: bool = True, categories=('images', 'labels'))[source]¶

Bases: Extractor

Extracts the file paths.

Added key to extracted:

pymia.data.definition.KEY_FILE_ROOT with str content
pymia.data.definition.KEY_PLACEHOLDER_FILES with str content

Parameters:

cache (bool) – Whether to cache the results. If True, the dataset is only accessed once. True is often preferred since the file name entries are typically unique in the dataset (i.e. independent of data chunks).
categories (tuple) – Categories for which to extract the file names.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.FilesystemDataExtractor(categories=('images',), load_fn=None, ignore_indexing: bool = False, override_file_root=None)[source]¶

Bases: Extractor

Extracts data of a given category.

Adds category as key to extracted.

Parameters:

categories (tuple) – Categories for which to extract the names.
load_fn (callable) – Callable that loads a file given the file path and the category, and returns a numpy.ndarray.
ignore_indexing (bool) – Whether to ignore the indexing in params. This is useful when extracting entire images.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.ImagePropertiesExtractor(do_pickle: bool = False)[source]¶

Bases: Extractor

Extracts the image properties.

Added key to extracted:

pymia.data.definition.KEY_PROPERTIES with ImageProperties content (or byte if do_pickle)

Parameters:: do_pickle (bool) – whether to pickle the extracted ImageProperties instance. This allows usage in multiprocessing environment.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.ImagePropertyShapeExtractor(numpy_format: bool = True)[source]¶

Bases: Extractor

Extracts the shape image property of an image.

Added key to extracted:

pymia.data.definition.KEY_SHAPE with tuple content

Parameters:: numpy_format (bool) – Whether the shape is numpy or ITK format (first and last dimension are swapped).

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.IndexingExtractor(do_pickle: bool = False)[source]¶

Bases: Extractor

Extracts the index expression.

Added key to extracted:

pymia.data.definition.KEY_SUBJECT_INDEX with int content
pymia.data.definition.KEY_INDEX_EXPR with IndexExpression content

Parameters:: do_pickle (bool) – whether to pickle the extracted ImageProperties instance. This is useful when applied with PyTorch DataLoader since it prevents from automatic translation to torch.Tensor.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.NamesExtractor(cache: bool = True, categories=('images', 'labels'))[source]¶

Bases: Extractor

Extracts the names of the entries within a category (e.g. “Flair”, “T1” for the category “images”).

Added key to extracted:

pymia.data.definition.KEY_PLACEHOLDER_NAMES with str content

Parameters:

cache (bool) – Whether to cache the results. If True, the dataset is only accessed once. True is often preferred since the name entries are typically unique in the dataset.
categories (tuple) – Categories for which to extract the names.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.PadDataExtractor(padding: tuple | List[tuple], extractor: Extractor, pad_fn=None)[source]¶

Bases: Extractor

Pads the data extracted by extractor

Parameters:

padding (tuple, list) – Lengths of the tuple or the list must be equal to the number of dimensions of the extracted data. If tuple, values are considered as symmetric padding in each dimension. If list, the each entry must consist of a tuple indicating (left, right) padding for one dimension.
extractor (.Extractor) – The extractor performing the extraction of the data to be padded.
pad_fn (callable, optional) – Optional function performing the padding. Default is PadDataExtractor.zero_pad().

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.RandomDataExtractor(selection=None, category: str = 'labels')[source]¶

Bases: Extractor

Extracts data of a given category randomly.

Adds category as key to extracted.

pymia.data.definition.KEY_PLACEHOLDER_NAMES_SELECTED with selection content

Parameters:

selection (str, tuple) – Entries (e.g., “T1”, “T2”) within the category to select an entry randomly from. If selection is None, an entry from all entries is randomly selected.
category (str) – The category (e.g. “images”) to extract data from.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.SelectiveDataExtractor(selection=None, category: str = 'labels')[source]¶

Bases: Extractor

Extracts data of a given category selectively.

Adds category as key to extracted, as well as

pymia.data.definition.KEY_PLACEHOLDER_NAMES_SELECTED with selection content

Parameters:

selection (str, tuple) – Entries (e.g., “T1”, “T2”) within the category to select. If selection is None, the class has the same behaviour as the DataExtractor and selects all entries.
category (str) – The category (e.g. “images”) to extract data from.

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

class pymia.data.extraction.extractor.SubjectExtractor[source]¶

Bases: Extractor

Extracts the subject’s identification.

Added key to extracted:

pymia.data.definition.KEY_SUBJECT_INDEX with int content
pymia.data.definition.KEY_SUBJECT with str content

extract(reader: Reader, params: dict, extracted: dict) → None[source]¶: see Extractor.extract()

Indexing (`pymia.data.extraction.indexing` module)¶

class pymia.data.extraction.indexing.EmptyIndexing[source]¶

Bases: IndexingStrategy

An empty indexing strategy. This is useful when a strategy is required but entire images should be extracted.

class pymia.data.extraction.indexing.IndexingStrategy[source]¶

Bases: ABC

Interface for indexing strategies that can be applied to images.

abstract __call__(shape: tuple) → List[IndexExpression][source]¶

Calculate the indexes for a given shape

Parameters:: shape (tuple) – The shape to determine the indexes for.
Returns:: The list of IndexExpression instances defining the indexes for an image shape.
Return type:: list

__repr__() → str[source]¶

Returns:: Representation of the strategy. Should include attributes such that it uniquely defines the strategy.
Return type:: str

class pymia.data.extraction.indexing.PatchWiseIndexing(patch_shape: tuple, ignore_incomplete=True)[source]¶

Bases: IndexingStrategy

Strategy to generate indices for patches (sub-volumes) of an image.

Parameters:

patch_shape (tuple) – The patch shape.
ignore_incomplete (bool) – If even division of image by patch shape ignore incomplete patch on True. Boundary condition.

class pymia.data.extraction.indexing.SliceIndexing(slice_axis: int | tuple = 0)[source]¶

Bases: IndexingStrategy

Strategy to generate a slice-wise indexing.

Parameters:: slice_axis (int, tuple) – The axis to be sliced. Multi-axis slicing can be achieved by providing a tuple of axes.

class pymia.data.extraction.indexing.VoxelWiseIndexing(image_dimension: int = 3)[source]¶

Bases: IndexingStrategy

Strategy to generate indices for every voxel of an image.

Parameters:: image_dimension (int) – The image dimension without the dimension of the voxels itself.

Reader (`pymia.data.extraction.reader` module)¶

class pymia.data.extraction.reader.Hdf5Reader(file_path: str, category='images')[source]¶

Bases: Reader

Represents the dataset reader for HDF5 files.

Initializes a new instance.

Parameters:

file_path (str) – The path to the dataset file.
category (str) – The category of an entry that defines the shape request

close()[source]¶: see Reader.close()

get_shape(subject_index: int) → list[source]¶: see Reader.get_shape()

get_subject_entries() → list[source]¶: see Reader.get_subject_entries()

get_subjects() → list[source]¶: see Reader.get_subjects()

has(entry: str) → bool[source]¶: see Reader.has()

open()[source]¶: see Reader.open()

read(entry: str, index: IndexExpression | None = None)[source]¶: see Reader.read()

class pymia.data.extraction.reader.Reader(file_path: str)[source]¶

Bases: ABC

Abstract dataset reader.

Parameters:: file_path (str) – The path to the dataset file.

abstract close()[source]¶: Close the reader.

abstract get_shape(subject_index: int) → list[source]¶

Get the shape from an entry.

Parameters:: subject_index (int) – The index of the subject.
Returns:: The shape of each dimension.
Return type:: list

abstract get_subject_entries() → list[source]¶

Get the dataset entries holding the subject’s data.

Returns:: The list of subject entry strings.
Return type:: list

abstract get_subjects() → list[source]¶

Get the subject names in the dataset.

Returns:: The list of subject names.
Return type:: list

abstract has(entry: str) → bool[source]¶

Check whether a dataset entry exists.

Parameters:: entry (str) – The dataset entry.
Returns:: Whether the entry exists.
Return type:: bool

abstract open()[source]¶: Open the reader.

abstract read(entry: str, index: IndexExpression | None = None)[source]¶

Read a dataset entry.

Parameters:

entry (str) – The dataset entry.
index (expr.IndexExpression) – The slicing expression.

Returns:

The read data.

pymia.data.extraction.reader.get_reader(file_path: str, direct_open: bool = False) → Reader[source]¶

Get the dataset reader corresponding to the file extension.

Parameters:

file_path (str) – The path to the dataset file.
direct_open (bool) – Whether the file should directly be opened.

Returns:

Reader corresponding to dataset file extension.

Return type:

Reader

pymia.data.extraction.reader.reader_registry = {'.h5': <class 'pymia.data.extraction.reader.Hdf5Reader'>, '.hdf5': <class 'pymia.data.extraction.reader.Hdf5Reader'>}¶: Registry defining the mapping between file extension and Reader class. Alternative writers need to be added to this registry in order to use get_reader().

Selection (`pymia.data.extraction.selection` module)¶

class pymia.data.extraction.selection.ComposeSelection(strategies)[source]¶: Bases: SelectionStrategy

class pymia.data.extraction.selection.NonBlackSelection(black_value: float = 0.0)[source]¶: Bases: SelectionStrategy

class pymia.data.extraction.selection.NonConstantSelection(loop_axis=None)[source]¶: Bases: SelectionStrategy

class pymia.data.extraction.selection.PercentileSelection(percentile: float)[source]¶: Bases: SelectionStrategy

class pymia.data.extraction.selection.SelectionStrategy[source]¶

Bases: ABC

Interface for selecting indices according some rule.

abstract __call__(sample: dict) → bool[source]¶

Parameters:: sample (dict) – An extracted from PymiaDatasource.
Returns:: Whether or not the sample should be considered.
Return type:: bool

__repr__() → str[source]¶

Returns:: Representation of the strategy. Should include attributes such that it uniquely defines the strategy.
Return type:: str

class pymia.data.extraction.selection.SubjectSelection(subjects)[source]¶

Bases: SelectionStrategy

Select subjects by their name or index.

class pymia.data.extraction.selection.WithForegroundSelection[source]¶: Bases: SelectionStrategy

Extraction (pymia.data.extraction package)¶

Datasource (pymia.data.extraction.datasource module)¶

Extractor (pymia.data.extraction.extractor module)¶

Indexing (pymia.data.extraction.indexing module)¶

Reader (pymia.data.extraction.reader module)¶

Selection (pymia.data.extraction.selection module)¶

Extraction (`pymia.data.extraction` package)¶

Datasource (`pymia.data.extraction.datasource` module)¶

Extractor (`pymia.data.extraction.extractor` module)¶

Indexing (`pymia.data.extraction.indexing` module)¶

Reader (`pymia.data.extraction.reader` module)¶

Selection (`pymia.data.extraction.selection` module)¶