Extraction (pymia.data.extraction package)

Datasource (pymia.data.extraction.datasource module)

class pymia.data.extraction.datasource.PymiaDatasource(dataset_path: str, indexing_strategy: Optional[pymia.data.extraction.indexing.IndexingStrategy] = None, extractor: Optional[pymia.data.extraction.extractor.Extractor] = None, transform: Optional[pymia.data.transformation.Transform] = None, subject_subset: Optional[list] = None, init_reader_once: bool = True)[source]

Bases: object

Provides convenient and adaptable reading of the data from a created dataset.

Parameters
  • dataset_path (str) – The path to the dataset to be read from.

  • indexing_strategy (.IndexingStrategy) – Strategy defining how the data is indexed for reading.

  • extractor (.Extractor) – Extractor or multiple extractors (ComposeExtractor) extracting the desired data from the dataset.

  • transform (.Transform) – Transformation(s) to be applied to the extracted data.

  • subject_subset (list) – A list of subject identifiers defining a subset of subject to be processed.

  • init_reader_once (bool) – Whether the reader is initialized once or for every retrieval (default: True)

Examples

The class mainly allows to modes of operation. The first mode is by extracting the data by index.

>>> ds = PymiaDatasource(...)
>>> for i in range(len(ds)):
>>>     sample = ds[i]

The second mode of operation is by directly extracting data.

>>> ds = PymiaDatasource(...)
>>> # Different from ds[index] since the extractor and transform override the ones in ds
>>> sample = ds.direct_extract(extractor, index, transform=transform)

Typically, the first mode is use to loop over the entire dataset as fast as possible, extracting just the necessary information, such as data chunks (e.g., slice, patch, sub-volume). Less critical information (e.g. image shape, orientation) not required with every chunk of data can independently be extracted with the second mode of operation.

close_reader()[source]

Close the reader.

direct_extract(extractor: pymia.data.extraction.extractor.Extractor, subject_index: int, index_expr: Optional[pymia.data.indexexpression.IndexExpression] = None, transform: Optional[pymia.data.transformation.Transform] = None)[source]

Extract data directly, bypassing the extractors and transforms of the instance.

The purpose of this method is to enable extraction of data that is not required for every data chunk (e.g., slice, patch, sub-volume) but only from time to time e.g., image shape, origin.

Parameters
  • extractor (.Extractor) – Extractor or multiple extractors (ComposeExtractor) extracting the desired data from the dataset.

  • subject_index (int) – Index of the subject to be extracted.

  • index_expr (.IndexExpression) – The indexing to extract a chunk of data only. Not required if only image related information (e.g., image shape, origin) should be extracted. Needed when desiring a chunk of data (e.g., slice, patch, sub-volume).

  • transform (.Transform) – Transformation(s) to be applied to the extracted data.

Returns

Extracted data in a dictionary. Keys are defined by the used Extractor.

Return type

dict

get_subjects()[source]

“Get all the subjects in the dataset.

Returns

All subject identifiers in the dataset.

Return type

list

indices

A list containing all sample indices. This is a mapping from item i to tuple (subject_index, index_expression).

Type

list

set_extractor(extractor: pymia.data.extraction.extractor.Extractor)[source]

Set the extractor(s).

Parameters

extractor (.Extractor) – Extractor or multiple extractors (ComposeExtractor) extracting the desired data from the dataset.

set_indexing_strategy(indexing_strategy: pymia.data.extraction.indexing.IndexingStrategy, subject_subset: Optional[list] = None)[source]

Set (or modify) the indexing strategy.

Parameters
  • indexing_strategy (.IndexingStrategy) – Strategy defining how the data is indexed for reading.

  • subject_subset (list) – A list of subject identifiers defining a subset of subject to be processed.

set_transform(transform: pymia.data.transformation.Transform)[source]

Set the transform.

Parameters

transform (.Transform) – Transformation(s) to be applied to the extracted data.

Extractor (pymia.data.extraction.extractor module)

class pymia.data.extraction.extractor.ComposeExtractor(extractors: list)[source]

Bases: pymia.data.extraction.extractor.Extractor

Composes many Extractor instances and behaves like an single Extractor instance.

Parameters

extractors (list) – A list of Extractor instances.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.DataExtractor(categories=('images',), ignore_indexing: bool = False)[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts data of a given category.

Adds category as key to extracted.

Parameters
  • categories (tuple) – Categories for which to extract the names.

  • ignore_indexing (bool) – Whether to ignore the indexing in params. This is useful when extracting entire images.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.Extractor[source]

Bases: abc.ABC

Interface unifying the extraction of data from a dataset.

abstract extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

Extract data from the dataset.

Parameters
  • reader (.Reader) – Reader instance that can read from dataset.

  • params (dict) – Extraction parameters containing information such as subject index and index expression.

  • extracted (dict) – The dictionary to put the extracted data in.

class pymia.data.extraction.extractor.FilesExtractor(cache: bool = True, categories=('images', 'labels'))[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts the file paths.

Added key to extracted:

Parameters
  • cache (bool) – Whether to cache the results. If True, the dataset is only accessed once. True is often preferred since the file name entries are typically unique in the dataset (i.e. independent of data chunks).

  • categories (tuple) – Categories for which to extract the file names.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.FilesystemDataExtractor(categories=('images',), load_fn=None, ignore_indexing: bool = False, override_file_root=None)[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts data of a given category.

Adds category as key to extracted.

Parameters
  • categories (tuple) – Categories for which to extract the names.

  • load_fn (callable) – Callable that loads a file given the file path and the category, and returns a numpy.ndarray.

  • ignore_indexing (bool) – Whether to ignore the indexing in params. This is useful when extracting entire images.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.ImagePropertiesExtractor(do_pickle: bool = False)[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts the image properties.

Added key to extracted:

Parameters

do_pickle (bool) – whether to pickle the extracted ImageProperties instance. This allows usage in multiprocessing environment.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.ImagePropertyShapeExtractor(numpy_format: bool = True)[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts the shape image property of an image.

Added key to extracted:

Parameters

numpy_format (bool) – Whether the shape is numpy or ITK format (first and last dimension are swapped).

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.IndexingExtractor(do_pickle: bool = False)[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts the index expression.

Added key to extracted:

Parameters

do_pickle (bool) – whether to pickle the extracted ImageProperties instance. This is useful when applied with PyTorch DataLoader since it prevents from automatic translation to torch.Tensor.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.NamesExtractor(cache: bool = True, categories=('images', 'labels'))[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts the names of the entries within a category (e.g. “Flair”, “T1” for the category “images”).

Added key to extracted:

Parameters
  • cache (bool) – Whether to cache the results. If True, the dataset is only accessed once. True is often preferred since the name entries are typically unique in the dataset.

  • categories (tuple) – Categories for which to extract the names.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.PadDataExtractor(padding: Union[tuple, List[tuple]], extractor: pymia.data.extraction.extractor.Extractor, pad_fn=None)[source]

Bases: pymia.data.extraction.extractor.Extractor

Pads the data extracted by extractor

Parameters
  • padding (tuple, list) – Lengths of the tuple or the list must be equal to the number of dimensions of the extracted data. If tuple, values are considered as symmetric padding in each dimension. If list, the each entry must consist of a tuple indicating (left, right) padding for one dimension.

  • extractor (.Extractor) – The extractor performing the extraction of the data to be padded.

  • pad_fn (callable, optional) – Optional function performing the padding. Default is PadDataExtractor.zero_pad().

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.RandomDataExtractor(selection=None, category: str = 'labels')[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts data of a given category randomly.

Adds category as key to extracted.

Parameters
  • selection (str, tuple) – Entries (e.g., “T1”, “T2”) within the category to select an entry randomly from. If selection is None, an entry from all entries is randomly selected.

  • category (str) – The category (e.g. “images”) to extract data from.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.SelectiveDataExtractor(selection=None, category: str = 'labels')[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts data of a given category selectively.

Adds category as key to extracted, as well as

Parameters
  • selection (str, tuple) – Entries (e.g., “T1”, “T2”) within the category to select. If selection is None, the class has the same behaviour as the DataExtractor and selects all entries.

  • category (str) – The category (e.g. “images”) to extract data from.

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

class pymia.data.extraction.extractor.SubjectExtractor[source]

Bases: pymia.data.extraction.extractor.Extractor

Extracts the subject’s identification.

Added key to extracted:

extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None[source]

see Extractor.extract()

Indexing (pymia.data.extraction.indexing module)

class pymia.data.extraction.indexing.EmptyIndexing[source]

Bases: pymia.data.extraction.indexing.IndexingStrategy

An empty indexing strategy. This is useful when a strategy is required but entire images should be extracted.

class pymia.data.extraction.indexing.IndexingStrategy[source]

Bases: abc.ABC

Interface for indexing strategies that can be applied to images.

abstract __call__(shape: tuple) List[pymia.data.indexexpression.IndexExpression][source]

Calculate the indexes for a given shape

Parameters

shape (tuple) – The shape to determine the indexes for.

Returns

The list of IndexExpression instances defining the indexes for an image shape.

Return type

list

__repr__() str[source]
Returns

Representation of the strategy. Should include attributes such that it uniquely defines the strategy.

Return type

str

class pymia.data.extraction.indexing.PatchWiseIndexing(patch_shape: tuple, ignore_incomplete=True)[source]

Bases: pymia.data.extraction.indexing.IndexingStrategy

Strategy to generate indices for patches (sub-volumes) of an image.

Parameters
  • patch_shape (tuple) – The patch shape.

  • ignore_incomplete (bool) – If even division of image by patch shape ignore incomplete patch on True. Boundary condition.

class pymia.data.extraction.indexing.SliceIndexing(slice_axis: Union[int, tuple] = 0)[source]

Bases: pymia.data.extraction.indexing.IndexingStrategy

Strategy to generate a slice-wise indexing.

Parameters

slice_axis (int, tuple) – The axis to be sliced. Multi-axis slicing can be achieved by providing a tuple of axes.

class pymia.data.extraction.indexing.VoxelWiseIndexing(image_dimension: int = 3)[source]

Bases: pymia.data.extraction.indexing.IndexingStrategy

Strategy to generate indices for every voxel of an image.

Parameters

image_dimension (int) – The image dimension without the dimension of the voxels itself.

Reader (pymia.data.extraction.reader module)

class pymia.data.extraction.reader.Hdf5Reader(file_path: str, category='images')[source]

Bases: pymia.data.extraction.reader.Reader

Represents the dataset reader for HDF5 files.

Initializes a new instance.

Parameters
  • file_path (str) – The path to the dataset file.

  • category (str) – The category of an entry that defines the shape request

close()[source]

see Reader.close()

get_shape(subject_index: int) list[source]

see Reader.get_shape()

get_subject_entries() list[source]

see Reader.get_subject_entries()

get_subjects() list[source]

see Reader.get_subjects()

has(entry: str) bool[source]

see Reader.has()

open()[source]

see Reader.open()

read(entry: str, index: Optional[pymia.data.indexexpression.IndexExpression] = None)[source]

see Reader.read()

class pymia.data.extraction.reader.Reader(file_path: str)[source]

Bases: abc.ABC

Abstract dataset reader.

Parameters

file_path (str) – The path to the dataset file.

abstract close()[source]

Close the reader.

abstract get_shape(subject_index: int) list[source]

Get the shape from an entry.

Parameters

subject_index (int) – The index of the subject.

Returns

The shape of each dimension.

Return type

list

abstract get_subject_entries() list[source]

Get the dataset entries holding the subject’s data.

Returns

The list of subject entry strings.

Return type

list

abstract get_subjects() list[source]

Get the subject names in the dataset.

Returns

The list of subject names.

Return type

list

abstract has(entry: str) bool[source]

Check whether a dataset entry exists.

Parameters

entry (str) – The dataset entry.

Returns

Whether the entry exists.

Return type

bool

abstract open()[source]

Open the reader.

abstract read(entry: str, index: Optional[pymia.data.indexexpression.IndexExpression] = None)[source]

Read a dataset entry.

Parameters
  • entry (str) – The dataset entry.

  • index (expr.IndexExpression) – The slicing expression.

Returns

The read data.

pymia.data.extraction.reader.get_reader(file_path: str, direct_open: bool = False) pymia.data.extraction.reader.Reader[source]

Get the dataset reader corresponding to the file extension.

Parameters
  • file_path (str) – The path to the dataset file.

  • direct_open (bool) – Whether the file should directly be opened.

Returns

Reader corresponding to dataset file extension.

Return type

Reader

pymia.data.extraction.reader.reader_registry = {'.h5': <class 'pymia.data.extraction.reader.Hdf5Reader'>, '.hdf5': <class 'pymia.data.extraction.reader.Hdf5Reader'>}

Registry defining the mapping between file extension and Reader class. Alternative writers need to be added to this registry in order to use get_reader().

Selection (pymia.data.extraction.selection module)

class pymia.data.extraction.selection.ComposeSelection(strategies)[source]

Bases: pymia.data.extraction.selection.SelectionStrategy

class pymia.data.extraction.selection.NonBlackSelection(black_value: float = 0.0)[source]

Bases: pymia.data.extraction.selection.SelectionStrategy

class pymia.data.extraction.selection.NonConstantSelection(loop_axis=None)[source]

Bases: pymia.data.extraction.selection.SelectionStrategy

class pymia.data.extraction.selection.PercentileSelection(percentile: float)[source]

Bases: pymia.data.extraction.selection.SelectionStrategy

class pymia.data.extraction.selection.SelectionStrategy[source]

Bases: abc.ABC

Interface for selecting indices according some rule.

abstract __call__(sample: dict) bool[source]
Parameters

sample (dict) – An extracted from PymiaDatasource.

Returns

Whether or not the sample should be considered.

Return type

bool

__repr__() str[source]
Returns

Representation of the strategy. Should include attributes such that it uniquely defines the strategy.

Return type

str

class pymia.data.extraction.selection.SubjectSelection(subjects)[source]

Bases: pymia.data.extraction.selection.SelectionStrategy

Select subjects by their name or index.

class pymia.data.extraction.selection.WithForegroundSelection[source]

Bases: pymia.data.extraction.selection.SelectionStrategy