Extraction (pymia.data.extraction package)¶
Datasource (pymia.data.extraction.datasource module)¶
- class pymia.data.extraction.datasource.PymiaDatasource(dataset_path: str, indexing_strategy: IndexingStrategy | None = None, extractor: Extractor | None = None, transform: Transform | None = None, subject_subset: list | None = None, init_reader_once: bool = True)[source]¶
Bases:
objectProvides convenient and adaptable reading of the data from a created dataset.
- Parameters:
dataset_path (str) – The path to the dataset to be read from.
indexing_strategy (.IndexingStrategy) – Strategy defining how the data is indexed for reading.
extractor (.Extractor) – Extractor or multiple extractors (
ComposeExtractor) extracting the desired data from the dataset.transform (.Transform) – Transformation(s) to be applied to the extracted data.
subject_subset (list) – A list of subject identifiers defining a subset of subject to be processed.
init_reader_once (bool) – Whether the reader is initialized once or for every retrieval (default:
True)
Examples
The class mainly allows to modes of operation. The first mode is by extracting the data by index.
>>> ds = PymiaDatasource(...) >>> for i in range(len(ds)): >>> sample = ds[i]
The second mode of operation is by directly extracting data.
>>> ds = PymiaDatasource(...) >>> # Different from ds[index] since the extractor and transform override the ones in ds >>> sample = ds.direct_extract(extractor, index, transform=transform)
Typically, the first mode is use to loop over the entire dataset as fast as possible, extracting just the necessary information, such as data chunks (e.g., slice, patch, sub-volume). Less critical information (e.g. image shape, orientation) not required with every chunk of data can independently be extracted with the second mode of operation.
- direct_extract(extractor: Extractor, subject_index: int, index_expr: IndexExpression | None = None, transform: Transform | None = None)[source]¶
Extract data directly, bypassing the extractors and transforms of the instance.
The purpose of this method is to enable extraction of data that is not required for every data chunk (e.g., slice, patch, sub-volume) but only from time to time e.g., image shape, origin.
- Parameters:
extractor (.Extractor) – Extractor or multiple extractors (
ComposeExtractor) extracting the desired data from the dataset.subject_index (int) – Index of the subject to be extracted.
index_expr (.IndexExpression) – The indexing to extract a chunk of data only. Not required if only image related information (e.g., image shape, origin) should be extracted. Needed when desiring a chunk of data (e.g., slice, patch, sub-volume).
transform (.Transform) – Transformation(s) to be applied to the extracted data.
- Returns:
Extracted data in a dictionary. Keys are defined by the used
Extractor.- Return type:
dict
- get_subjects()[source]¶
“Get all the subjects in the dataset.
- Returns:
All subject identifiers in the dataset.
- Return type:
list
- indices¶
A list containing all sample indices. This is a mapping from item i to tuple (subject_index, index_expression).
- Type:
list
- set_extractor(extractor: Extractor)[source]¶
Set the extractor(s).
- Parameters:
extractor (.Extractor) – Extractor or multiple extractors (
ComposeExtractor) extracting the desired data from the dataset.
- set_indexing_strategy(indexing_strategy: IndexingStrategy, subject_subset: list | None = None)[source]¶
Set (or modify) the indexing strategy.
- Parameters:
indexing_strategy (.IndexingStrategy) – Strategy defining how the data is indexed for reading.
subject_subset (list) – A list of subject identifiers defining a subset of subject to be processed.
Extractor (pymia.data.extraction.extractor module)¶
- class pymia.data.extraction.extractor.ComposeExtractor(extractors: list)[source]¶
Bases:
ExtractorComposes many
Extractorinstances and behaves like an singleExtractorinstance.- Parameters:
extractors (list) – A list of
Extractorinstances.
- class pymia.data.extraction.extractor.DataExtractor(categories=('images',), ignore_indexing: bool = False)[source]¶
Bases:
ExtractorExtracts data of a given category.
Adds
categoryas key toextracted.- Parameters:
categories (tuple) – Categories for which to extract the names.
ignore_indexing (bool) – Whether to ignore the indexing in
params. This is useful when extracting entire images.
- class pymia.data.extraction.extractor.Extractor[source]¶
Bases:
ABCInterface unifying the extraction of data from a dataset.
- abstract extract(reader: Reader, params: dict, extracted: dict) None[source]¶
Extract data from the dataset.
- Parameters:
reader (.Reader) – Reader instance that can read from dataset.
params (dict) – Extraction parameters containing information such as subject index and index expression.
extracted (dict) – The dictionary to put the extracted data in.
- class pymia.data.extraction.extractor.FilesExtractor(cache: bool = True, categories=('images', 'labels'))[source]¶
Bases:
ExtractorExtracts the file paths.
Added key to
extracted:pymia.data.definition.KEY_FILE_ROOTwithstrcontentpymia.data.definition.KEY_PLACEHOLDER_FILESwithstrcontent
- Parameters:
cache (bool) – Whether to cache the results. If
True, the dataset is only accessed once.Trueis often preferred since the file name entries are typically unique in the dataset (i.e. independent of data chunks).categories (tuple) – Categories for which to extract the file names.
- class pymia.data.extraction.extractor.FilesystemDataExtractor(categories=('images',), load_fn=None, ignore_indexing: bool = False, override_file_root=None)[source]¶
Bases:
ExtractorExtracts data of a given category.
Adds
categoryas key toextracted.- Parameters:
categories (tuple) – Categories for which to extract the names.
load_fn (callable) – Callable that loads a file given the file path and the category, and returns a numpy.ndarray.
ignore_indexing (bool) – Whether to ignore the indexing in
params. This is useful when extracting entire images.
- class pymia.data.extraction.extractor.ImagePropertiesExtractor(do_pickle: bool = False)[source]¶
Bases:
ExtractorExtracts the image properties.
Added key to
extracted:pymia.data.definition.KEY_PROPERTIESwithImagePropertiescontent (or byte ifdo_pickle)
- Parameters:
do_pickle (bool) – whether to pickle the extracted
ImagePropertiesinstance. This allows usage in multiprocessing environment.
- class pymia.data.extraction.extractor.ImagePropertyShapeExtractor(numpy_format: bool = True)[source]¶
Bases:
ExtractorExtracts the shape image property of an image.
Added key to
extracted:pymia.data.definition.KEY_SHAPEwithtuplecontent
- Parameters:
numpy_format (bool) – Whether the shape is numpy or ITK format (first and last dimension are swapped).
- class pymia.data.extraction.extractor.IndexingExtractor(do_pickle: bool = False)[source]¶
Bases:
ExtractorExtracts the index expression.
Added key to
extracted:pymia.data.definition.KEY_SUBJECT_INDEXwithintcontentpymia.data.definition.KEY_INDEX_EXPRwithIndexExpressioncontent
- Parameters:
do_pickle (bool) – whether to pickle the extracted
ImagePropertiesinstance. This is useful when applied with PyTorch DataLoader since it prevents from automatic translation to torch.Tensor.
- class pymia.data.extraction.extractor.NamesExtractor(cache: bool = True, categories=('images', 'labels'))[source]¶
Bases:
ExtractorExtracts the names of the entries within a category (e.g. “Flair”, “T1” for the category “images”).
Added key to
extracted:pymia.data.definition.KEY_PLACEHOLDER_NAMESwithstrcontent
- Parameters:
cache (bool) – Whether to cache the results. If
True, the dataset is only accessed once.Trueis often preferred since the name entries are typically unique in the dataset.categories (tuple) – Categories for which to extract the names.
- class pymia.data.extraction.extractor.PadDataExtractor(padding: tuple | List[tuple], extractor: Extractor, pad_fn=None)[source]¶
Bases:
ExtractorPads the data extracted by
extractor- Parameters:
padding (tuple, list) – Lengths of the tuple or the list must be equal to the number of dimensions of the extracted data. If tuple, values are considered as symmetric padding in each dimension. If list, the each entry must consist of a tuple indicating (left, right) padding for one dimension.
extractor (.Extractor) – The extractor performing the extraction of the data to be padded.
pad_fn (callable, optional) – Optional function performing the padding. Default is
PadDataExtractor.zero_pad().
- class pymia.data.extraction.extractor.RandomDataExtractor(selection=None, category: str = 'labels')[source]¶
Bases:
ExtractorExtracts data of a given category randomly.
Adds
categoryas key toextracted.pymia.data.definition.KEY_PLACEHOLDER_NAMES_SELECTEDwithselectioncontent
- Parameters:
selection (str, tuple) – Entries (e.g., “T1”, “T2”) within the category to select an entry randomly from. If selection is None, an entry from all entries is randomly selected.
category (str) – The category (e.g. “images”) to extract data from.
- class pymia.data.extraction.extractor.SelectiveDataExtractor(selection=None, category: str = 'labels')[source]¶
Bases:
ExtractorExtracts data of a given category selectively.
Adds
categoryas key toextracted, as well aspymia.data.definition.KEY_PLACEHOLDER_NAMES_SELECTEDwithselectioncontent
- Parameters:
selection (str, tuple) – Entries (e.g., “T1”, “T2”) within the category to select. If selection is None, the class has the same behaviour as the DataExtractor and selects all entries.
category (str) – The category (e.g. “images”) to extract data from.
- class pymia.data.extraction.extractor.SubjectExtractor[source]¶
Bases:
ExtractorExtracts the subject’s identification.
Added key to
extracted:pymia.data.definition.KEY_SUBJECT_INDEXwithintcontentpymia.data.definition.KEY_SUBJECTwithstrcontent
Indexing (pymia.data.extraction.indexing module)¶
- class pymia.data.extraction.indexing.EmptyIndexing[source]¶
Bases:
IndexingStrategyAn empty indexing strategy. This is useful when a strategy is required but entire images should be extracted.
- class pymia.data.extraction.indexing.IndexingStrategy[source]¶
Bases:
ABCInterface for indexing strategies that can be applied to images.
- abstract __call__(shape: tuple) List[IndexExpression][source]¶
Calculate the indexes for a given shape
- Parameters:
shape (tuple) – The shape to determine the indexes for.
- Returns:
The list of
IndexExpressioninstances defining the indexes for an image shape.- Return type:
list
- class pymia.data.extraction.indexing.PatchWiseIndexing(patch_shape: tuple, ignore_incomplete=True)[source]¶
Bases:
IndexingStrategyStrategy to generate indices for patches (sub-volumes) of an image.
- Parameters:
patch_shape (tuple) – The patch shape.
ignore_incomplete (bool) – If even division of image by patch shape ignore incomplete patch on True. Boundary condition.
- class pymia.data.extraction.indexing.SliceIndexing(slice_axis: int | tuple = 0)[source]¶
Bases:
IndexingStrategyStrategy to generate a slice-wise indexing.
- Parameters:
slice_axis (int, tuple) – The axis to be sliced. Multi-axis slicing can be achieved by providing a tuple of axes.
- class pymia.data.extraction.indexing.VoxelWiseIndexing(image_dimension: int = 3)[source]¶
Bases:
IndexingStrategyStrategy to generate indices for every voxel of an image.
- Parameters:
image_dimension (int) – The image dimension without the dimension of the voxels itself.
Reader (pymia.data.extraction.reader module)¶
- class pymia.data.extraction.reader.Hdf5Reader(file_path: str, category='images')[source]¶
Bases:
ReaderRepresents the dataset reader for HDF5 files.
Initializes a new instance.
- Parameters:
file_path (str) – The path to the dataset file.
category (str) – The category of an entry that defines the shape request
- close()[source]¶
see
Reader.close()
- has(entry: str) bool[source]¶
see
Reader.has()
- open()[source]¶
see
Reader.open()
- read(entry: str, index: IndexExpression | None = None)[source]¶
see
Reader.read()
- class pymia.data.extraction.reader.Reader(file_path: str)[source]¶
Bases:
ABCAbstract dataset reader.
- Parameters:
file_path (str) – The path to the dataset file.
- abstract get_shape(subject_index: int) list[source]¶
Get the shape from an entry.
- Parameters:
subject_index (int) – The index of the subject.
- Returns:
The shape of each dimension.
- Return type:
list
- abstract get_subject_entries() list[source]¶
Get the dataset entries holding the subject’s data.
- Returns:
The list of subject entry strings.
- Return type:
list
- abstract get_subjects() list[source]¶
Get the subject names in the dataset.
- Returns:
The list of subject names.
- Return type:
list
- abstract has(entry: str) bool[source]¶
Check whether a dataset entry exists.
- Parameters:
entry (str) – The dataset entry.
- Returns:
Whether the entry exists.
- Return type:
bool
- abstract read(entry: str, index: IndexExpression | None = None)[source]¶
Read a dataset entry.
- Parameters:
entry (str) – The dataset entry.
index (expr.IndexExpression) – The slicing expression.
- Returns:
The read data.
- pymia.data.extraction.reader.get_reader(file_path: str, direct_open: bool = False) Reader[source]¶
Get the dataset reader corresponding to the file extension.
- Parameters:
file_path (str) – The path to the dataset file.
direct_open (bool) – Whether the file should directly be opened.
- Returns:
Reader corresponding to dataset file extension.
- Return type:
- pymia.data.extraction.reader.reader_registry = {'.h5': <class 'pymia.data.extraction.reader.Hdf5Reader'>, '.hdf5': <class 'pymia.data.extraction.reader.Hdf5Reader'>}¶
Registry defining the mapping between file extension and
Readerclass. Alternative writers need to be added to this registry in order to useget_reader().
Selection (pymia.data.extraction.selection module)¶
- class pymia.data.extraction.selection.ComposeSelection(strategies)[source]¶
Bases:
SelectionStrategy
- class pymia.data.extraction.selection.NonBlackSelection(black_value: float = 0.0)[source]¶
Bases:
SelectionStrategy
- class pymia.data.extraction.selection.NonConstantSelection(loop_axis=None)[source]¶
Bases:
SelectionStrategy
- class pymia.data.extraction.selection.PercentileSelection(percentile: float)[source]¶
Bases:
SelectionStrategy
- class pymia.data.extraction.selection.SelectionStrategy[source]¶
Bases:
ABCInterface for selecting indices according some rule.
- abstract __call__(sample: dict) bool[source]¶
- Parameters:
sample (dict) – An extracted from
PymiaDatasource.- Returns:
Whether or not the sample should be considered.
- Return type:
bool
- class pymia.data.extraction.selection.SubjectSelection(subjects)[source]¶
Bases:
SelectionStrategySelect subjects by their name or index.
- class pymia.data.extraction.selection.WithForegroundSelection[source]¶
Bases:
SelectionStrategy