Extraction (pymia.data.extraction
package)¶
Datasource (pymia.data.extraction.datasource
module)¶
- class pymia.data.extraction.datasource.PymiaDatasource(dataset_path: str, indexing_strategy: Optional[pymia.data.extraction.indexing.IndexingStrategy] = None, extractor: Optional[pymia.data.extraction.extractor.Extractor] = None, transform: Optional[pymia.data.transformation.Transform] = None, subject_subset: Optional[list] = None, init_reader_once: bool = True)[source]¶
Bases:
object
Provides convenient and adaptable reading of the data from a created dataset.
- Parameters
dataset_path (str) – The path to the dataset to be read from.
indexing_strategy (.IndexingStrategy) – Strategy defining how the data is indexed for reading.
extractor (.Extractor) – Extractor or multiple extractors (
ComposeExtractor
) extracting the desired data from the dataset.transform (.Transform) – Transformation(s) to be applied to the extracted data.
subject_subset (list) – A list of subject identifiers defining a subset of subject to be processed.
init_reader_once (bool) – Whether the reader is initialized once or for every retrieval (default:
True
)
Examples
The class mainly allows to modes of operation. The first mode is by extracting the data by index.
>>> ds = PymiaDatasource(...) >>> for i in range(len(ds)): >>> sample = ds[i]
The second mode of operation is by directly extracting data.
>>> ds = PymiaDatasource(...) >>> # Different from ds[index] since the extractor and transform override the ones in ds >>> sample = ds.direct_extract(extractor, index, transform=transform)
Typically, the first mode is use to loop over the entire dataset as fast as possible, extracting just the necessary information, such as data chunks (e.g., slice, patch, sub-volume). Less critical information (e.g. image shape, orientation) not required with every chunk of data can independently be extracted with the second mode of operation.
- direct_extract(extractor: pymia.data.extraction.extractor.Extractor, subject_index: int, index_expr: Optional[pymia.data.indexexpression.IndexExpression] = None, transform: Optional[pymia.data.transformation.Transform] = None)[source]¶
Extract data directly, bypassing the extractors and transforms of the instance.
The purpose of this method is to enable extraction of data that is not required for every data chunk (e.g., slice, patch, sub-volume) but only from time to time e.g., image shape, origin.
- Parameters
extractor (.Extractor) – Extractor or multiple extractors (
ComposeExtractor
) extracting the desired data from the dataset.subject_index (int) – Index of the subject to be extracted.
index_expr (.IndexExpression) – The indexing to extract a chunk of data only. Not required if only image related information (e.g., image shape, origin) should be extracted. Needed when desiring a chunk of data (e.g., slice, patch, sub-volume).
transform (.Transform) – Transformation(s) to be applied to the extracted data.
- Returns
Extracted data in a dictionary. Keys are defined by the used
Extractor
.- Return type
dict
- get_subjects()[source]¶
“Get all the subjects in the dataset.
- Returns
All subject identifiers in the dataset.
- Return type
list
- indices¶
A list containing all sample indices. This is a mapping from item i to tuple (subject_index, index_expression).
- Type
list
- set_extractor(extractor: pymia.data.extraction.extractor.Extractor)[source]¶
Set the extractor(s).
- Parameters
extractor (.Extractor) – Extractor or multiple extractors (
ComposeExtractor
) extracting the desired data from the dataset.
- set_indexing_strategy(indexing_strategy: pymia.data.extraction.indexing.IndexingStrategy, subject_subset: Optional[list] = None)[source]¶
Set (or modify) the indexing strategy.
- Parameters
indexing_strategy (.IndexingStrategy) – Strategy defining how the data is indexed for reading.
subject_subset (list) – A list of subject identifiers defining a subset of subject to be processed.
- set_transform(transform: pymia.data.transformation.Transform)[source]¶
Set the transform.
- Parameters
transform (.Transform) – Transformation(s) to be applied to the extracted data.
Extractor (pymia.data.extraction.extractor
module)¶
- class pymia.data.extraction.extractor.ComposeExtractor(extractors: list)[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Composes many
Extractor
instances and behaves like an singleExtractor
instance.- Parameters
extractors (list) – A list of
Extractor
instances.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.DataExtractor(categories=('images',), ignore_indexing: bool = False)[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts data of a given category.
Adds
category
as key toextracted
.- Parameters
categories (tuple) – Categories for which to extract the names.
ignore_indexing (bool) – Whether to ignore the indexing in
params
. This is useful when extracting entire images.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.Extractor[source]¶
Bases:
abc.ABC
Interface unifying the extraction of data from a dataset.
- abstract extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
Extract data from the dataset.
- Parameters
reader (.Reader) – Reader instance that can read from dataset.
params (dict) – Extraction parameters containing information such as subject index and index expression.
extracted (dict) – The dictionary to put the extracted data in.
- class pymia.data.extraction.extractor.FilesExtractor(cache: bool = True, categories=('images', 'labels'))[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts the file paths.
Added key to
extracted
:pymia.data.definition.KEY_FILE_ROOT
withstr
contentpymia.data.definition.KEY_PLACEHOLDER_FILES
withstr
content
- Parameters
cache (bool) – Whether to cache the results. If
True
, the dataset is only accessed once.True
is often preferred since the file name entries are typically unique in the dataset (i.e. independent of data chunks).categories (tuple) – Categories for which to extract the file names.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.FilesystemDataExtractor(categories=('images',), load_fn=None, ignore_indexing: bool = False, override_file_root=None)[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts data of a given category.
Adds
category
as key toextracted
.- Parameters
categories (tuple) – Categories for which to extract the names.
load_fn (callable) – Callable that loads a file given the file path and the category, and returns a numpy.ndarray.
ignore_indexing (bool) – Whether to ignore the indexing in
params
. This is useful when extracting entire images.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.ImagePropertiesExtractor(do_pickle: bool = False)[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts the image properties.
Added key to
extracted
:pymia.data.definition.KEY_PROPERTIES
withImageProperties
content (or byte ifdo_pickle
)
- Parameters
do_pickle (bool) – whether to pickle the extracted
ImageProperties
instance. This allows usage in multiprocessing environment.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.ImagePropertyShapeExtractor(numpy_format: bool = True)[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts the shape image property of an image.
Added key to
extracted
:pymia.data.definition.KEY_SHAPE
withtuple
content
- Parameters
numpy_format (bool) – Whether the shape is numpy or ITK format (first and last dimension are swapped).
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.IndexingExtractor(do_pickle: bool = False)[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts the index expression.
Added key to
extracted
:pymia.data.definition.KEY_SUBJECT_INDEX
withint
contentpymia.data.definition.KEY_INDEX_EXPR
withIndexExpression
content
- Parameters
do_pickle (bool) – whether to pickle the extracted
ImageProperties
instance. This is useful when applied with PyTorch DataLoader since it prevents from automatic translation to torch.Tensor.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.NamesExtractor(cache: bool = True, categories=('images', 'labels'))[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts the names of the entries within a category (e.g. “Flair”, “T1” for the category “images”).
Added key to
extracted
:pymia.data.definition.KEY_PLACEHOLDER_NAMES
withstr
content
- Parameters
cache (bool) – Whether to cache the results. If
True
, the dataset is only accessed once.True
is often preferred since the name entries are typically unique in the dataset.categories (tuple) – Categories for which to extract the names.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.PadDataExtractor(padding: Union[tuple, List[tuple]], extractor: pymia.data.extraction.extractor.Extractor, pad_fn=None)[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Pads the data extracted by
extractor
- Parameters
padding (tuple, list) – Lengths of the tuple or the list must be equal to the number of dimensions of the extracted data. If tuple, values are considered as symmetric padding in each dimension. If list, the each entry must consist of a tuple indicating (left, right) padding for one dimension.
extractor (.Extractor) – The extractor performing the extraction of the data to be padded.
pad_fn (callable, optional) – Optional function performing the padding. Default is
PadDataExtractor.zero_pad()
.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.RandomDataExtractor(selection=None, category: str = 'labels')[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts data of a given category randomly.
Adds
category
as key toextracted
.pymia.data.definition.KEY_PLACEHOLDER_NAMES_SELECTED
withselection
content
- Parameters
selection (str, tuple) – Entries (e.g., “T1”, “T2”) within the category to select an entry randomly from. If selection is None, an entry from all entries is randomly selected.
category (str) – The category (e.g. “images”) to extract data from.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.SelectiveDataExtractor(selection=None, category: str = 'labels')[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts data of a given category selectively.
Adds
category
as key toextracted
, as well aspymia.data.definition.KEY_PLACEHOLDER_NAMES_SELECTED
withselection
content
- Parameters
selection (str, tuple) – Entries (e.g., “T1”, “T2”) within the category to select. If selection is None, the class has the same behaviour as the DataExtractor and selects all entries.
category (str) – The category (e.g. “images”) to extract data from.
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
- class pymia.data.extraction.extractor.SubjectExtractor[source]¶
Bases:
pymia.data.extraction.extractor.Extractor
Extracts the subject’s identification.
Added key to
extracted
:pymia.data.definition.KEY_SUBJECT_INDEX
withint
contentpymia.data.definition.KEY_SUBJECT
withstr
content
- extract(reader: pymia.data.extraction.reader.Reader, params: dict, extracted: dict) None [source]¶
Indexing (pymia.data.extraction.indexing
module)¶
- class pymia.data.extraction.indexing.EmptyIndexing[source]¶
Bases:
pymia.data.extraction.indexing.IndexingStrategy
An empty indexing strategy. This is useful when a strategy is required but entire images should be extracted.
- class pymia.data.extraction.indexing.IndexingStrategy[source]¶
Bases:
abc.ABC
Interface for indexing strategies that can be applied to images.
- abstract __call__(shape: tuple) List[pymia.data.indexexpression.IndexExpression] [source]¶
Calculate the indexes for a given shape
- Parameters
shape (tuple) – The shape to determine the indexes for.
- Returns
The list of
IndexExpression
instances defining the indexes for an image shape.- Return type
list
- class pymia.data.extraction.indexing.PatchWiseIndexing(patch_shape: tuple, ignore_incomplete=True)[source]¶
Bases:
pymia.data.extraction.indexing.IndexingStrategy
Strategy to generate indices for patches (sub-volumes) of an image.
- Parameters
patch_shape (tuple) – The patch shape.
ignore_incomplete (bool) – If even division of image by patch shape ignore incomplete patch on True. Boundary condition.
- class pymia.data.extraction.indexing.SliceIndexing(slice_axis: Union[int, tuple] = 0)[source]¶
Bases:
pymia.data.extraction.indexing.IndexingStrategy
Strategy to generate a slice-wise indexing.
- Parameters
slice_axis (int, tuple) – The axis to be sliced. Multi-axis slicing can be achieved by providing a tuple of axes.
- class pymia.data.extraction.indexing.VoxelWiseIndexing(image_dimension: int = 3)[source]¶
Bases:
pymia.data.extraction.indexing.IndexingStrategy
Strategy to generate indices for every voxel of an image.
- Parameters
image_dimension (int) – The image dimension without the dimension of the voxels itself.
Reader (pymia.data.extraction.reader
module)¶
- class pymia.data.extraction.reader.Hdf5Reader(file_path: str, category='images')[source]¶
Bases:
pymia.data.extraction.reader.Reader
Represents the dataset reader for HDF5 files.
Initializes a new instance.
- Parameters
file_path (str) – The path to the dataset file.
category (str) – The category of an entry that defines the shape request
- close()[source]¶
see
Reader.close()
- has(entry: str) bool [source]¶
see
Reader.has()
- open()[source]¶
see
Reader.open()
- read(entry: str, index: Optional[pymia.data.indexexpression.IndexExpression] = None)[source]¶
see
Reader.read()
- class pymia.data.extraction.reader.Reader(file_path: str)[source]¶
Bases:
abc.ABC
Abstract dataset reader.
- Parameters
file_path (str) – The path to the dataset file.
- abstract get_shape(subject_index: int) list [source]¶
Get the shape from an entry.
- Parameters
subject_index (int) – The index of the subject.
- Returns
The shape of each dimension.
- Return type
list
- abstract get_subject_entries() list [source]¶
Get the dataset entries holding the subject’s data.
- Returns
The list of subject entry strings.
- Return type
list
- abstract get_subjects() list [source]¶
Get the subject names in the dataset.
- Returns
The list of subject names.
- Return type
list
- abstract has(entry: str) bool [source]¶
Check whether a dataset entry exists.
- Parameters
entry (str) – The dataset entry.
- Returns
Whether the entry exists.
- Return type
bool
- abstract read(entry: str, index: Optional[pymia.data.indexexpression.IndexExpression] = None)[source]¶
Read a dataset entry.
- Parameters
entry (str) – The dataset entry.
index (expr.IndexExpression) – The slicing expression.
- Returns
The read data.
- pymia.data.extraction.reader.get_reader(file_path: str, direct_open: bool = False) pymia.data.extraction.reader.Reader [source]¶
Get the dataset reader corresponding to the file extension.
- Parameters
file_path (str) – The path to the dataset file.
direct_open (bool) – Whether the file should directly be opened.
- Returns
Reader corresponding to dataset file extension.
- Return type
- pymia.data.extraction.reader.reader_registry = {'.h5': <class 'pymia.data.extraction.reader.Hdf5Reader'>, '.hdf5': <class 'pymia.data.extraction.reader.Hdf5Reader'>}¶
Registry defining the mapping between file extension and
Reader
class. Alternative writers need to be added to this registry in order to useget_reader()
.
Selection (pymia.data.extraction.selection
module)¶
- class pymia.data.extraction.selection.SelectionStrategy[source]¶
Bases:
abc.ABC
Interface for selecting indices according some rule.
- abstract __call__(sample: dict) bool [source]¶
- Parameters
sample (dict) – An extracted from
PymiaDatasource
.- Returns
Whether or not the sample should be considered.
- Return type
bool
- class pymia.data.extraction.selection.SubjectSelection(subjects)[source]¶
Bases:
pymia.data.extraction.selection.SelectionStrategy
Select subjects by their name or index.