mednet.data.segment.cxr8

ChestX-ray8: Hospital-scale Chest X-ray Database.

The database contains a total of 112’120 images. Image size for each X-ray is 1024 x 1024. One set of automatically generated mask annotations is available for all images.

  • Database references:

Important

Raw data organization

The CXR8 base datadir, which you should configure following the Setup instructions, must contain at least the following directories:

  • images/ (directory containing the CXR images, in PNG format)

  • segmentations/ (must contain masks downloaded from CXR8-Annotations)

The flag idiap_folder_structure makes the loader search for files named, e.g. images/00030621_006.png, as images/00030/00030621_006.png (this is valid for both images and segmentation masks).

  • Raw data input (on disk):

    • PNG RGB 8-bit depth images

    • Resolution: 1024 x 1024 pixels

    • Total samples available: 112’120

  • Output image:

    • Transforms:

      • Load raw PNG with PIL, with auto-conversion to RGB, convert to tensor

      • Labels for each of the lungs are read from the provided GIF files and merged into a single output image.

The default split contains 78’484 images for training, 11’212 images for validation, and 22’424 images for testing.

This module contains the base declaration of common data modules and raw-data loaders for this database. All configured splits inherit from this definition.

Module Attributes

DATABASE_SLUG

Pythonic name to refer to this database.

CONFIGURATION_KEY_DATADIR

Key to search for in the configuration file for the root directory of this database.

CONFIGURATION_KEY_IDIAP_FILESTRUCTURE

Key to search for in the configuration file indicating if the loader should use standard or idiap-based file organisation structure.

Classes

DataModule(split_path)

ChestX-ray8: Hospital-scale Chest X-ray Database.

RawDataLoader()

A specialized raw-data-loader for the cxr8 dataset.

mednet.data.segment.cxr8.DATABASE_SLUG = 'cxr8'

Pythonic name to refer to this database.

mednet.data.segment.cxr8.CONFIGURATION_KEY_DATADIR = 'datadir.cxr8'

Key to search for in the configuration file for the root directory of this database.

mednet.data.segment.cxr8.CONFIGURATION_KEY_IDIAP_FILESTRUCTURE = 'cxr8.idiap_folder_structure'

Key to search for in the configuration file indicating if the loader should use standard or idiap-based file organisation structure.

It causes the internal loader to search for files in a slightly different folder structure, that was adapted to Idiap’s requirements (number of files per folder to be less than 10k).

class mednet.data.segment.cxr8.RawDataLoader[source]

Bases: RawDataLoader

A specialized raw-data-loader for the cxr8 dataset.

datadir: Path

This variable contains the base directory where the database raw data is stored.

sample(sample)[source]

Load a single image sample from the disk.

Parameters:

sample (Any) – A tuple containing path suffixes to the sample image, target, and mask to be loaded, within the dataset root folder.

Return type:

Mapping[str, Any]

Returns:

The sample representation.

target(sample)[source]

Load only sample target from its raw representation.

Parameters:

sample (Any) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Tensor

Returns:

The label corresponding to the specified sample, encapsulated as a torch float tensor.

class mednet.data.segment.cxr8.DataModule(split_path)[source]

Bases: CachingDataModule

ChestX-ray8: Hospital-scale Chest X-ray Database.

Parameters:

split_path (Path | Traversable) – Path or traversable (resource) with the JSON split description to load.