mednet.data.classify.nih_cxr14

NIH CXR14 (relabeled) DataModule for computer-aided diagnosis.

This dataset was extracted from the clinical PACS database at the National Institutes of Health Clinical Center (USA) and represents 60% of all their radiographs. It contains labels for 14 common radiological signs in this order: cardiomegaly, emphysema, effusion, hernia, infiltration, mass, nodule, atelectasis, pneumothorax, pleural thickening, pneumonia, fibrosis, edema and consolidation. This is the relabeled version created in the CheXNeXt study.

Important

Raw data organization

The NIH_CXR14_re base datadir, which you should configure following the Setup instructions, must contain at least the directory “images/” with all the images of the database.

The labels from [CHEXNEXT-2018] are already incorporated in this library and do not need to be re-downloaded.

The flag idiap_folder_structure makes the loader search for files named, e.g. images/00030621_006.png, as images/00030/00030621_006.png.

  • Raw data input (on disk):

    • PNG RGB 8-bit depth images

    • Resolution: 1024 x 1024 pixels

    • Total samples available: 109’041

  • Output image:

    • Transforms:

      • Load raw PNG with PIL, with auto-conversion to grayscale

      • Convert to torch tensor

    • Final specifications:

      • RGB, encoded as a 3-plane tensor, 32-bit floats, square (1024x1024 px)

      • Labels in order:

        • cardiomegaly

        • emphysema

        • effusion

        • hernia

        • infiltration

        • mass

        • nodule

        • atelectasis

        • pneumothorax

        • pleural thickening

        • pneumonia

        • fibrosis

        • edema

        • consolidation

This module contains the base declaration of common data modules and raw-data loaders for this database. All configured splits inherit from this definition.

Module Attributes

DATABASE_SLUG

Pythonic name of this database.

CONFIGURATION_KEY_DATADIR

Key to search for in the configuration file for the root directory of this database.

CONFIGURATION_KEY_IDIAP_FILESTRUCTURE

Key to search for in the configuration file indicating if the loader should use standard or idiap-based file organisation structure.

Classes

DataModule(split_path)

NIH CXR14 (relabeled) DataModule for computer-aided diagnosis.

RawDataLoader()

A specialized raw-data-loader for the NIH CXR-14 dataset.

mednet.data.classify.nih_cxr14.DATABASE_SLUG = 'nih_cxr14'

Pythonic name of this database.

mednet.data.classify.nih_cxr14.CONFIGURATION_KEY_DATADIR = 'datadir.nih_cxr14'

Key to search for in the configuration file for the root directory of this database.

mednet.data.classify.nih_cxr14.CONFIGURATION_KEY_IDIAP_FILESTRUCTURE = 'nih_cxr14.idiap_folder_structure'

Key to search for in the configuration file indicating if the loader should use standard or idiap-based file organisation structure.

It causes the internal loader to search for files in a slightly different folder structure, that was adapted to Idiap’s requirements (number of files per folder to be less than 10k).

class mednet.data.classify.nih_cxr14.RawDataLoader[source]

Bases: ClassificationRawDataLoader

A specialized raw-data-loader for the NIH CXR-14 dataset.

datadir: Path

This variable contains the base directory where the database raw data is stored.

idiap_file_organisation: bool

If should use the Idiap’s filesystem organisation when looking up data.

This variable will be True, if the user has set the configuration parameter nih_cxr14.idiap_file_organisation in the global configuration file. It will cause internal loader to search for files in a slightly different folder structure, that was adapted to Idiap’s requirements (number of files per folder to be less than 10k).

sample(sample)[source]

Load a single image sample from the disk.

Parameters:

sample (tuple[str, int, Any | None]) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

tuple[Tensor, Mapping[str, Any]]

Returns:

The sample representation.

target(k)[source]

Load a single image sample target from the disk.

Parameters:

k (Any) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Returns:

The integer targets associated with the sample.

Return type:

list[int]

class mednet.data.classify.nih_cxr14.DataModule(split_path)[source]

Bases: CachingDataModule

NIH CXR14 (relabeled) DataModule for computer-aided diagnosis.

Parameters:

split_path (Path | Traversable) – Path or traversable (resource) with the JSON split description to load.