mednet.data.classify.nih_cxr14¶
NIH CXR14 (relabeled) DataModule for computer-aided diagnosis.
This dataset was extracted from the clinical PACS database at the National Institutes of Health Clinical Center (USA) and represents 60% of all their radiographs. It contains labels for 14 common radiological signs in this order: cardiomegaly, emphysema, effusion, hernia, infiltration, mass, nodule, atelectasis, pneumothorax, pleural thickening, pneumonia, fibrosis, edema and consolidation. This is the relabeled version created in the CheXNeXt study.
Database references:
Original data: [NIH-CXR14-2017]
Labels and split references: [CHEXNEXT-2018]
Important
Raw data organization
The NIH_CXR14_re base datadir, which you should configure following the Setup instructions, must contain at least the directory “images/” with all the images of the database.
The labels from [CHEXNEXT-2018] are already incorporated in this library and do not need to be re-downloaded.
The flag idiap_folder_structure makes the loader search for files
named, e.g. images/00030621_006.png, as
images/00030/00030621_006.png.
Raw data input (on disk):
PNG RGB 8-bit depth images
Resolution: 1024 x 1024 pixels
Total samples available: 109’041
Output image:
Transforms:
Load raw PNG with
PIL, with auto-conversion to grayscaleConvert to torch tensor
Final specifications:
RGB, encoded as a 3-plane tensor, 32-bit floats, square (1024x1024 px)
Labels in order:
cardiomegaly
emphysema
effusion
hernia
infiltration
mass
nodule
atelectasis
pneumothorax
pleural thickening
pneumonia
fibrosis
edema
consolidation
This module contains the base declaration of common data modules and raw-data loaders for this database. All configured splits inherit from this definition.
Module Attributes
Pythonic name of this database. |
|
Key to search for in the configuration file for the root directory of this database. |
|
Key to search for in the configuration file indicating if the loader should use standard or idiap-based file organisation structure. |
Classes
|
NIH CXR14 (relabeled) DataModule for computer-aided diagnosis. |
A specialized raw-data-loader for the NIH CXR-14 dataset. |
- mednet.data.classify.nih_cxr14.DATABASE_SLUG = 'nih_cxr14'¶
Pythonic name of this database.
- mednet.data.classify.nih_cxr14.CONFIGURATION_KEY_DATADIR = 'datadir.nih_cxr14'¶
Key to search for in the configuration file for the root directory of this database.
- mednet.data.classify.nih_cxr14.CONFIGURATION_KEY_IDIAP_FILESTRUCTURE = 'nih_cxr14.idiap_folder_structure'¶
Key to search for in the configuration file indicating if the loader should use standard or idiap-based file organisation structure.
It causes the internal loader to search for files in a slightly different folder structure, that was adapted to Idiap’s requirements (number of files per folder to be less than 10k).
- class mednet.data.classify.nih_cxr14.RawDataLoader[source]¶
Bases:
ClassificationRawDataLoaderA specialized raw-data-loader for the NIH CXR-14 dataset.
-
idiap_file_organisation:
bool¶ If should use the Idiap’s filesystem organisation when looking up data.
This variable will be
True, if the user has set the configuration parameternih_cxr14.idiap_file_organisationin the global configuration file. It will cause internal loader to search for files in a slightly different folder structure, that was adapted to Idiap’s requirements (number of files per folder to be less than 10k).
-
idiap_file_organisation:
- class mednet.data.classify.nih_cxr14.DataModule(split_path)[source]¶
Bases:
CachingDataModuleNIH CXR14 (relabeled) DataModule for computer-aided diagnosis.
- Parameters:
split_path (
Path|Traversable) – Path or traversable (resource) with the JSON split description to load.