mednet.data.classify.nih_cxr14¶
NIH CXR14 (relabeled) DataModule for computer-aided diagnosis.
This dataset was extracted from the clinical PACS database at the National Institutes of Health Clinical Center (USA) and represents 60% of all their radiographs. It contains labels for 14 common radiological signs in this order: cardiomegaly, emphysema, effusion, hernia, infiltration, mass, nodule, atelectasis, pneumothorax, pleural thickening, pneumonia, fibrosis, edema and consolidation. Training and validation data come from the relabeled version created in the [RIB+18] study. Test data uses the available annotations with [WPL+17].
Database references:
Original data: [WPL+17] (contains 112’120 chest X-ray images) and up to 14 associated radiological findings.
Labels and split references: We use train and validation splits published at [RIB+18], that are available here <nih-cxr14-relabeled>`_. These are different compared to the file lists provided with the original :cite:p:`wang_chestx-ray8_2017` study (train/val set: 86’523 samples; test set: 25’595 samples; +2 missing samples which are not listed, making up 112’120 samples). The splits at [RIB+18], which we copied in this library, contain 104’987 samples which were relabeled making up a training and a validation set containing 98’637 and 6’350 samples respectively. Note the relabeling work provided by [RIB+18] does not provide test set annotations (only training and validation). Our test set then consists of all CXR8 samples that were not relabled, and for which we reused the original CXR8 annotations (7’133 samples).
Important
Raw data organization
The CXR8 base datadir, which you should configure following the Setup instructions, must contain at least the directory “images/” with all the images of the database.
The labels from [RIB+18] (available here) are already incorporated in this library and do not need to be re-downloaded.
The flag idiap_folder_structure
makes the loader search for files
named, e.g. images/00030621_006.png
, as
images/00030/00030621_006.png
.
Raw data input (on disk):
PNG RGB 8-bit depth images
Original resolution: 1024 x 1024 pixels
Non-exclusive labels organized in a (compact) string list encoded as such:
car
: cardiomegalyemp
: emphysemaeff
: effusionher
: herniainf
: infiltrationmas
: massnod
: noduleate
: atelectasispnt
: pneumothoraxplt
: pleural thickeningpne
: pneumoniafib
: fibrosisede
: edemacon
: consolidation
Patient age (integer)
Patient gender (“M” or “F”)
Total samples available: 112’120
Output image:
Transforms:
Load raw PNG with
PIL
, with auto-conversion to grayscaleConvert to torch tensor
Final specifications:
RGB, encoded as a 3-plane tensor, 32-bit floats, square (1024x1024 px)
This decoder loads this description and converts it to a binary multi-label representation.
This module contains the base declaration of common data modules and raw-data loaders for this database. All configured splits inherit from this definition.
Module Attributes
Pythonic name of this database. |
|
Key to search for in the configuration file for the root directory of this database. |
|
Key to search for in the configuration file indicating if the loader should use standard or idiap-based file organisation structure. |
|
List of radiological findings (abbreviations) supported on this database. |
Functions
|
Binarize the input list of radiological findings. |
Classes
|
NIH CXR14 (relabeled) DataModule for computer-aided diagnosis. |
A specialized raw-data-loader for the NIH CXR-14 dataset. |
- mednet.data.classify.nih_cxr14.DATABASE_SLUG = 'nih_cxr14'¶
Pythonic name of this database.
- mednet.data.classify.nih_cxr14.CONFIGURATION_KEY_DATADIR = 'datadir.cxr8'¶
Key to search for in the configuration file for the root directory of this database.
- mednet.data.classify.nih_cxr14.CONFIGURATION_KEY_IDIAP_FILESTRUCTURE = 'cxr8.idiap_folder_structure'¶
Key to search for in the configuration file indicating if the loader should use standard or idiap-based file organisation structure.
It causes the internal loader to search for files in a slightly different folder structure, that was adapted to Idiap’s requirements (number of files per folder to be less than 10k).
- mednet.data.classify.nih_cxr14.RADIOLOGICAL_FINDINGS = ['car', 'emp', 'eff', 'her', 'inf', 'mas', 'nod', 'ate', 'pnt', 'plt', 'pne', 'fib', 'ede', 'con']¶
List of radiological findings (abbreviations) supported on this database.
- mednet.data.classify.nih_cxr14.binarize_findings(lst)[source]¶
Binarize the input list of radiological findings.
The output list contains zeros and ones, respecting the findings order in
RADIOLOGICAL_FINDINGS
.
- class mednet.data.classify.nih_cxr14.RawDataLoader[source]¶
Bases:
RawDataLoader
A specialized raw-data-loader for the NIH CXR-14 dataset.
-
idiap_file_organisation:
bool
¶ If should use the Idiap’s filesystem organisation when looking up data.
This variable will be
True
, if the user has set the configuration parameternih_cxr14.idiap_file_organisation
in the global configuration file. It will cause internal loader to search for files in a slightly different folder structure, that was adapted to Idiap’s requirements (number of files per folder to be less than 10k).
- target(sample)[source]¶
Load only sample target from its raw representation.
The raw representation contains zero to many (unique) instances of radiological findings listed at
RADIOLOGICAL_FINDINGS
. This list is binarized (into 14 binary ositions) before it is returned.- Parameters:
sample (
Any
) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.- Return type:
- Returns:
The labels corresponding to all radiological signs present in the specified sample, encapsulated as a 1D torch float tensor.
-
idiap_file_organisation:
- class mednet.data.classify.nih_cxr14.DataModule(split_path)[source]¶
Bases:
CachingDataModule
NIH CXR14 (relabeled) DataModule for computer-aided diagnosis.
- Parameters:
split_path (
Path
|Traversable
) – Path or traversable (resource) with the JSON split description to load.