mednet.data.classify.shenzhen

Shenzhen DataModule for computer-aided diagnosis.

The standard digital image database for Tuberculosis was created by the National Library of Medicine, Maryland, USA in collaboration with Shenzhen No.3 People’s Hospital, Guangdong Medical College, Shenzhen, China. The Chest X-rays are from out-patient clinics, and were captured as part of the daily routine using Philips DR Digital Diagnose systems.

  • Database reference: [JCA+14]

Important

Raw data organization

The Shenzhen base datadir, which you should configure following the Setup instructions, must contain at this subdirectory:

  • CXR_png/ (directory containing the CXR images)

Data specifications:

  • Raw data input (on disk):

    • PNG 8-bit RGB images issued from digital radiography machines (grayscale, but encoded as RGB images with “inverted” grayscale scale requiring special treatment).

    • Original resolution: variable width and height of 3000 x 3000 pixels or less

    • Samples: 662 images and associated labels

  • Output image:

    • Transforms:

      • Load raw data with PIL with auto-conversion to grayscale

      • Remove (completely) black borders

      • Convert to torch tensor

    • Final specifications:

      • Grayscale, encoded as a single plane tensor, 32-bit floats, square with varying resolutions, depending on the input image

      • Labels: 0 (healthy), 1 (active tuberculosis)

This module contains the base declaration of common data modules and raw-data loaders for this database. All configured splits inherit from this definition.

Module Attributes

DATABASE_SLUG

Pythonic name of this database.

CONFIGURATION_KEY_DATADIR

Key to search for in the configuration file for the root directory of this database.

Classes

DataModule(split_path)

Shenzhen DataModule for computer-aided diagnosis.

RawDataLoader([config_variable])

A specialized raw-data-loader for the Shenzhen dataset.

mednet.data.classify.shenzhen.DATABASE_SLUG = 'shenzhen'

Pythonic name of this database.

mednet.data.classify.shenzhen.CONFIGURATION_KEY_DATADIR = 'datadir.shenzhen'

Key to search for in the configuration file for the root directory of this database.

class mednet.data.classify.shenzhen.RawDataLoader(config_variable='datadir.shenzhen')[source]

Bases: RawDataLoader

A specialized raw-data-loader for the Shenzhen dataset.

Parameters:

config_variable (str) – Key to search for in the configuration file for the root directory of this database.

datadir: Path

This variable contains the base directory where the database raw data is stored.

sample(sample)[source]

Load a single image sample from the disk.

Parameters:

sample (Any) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Mapping[str, Any]

Returns:

The sample representation.

target(sample)[source]

Load only sample target from its raw representation.

Parameters:

sample (Any) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Tensor

Returns:

The label corresponding to the specified sample, encapsulated as a 1D torch float tensor.

class mednet.data.classify.shenzhen.DataModule(split_path)[source]

Bases: CachingDataModule

Shenzhen DataModule for computer-aided diagnosis.

Parameters:

split_path (Path | Traversable) – Path or traversable (resource) with the JSON split description to load.