mednet.data.classify.montgomery

Montgomery DataModule for TB detection.

The standard digital image database for Tuberculosis was created by the National Library of Medicine, Maryland, USA in collaboration with Shenzhen No.3 People’s Hospital, Guangdong Medical College, Shenzhen, China.

  • Database references: [JCA+14],

Data specifications:

  • Raw data input (on disk):

    • PNG images 8 bit grayscale issued from digital radiography machines

    • Original resolution (height x width or width x height): 4020x4892 px or 4892x4020 px

    • Samples: 138 images and associated labels

  • Output image:

    • Transforms:

      • Load raw PNG with PIL

      • Remove black borders

      • Convert to torch tensor

    • Final specifications

      • Grayscale, encoded as a single plane tensor, 32-bit floats, square at most 4020 x 4020 pixels

      • Binary labels: 0 (healthy), 1 (active tuberculosis), encoded as a 1D torch float tensor.

This module contains the base declaration of common data modules and raw-data loaders for this database. All configured splits inherit from this definition.

Module Attributes

DATABASE_SLUG

Pythonic name of this database.

CONFIGURATION_KEY_DATADIR

Key to search for in the configuration file for the root directory of this database.

Classes

DataModule(split_path[, multiclass])

Montgomery DataModule for TB detection.

RawDataLoader([config_variable, multiclass])

A specialized raw-data-loader for the Montgomery dataset.

mednet.data.classify.montgomery.DATABASE_SLUG = 'montgomery'

Pythonic name of this database.

mednet.data.classify.montgomery.CONFIGURATION_KEY_DATADIR = 'datadir.montgomery'

Key to search for in the configuration file for the root directory of this database.

class mednet.data.classify.montgomery.RawDataLoader(config_variable='datadir.montgomery', multiclass=False)[source]

Bases: RawDataLoader

A specialized raw-data-loader for the Montgomery dataset.

Parameters:
  • config_variable (str) – Key to search for in the configuration file for the root directory of this database.

  • multiclass (bool) – Set to True if the targets should be output as 2 distinct classes instead of a single (0/1) output.

datadir: Path

This variable contains the base directory where the database raw data is stored.

sample(sample)[source]

Load a single image sample from the disk.

Parameters:

sample (tuple[str, int, Any | None]) – Expects a tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Mapping[str, Any]

Returns:

The sample representation as a dictionary.

target(sample)[source]

Load only sample target from its raw representation.

Parameters:

sample (Any) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Tensor

Returns:

The label corresponding to the specified sample, encapsulated as a 1D torch float tensor.

class mednet.data.classify.montgomery.DataModule(split_path, multiclass=False)[source]

Bases: CachingDataModule

Montgomery DataModule for TB detection.

Parameters:
  • split_path (Path | Traversable) – Path or traversable (resource) with the JSON split description to load.

  • multiclass (bool) – Set to True if the targets should be output as 2 distinct classes instead of a single (0/1) output.