mednet.data.classify.tbpoc

TB-POC dataset for computer-aided diagnosis.

This databases contain only the tuberculosis final diagnosis (0 or 1) and come from HIV infected patients.

Important

Raw data organization

The TB-POC base datadir, which you should configure following the Setup instructions, must contain at least the directory TBPOC_CXR with all JPEG images.

Data specifications:

  • Raw data input (on disk):

    • JPEG 8-bit Grayscale images

    • Original resolution (height x width or width x height): 2048 x 2500 pixels or 2500 x 2048 pixels

    • Total samples: 407

  • Output image:

    • Transforms:

      • Load raw grayscale jpeg with PIL

      • Remove black borders

      • Convert to torch tensor

      • Torch center cropping to get square image

    • Final specifications:

      • Grayscale, encoded as a single plane tensor, 32-bit floats, square with varying resolutions (2048 x 2048 being the maximum), but also depending on black borders’ sizes on the input image.

      • Labels: 0 (healthy), 1 (active tuberculosis)

This module contains the base declaration of common data modules and raw-data loaders for this database. All configured splits inherit from this definition.

Module Attributes

DATABASE_SLUG

Pythonic name of this database.

CONFIGURATION_KEY_DATADIR

Key to search for in the configuration file for the root directory of this database.

Classes

DataModule(split_path)

TB-POC dataset for computer-aided diagnosis.

RawDataLoader()

A specialized raw-data-loader for the Shenzen dataset.

mednet.data.classify.tbpoc.DATABASE_SLUG = 'tbpoc'

Pythonic name of this database.

mednet.data.classify.tbpoc.CONFIGURATION_KEY_DATADIR = 'datadir.tbpoc'

Key to search for in the configuration file for the root directory of this database.

class mednet.data.classify.tbpoc.RawDataLoader[source]

Bases: RawDataLoader

A specialized raw-data-loader for the Shenzen dataset.

datadir: Path

This variable contains the base directory where the database raw data is stored.

sample(sample)[source]

Load a single image sample from the disk.

Parameters:

sample (Any) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Mapping[str, Any]

Returns:

The sample representation.

target(sample)[source]

Load only sample target from its raw representation.

Parameters:

sample (Any) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Tensor

Returns:

The label corresponding to the specified sample, encapsulated as a 1D torch float tensor.

class mednet.data.classify.tbpoc.DataModule(split_path)[source]

Bases: CachingDataModule

TB-POC dataset for computer-aided diagnosis.

Parameters:

split_path (Path | Traversable) – Path or traversable (resource) with the JSON split description to load.