mednet.data.classify.hivtb

HIV-TB dataset for computer-aided diagnosis (only BMP files).

This databases contain only the tuberculosis final diagnosis (0 or 1) and come from HIV infected patients.

Important

Raw data organization

The HIV-TB base datadir, which you should configure following the Setup instructions, must contain at least the directory HIV-TB/HIV-TB_Algorithm_study_X-rays with all BMP and JPEG images.

Data specifications:

  • Raw data input (on disk):

    • BMP (BMP3) and JPEG grayscale images encoded as 8-bit RGB, with varying resolution (most images being 2048 x 2500 pixels or 2500 x 2048 pixels, but not all).

    • Total samples: 243

  • Output image:

    • Transforms:

      • Load raw BMP or JPEG with PIL, with auto-conversion to grayscale

      • Remove black borders

      • Convert to torch tensor

  • Final specifications

    • Grayscale, encoded as a single plane tensor, 32-bit floats, with varying resolution depending on input.

    • Binary labels: 0 (healthy), 1 (active tuberculosis), encoded as a 1D torch float tensor.

This module contains the base declaration of common data modules and raw-data loaders for this database. All configured splits inherit from this definition.

Module Attributes

DATABASE_SLUG

Pythonic name of this database.

CONFIGURATION_KEY_DATADIR

Key to search for in the configuration file for the root directory of this database.

Classes

DataModule(split_path)

HIV-TB dataset for computer-aided diagnosis (only BMP files).

RawDataLoader()

A specialized raw-data-loader for the HIV-TB dataset.

mednet.data.classify.hivtb.DATABASE_SLUG = 'hivtb'

Pythonic name of this database.

mednet.data.classify.hivtb.CONFIGURATION_KEY_DATADIR = 'datadir.hivtb'

Key to search for in the configuration file for the root directory of this database.

class mednet.data.classify.hivtb.RawDataLoader[source]

Bases: RawDataLoader

A specialized raw-data-loader for the HIV-TB dataset.

datadir: Path

This variable contains the base directory where the database raw data is stored.

sample(sample)[source]

Load a single image sample from the disk.

Parameters:

sample (Any) – Expects a tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Mapping[str, Any]

Returns:

The sample representation.

target(sample)[source]

Load only sample target from its raw representation.

Parameters:

sample (Any) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample target.

Return type:

Tensor

Returns:

The label corresponding to the specified sample, encapsulated as a 1D torch float tensor.

class mednet.data.classify.hivtb.DataModule(split_path)[source]

Bases: CachingDataModule

HIV-TB dataset for computer-aided diagnosis (only BMP files).

Parameters:

split_path (Path | Traversable) – Path or traversable (resource) with the JSON split description to load.