mednet.engine.classify.evaluator

Defines functionality for the evaluation of classification predictions.

Functions

eer_threshold(predictions)

Calculate the (approximate) threshold leading to the equal error rate.

make_plots(results)

Create plots for all curves and score distributions in results.

make_table(data, fmt)

Tabulate summaries from multiple splits.

maxf1_threshold(predictions)

Calculate the threshold leading to the maximum F1-score on a precision- recall curve.

run(name, predictions, binning, rng[, ...])

Run inference and calculates measures for binary or multilabel classification.

mednet.engine.classify.evaluator.eer_threshold(predictions)[source]

Calculate the (approximate) threshold leading to the equal error rate.

For multi-label problems, calculate the EER threshold in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:

predictions (Iterable[tuple[str, Sequence[int], Sequence[float]]]) – An iterable of multiple models.classify.typing.Prediction’s.

Returns:

The EER threshold value.

Return type:

float

mednet.engine.classify.evaluator.maxf1_threshold(predictions)[source]

Calculate the threshold leading to the maximum F1-score on a precision- recall curve.

For multi-label problems, calculate the maximum F1-core threshold in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:

predictions (Iterable[tuple[str, Sequence[int], Sequence[float]]]) – An iterable of multiple models.classify.typing.Prediction’s.

Returns:

The threshold value leading to the maximum F1-score on the provided set of predictions.

Return type:

float

mednet.engine.classify.evaluator.run(name, predictions, binning, rng, threshold_a_priori=None, credible_regions=False)[source]

Run inference and calculates measures for binary or multilabel classification.

For multi-label problems, calculate the metrics in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:
  • name (str) – The name of subset to load.

  • predictions (Sequence[tuple[str, Sequence[int], Sequence[float]]]) – A list of predictions to consider for measurement.

  • binning (str | int) – The binning algorithm to use for computing the bin widths and distribution for histograms. Choose from algorithms supported by numpy.histogram().

  • rng (Generator) – An initialized numpy random number generator.

  • threshold_a_priori (float | None) – A threshold to use, evaluated a priori, if must report single values. If this value is not provided, an a posteriori threshold is calculated on the input scores. This is a biased estimator.

  • credible_regions (bool) – If set to True, then returns also credible intervals via credible.bayesian.metrics. Notice the evaluation of ROC-AUC and Average Precision confidence margins can be rather slow for larger datasets.

Returns:

A dictionary containing the performance summary on the specified threshold, general performance curves (under the key curves), and score histograms (under the key score-histograms).

Return type:

dict[str, Any]

mednet.engine.classify.evaluator.make_table(data, fmt)[source]

Tabulate summaries from multiple splits.

This function can properly tabulate the various summaries produced for all the splits in a prediction database.

Parameters:
Returns:

A string containing the tabulated information.

Return type:

str

mednet.engine.classify.evaluator.make_plots(results)[source]

Create plots for all curves and score distributions in results.

Parameters:

results (dict[str, dict[str, Any]]) – Evaluation data as returned by run().

Return type:

list

Returns:

A list of figures to record to file