mednet.engine.classify.evaluator¶

Defines functionality for the evaluation of predictions.

Functions

`eer_threshold`(predictions)	Calculate the (approximate) threshold leading to the equal error rate.
`make_plots`(results)	Create plots for all curves and score distributions in `results`.
`make_table`(data, fmt)	Tabulate summaries from multiple splits.
`maxf1_threshold`(predictions)	Calculate the threshold leading to the maximum F1-score on a precision- recall curve.
`run`(name, predictions, binning[, ...])	Run inference and calculates measures for binary or multilabel classification.

mednet.engine.classify.evaluator.eer_threshold(predictions)[source]¶

Calculate the (approximate) threshold leading to the equal error rate.

For multi-label problems, calculate the EER threshold in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:: predictions (Iterable[tuple[str, Sequence[int], Sequence[float]]]) – An iterable of multiple models.classify.typing.Prediction’s.
Returns:: The EER threshold value.
Return type:: float

mednet.engine.classify.evaluator.maxf1_threshold(predictions)[source]¶

Calculate the threshold leading to the maximum F1-score on a precision- recall curve.

For multi-label problems, calculate the maximum F1-core threshold in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:: predictions (Iterable[tuple[str, Sequence[int], Sequence[float]]]) – An iterable of multiple models.classify.typing.Prediction’s.
Returns:: The threshold value leading to the maximum F1-score on the provided set of predictions.
Return type:: float

mednet.engine.classify.evaluator.run(name, predictions, binning, threshold_a_priori=None)[source]¶

Run inference and calculates measures for binary or multilabel classification.

For multi-label problems, calculate the metrics in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:

name (str) – The name of subset to load.
predictions (Iterable[tuple[str, Sequence[int], Sequence[float]]]) – A list of predictions to consider for measurement.
binning (str | int) – The binning algorithm to use for computing the bin widths and distribution for histograms. Choose from algorithms supported by numpy.histogram().
threshold_a_priori (float | None) – A threshold to use, evaluated a priori, if must report single values. If this value is not provided, an a posteriori threshold is calculated on the input scores. This is a biased estimator.

Returns:

A tuple containing the following entries:

summary: A dictionary containing the performance summary on the specified threshold, general performance curves (under the key curves), and score histograms (under the key score-histograms).

Return type:

dict[str, Any]

mednet.engine.classify.evaluator.make_table(data, fmt)[source]¶

Tabulate summaries from multiple splits.

This function can properly tabulate the various summaries produced for all the splits in a prediction database.

Parameters:

data (Mapping[str, Mapping[str, Any]]) – An iterable over all summary data collected.
fmt (str) – One of the formats supported by python-tabulate.

Returns:

A string containing the tabulated information.

Return type:

str

mednet.engine.classify.evaluator.make_plots(results)[source]¶

Create plots for all curves and score distributions in results.

Parameters:: results (dict[str, dict[str, Any]]) – Evaluation data as returned by run().
Return type:: list
Returns:: A list of figures to record to file