mednet.engine.classify.evaluator¶

Defines functionality for the evaluation of classification predictions.

Functions

`eer_threshold`(predictions)	Calculate the (approximate) threshold leading to the equal error rate.
`make_plots`(results)	Create plots for all curves and score distributions in `results`.
`make_plots_multi`(results)	Create plots for all curves and score distributions in `results`.
`make_plots_single`(results)	Create plots for all curves and score distributions in `results`.
`make_table`(data, fmt)	Tabulate summaries from multiple splits.
`maxf1_threshold`(predictions)	Calculate the threshold leading to the maximum F1-score on a precision- recall curve.
`run`(name, predictions, binning, rng[, ...])	Run inference and calculates measures for binary, multilabel or multiclass classification.
`run_multi`(predictions)	Run inference and calculates measures for multiclass classification.
`run_single`(name, predictions, binning, rng)	Run inference and calculates measures for binary or multilabel classification.

mednet.engine.classify.evaluator.eer_threshold(predictions)[source]¶

Calculate the (approximate) threshold leading to the equal error rate.

For multi-label problems, calculate the EER threshold in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:: predictions (Iterable[tuple[str, Sequence[int], Sequence[float]]]) – An iterable of multiple models.classify.typing.Prediction’s.
Returns:: The EER threshold value.
Return type:: float

mednet.engine.classify.evaluator.maxf1_threshold(predictions)[source]¶

Calculate the threshold leading to the maximum F1-score on a precision- recall curve.

For multi-label problems, calculate the maximum F1-core threshold in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:: predictions (Iterable[tuple[str, Sequence[int], Sequence[float]]]) – An iterable of multiple models.classify.typing.Prediction’s.
Returns:: The threshold value leading to the maximum F1-score on the provided set of predictions.
Return type:: float

mednet.engine.classify.evaluator.run_single(name, predictions, binning, rng, threshold_a_priori=None, credible_regions=False)[source]¶

Run inference and calculates measures for binary or multilabel classification.

For multi-label problems, calculate the metrics in the “micro” sense by first rasterizing all scores and labels (with numpy.ravel()), and then using this (large) 1D vector like in a binary classifier.

Parameters:

name (str) – The name of subset to load.
predictions (Sequence[tuple[str, Sequence[int], Sequence[float]]]) – A list of predictions to consider for measurement.
binning (str | int) – The binning algorithm to use for computing the bin widths and distribution for histograms. Choose from algorithms supported by numpy.histogram().
rng (Generator) – An initialized numpy random number generator.
threshold_a_priori (float | None) – A threshold to use, evaluated a priori, if must report single values. If this value is not provided, an a posteriori threshold is calculated on the input scores. This is a biased estimator.
credible_regions (bool) – If set to True, then returns also credible intervals via credible.bayesian.metrics. Notice the evaluation of ROC-AUC and Average Precision confidence margins can be rather slow for larger datasets.

Returns:

A dictionary containing the performance summary on the specified threshold, general performance curves (under the key curves), and score histograms (under the key score-histograms).

Return type:

dict[str, Any]

mednet.engine.classify.evaluator.make_table(data, fmt)[source]¶

Tabulate summaries from multiple splits.

This function can properly tabulate the various summaries produced for all the splits in a prediction database.

Parameters:

data (Mapping[str, Mapping[str, Any]]) – An iterable over all summary data collected.
fmt (str) – One of the formats supported by python-tabulate.

Returns:

A string containing the tabulated information.

Return type:

str

mednet.engine.classify.evaluator.make_plots_single(results)[source]¶

Create plots for all curves and score distributions in results.

Parameters:: results (dict[str, dict[str, Any]]) – Evaluation data as returned by run_single().
Return type:: list
Returns:: A list of figures to record to file

mednet.engine.classify.evaluator.run_multi(predictions)[source]¶

Run inference and calculates measures for multiclass classification.

It computes the recall, precision, f1, auc and aupr for each single class and also with both macro and micro averaging. ROC curves are all computed following the OvR scheme macro-averaged.

Parameters:: predictions (Sequence[tuple[str, Sequence[int], Sequence[float]]]) – A list of predictions to consider for measurement.
Returns:: A dictionary containing the performance summary, general performance curves (under the key curves), and confusion matrices (under the key confusion_matrix).
Return type:: dict[str, Any]

mednet.engine.classify.evaluator.make_plots_multi(results)[source]¶

Create plots for all curves and score distributions in results.

Parameters:: results (dict[str, dict[str, Any]]) – Evaluation data as returned by run_multi().
Return type:: list
Returns:: A list of figures to record to file

mednet.engine.classify.evaluator.run(name, predictions, binning, rng, threshold_a_priori=None, credible_regions=False)[source]¶

Run inference and calculates measures for binary, multilabel or multiclass classification. It autocmatically detects the problem type, and route the arguments to the pertinent functions (run_single() for binary and multilabel, run_multi() for multiclass classification problem).

Parameters:

name (str) – The name of subset to load.
predictions (Sequence[tuple[str, Sequence[int], Sequence[float]]]) – A list of predictions to consider for measurement.
binning (str | int) – The binning algorithm to use for computing the bin widths and distribution for histograms. Choose from algorithms supported by numpy.histogram().
rng (Generator) – An initialized numpy random number generator.
threshold_a_priori (float | None) – A threshold to use, evaluated a priori, if must report single values. If this value is not provided, an a posteriori threshold is calculated on the input scores. This is a biased estimator.
credible_regions (bool) – If set to True, then returns also credible intervals via credible.bayesian.metrics. Notice the evaluation of ROC-AUC and Average Precision confidence margins can be rather slow for larger datasets.

Returns:

For binary and multilabel classification problem: a dictionary containing the performance summary on the specified threshold, general performance curves (under the key curves), and score histograms (under the key score-histograms). For multiclass classification problem: a dictionary containing the performance summary, general performance curves (under the key curves), and confusion matrices (under the key confusion_matrix).

Return type:

dict[str, Any]

mednet.engine.classify.evaluator.make_plots(results)[source]¶

Create plots for all curves and score distributions in results.

Parameters:: results (dict[str, dict[str, Any]]) – Evaluation data as returned by run().
Return type:: list
Returns:: A list of figures to record to file