如何用sklearn库计算混淆矩阵

我们评价二分类模型的预测效果的时候通常需要查看混淆矩阵。

那么在Python里面如何用sklearn库计算混淆矩阵呢？

当我们知道了二分类变量y的预测值和实际值的时候，就可以计算混淆矩阵了，我们这里自己随便生成几个数据演示一下

import sklearn

Y_real= [1,0,1,1,1,0,0,0,0,0]

Y_predict=[0,0,0,0,1,1,0,0,0,1]

#如何计算混淆矩阵

confusion_matrix_1=sklearn.metrics.confusion_matrix(Y_real,Y_predict)

print("混淆矩阵如下：",confusion_matrix_1,sep="\n")

#如何获取分类报告

r_1 = sklearn.metrics.classification_report(Y_real,Y_predict)

print("分类报告如下所示：",r_1,sep="\n")

执行结果如下

混淆矩阵如下：

[[4 2]

[3 1]]

分类报告如下所示：

precision recall f1-score support

0 0.57 0.67 0.62 6

1 0.33 0.25 0.29 4

accuracy 0.50 10

macro avg 0.45 0.46 0.45 10

weighted avg 0.48 0.50 0.48 10

还可以看下混淆矩阵函数的帮助文件

In [11]: help(sklearn.metrics.confusion_matrix)

Help on function confusion_matrix in module sklearn.metrics._classification:

confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)

Compute confusion matrix to evaluate the accuracy of a classification.

By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`

is equal to the number of observations known to be in group :math:`i` and

predicted to be in group :math:`j`.

Thus in binary classification, the count of true negatives is

:math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is

:math:`C_{1,1}` and false positives is :math:`C_{0,1}`.

Read more in the :ref:`User Guide <confusion_matrix>`.

Parameters

----------

y_true : array-like of shape (n_samples,)

Ground truth (correct) target values.

y_pred : array-like of shape (n_samples,)

Estimated targets as returned by a classifier.

labels : array-like of shape (n_classes), default=None

List of labels to index the matrix. This may be used to reorder

or select a subset of labels.

If ``None`` is given, those that appear at least once

in ``y_true`` or ``y_pred`` are used in sorted order.

sample_weight : array-like of shape (n_samples,), default=None

Sample weights.

.. versionadded:: 0.18

normalize : {'true', 'pred', 'all'}, default=None

Normalizes confusion matrix over the true (rows), predicted (columns)

conditions or all the population. If None, confusion matrix will not be

normalized.

Returns

-------

C : ndarray of shape (n_classes, n_classes)

Confusion matrix whose i-th row and j-th

column entry indicates the number of

samples with true label being i-th class

and predicted label being j-th class.