我们评价二分类模型的预测效果的时候通常需要查看混淆矩阵。
那么在Python里面如何用sklearn库计算混淆矩阵呢?
当我们知道了二分类变量y的预测值和实际值的时候,就可以计算混淆矩阵了,我们这里自己随便生成几个数据演示一下
import sklearn
Y_real= [1,0,1,1,1,0,0,0,0,0]
Y_predict=[0,0,0,0,1,1,0,0,0,1]
#如何计算混淆矩阵
confusion_matrix_1=sklearn.metrics.confusion_matrix(Y_real,Y_predict)
print("混淆矩阵如下:",confusion_matrix_1,sep="\n")
#如何获取分类报告
r_1 = sklearn.metrics.classification_report(Y_real,Y_predict)
print("分类报告如下所示:",r_1,sep="\n")
执行结果如下
混淆矩阵如下:
[[4 2]
[3 1]]
分类报告如下所示:
precision recall f1-score support
0 0.57 0.67 0.62 6
1 0.33 0.25 0.29 4
accuracy 0.50 10
macro avg 0.45 0.46 0.45 10
weighted avg 0.48 0.50 0.48 10
还可以看下混淆矩阵函数的帮助文件
In [11]: help(sklearn.metrics.confusion_matrix)
Help on function confusion_matrix in module sklearn.metrics._classification:
confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None)
Compute confusion matrix to evaluate the accuracy of a classification.
By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}`
is equal to the number of observations known to be in group :math:`i` and
predicted to be in group :math:`j`.
Thus in binary classification, the count of true negatives is
:math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is
:math:`C_{1,1}` and false positives is :math:`C_{0,1}`.
Read more in the :ref:`User Guide <confusion_matrix>`.
Parameters
----------
y_true : array-like of shape (n_samples,)
Ground truth (correct) target values.
y_pred : array-like of shape (n_samples,)
Estimated targets as returned by a classifier.
labels : array-like of shape (n_classes), default=None
List of labels to index the matrix. This may be used to reorder
or select a subset of labels.
If ``None`` is given, those that appear at least once
in ``y_true`` or ``y_pred`` are used in sorted order.
sample_weight : array-like of shape (n_samples,), default=None
Sample weights.
.. versionadded:: 0.18
normalize : {'true', 'pred', 'all'}, default=None
Normalizes confusion matrix over the true (rows), predicted (columns)
conditions or all the population. If None, confusion matrix will not be
normalized.
Returns
-------
C : ndarray of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th
column entry indicates the number of
samples with true label being i-th class
and predicted label being j-th class.
暂无数据