Cohen Kappa Statistic
Cohen’s Kappa Score, also known as the Kappa Coefficient, is a statistic used to measure the performance of machine learning classification models. Its formula incorporates the traditional 2x2 confusion matrix, employed in machine learning and statistics, to evaluate binary classifiers.
The Cohen’s Kappa Coefficient is named after statistician Jacob Cohen, who developed the metric in 1960.
Cohen’s Kappa score (κ) measures the agreement level between inter-and intra-raters’ reliability for categorical items. It is generally considered a more robust measure than a simple percent agreement calculation, as it considers the possibility of the agreement occurring by chance.
It is generally used when there are two raters but can also be adapted with more than two.
For artificial intelligence and machine learning binary classification models, one of the raters becomes the classification model, and the other rater becomes a real-world observer who knows the actual truth about the categories of each record or dataset.
Cohen’s Kappa considers the number of agreements (True Positives and True Negatives) and the number of disagreements between the raters (False Positives and False Negatives).
Cohen Kappa can calculate the overall agreement after chance has been considered.
Cohen’s Kappa score can be defined as the metric used to measure the performance of machine learning classification models based on assessing the perfect agreement and agreement by chance between the two raters (real-world observer and the classification model).
The Cohen Kappa score can be used to compare the agreement of two or more raters on various tasks, including diagnostic tasks, assessment tasks, and behavior rating tasks. Cohen’s Kappa is a reliable and valid measure of inter-rater agreement.
Cohen’s Kappa can be calculated using raw data or confusion matrix values. When Cohen’s Kappa is calculated using raw data, each row in the data represents a single observation, and each column represents a rater’s classification of that observation. Cohen’s Kappa can also be calculated using a confusion matrix containing the counts of true positives, false positives, true negatives, and false negatives for each class. We will examine how the confusion matrix can calculate the Kappa score.
In binary classification tasks, Cohen’s Kappa is often used as a quality measure for data annotations, which is inconsistent with its original purpose as an inter-annotator consistency measure. The analytic relationship between Kappa and commonly used classification metrics (e.g., sensitivity and specificity) is nonlinear and thus is challenging to apply for interpretation of the classification performance (merely from the knowledge of the kappa value) annotations.
A Cohen Kappa score can also be used to assess the performance of the multi-class classification model.
科恩的卡帕分数,也称为卡帕系数,是一种用于衡量机器学习分类模型性能的统计数据。它的公式结合了传统的2x2混淆矩阵,用于机器学习和统计,以评估二元分类器。
科恩的卡帕系数以统计学家雅各布·科恩(Jacob Cohen)的名字命名,他于1960年开发该指标。
科恩的卡帕分数(κ)衡量内部和内部评估者对分类项目的可靠性之间的一致性水平。它通常被认为是比简单的一致性百分比计算更可靠的度量,因为它考虑了偶然发生一致性的可能性。
它通常在有两个评估者时使用,但也可以适应两个以上的评估者。
对于人工智能和机器学习二元分类模型,其中一个评分者成为分类模型,另一个评分者成为现实世界的观察者,他知道每个记录或数据集类别的真实情况。
科恩卡帕考虑了一致的数量(真阳性和真阴性)和评分者之间的分歧数量(假阳性和假阴性)。
科恩卡帕可以在考虑机会后计算出总体一致性。
科恩的卡帕分数可以定义为用于衡量机器学习分类模型性能的指标,该指标基于评估两个评分者(真实世界观察者和分类模型)之间的完美一致性和偶然一致性。
科恩卡帕分数可用于比较两个或多个评分者对各种任务的一致性,包括诊断任务、评估任务和行为评分任务。 科恩的卡帕是一种可靠且有效的评估者间一致性衡量标准。
科恩的卡帕可以使用原始数据或混淆矩阵值来计算。当使用原始数据计算科恩的卡帕时,数据中的每一行代表一个观察值,每一列代表评估者对该观察值的分类。科恩的卡帕也可以使用包含每个类别的真阳性、假阳性、真阴性和假阴性计数的混淆矩阵来计算。我们将研究混淆矩阵如何计算卡帕分数。
在二进制分类任务中,科恩的卡帕经常被用作数据注释的质量度量,这与其作为注释者间一致性度量的最初目的不一致。卡帕与常用分类指标(例如灵敏度和特异性)之间的分析关系是非线性的,因此很难应用于分类性能(仅从卡帕值的知识)注释的解释。
科恩卡帕分数也可用于评估多类分类模型的性能。
— -
信源:人工智能先锋通讯 |丹尼·布维尼克
Credit: The AI Vanguard Newsletter | Danny Butvinik. Share & Translate: Chinou Gea (秦陇纪) @2023 DSS-SDS, IFS-AHSC. Data Simplicity Community Facebook Group. https://m.facebook.com/groups/290760182638656/ #DataSimp #DataScience #computing #AI #ArtificialIntelligence #MachineLearning #ML.