Figure 5 simulates a screening test in a low prevalence setting, where ground truth negatives are significantly more common than Ground Truth Positive. An example of this scenario is found in cervical cancer screening using pap smear cytology, where significant false positive rates (cell abnormalities of unknown importance) can be expected and for which positive test results do not necessarily give high confidence in the presence of a high-level disease [28,29]. if a „true positive“ is the event that the test makes a positive prediction and the test has a positive result according to the gold standard, and a „false positive“ is the event that the test makes a positive prediction and the subject has a negative result below the gold standard. The ideal APP value for a perfect test is 1 (100%), and the worst possible value would be zero. Look at the evaluations of two evaluators (or experts, judges, diagnostic procedures, etc.) summarized in Table 1: To avoid confusion, we advise you to always use the terms positive agreement (FTA) and negative agreement (APN) to describe the compliance of these tests. We examine the results of two evaluators who issue polytomic evaluations (classified or purely nominal). Let C denote the number of categories or levels of evaluation. The results for both evaluators can be grouped as Table C × C, for example. B Table 2. Ground Truth: the actual positive or negative state of a subject in a binary classification scheme. In addition, Cohens` (1960) criticism of po can be considered: even in hypothetical evaluators who, by chance, guess each case according to probabilities corresponding to the observed base rates, it can be high.
In this example, if both evaluators simply suspected „positively“ the vast majority of times, they would normally agree on the diagnosis. Cohen proposed to remedy this by comparing po with a corresponding amount, pc, the share of consent expected by random reviewers. As described on the Kappa coefficients page, this logic is debatable; in particular, it is not clear what the advantage is of comparing a real level of concordance, po, with a hypothetical value, pc, which would occur under a manifestly unrealistic model. . . .