Open Access
5 July 2023 Roe and Metz identical-test simulation model for validating multi-reader methods of analysis for comparing different radiologic imaging modalities
Author Affiliations +
Abstract

Purpose

The most frequently used model for simulating multi-reader multi-case (MRMC) data that emulate confidence-of-disease ratings from diagnostic imaging studies has been the Roe and Metz model, proposed by Roe and Metz in 1997 and later generalized by Hillis (2012), Abbey et al. (2013), and Gallas and Hillis (2014). These models have been used for evaluating MRMC analysis and sample size methods. The models suggested in these papers for assessing type I error have been null models, where the expected area under the receiver-operating-characteristic curve across readers is the same for each test. However, for these null models, there are other differences that would not exist if the two tests were identical. None of the papers mentioned above discuss how to formulate a null model that is also an identical-test model, where the two tests are identical in all respects. The purpose of this paper is to show how to formulate a Roe and Metz identical-test model and to show its usefulness for validating the error covariance constraints employed by the Obuchowski-Rockette (1995) method.

Approach

For a given Roe-and-Metz model, the corresponding Roe-and-Metz identical-test model is derived by modifying the Roe and Metz null model under the assumption that the two tests are identical.

Results

The importance of the Obuchowski-Rockette model constraints for avoiding negative variance estimates is established using data simulated from the Roe and Metz identical-test model. It is also shown that negative variance estimates can occur at nontrivial rates when the two tests are not identical but somewhat “close” to being identical.

Conclusions

The findings of this paper are important because it has recently been shown (Hillis, 2022) that the commonly used MRMC method proposed by Gallas (2006) and Gallas et al. (2009) uses the same test statistic as the unconstrained Obuchowski-Rockette method.

1.

Introduction

For the typical diagnostic radiology study, several readers (usually 4 to 10 radiologists) assign confidence-of-disease ratings to each case (i.e., subject) based on one or more corresponding radiologic images, using one or more tests (typically imaging modalities), with the numbers of diseased and nondiseased cases each typically between 25 and 100. The resulting data are called multi-reader multi-case (MRMC) data. These studies are typically used to compare different imaging modalities with respect to reader performance. Often measures of reader performance are functions of the estimated receiver-operating-characteristic (ROC) curve, such as the area under the ROC curve (AUC). Throughout we assume AUC is the reader performance metric of interest. Two commonly used methods for analyzing reader performance outcomes that allows conclusions to generalize to both the reader and case populations are the method proposed by Obuchowski and Rockette1 and later modified by Hillis,2 which will be referred to as the “OR” method, and the method proposed by Gallas3 and Gallas et al.,4 which will be referred to as the “Gallas” method.

The most frequently used model for simulating MRMC data has been the model first proposed by Roe and Metz5 and later generalized by Hillis,6 Abbey7 and Gallas and Hillis.8 We will refer to each of these models as a “Roe and Metz” or “RM” model when there is no need to distinguish between them. These RM models have been used for evaluating MRMC analysis and sample size methods. As discussed by Hillis,9 these RM models generate continuous confidence-of-disease ratings based on an underlying binormal model for each reader, with the separation between the normal and abnormal rating distributions varying across readers.

The parameter settings included in the original RM paper5 result in RM “null” models, where the mean AUC across readers is the same for each test. These null models are useful for evaluating the performance of MRMC methods with respect to type I error for the hypothesis of equal test AUCs. However, these null models can result in correlations for the simulated ratings that would be different if the two tests were identical. For example, it will be shown that between-test correlations of case ratings generated from the RM null model are less than or equal to corresponding correlations when the two tests are identical.

An RM null model where the two tests are identical will be referred to as an “identical-test” model. Although there is no reason to compare two tests that are known to be identical, sometimes it is of interest to compare two tests that are quite similar is most respects, e.g., when the two tests are the same imaging modality but used with slightly different radiation doses. For this situation a researcher likely would want to test if the lower-dose modality is noninferior or equivalent to the higher dose modality. For such situations, it is important to know that the MRMC analysis method being used performs well when the tests are close to being identical, not only in terms of AUC, but in other ways.

A discussion of how to determine parameter settings that result in an identical-test model is not provided in the original RM paper or in any of the previously mentioned papers that generalize the original RM model. The purpose of this paper is to show how to formulate an RM identical-test model and to show its usefulness for validating the need for the error covariance constraints employed by the OR method. A summary of the paper is as follows: a review of the various RM models is provided in Sec. 2, the definition and derivation of an RM identical-test model are provided in Sec. 3 with illustrative examples in Sec. 4, a brief review of the conventional OR, unconstrained OR and Gallas methods is provided in Sec. 5 with simulation studies comparing the methods in Sec. 6, a discussion of how a negative OR variance can occur is presented in Sec. 7 with illustrative simulation studies in Sec. 8, followed by a summary and discussion in Sec. 9.

2.

Roe and Metz null Models: Original, Constrained, and Unconstrained Unequal-Variance

2.1.

Original RM Null Model

Let X denote a confidence-of-disease rating assigned by a reader to a case; X is often called a decision variable (DV). The original RM simulation model proposed by Roe and Metz5 is a mixed four-factor (test, reader, case, and truth) ANOVA model for X with case nested within truth; test, reader, and truth crossed; test and truth treated as fixed factors; and reader and case treated as random factors.

Using their notation, their null model is given as

Eq. (1)

Xijkt=μposI{t=+}+Rjt+Ckt+(τR)ijt+(τC)ikt+(RC)jkt+(τRC)ijkt+Eijkt,
where Xijkt denotes the confidence-of-disease rating assigned to case k of truth state t by reader j when reading under test i, with t = “−” indicating a nondiseased case and t = “+” indicating a diseased case. Here μpos is the expected difference in the means for the diseased and nondiseased DV distributions, I{t=+} is an indicator function that takes the value 1 when t=+ and 0 when t=, Rjt is the interaction effect of reader j and truth state t, Ckt is the effect of case k nested within truth state t, the multiple symbols in parentheses denote interactions, and Eijkt is the error term. By comparison, the nonnull model given by Roe and Metz is the same as Eq. (1) except that it also includes a test-by-truth interaction term, denoted by τit, which is implicitly set to zero in the null model Eq. (1).

All effects in Eq. (1) are random except for μpos. The random effects are mutually independent and normally distributed with zero means. Roe and Metz denote the corresponding variance components by σR2, σC2, στR2, στC2, σRC2, στRC2, and σE2. They note that στRC2 and σE2 cannot be estimated separately for this model with no replications, and hence define

Eq. (2)

σε2στRC2+σE2.

Although not mentioned by Roe and Metz, the omission of effects that do not depend on truth is justified by the invariance of the ROC curve to location shifts; thus, inclusion of these terms would not change the ROC curve for a given reader. Note that interactions with truth are denoted only by a t subscript in Eq. (1). Roe and Metz constrain the sum of the error variance and variance components involving case to be equal to one:

Eq. (3)

σC2+στC2+σRC2+σε2=1.

It can be shown (e.g., Hillis9) that the reader nondiseased and diseased DV distributions have unit variances (and hence their ROC curves are symmetric about the negative 45 deg diagonal), with the reader true AUCs varying across the reader population and having the same expectation for each test. Furthermore, a randomly selected reader has the same ROC curve under each test.

2.2.

Constrained and Unconstrained Unequal-Variance RM Null Models

In practice, estimated binormal-model nondiseased and diseased DV variances for a fixed reader are often different, with diseased subjects typically having more variable case ratings. To better emulate real data, Hillis6 modified the original RM model by allowing the error variance and variance components involving case to depend on truth, with variance components involving diseased cases set equal to those involving normal cases multiplied by the factor 1/b2, b>0. Specifically, the null model is given by Eq. (1) with variance components (using an obvious notation) denoted as

Eq. (4)

σR2,στR2,σC()2,στC()2,σRC()2,σε()2,σC(+)2,στC(+)2,σRC(+)2,σε(+)2,
with

Eq. (5)

σC(+)2=b2σC()2,στC(+)2=b2στC()2,σRC(+)2=b2σRC()2,σε(+)2=b2σε()2.

Similar to Eq. (3), the constraint

Eq. (6)

σC()2+στC()2+σRC()2+σε()2=1,
is imposed. It follows that
σC(+)2+στC(+)2+σRC(+)2+σε(+)2=b2.

Following Hillis,6 we refer to this as the “constrained unequal-variance RM null model.” It follows6 from Eq. (5) that setting b=1 results in the original RM model and that b is the conventional binormal-model slope coefficient for each reader’s ROC curve.

A more general RM null model, called the “unconstrained unequal-variance RM null model” by Hillis,9 results if the variance components σC(+)2, στC(+)2, σRC(+)2, and σε(+)2 are not constrained to satisfy any particular relationship with σC()2, στC()2, σRC()2, and σε()2. This model includes the original and constrained unequal-variance RM null models as special applications.

2.3.

Comparison of the RM Null Models

The original RM null model and the constrained and unconstrained unequal-variance RM null models all have the same mixed linear model formulation, given by Eq. (1); all of them also constrain the sum of the variance components corresponding to effects involving nondiseased cases to be equal to 1, as given by Eq. (6). The null models differ only with respect to their constraints on the variance components corresponding to effects involving diseased cases, with the original RM model requiring that the variance components be the same as those for the nondiseased cases, the constrained unequal-variance model requiring that they differ by a factor of 1/b2 from those for the nondiseased cases, and the unconstrained unequal-variance model not placing any constraints on them.

3.

Proposed RM Identical-Test Model

3.1.

Definition of Identical-Test Model

I define two tests to be “identical” if they are the same in all respects. I will derive an RM identical-test model by applying this definition to an unconstrained unequal-variance RM null model; since this model includes the original and constrained unequal-variance RM null models as specific applications, the derivation can also be applied to those models. Recall that the unconstrained unequal-variance RM null model is defined by mixed linear model Eq. (1) with variance components given by Eq. (4) subject only to constraint Eq. (6).

3.2.

Derivation of an RM Identical-Test Model

In this section I derive the RM identical-test model by modifying the unconstrained unequal-variance RM null model. The definition of identical tests implies that model effects (excluding the error term) cannot differ by test in an RM identical-test model. Thus, if tests i=1 and i=2 are identical, it follows that model effects in Eq. (1) that include test do not depend on the value of the test subscript i. Specifically, (τR)ijt, (τC)ikt, and (τRC)ijkt in Eq. (1) cannot depend on the value of subscript i; hence (τR)1jt=(τR)2jt, (τC)1kt=(τC)2kt, and (τRC)1jkt=(τRC)2jkt.

Thus I can derive the RM identical-test model from the unequal-variance RM null model using the following result.

Result 1. Setting test subscript values (i) in Eq. (1) equal to 1 for model effects (excluding the error term) results in the corresponding RM identical-test model.

Applying Result 1 results in none of the model effects that include test depending on the value of the test subscript, since it will be the same for all of these effects.

Applying Result 1 to the unequal-variance RM null model given by mixed linear model Eq. (1) with variance components Eq. (4) subject only to constraint Eq. (6) results in the identical-test RM null model

X˜ijkt=μposI{t=+}+Rjt+Ckt+(τR)1jt+(τC)1kt+(RC)jkt+(τRC)1jkt+Eijkt,
where X˜ijkt is the identical-test model DV. Consolidating random effects results in the equivalent model

Eq. (7)

X˜ijkt=μposI{t=+}+R˜jt+C˜kt+(RC˜)jkt+E˜ijkt,
where
R˜jt=Rjt+(τR)1jt,C˜kt=Ckt+(τC)1kt,(RC˜)jkt=(RC)jkt+(τRC)1jkt,E˜ijkt=Eijkt.

Corresponding variance components for R˜jt, C˜kt, (RC˜)jkt, and E˜ijkt are given as

Eq. (8)

σR˜2=σR2+στR2,

Eq. (9)

σC˜(t)2=σC(t)2+στC(t)2,

Eq. (10)

σRC˜(t)2=σRC(t)2+στRC(t)2,

Eq. (11)

σE˜(t)2=σε(t)2στRC(t)2.

It follows from Eqs. (6) and (9)–(11) that

Eq. (12)

σC˜()2+σRC˜()2+σE˜()2=1.

In summary, the RM identical-test model derived from the unconstrained unequal-variance null RM model is given by model Eq. (7) with variance components Eqs. (8)–(11) and constraint Eq. (12).

Because the original, constrained unequal variance, and unconstrained unequal variance RM null models specify values for σε(t)2=στRC(t)2+σE(t)2 without specifying specific values for either σE(t)2 or στRC(t)2, values must be assigned to στRC(t)2, t=+, for the null model in order to determine values for σRC˜(t)2 and σE˜(t)2 in the identical-test model using Eqs. (10) and (11).

For simplicity, for the remainder of this paper I will assume

Eq. (13)

στRC()2=στRC(+)2=0
in the unconstrained unequal variance RM model, resulting in

Eq. (14)

σRC˜(t)2=σRC(t)2,

Eq. (15)

σE˜(t)2=σε(t)2,
in the identical-test model. On the other hand, if the values for στRC()2 or στRC(+)2 are specified, then the values for σRC˜(t)2 and σE˜(t)2 can be computed using Eqs. (10) and (11).

When using the identical-test model for simulations, ratings X1jkt and X2jkt are simulated, corresponding to tests 1 and 2, respectively. But since the only term on the right of Eq. (7) that depends on test is the error term, it follows that for a given reader, case, and truth status, the ratings for the two tests will differ only because their error term values will not be the same.

Note that because the derivation was based on an RM null model, the resulting RM identical-test model is also an RM null model and is a specific application of the unconstrained unequal-variance RM null model.

3.3.

Comparison of the RM Null Model and the Corresponding RM Identical-Test Model

The following relationships for the ratings generating from an unconstrained unequal-variance RM null model and its corresponding RM identical-test model Eqs. (7)–(15) can be shown.

  • 1. Conditional on disease status, the RM null model and the corresponding RM identical-test model result in the same rating distributions for both tests. Specifically, for either test 1 (i=1) or test 2 (i=2), Xijk and X˜ijk have N(0,σR2+στR2+1) distributions and X2jk+ and X˜1jk+ have N(0,σR2+στR2+σC(+)2+στC(+)2+σRC(+)2+σε(+)2) distributions, where “N(0,σ2)” indicates a normal distribution with mean 0 and variance σ2.

  • 2. Within-test rating covariances are the same for both models. Specifically, for either test 1 (i=1) or test 2 (i=2), cov(Xijkt,Xijkt)=cov(X˜ijkt,X˜ijkt). (Note: here and for relationship 3 below we do not assume jj,kk or tt.) For example, the covariance between ratings for two different nondiseased cases for the same reader and test is given by covkk(Xijk,Xijk)=σR2+στR2=σR˜2=covkk(X˜ijkt,X˜ijkt), and hence is the same for both models.

  • 3. For the RM null model, between-test covariances are the same or less than corresponding RM identical-test model between-test covariances. That is, cov(X1jkt,X2jkt)cov(X˜1jkt,X˜1jkt). For example, the covariance between ratings for two different nondiseased cases for the same reader but for different tests is given as covkk(X1jk,X2jk)=σR2σR2+στR2=σR˜2=covkk(X˜1jk,X˜2jk).

In summary, we see that the only difference between the rating distributions for the two models is that the between-test covariances for the unconstrained unequal-variance RM model can be less than those for the RM identical-test model.

3.4.

RM Identical-Test Model Expressed in Terms of a Null Unconstrained Unequal-Variance RM Model with Altered Variance Components

It follows from Eqs. (7)–(15) that the RM identical-test model can be expressed in terms of an unconstrained unequal-variance RM null model, with RM identical-test variance components (indicated by an overline) defined in terms of the unconstrained unequal-variance RM null model variance components as follows, with t=,+

Eq. (16)

σR2¯=σR2+στR2,

Eq. (17)

στR2=0,

Eq. (18)

σC(t)2=σC(t)2+στC(t)2,

Eq. (19)

στC(t)2¯=0,

Eq. (20)

σRC(t)2=σRC(t)2,

Eq. (21)

στRC(t)2=0,

Eq. (22)

σE(t)2=σε(t)2.

The advantage of this approach is that for simulations, an unconstrained unequal-variance RM null model that is already programmed can be easily modified to produce identical-test simulations by altering the values of the null model variance components using Eqs. (16)–(22).

3.5.

General Definition of an RM Identical-Test Model

It follows from Eqs. (12) and (16)–(22) that an unconstrained unequal-variance null RM is an RM identical-test model if it can be expressed by mixed linear model Eq. (1) with

Eq. (23)

στR2=στC(+)2=στC()2=στRC(+)2=στRC()2=0
and

Eq. (24)

σC()2+σRC()2+σE()2=1.

This result can also be applied to original RM null or constrained unequal-variance RM null models, since they are specific applications of the unconstrained unequal-variance RM null model. In particular, it follows from Eqs. (5), (23), and (24) that a constrained unequal-variance null RM or an original RM null model is an RM identical-test model if it can be expressed by mixed linear model Eq. (1) with

Eq. (25)

στR2=στC()2=στRC()2=0
and constraint Eq. (24).

4.

Examples of RM Null and RM Identical-Test Models

Table 1 illustrates the derivation of several RM identical-test models from RM null models using Eqs. (16)–(22). In row 1 of the table are the parameter values for one of the RM null models proposed by Roe and Metz.5 In row 2 are the variance components for the corresponding RM identical-test model, computed using Eqs. (16)–(22).

Table 1

Examples of RM null models and corresponding RM identical-test models.

RowRM modelModel typeμ+(Az)aσC(−)2στC(−)2σRC(−)2σε(−)2σC(+)2στC(+)2σRC(+)2σε(+)2σR2στR2
1(a) Original5Null1.50(0.856)0.30.30.20.20.30.30.20.20.00550.0055
2Identical-test1.50(0.856)0.60.00.20.20.60.00.20.20.01100.0000
3(b) Const. unequal var.6Null1.831(0.856)0.30.30.20.20.5930.5930.400.400.00820.0082
4Identical-test1.831(0.856)0.60.00.20.21.1860.00.400.400.01640.0000
5(c) Unconst. unequal var.9Null1.50(0.826)0.30.30.20.20.40.450.250.350.0070.004
6Identical-test1.50(0.826)0.60.00.20.20.850.000.250.350.0110.000
Notes: “Const.” = constrained; “Unconst.” = unconstrained; and “var.” = variance; b = 0.771 for the constrained unequal variance RM model, RM model (b).

aAz is equal to the median AUC across the reader population; the purpose of the parentheses is to indicate that it is not an RM model parameter used for simulating data, but rather is included to provide additional information about the model. It is computed using Az=Φ(μ+/σ−2+σ+2), where σ−2=σC(−)2+στC(−)2+σRC(−)2+σε(−)2=1 and σ+2=σC(+)2+στC(+)2+σRC(+)2+σε(+)2.

Similarly, in row 3 are the parameter values for a constrained unequal variance null RM model given by Hillis,6 which has the same median AUC and the same variance components for random effects involving nondiseased cases as the original RM null model in row 1, but sets b=0.711 so that the median mean-to-sigma ratio10 will be 4.50. In row 4 are the corresponding identical-test parameter values, derived using Eqs. (16)–(22).

Finally, in row 5 is an unconstrained unequal variance null RM model6 with the corresponding identical-test model variance components, again derived using Eqs. (16)–(22), given in row 6.

5.

Review and Comparison of Conventional OR, Unconstrained OR, and Gallas MRMC Methods

5.1.

OR Method

The OR method assumes a test × reader factorial ANOVA model for AUC estimates and other reader performance measure estimates resulting from an MRMC study, with each AUC estimate corresponding to one reader using one of several tests (typically an imaging modality). Here we are assuming the study design discussed in the first paragraph of Sec. 1. Unlike a conventional ANOVA model, the errors are assumed to be correlated to account for correlation due to each reader evaluating the same cases.

The OR model is given as

Eq. (26)

θ^ij=μOR+τi:OR+Rj:OR+(τR)ij:OR+εij:OR,
where μOR is the fixed intercept term, τi:OR denotes the fixed effect of test i, Rj:OR denotes the random effect of reader j, (τR)ij:OR denotes the random test × reader interaction, and εij:OR is the error term. The Rj:OR and (τR)ij:OR are assumed to be mutually independent and normally distributed with zero means and respective variances σR:OR2 and στR:OR2. (We include “OR” in effect and variance component subscripts to distinguish OR effects and variance components from similarly notated RM-model quantities.) The εij:OR are assumed to be normally distributed with mean zero and variance σε:OR2 and are assumed uncorrelated with the Rj:OR and (τR)ij:OR. Three possible error covariances are assumed:
Cov(εij:OR,εij:OR)={Cov1ii,j=j(different test,same reader)Cov2i=i,jj(same test,different reader)Cov3ii,jj(different test,different reader).
The OR model assumes11

Eq. (27)

Cov1Cov3,  Cov2Cov3,  Cov30.
The OR model can alternatively be described with population correlations

Eq. (28)

ri=Covi/σε2,i=1,2,3,
instead of the covariances, i.e., with Covi replaced by riσε2, i=1,2,3.

These error variance-covariance parameters are typically estimated by averaging corresponding fixed-reader estimates computed using the jackknife,1214 bootstrap,14,15 or the method proposed by DeLong et al.16 (DeLong), with DeLong only for empirical AUC estimates. These three estimation methods are consistent but are not unbiased. An unbiased error covariance method, which we will refer to as the “unbiased” method, was recently proposed by Hillis17 for use when empirical AUC is the outcome. This method utilizes the unbiased method fixed-reader method discussed by Gallas (Ref. 3, p. 362) for estimating the error variance [which Gallas notes is equivalent to the expressions given by Bamber (Ref. 18, p. 402)] and extensions of it for estimating the error covariances. OR analysis using this method is included in the freely available R software package MRMCaov.19

5.2.

Conventional OR Test Statistic and Variance Estimate

The conventional OR test statistic for testing the null hypothesis of no test effect (H0:τ1==τNT) is given as

Eq. (29)

FOR=MS(T)MS(T*R)+max[NR(Cov^2Cov^3),0],
where MS(T*R)=1(NT1)(NR1)i=1NTj=1NR(θ^ijθ^iθ^j+θ^)2, MS(T)=NRNT1i=1NT(θ^iθ^)2, NT2 is the number of tests, NR is the number of readers and Cov^2 and Cov^3 are the Cov2 and Cov3 estimates. Here a subscript replaced by a dot indicates the average across the corresponding levels; e.g., θ^i=j=1NRθ^ij/NR. Under H0, F has an approximate F distribution with numerator degrees of freedom NT1 and denominator degrees of freedom2

Eq. (30)

ddfH{MS(T*R)+max[NR(Cov^2Cov^3),0]}2[MS(T*R)]2/[(NT1)(NR1)].

For NT=2 tests, Eq. (29) can be written in the form

Eq. (31)

FOR=(θ^1θ^2)2var^OR(θ^1θ^2),
where

Eq. (32)

var^OR(θ^1θ^2)=2NR{MS(T*R)+max[NR(Cov^2Cov^3),0]},
is the OR estimate for the variance of θ^1θ^2.

Note that Eqs. (29)–(32) incorporate the error-covariance constraints given in Eq. (27). We will sometimes refer to these as the “conventional OR” F statistics, denominator degrees of freedom estimate and variance estimate, to distinguish them from the unconstrained versions of these statistics discussed below.

5.3.

Unconstrained OR Test Statistics and Variance Estimate

The importance of the OR constraints given in Eq. (27) will be demonstrated by simulations in Sec. 6. In this section the OR test statistics and variance estimate are defined without constraints Eq. (27) imposed. Use of these unconstrained test statistics in place of Eqs. (29)–(32) will be called the “unconstrained OR” method.

The unconstrained OR test statistics, denominator degrees of freedom and variance are given as

Eq. (33)

FOR;unconstrained=MS(T)MS(T*R)+NR(Cov^2Cov^3),

Eq. (34)

ddfH;unconstrained={MS(T*R)+NR(Cov^2Cov^3),0}2[MS(T*R)]2/[(NT1)(NR1)],
and

Eq. (35)

FOR;unconstrained=(θ^1θ^2)2var^OR;unconstrained(θ^1θ^2),
where

Eq. (36)

var^OR;unconstrained(θ^1θ^2)=2NR{MS(T*R)+NR(Cov^2Cov^3)}.

Note that

Eq. (37)

FOR=FOR;unconstrained  if  Cov^2Cov^30,
and that Eq. (35) is not defined if Eq. (36) is not positive.

5.4.

Equivalence of Gallas and Unconstrained OR F Statistics

When the outcome is the empirical AUC and there are two tests, Hillis17 has shown that the Gallas method F statistic for testing the null hypothesis of no difference in test AUCs is equivalent to the unconstrained OR method F statistic Eq. (35) when the unbiased covariance estimation method is used to compute Cov^2 and Cov^3. However, the Gallas denominator degrees of freedom estimate differs from the conventional and unconstrained OR denominator degrees of freedom estimates.

5.5.

Relationship of OR Model and RM Identical-Test Model

Hillis9 derived the OR parameters for the distribution of empirical AUC estimates simulated using the unconstrained unequal-variance RM model. I show in Appendix A that it follows from these results that for data simulated from the unconstrained unequal-variance RM identical-test model

Eq. (38)

μOR+τ1:OR=μOR+τ2:OR,

Eq. (39)

Cov2=Cov3,

Eq. (40)

στR:OR2=0.

These results are intuitive. The first result states that the expected AUCs (as given by μOR+τi:OR for test i) must be the same for each test and the second result states that Cov2 and Cov3 must be equal, which makes sense since for equal tests the covariances have the same definition. To understand the third result, we note that it can be shown [Ref. 9, p. 2069] that στR2 is equal to half of the variance of the within-reader differences of the expected AUCs; under the assumption of the identical-test RM model, these differences are zero, and hence στR2=0.

6.

Simulation Studies Comparing Conventional and Unconstrained OR Based on RM Null and Identical-Test Models

6.1.

Simulation Study Using Tables 1(a) and 1(b) RM Null and Identical-Test Models

Multi-reader rating data for five readers, each reading the same cases under two tests, were simulated based on the original RM null and corresponding RM identical-test models, and on the constrained unequal variance RM null and corresponding RM identical-test models, given in Tables 1(a) and 1(b), respectively. (Results based on the Table 1(c) model are omitted from Table 1 for brevity and because the Table 1(c) RM null model parameter values, unlike the other two RM null model parameter values, have not been previously suggested in the literature.) For each model, 5000 simulated MRMC samples were generated for case sample sizes of 25/25 and 50/50 each, where “25/25” indicates 25 nondiseased and 25 diseased cases. The empirical AUC was computed for each simulated MRMC sample with OR error covariances estimated using the unbiased error-covariance method. The null hypothesis of equal test AUCs, versus the two-sided alternative hypothesis, was tested at the 0.05 significance level using both the conventional and unconstrained OR test statistics, given by Eqs. (31) and (35). Results of the simulations, presented in Table 2, include the empirical type I error rate; the proportion of samples having negative variance estimates, as defined by Eq. (32) or Eq. (36); and the proportion of negative values for Cov^2Cov^3.

Table 2

OR analysis results using the unbiased covariance method, for 5 readers reading the same cases under both tests, with 5000 MRMC samples simulated from Table 1 RM null models (a) and (b) and their corresponding identical-test models for each case size combination.

No. of casesRM modelModel typeOR F statisticRates
var^(θ^1−θ^2)<0Type I errorCov^2−Cov^3<0
25/25(a) OriginalNullConventional00.0500
Unconstrained00.0500
Identical-testConventional00.0430.52
Unconstrained0.039N/A0.52
(b) Const. unequal var.NullConventional00.0480
Unconstrained00.0480
Identical-testConventional00.0330.53
Unconstrained0.049N/A0.53
50/50(a) OriginalNullConventional00.0530
Unconstrained00.0530
Identical-testConventional00.0460.52
Unconstrained0.026N/A0.52
(b) Const. unequal var.NullConventional00.0510
Unconstrained00.0510
Identical-testConventional00.0460.53
Unconstrained0.031N/A0.53
Notes: see Table 1 for definitions of “RM model” and “Model type”; OR F statistic = “Conventional” if the OR constrained F statistic Eq. (32) is used and = “Unconstrained” if the unconstrained F statistic Eq. (33) is used; “25/25" indicates 25 nondiseased and 25 diseased cases; “type I error” is the proportion of samples where the null hypothesis of no test effect is rejected; “N/A” stands for “not applicable” and indicates that the empirical type I error rate could not be computed because the variance of the test statistic, computed using Eq. (36), was negative for some samples. Note that although Cov^2−Cov^3 is not constrained in this table, in the computation of the conventional OR variance it is constrained to be nonnegative.

If the variance estimate was negative, the type I error rate could not be computed because the test statistic Eq. (31) or Eq. (35), which is required for deciding whether to accept or reject the null hypothesis of equal test AUCs, was not defined for all the simulated samples; this situation is indicated in Table 2 by “NA” (not applicable).

6.1.1.

RM null model results

We see from Table 2 that when model type = “null,” the empirical type I error rates are the same for the conventional and constrained OR methods, with the type I rates varying between 0.048 and 0.051. That these rates are the same can be explained by Eq. (37) and by the nonnegativity of Cov^2Cov^3 for all the samples (as indicated in the last column).

6.1.2.

Identical-test model results

In contrast to the null model results reported above, we see in Table 2 that the identical-test model type I error rates depend on whether the conventional or unconstrained OR method was used.

Conventional OR results

For the identical-test models the conventional OR type I error rates vary between 0.033 and 0.046 with no negative variance estimates.

Unconstrained OR results

For the identical-test models, all of the unconstrained OR type I error rates were undefined (as indicated by “NA” in Table 2) because of negative variance estimates. For the original RM identical-test and constrained unequal variance RM identical-test models, respective negative unconstrained OR variance rates were 0.039 and 0.049 for 25/25 samples and 0.026 and 0.031 for 50/50 samples. Note that these negative variance estimate rates apply also to the Gallas F statistic, since it is the same as the unconstrained OR F statistic, as discussed in Sec. 5.4.

6.2.

Simulation Study Using Original Roe and Metz Null Models and Corresponding Identical-Test Null Models

In the original Roe and Metz5 paper, four different variance component “structures,” denoted as “HL,” “LL,” “HH,” and “LH,” are given for μ+=0.75, 1.5, and 2.50, resulting in twelve different parameter combinations. In the upper half of Table 3 are the four variance component structures for μ+=1.5. In the lower half of Table 3 are the corresponding RM identical-test model structures that result from application of Eqs. (16)–(22) to the structures in the upper half of Table 3. (See Table 8 in Appendix B for a similar table that includes all twelve parameter combinations and corresponding RM identical-test parameter specifications.)

Table 3

Subset of original 12 sets of Roe and Metz5 (RM) null simulation model parameter values and corresponding RM identical-test model parameter values. The Table 4 simulation results are based on these parameter values. The complete set of 12 sets of parameter values is included in Table 8.

Structureμ+AzσC2στC2σRC2σε2σR2στR2
(a) Original RM model parameter values
HL1.50.8560.30.30.20.20.00550.0055
LL1.50.8560.10.10.20.60.00550.0055
HH1.50.8560.30.30.20.20.03000.0300
LH1.50.8560.10.10.20.60.03000.0300
(b) Corresponding RM identical-test model parameter values
HL1.50.8560.60.00.20.20.01100.0000
LL1.50.8560.20.00.20.60.01100.0000
HH1.50.8560.60.00.20.20.06000.0000
LH1.50.8560.20.00.20.60.06000.0000
Notes: μ+ is the median and mean separation of the normal and abnormal DV distributions across the reader population, and Az=Φ(μ+/2) is the median reader-specific true area under the ROC curve.

For each parameter combination in Table 3, 2000 MRMC samples were simulated for each of 6 combinations of 3 reader levels (3, 5, and 10 readers) and 2 sample size levels (25/25 and 50/50). Each set of 2000 samples was analyzed using the conventional and unconstrained OR methods, using both unbiased and DeLong error-covariance estimates.

For each error covariance method and model type (null or identical-test), Table 4 presents the analysis results for each reader and sample size combination, averaged across the four structures in Table 3. For example, the type I error of 0.061 in the first row of Table 4 is the average of four empirical type I error rates, corresponding to the four original RM null model structures, resulting from performing a conventional OR analysis using the DeLong covariance method on each of 2000 simulated MRMC samples for each structure, with each simulated MRMC sample containing rating data from 3 readers reading 25 nondiseased and 25 diseased cases. (For brevity, averages of the four empirical type I error rates are reported rather than the rates for each separate structure, since the averages are sufficient to reveal the problem of negative variances with the unconstrained OR method.)

Table 4

Conventional and unconstrained OR analysis results using the DeLong and unbiased error covariance methods, for MRMC samples simulated from the original RM null model and the corresponding RM identical-test model parameter values given in Table 3. Readers read the same cases under both tests. For each combination of structure (HL, LL, HH, or LH), error covariance method (Delong or unbiased), readers (3, 5, or 10) and case sample sizes (25/25 or 50/50), 2000 MRMC samples were simulated and analyzed using both conventional and unconstrained OR with empirical AUC being the outcome. This table presents those analysis results averaged across the four structures. For example, the conventional OR type I error of 0.061 in the first line is the average of the four conventional OR empirical type I error statistics computed for each of the four parameter structures in Table 3, based on 2000 simulated MRMC samples for each structure for 3 readers each reading 25 nondiseased and 25 diseased cases.

RowCovarianceRM modelCasesReadersNConventional ORUnconstrained ORConventional and unconstrained OR
var < 0Type Ivar < 0Type ICov2Cov3στR;OR2AUC1AUC2
1DeLongNull25/25340.0000%0.0610.8625%NA0.00090.00040.00080.8520.852
2540.0000%0.0490.0250%NA0.00100.00040.00090.8510.851
31040.0000%0.0510.0000%0.0510.00100.00040.00090.8510.851
450/50340.0000%0.0640.1875%NA0.00050.00020.00100.8510.852
5540.0000%0.0490.0250%NA0.00050.00020.00090.8510.852
61040.0000%0.0450.0000%0.0450.00050.00020.00100.8510.851
7Identical-test25/25340.0000%0.05610.3630%NA0.00090.0009-0.00010.8520.852
8540.0000%0.0434.6500%NA0.00100.0010-0.00010.8510.851
91040.0000%0.0411.0630%NA0.00100.0010-0.00010.8510.851
1050/50340.0000%0.0537.1630%NA0.00050.00050.00000.8520.852
11540.0000%0.0432.3000%NA0.00050.00050.00000.8510.851
121040.0000%0.0420.3000%NA0.00050.00050.00000.8520.852
13UnbiasedNull25/25340.0000%0.0610.8375%NA0.00090.00040.00090.8520.852
14540.0000%0.0490.0250%NA0.00090.00040.00100.8510.851
151040.0000%0.0520.0000%0.0520.00090.00040.00100.8510.851
1650/50340.0000%0.0640.1875%NA0.00050.00020.00100.8510.852
17540.0000%0.0490.0250%NA0.00050.00020.00090.8510.852
181040.0000%0.0460.0000%0.0460.00050.00020.00100.8510.851
19Identical-test25/25340.0000%0.05610.3380%NA0.00090.00090.00000.8520.852
20540.0000%0.0444.5750%NA0.00100.00100.00000.8510.851
211040.0000%0.0411.0380%NA0.00100.00100.00000.8510.851
2250/50340.0000%0.0537.1630%NA0.00050.00050.00000.8520.852
23540.0000%0.0432.3000%NA0.00050.00050.00000.8510.851
241040.0000%0.0420.2880%NA0.00050.00050.00000.8520.852
Notes: “covariance” = error covariance method used with the OR method; “N” = number of parameter strutures in Table 3 that results are averaged across; “25/25" indicates 25 nondiseased and 25 diseased cases; “type I ” is the empirical type I error rate; “var < 0" is the proportion of samples where the variance estimate for the difference of the reader-averaged test AUCs is negative, and hence the OR F statistic is not defined; Cov2, Cov3, and στR:OR2 are OR parameter estimates (which are the same for the conventional and unconstrained OR methods); AUC1 and AUC2 are the empirical AUCs for tests 1 and 2, respectively.

Results of the simulations, presented in Table 4, include the empirical type I error rate and the negative-variance rate. The negative-variance rate is the proportion of samples having negative variance estimates, as defined by Eq. (32) or Eq. (36), for both the conventional and unconstrained OR methods. As in Table 2, a value of “NA” for the type I rate indicates at least one sample had a negative variance estimate, and hence an undefined type I error rate. Table 4 also includes the averages of the empirical AUC estimates for tests 1 and 2 and the averages of the OR estimates for Cov2, Cov3, and σTR;OR2; these last three estimates depend on the OR error covariance method but not on the use of conventional or unconstrained OR.

From Table 4, I make the following remarks.

  • 1. OR results are similar for DeLong and unbiased covariance methods. For the original RM null model, a comparison of rows 1 to 6 with 13 to 18 shows only slight differences between the DeLong and corresponding unbiased covariance method results. Similarly, there are only slight differences between the DeLong and unbiased covariance method results for the RM identical-test model, as can be seen from comparing rows 7 to 12 with 19 to 24.

  • 2. Conventional OR has acceptable type I rates and no negative variances. For the conventional OR method, type I error averages (across the four structures) are between 0.041 to 0.064, with the overall average type I rate average (not shown) equal to 0.050 for both the unbiased and DeLong methods. As shown in the “var < 0” column, none of the conventional OR variance estimates, computed using Eq. (32), were negative as expected, since it is impossible for Eq. (32) to be negative.

  • 3. Unconstrained OR type I errors are undefined for most parameter combinations because of negative variances. For the unconstrained OR method using the DeLong covariance estimation method, 10 of the 12 sets of 8000 samples (2000 samples × 4 structures) resulted in negative variance estimates (see rows 1-12, “Unconstrained OR” columns). As a result, type I error was not defined for any of the 6 identical-test parameter combinations, and only for 2 of the 6 null model combinations (for 10 readers and 50/50 cases, rows 3 and 6). For the identical-test model, the negative-variance (“var < 0”) rates for the unconstrained OR method range from 0.3% to 10.4%, with rates being higher for smaller numbers of cases and readers. For the null model the rates were much smaller, with the highest negative-variance rate equal to 0.9%; again, rates were higher for smaller numbers of cases and readers. The above comments also apply to the results for the OR method using the unbiased covariance estimation method in rows 13 to 24.

  • 4. OR parameter relationships for an identical-test model are validated. For the identical-test models, the Cov2 and Cov3 estimates are approximately equal and the OR test-by-reader variance component (σTR:OR2) estimates are approximately zero, regardless of which covariance method is used. Also, the AUC estimates are approximately equal for each test. These empirical results validate the OR parameter relationships given by Eqs. (38)–(40) in Sec. 5.5 for identical-test models.

7.

Understanding How a Negative Variance Occurs

We can rewrite Eq. (36) in the form

Eq. (41)

var^OR;unconstrained(θ^1θ^2)=A+B
where

Eq. (42)

A=2NRMS(T*R),B=2(Cov^2Cov^3).

The A term will never be negative because MS(T*R) cannot be negative. Thus var^OR;unconstrained(θ^1θ^2) can be negative only if Cov^2Cov^3 is sufficiently negative to result in B<A. For the unconstrained unequal-variance RM identical-test model, Cov^2 and Cov^3 have the same distributions; thus E(Cov^2Cov^3)=0 and Cov^2Cov^3 has a symmetric distribution about 0. It follows that B will be negative with probability 0.5, which is in agreement with the results in Table 2 where the negative Cov^2Cov^3 rates are 50%.

It has been shown by Hillis,17 under the assumption of the unconstrained unequal-variance RM model, that

Eq. (43)

E(A)=2NR[(σε:OR2Cov1)+στR:OR2(Cov2Cov3)]=2σε:OR2NR[(1r1)+στR:OR2/σε:OR2(r2r3)].

Here and elsewhere in this section I often express Covi=riσε:OR2, i=1,2,3 because it has been shown9 that these correlations remain approximately constant for a given RM model across different reader sample sizes and case sample sizes, making them easy to interpret.

To simplify the discussion, I now assume that the estimates Cov^2Cov^3 are unbiased, which is the case when the unbiased error-covariance method is used with OR. Making this assumption, it follows from Eqs. (41)–(43) that

Eq. (44)

E[var^OR;unconstrained(θ^1θ^2)]=2σε:OR2NR[(1r1)+στR:OR2/σε:OR2+(NR1)(r2r3)].

Although Eq. (44) assumes unbiased Cov2 and Cov3 estimates, typically we expect the right side of Eq. (44) to approximate the left side when a reasonable alternative error covariance estimation method is used, such as the jackknife or DeLong method.

From Eq. (44) it follows that E[var^OR;unconstrained(θ^1θ^2)] increases as (r2r3) and στR:OR2 increase, assuming all other parameters in the model remain the same. Thus, recalling that for an RM identical-test model (r2r3)=0 and στR:OR2=0, it seems likely that the probability of a negative variance will decrease as (r2r3) or στR:OR2 increase. On the other hand, because Eq. (44) does not depend on the difference of the AUCs, as shown by the omission of τ1:OR and τ2:OR, there is no indication that the probability of a negative variance will decrease or increase as the magnitude of the AUC difference increases.

8.

Simulation Studies for Examining Effects of AUC1AUC2,r2r1 and στR;OR2 on Negative Variance Rates

8.1.

Purpose

The simulations in Sec. 6 established the usefulness of the identical-test RM model for detecting the negative variance problem inherent in using the unconstrained OR procedure. A natural follow-up question to ask is, “to what extent does the unconstrained OR procedure have this problem when the conditions of the identical-test RM model are not exactly satisfied?” The purpose of this section is to empirically address this question by simulating data from RM simulation models that are not identical-test RM models.

As discussed in Sec. 5.5, data simulated from an identical-test RM model results in AUC estimates such that three conditions are true: (1) the tests have equal expected AUCs; (2) the OR Cov2 and Cov3 parameters are equal, or equivalently, r2r3=0 where r2 and r3 are the OR correlations defined by Eq. (28); and (3) the OR test-by-reader interaction variance component στR2 is zero. These conditions are implied by Eqs. (38)–(40). In this section I simulate data from RM models that have been formulated such that not all of these conditions are true, and thus none of the simulation models are identical-test RM models. The results of these simulations will allow us to answer the question posed in the previous paragraph, as well as to provide support for the conjectures in Sec. 7 regarding the associations between each of the three conditions and the negative variance rate.

8.2.

Simulations

8.2.1.

Overview

Data are simulated that result in OR distributions with parameter values similar to those estimated for two real datasets that are analyzed by the unconstrained OR method with empirical AUC as the outcome and using the unbiased error-covariance method. In each of the 2 examples, 10,000 MRMC samples are simulated from 8 different constrained unequal-variance RM models, with each corresponding empirical AUC distribution corresponding to one of eight possible combinations of 2 different levels for r2r3,στR:OR2 and AUC1AUC2. The two levels are 0.01 and 0.04 for (r2r3), 0.0000 and 0.0002 for στR2, and 0.00 and 0.04 for AUC1AUC2 (note that AUCi=μOR+τi:OR). All of these values are representative of real datasets. The case and reader sample sizes for the simulated MRMC samples are the same as for the original datasets. Although r^2r^3<0 for both of the original datasets, a negative value for r2r3 is not included as one of the study design parameters because the OR model assumes r2r30.

8.2.2.

Example: simulations based on Kundel dataset

Kundel et al.20 compared reader AUCs for hard-copy and soft-copy computed radiograph chest images selected randomly from a medical intensive care unit. Four radiologists blindly read both types of images obtained from the same patients. Six months separated the end of the hard-copy readings and the start of the soft-copy readings. A five-point ordinal scale was used to rate the likelihood of the presence of the condition (which we will consider to be the disease) implied by the reason for requesting the corresponding examination. Ninety-five images, consisting of 29 diseased and 66 nondiseased images, were read under each test condition. The difference of the empirical AUC estimates was 0.0375 (p=0.0916) and r^2r^3 was 0.380.40=0.02, computed from a conventional OR analysis using the unbiased covariance method. The unconstrained variance estimate was not negative.

The OR parameter estimates for the original data, using the unbiased covariance method, are shown in Table 5(a). In Table 5(b) are eight sets of parameter values similar to the original data estimates, corresponding to the eight possible combinations of the levels of r2r3,στR2 and AUC1AUC2. Table 5(c) presents constrained unequal-variance RM model parameter values that result in simulated data that can be described by the OR parameters in Table 5(b); these were computed using the algorithm developed by Hillis et al.21 Because some of these RM models are not null models, μpos in Eq. (1) is replaced by μi,i=1,2; thus μi is the expected difference in the means for the diseased and nondiseased DV distributions for test i.

Table 5

Simulations based on Kundel20 dataset showing effects of r2−r3, στR:OR2 and AUC1−AUC2 on negative variance rates. In this study, 4 readers read the same 29 diseased and 66 nondiseased cases.

(a) OR estimates for original Kundel20 data set computed using the unbiased covariance method with empirical AUC as the outcome. The study included 4 readers and 29 diseased and 66 nondiseased cases.
AUC^1AUC^2σ^R:OR2σ^τR:OR2σ^ε:OR2r^1r^2r^3
0.8040.8410.0007340.000000.002150.5080.3840.404
(b) OR parameter values describing simulated data sets.
AUC1AUC2σR:OR2στR:OR2σε:OR2r1r2r3
0.8000.8400.0007340.000000.002150.5100.3900.380
0.8200.8200.0007340.000000.002150.5100.3900.380
0.8000.8400.0007340.000000.002150.5100.4050.365
0.8200.8200.0007340.000000.002150.5100.4050.365
0.8000.8400.0007340.000200.002150.5100.3900.380
0.8200.8200.0007340.000200.002150.5100.3900.380
0.8000.8400.0007340.000200.002150.5100.4050.365
0.8200.8200.0007340.000200.002150.5100.4050.365
(c) Parameter values for constrained unequal-variance RM models used to simulate MRMC data sets corresponding to OR parameters in (b). Because some of these are not null models, μpos in Eq. (1) has been replaced by μi,i=1,2.
μ1μ2σR2στR2σC2σTC2σRC2σε2b
1.2093541.4289700.01107500.4420320.0067720.13032480.42087050.979343
1.3120151.3120150.01090200.4383470.0105470.12931560.42179070.984081
1.2093541.4289700.01107500.4259990.0385450.14635860.38909820.979343
1.3120151.3120150.01090200.4224410.0421890.14522170.39014910.984081
1.2124811.4326650.0111320.0028790.4428800.0067850.13057450.41976020.977088
1.3155161.3155160.0109600.0029700.4392190.0105680.12957310.42063950.981743
1.2124811.4326650.0111320.0028790.4268160.0386190.14663900.38792700.977088
1.3155161.3155160.0109600.0029700.4232820.0422730.14551090.38893500.981743
(d) Estimates corresponding to OR parameters in (b) resulting from OR analysis of AUC data simulated using the RM models in (c)
AUC^1AUC^2σ^R:OR2σ^τR:OR2σ^ε:OR2r^1r^2r^3var < 0 rater2<r3 rate
0.8000.8400.0007220.000020.002150.5100.3900.3804.2%40.4%
0.8200.8200.0007280.000010.002150.5100.3890.3793.3%40.2%
0.8000.8400.000740.000010.002140.5110.4050.3641.0%13.6%
0.8200.8200.000736-0.000010.002160.5090.4050.3651.0%15.2%
0.7990.8390.0007610.000180.002160.5100.3900.3803.2%40.5%
0.8200.8200.0007430.000210.002150.5100.3910.3813.1%39.7%
0.8000.8400.0007550.000190.002150.5110.4040.3650.8%14.4%
0.8200.8190.0007450.000210.002160.5100.4060.3650.8%13.7%
Note: “(var < 0) rate” = proportion of samples where var^OR;unconstrained(θ^1•−θ^2•)<0.

Table 5(d) presents the estimates of the OR parameter estimates and the negative variance and negative r^2r^3 rates computed from the simulated data, based on the RM models in Table 5(c). The excellent agreement between Tables 5(b) and 5(d) confirms that the RM model parameter values in Table 5(c) were appropriately chosen. In Table 5(d) negative variance rates range between 0.8% and 4.2% and negative r^2r^3 rates range between 14% and 41%.

Figure 1 displays a plot of the negative variance rate for each combination of the true values of r2r3,στR2 and AUC1AUC2. The labels on the x-axis indicate the levels of r2r3 and στR2, with “LL” indicating both at the lowest level, “LH” indicating the low level of r2r3 and the high level of στR2, etc. From Fig. 1, we see that higher negative variance rates are associated with lower levels of both r2r3 and στR:OR2, in agreement with the conjectures in the previous section. The effect of AUC1AUC2 is minimal except when both (r2r3) and στR:OR2 are at their lowest levels, as shown by the first pair of points; for this situation the negative variance rate is higher for the larger magnitude of AUC1AUC2. As noted in the previous section, there was no indication from Eq. (44) as to whether there would be any effect from AUC1AUC2.

Fig. 1

Negative unconstrained OR AUC-difference variance rates, as defined by Eq. (36), for different combinations of OR parameter values. RM model parameter values used for simulating the data are similar to estimates from an OR analysis of the Kundel et al.20 data. There were 10,000 simulated MRMC samples for each set of RM parameter values, with each sample corresponding to 4 readers reading the same 29 diseased and 66 nondiseased images.

JMI_10_S1_S11916_f001.png

8.2.3.

Example: simulations based on Franken dataset

Franken et al.22 compared the diagnostic accuracy of interpreting clinical neonatal radiographs using a picture archiving and communication system workstation versus plain film. The case sample consisted of 100 chest or abdominal radiographs (67 abnormal and 33 normal). The readers were four radiologists with considerable experience in interpreting neonatal examinations. The readers indicated whether each patient had normal or abnormal findings and their degree of confidence in this judgment using a 5-point ordinal scale. The difference of the empirical AUC estimates was 0.0109 (p=0.1188) and r^2r^3 was 0.32.0.34=0.02, computed from a conventional OR analysis using the unbiased covariance method. The unconstrained variance estimate was negative.

Table 6 gives results for this dataset in the same format as Table 5. Similar to Table 5, there is excellent agreement between Tables 6(b) and 6(d) that confirms that the RM model parameter values in Table 6(c) were appropriately chosen. In Table 6(d), negative variance rates range between 0.3% and 2.2% and negative r2r3 rates range between 7% and 36%.

Table 6

Simulations based on Franken et al.22 dataset showing effects of r2−r3, στR:OR2, and AUC1−AUC2 on negative variance rates. In this study, 4 readers read the same 67 diseased and 33 nondiseased cases.

(a) OR estimates for original Franken et al.22 data set computed using the unbiased covariance method with empirical AUC as the outcome.
AUC^1AUC^2σ^R:OR2σ^τR:OR2σ^ε:OR2r^1r^2r^3
0.8480.8370.00004330.000000.001500.5210.3200.339
(b) OR parameter values describing simulated data sets.
AUC1AUC2σR:OR2στR:OR2σε:OR2r1r2r3
0.8600.8200.00004330.000000.001520.5200.3350.325
0.8400.8400.00004330.000000.001520.5100.3350.325
0.8600.8200.00004330.000000.001520.5100.3500.310
0.8400.8400.00004330.000000.001520.5100.3500.310
0.8600.8200.00004330.000200.001520.5100.3350.325
0.8400.8400.00004330.000200.001520.5100.3350.325
0.8600.8200.00004330.000200.001520.5100.3500.310
0.8400.8400.00004330.000200.001520.5100.3500.310
(c) Parameter values for constrained unequal-variance RM models used to simulate data corresponding to OR parameters in (b).
μ1μ2σR2στR2σC2σTC2σRC2σε2b
2.0195681.7112000.00129500.3958810.0063920.2044010.39332600.633453
1.8839841.8839840.00131200.3927500.0109420.1926270.40368110.621797
2.0195681.7112000.00129500.3792110.0393400.2110970.37035230.633453
1.8839841.8839840.00131200.3762220.0437700.2091550.37085350.621797
2.0105141.7035280.0012830.0058200.3963460.0064030.1948480.40240220.638974
1.8741901.8741900.0012980.0059850.3931430.0109610.1930390.40285670.627792
2.0105141.7035280.0012830.0058200.3796460.0394120.2115480.36939410.638974
1.8741901.8741900.0012980.0059850.3765880.0438460.2095950.36997210.627792
(d) Estimates corresponding to OR parameters in (b) resulting from OR analysis of AUC data simulated using the RM models in (d)
AUC^1AUC^2σ^R:OR2σ^τR:OR2σ^ε:OR2r^1r^2r^3var < 0 rater2<r3 rate
0.8600.8200.0000416880.000000.001520.5200.3340.3252.24%36%
0.8400.8400.0000430180.000010.001520.5110.3350.3252.16%36%
0.8600.8200.0000466780.000000.001520.5090.3490.3090.31%8%
0.8410.8400.0000373590.000010.001520.5090.3490.3090.29%8%
0.8600.8190.0000499930.000190.001520.5100.3360.3261.55%36%
0.8400.8400.0000491500.000200.001520.5100.3360.3251.48%36%
0.8600.8200.0000243170.000200.001520.5100.3500.3100.25%8%
0.8410.8400.0000522780.000200.001520.5090.3500.3100.27%7%
Note: “(var < 0) rate” = proportion of samples where var^OR;unconstrained(θ^1•−θ^2•)<0.

Figure 2 displays a plot of the negative variance rate for each combination of the true values of r2r3,στR2 and AUC1AUC2. Similar to Fig. 1, higher negative variance rates are associated with lower levels of both r2r3 and στR:OR2, in agreement with the conjectures in the previous section. The effect of AUC1AUC2 is minimal, with the negative variance rate slightly higher for the larger magnitude of AUC1AUC2 when r2r3 is at its low level, as shown by the first two pairs of points.

Fig. 2

Negative unconstrained OR AUC-difference variance rates, as defined by Eq. (36), for different combinations of OR parameter values. RM model parameter values used for simulating the data are similar to estimates from an OR analysis of the Franken et al.22 data. There were 10,000 simulated MRMC samples for each set of RM parameter values, with each sample corresponding to 4 readers reading the same 67 abnormal and 33 normal abdominal radiographs.

JMI_10_S1_S11916_f002.png

8.2.4.

Summary of simulation results

From the results of the simulations in Secs. 8.2.2 and 8.2.3, we saw that the negative variance problem for the unconstrained OR method is present even when conditions Eqs. (38)–(40), which are implied by the identical-test RM model, do not hold. Moreover, the simulations supported the conjectures given in Sec. 7. Specifically, we saw that while higher negative variance rates were associated with lower levels of both r2r3 and στR:OR2, there was little association with the magnitude of AUC1AUC2. Surprisingly, the largest effect of the magnitude of the AUC difference, shown by the first pair of points in Figs. 1 and 2, shows the negative variance rate to be higher for the larger AUC difference magnitude of 0.04.

In summary, these results suggest that negative variance estimates can be a problem for the unconstrained OR procedure when r2r3 and στR:OR2 are small, regardless of the difference in the AUCs, with the negative variance rate diminishing with increasing numbers of readers and cases. However, we caution that these findings are based on only two simulation studies and will need to be confirmed by additional studies.

9.

Summary and Discussion

Sometimes it is of interest to compare two tests that may be similar in most respects, such as when noninferiority or equivalence testing is appropriate. For this situation it is important to be able to assess how well a particular MRMC analysis method performs, and hence there is a need for simulation models that emulate this situation. This need was the motivation for developing the RM identical-test model, where the two tests are exactly the same.

The derivation of the RM identical-test model from a particular RM null model was straightforward: simply change the test subscript for all of the RM null model effects to 1, which results in none of the test effects depending on test. This derivation was illustrated for the unconstrained unequal-variance RM model,9 which includes the constrained equal-variance6 and original5 RM null models as special cases. It was shown that the null and corresponding identical-test model rating distributions are the same and the within-test covariances are the same, but the between-test covariances for the null model can be less than those for the identical-test model. In terms of the reader empirical AUCs computed from ratings generated from the identical-test model, it was shown that the expected test empirical AUC estimates are equal, Cov2 and Cov3 are equal, and στR:OR2=0.

The RM identical-test simulations showed how the performance of the unconstrained OR method is unacceptable because of a nontrivial percentage of negative variance estimates. Because negative estimates can occur, the significance level cannot be estimated unless the action to be taken when a negative estimate occurs has been specified in advance of the analysis and is incorporated into the simulation study. In contrast, the conventional OR method did not have the negative variance problem because its variance estimate can never be negative, and it had an acceptable type I error rate.

The original RM null model5 simulations also revealed that the unconstrained OR variance estimate could be negative, but the rates were much smaller than for the identical-test model.

Of course, in practice we would rarely expect two tests to be identical. But if an analysis method does not perform satisfactorily when two tests are exactly the same, then it seems likely that the performance will also not be acceptable when the tests are “close” to being identical. This situation was illustrated in the simulations in Sec. 8, where RM models were created to result in OR distributions somewhat similar to those for two real datasets. In both of those examples, there were nontrivial rates of negative variance estimates (3.2% and 1.55%) with moderate deviations from an identical-test model with respect to two categories (AUC1AUC2=0.04 and στR:OR2=0.0002) and a slight deviation with respect to the other category (r2r3=0.01). Furthermore, the results of the two examples in Sec. 8 suggest that increasing the AUC difference does not reduce the negative-variance rate; if future research shows this relationship to hold in general, then this result implies that negative variance rates can be nontrivial even when the AUC difference is substantial.

Although there has never been any suggestion in the literature that the unconstrained version of OR should be used instead of the conventional version, the findings of this paper are relevant because of the relationship between the unconstrained OR method and the often-used Gallas analysis method. As discussed in Sec. 5, recently17 it has been shown that the Gallas test statistic for comparing two tests is equivalent to the unconstrained OR test statistic when empirical AUC is the outcome and the unbiased error-covariance method is used. Thus we recommend that the Gallas method not be used. For the Gallas method to be a statistically acceptable method, there would have to be a defined follow-up analysis procedure to use if the Gallas variance is negative, as well as simulation studies validating the performance of this two-step approach.

In my opinion, it is much easier to interpret an RM model in terms of the OR parameter values describing the resulting empirical AUC distribution based on data simulated from the model, as opposed to interpreting the RM parameter values in terms of the distribution of the confidence-of-disease ratings. It was shown in Sec. 5.5 that an unconstrained unequal-variance RM identical-test model will have an empirical AUC distribution with Cov2Cov3=0 (or equivalently, r2r3=0), no reader-by-test interaction, and equal expected test AUC values. These three OR relationships are intuitively obvious for identical tests and they can be thought of as the criteria by which tests can be identical in terms of the empirical AUC distributions.

In contrast, it has been shown [Ref. 9, Tables 4 and 6] that the original5 12 sets of RM model parameter values lead to OR distributions with identical expected test AUC values, but with 0.05<r2r3<0.29, and for 10 of the sets, 0.000287<στR:OR2<0.001629; for the other 2 sets, στR:OR2=0.00004. To obtain some perspective on the size of the interaction variance component, we note that στR:OR2=0.0002 implies that the middle 95% probability range is 0.08 for the true AUC1AUC2 difference for a randomly selected reader, as discussed in Hillis and Schartz;23 for this reason, we consider στR:OR2=0.0002 to be at least moderate test-by-reader interaction. Thus, in terms of the 3 OR identical-test criteria, 10 of the 12 original RM parameter sets describe tests that are similar with respect only to the OR equal-test AUC criterion, with the other 2 sets describing tests also approximately similar with respect to the OR στR:OR2=0 criterion. But none of them describes tests that are approximately similar with respect to all three OR criteria.

In summary, the RM identical-test model is useful because it allows for assessment of an MRMC analysis method for the situation where the two tests are identical and it is easy to derive from a previously formulated RM null model. Ideally, an MRMC analysis method would be assessed with respect to a wide range underlying rating models. Thus the RM identical-test model should typically be used in conjunction with other RM models.

For brevity, results of the simulation studies in this paper have been limited to the minimum needed to accomplish the two purposes of the paper: to show how to formulate an identical-test RM model and to show its usefulness for validating the need for the OR error covariance constraints. For example, a more extensive analysis might include estimating the type I error, not just for the two-sided nonequivalence set of hypotheses, but also for the noninferiority and equivalence sets of hypotheses; reporting results in Tables 4 and 5 for each structure instead of averaging across the four structures; and reporting results for more combinations of RM parameter values.

Finally, for future research, I recommend creating a new set of RM model parameter sets that correspond to real datasets, as was done in Sec. 8. Doing this will allow for a better understanding of what types of studies are emulated by the simulated data. Recently21 an algorithm has been developed that maps OR parameter estimates obtained from real datasets to constrained unequal-variance RM model parameter values; this algorithm can be easily implemented using the R function OR_to_RMH, available in the R package MRMCaov.19 This algorithm was utilized in Sec. 8 to create the RM parameter values corresponding to the two real-datasets.

10.

Appendix A Derivation of the Relationships Between the OR Model and Unconstrained Unequal-Variance RM Identical-Test Model Given in Sec. 5.5

The OR parameters that describe the distribution of the empirical AUC estimates computed from MRMC data simulated from the unconstrained unequal-variance RM model have been expressed as functions of the RM model parameters by Hillis.9 Table 7 presents the relationships for the three OR parameters given in Sec. 5.5.

Table 7

The OR Cov2 and Cov3 parameters corresponding to empirical AUC estimates computed from MRMC data simulated from the unconstrained unequal-variance RM model, expressed as functions of the RM model parameters.

μOR+τi:OR=Φ(δiV)
Cov2=12i=12m=14cmFBVN(δiV,δiV;ρm(σfixed()2+σfixed(+)2)V)
where ρ1=σTC()2+σC()2+σTC(+)2+σC(+)2σfixed()2+σfixed(+)2,  ρ2=σTC()2+σC()2σfixed()2+σfixed(+)2,  ρ3=σTC(+)2+σC(+)2σfixed()2+σfixed(+)2,  ρ4=0
Cov3=m=14cmFBVN(δ1V,δ2V;ρm(σfixed()2+σfixed(+)2)V)
where ρ1=σC()2+σC(+)2σfixed()2+σfixed(+)2,  ρ2=σC()2σfixed()2+σfixed(+)2,  ρ3=σC(+)2σfixed()2+σfixed(+)2,  ρ4=0
σR:OR2=FBVN(δ1V,δ2V;2σR2V)[Φ(δ1V)Φ(δ2V)]
στR:OR2=.5i=12{FBVN(δiV,δiV;2(σR2+στR2)V)[Φ(δiV)]2}σR:OR2
Notes: these results are taken from Table 3 in Hillis.9 FBVN(.,.;ρ) is the standardized bivariate normal distribution function with correlation ρ; δi=μ++τi+; V=σfixed(−)2+σfixed(+)2+2(σR2+στR2), where σfixed(−)2=σC(−)2+στC(−)2+σRC(−)2+σε(−)2 and σfixed(+)2=σC(+)2+στC(+)2+σRC(+)2+σε(+)2; c1=1/(n0n1); c2=(n1−1)/(n0n1); c3=(n0−1)/(n0n1); and c4=(1−n0−n1)/(n0n1).

The unconstrained unequal-variance RM model assumed in Table 7 is the same as the unconstrained unequal-variance RM null model discussed in Sec. 5.5, but with the addition of the test-by-truth interaction effect τit to the mixed linear model Eq. (1). It follows that the expected difference between the nondiseased and diseased decision-variable distributions is δ1=μpos+τ1+ for test 1 and δ2=μpos+τ2+ for test 2.

Relationships Eqs. (38)–(40) in Sec. 5.5 are for the unconstrained unequal-variance RM identical-test model, which is the same as the model assumed in Table 7 with the following constraints imposed

Eq. (45)

στR2=στC(+)2=στC()2=στRC(+)2=στRC()2=0,
and

Eq. (46)

δ1=δ2.

Relationships Eqs. (38)–(40) follow directly from the results in Table 8 when constraints Eqs. (45) and (46) are imposed. Specifically

Cov2=Cov3=m=14cmFBVN(δ1V,δ1V;ρm(σfixed()2+σfixed(+)2)V),
where

Eq. (47)

δ1=μ++τi+,  ρ1=σC()2+σC(+)2σfixed()2+σfixed(+)2,  ρ2=σC()2σfixed()2+σfixed(+)2,  ρ3=σC(+)2σfixed()2+σfixed(+)2,  ρ4=0,
and
μOR+τ1;OR=μOR+τ2;OR=Φ(δ1V),
where
V=σfixed()2+σfixed(+)2+2(σR2+στR2)=σC()2+σRC()2+σε()2+σC(+)2+σRC(+)2+σε(+)2+2(σR2+στR2).

Table 8

12 original sets of Roe and Metz5 (RM) null simulation model parameter values and corresponding RM identical-test model parameter values. Table 3 is a subset of this table.

LineStructureμ+AzσC2σTC2σRC2σε2σR2στR2
(a) Original RM null model parameter values
1HL0.750.7020.30.30.20.20.00550.0055
2LL0.750.7020.10.10.20.60.00550.0055
3HH0.750.7020.30.30.20.20.01100.0110
4LH0.750.7020.10.10.20.60.01100.0110
5HL1.50.8560.30.30.20.20.00550.0055
6LL1.50.8560.10.10.20.60.00550.0055
7HH1.50.8560.30.30.20.20.03000.0300
8LH1.50.8560.10.10.20.60.03000.0300
9HL2.50.9610.30.30.20.20.00550.0055
10LL2.50.9610.10.10.20.60.00550.0055
11HH2.50.9610.30.30.20.20.05600.0560
12LH2.50.9610.10.10.20.60.05600.0560
(b) Corresponding RM identical-test model parameter values
1HL0.750.7020.600.20.20.01100.0000
2LL0.750.7020.200.20.60.01100.0000
3HH0.750.7020.600.20.20.02200.0000
4LH0.750.7020.200.20.60.02200.0000
5HL1.50.8560.600.20.20.01100.0000
6LL1.50.8560.200.20.60.01100.0000
7HH1.50.8560.600.20.20.06000.0000
8LH1.50.8560.200.20.60.06000.0000
9HL2.50.9610.600.20.20.01100.0000
10LL2.50.9610.200.20.60.01100.0000
11HH2.50.9610.600.20.20.11200.0000
12LH2.50.9610.200.20.60.11200.0000
Notes: μ+ is the median and mean separation of the normal and abnormal DV distributions across the reader population, and Az=Φ(μ+/2) is the median reader-specific true area under the ROC curve.

In addition, we can write

σR:OR2=FBVN(δ1V,δ2V;2σR2V)[Φ(δ1V)Φ(δ2V)]=FBVN(δ1V,δ1V;2σR2V)[Φ(δ1V)2],
and the first term on the right in the Table 7 expression for στR:OR2 can be expressed in the form
5i=12{FBVN(δiV,δiV;2(σR2+στR2)V)[Φ(δiV)]2}=5i=12{FBVN(δ1V,δ1V;2(σR2+στR2)V)[Φ(δ1V)]2}={FBVN(δ1V,δ1V;2(σR2+στR2)V)[Φ(δ1V)]2}=σR:OR2(using the previous result).

Replacing the first term on the right in the Table 7 expression for στR;OR2 by σR:OR2 yields

στR:OR2=σR:OR2σR:OR2=0.

11.

Appendix B Original Roe and Metz Null Simulation Model Parameter Values and Corresponding RM Identical-Test Model Parameter Values

For completeness, Table 8 lists the 12 sets of the orginal Roe and Metz5 (RM) null simulation model parameter values and the corresponding RM identical-test model parameter values. Table 3 is a subset of this table.

Disclosures

No conflicts of interest, financial, or otherwise are declared by the author.

Code, Data, and Materials Availability

The two datasets (VanDyke and Kundel) analyzed in this article are publicly available as part of the R package MRMCaov.19 Code for performing the conventional OR analysis using either the unbiased or DeLong or jackknife error covariance methods is also included in the MRMCaov package. Although code for performing unconstrained OR analysis is not included in MRMCaov, one can perform the unconstrained OR analysis based on the MRMCaov conventional OR analysis output.

Acknowledgments

This research was supported by the National Institute of Biomedical Imaging and Bioengineering (NIBIB) of the National Institutes of Health (Grant No. R01EB025174). Some of the information presented in this paper was presented in a prior SPIE proceedings paper by the author.24 I thank two reviewers and the editor for their very helpful comments and suggestions. This content is solely the responsibility of the author and does not necessarily represent the official views of the National Institutes of Health.

References

1. 

N. A. Obuchowski and H. E. Rockette, “Hypothesis testing of diagnostic accuracy for multiple readers and multiple tests: an ANOVA approach with dependent observations,” Commun. Stat. - Simul. Comput., 24 (2), 285 –308 https://doi.org/10.1080/03610919508813243 CSSCDB 0361-0918 (1995). Google Scholar

2. 

S. L. Hillis, “A comparison of denominator degrees of freedom methods for multiple observer ROC analysis,” Stat. Med., 26 (3), 596 –619 https://doi.org/10.1002/sim.2532 SMEDDA 1097-0258 (2007). Google Scholar

3. 

B. D. Gallas, “One-shot estimate of MRMC variance: AUC,” Acad. Radiol., 13 (3), 353 –362 https://doi.org/10.1016/j.acra.2005.11.030 (2006). Google Scholar

4. 

B. D. Gallas et al., “A framework for random-effects ROC analysis: biases with the bootstrap and other variance estimators,” Commun. Stat. - Theory Methods, 38 (15), 2586 –2603 https://doi.org/10.1080/03610920802610084 CSTMDC 0361-0926 (2009). Google Scholar

5. 

C. A. Roe and C. E. Metz, “Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic data: validation with computer simulation,” Acad. Radiol., 4 (4), 298 –303 https://doi.org/10.1016/S1076-6332(97)80032-3 (1997). Google Scholar

6. 

S. L. Hillis, “Simulation of unequal-variance binormal multireader ROC decision data: an extension of the Roe and Metz simulation model,” Acad. Radiol., 19 (12), 1518 –1528 https://doi.org/10.1016/j.acra.2012.09.011 (2012). Google Scholar

7. 

C. K. Abbey, F. W. Samuelson and B. D. Gallas, “Statistical power considerations for a utility endpoint in observer performance studies,” Acad. Radiol., 20 (7), 798 –806 https://doi.org/10.1016/j.acra.2013.02.008 (2013). Google Scholar

8. 

B. D. Gallas and S. L. Hillis, “Generalized Roe and Metz receiver operating characteristic model: analytic link between simulated decision scores and empirical AUC variances and covariances,” J. Med. Imaging, 1 (3), 031006 https://doi.org/10.1117/1.JMI.1.3.031006 JMEIET 0920-5497 (2014). Google Scholar

9. 

S. L. Hillis, “Relationship between Roe and Metz simulation model for multireader diagnostic data and Obuchowski-Rockette model parameters,” Stat. Med., 37 (13), 2067 –2093 https://doi.org/10.1002/sim.7616 SMEDDA 1097-0258 (2018). Google Scholar

10. 

S. L. Hillis and K. S. Berbaum, “Using the mean-to-sigma ratio as a measure of the improperness of binormal ROC curves,” Acad. Radiol., 18 (2), 143 –154 https://doi.org/10.1016/j.acra.2010.09.002 (2011). Google Scholar

11. 

S. L. Hillis, “A marginal-mean ANOVA approach for analyzing multireader multicase radiological imaging data,” Stat. Med., 33 (2), 330 –360 https://doi.org/10.1002/sim.5926 SMEDDA 1097-0258 (2014). Google Scholar

12. 

M. Quenoille, “Approximate tests of correlation in time series,” J. R. Stat. Soc. Ser. B, 11 68 –84 https://doi.org/10.1111/j.2517-6161.1949.tb00023.x (1949). Google Scholar

13. 

J. Shao and T. Dongshen, The Jackknife and Bootstrap, Springer-Verlag, New York (1995). Google Scholar

14. 

B. Efron, The Jackknife, the Bootstrap and Other Resampling Plans, SIAM( (1982). Google Scholar

15. 

B. Efron and R. Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, New York (1993). Google Scholar

16. 

E. R. DeLong, D. M. DeLong and D. L. Clarke-Pearson, “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach,” Biometrics, 44 (3), 837 –845 https://doi.org/10.2307/2531595 BIOMB6 0006-341X (1988). Google Scholar

17. 

S. L. Hillis, “Relationship between Obuchowski–Rockette–Hillis and Gallas methods for analyzing multi-reader diagnostic imaging data with empirical AUC as the reader performance measure,” Biostat. Epidemiol., 1 –38 https://doi.org/10.1080/24709360.2022.2062115 (2022). Google Scholar

18. 

D. Bamber, “Area above ordinal dominance graph and area below receiver operating characteristic graph,” J. Math. Psychol., 12 (4), 387 –415 https://doi.org/10.1016/0022-2496(75)90001-2 JMTPAJ 0022-2496 (1975). Google Scholar

19. 

B. J. Smith, S. L. Hillis and L. L. Pesce, “MRMCaov: multi-reader multi-case analysis of variance,” https://cran.r-project.org/package=MRMCaov (2023). Google Scholar

20. 

H. Kundel et al., “Accuracy of bedside chest hard-copy screen-film versus hard-and soft-copy computed radiographs in a medical intensive care unit: receiver operating characteristic analysis,” Radiology, 205 (3), 859 –863 https://doi.org/10.1148/radiology.205.3.9393548 RADLAX 0033-8419 (1997). Google Scholar

21. 

S. L. Hillis, B. J. Smith and W. Chen, “Determining Roe and Metz model parameters for simulating multireader multicase confidence-of-disease rating data based on real-data or conjectured Obuchowski–Rockette parameter estimates,” J. Med. Imaging, 9 (4), 045501 https://doi.org/10.1117/1.JMI.9.4.045501 JMEIET 0920-5497 (2022). Google Scholar

22. 

J. Franken et al., “Evaluation of a digital workstation for interpreting neonatal examinations: a receiver operating characteristic study,” Invest. Radiol., 27 (9), 732 –737 https://doi.org/10.1097/00004424-199209000-00016 (1992). Google Scholar

23. 

S. L. Hillis and K. M. Schartz, “Multireader sample size program for diagnostic studies: demonstration and methodology,” J. Med. Imaging, 5 (4), 045503 https://doi.org/10.1117/1.JMI.5.4.045503 JMEIET 0920-5497 (2018). Google Scholar

24. 

S. L. Hillis, “Identical-test Roe and Metz simulation model for validating multi-reader methods of analysis for comparing different radiologic imaging modalities,” Proc. SPIE, 12035 120350E https://doi.org/10.1117/12.2612691 PSISDG 0277-786X (2022). Google Scholar

Biography

Stephen L. Hillis received his PhD in statistics in 1987 and an MFA in music 1978, both from the University of Iowa. Currently, he is working as a research professor in the Departments of Radiology and Biostatistics at the University of Iowa. He is the author of more than 100 peer-reviewed journal articles and four book chapters. Since 1998, his research has focused on methodology for multireader diagnostic radiologic imaging studies.

CC BY: © The Author. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Stephen L. Hillis "Roe and Metz identical-test simulation model for validating multi-reader methods of analysis for comparing different radiologic imaging modalities," Journal of Medical Imaging 10(S1), S11916 (5 July 2023). https://doi.org/10.1117/1.JMI.10.S1.S11916
Received: 3 January 2023; Accepted: 13 June 2023; Published: 5 July 2023
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Error analysis

Covariance

Statistical analysis

Statistical modeling

Computer simulations

Sampling rates

Back to Top