KEYWORDS: Artificial intelligence, Medical imaging, Clinical practice, Pathology, Lung, Imaging devices, Cardiovascular magnetic resonance imaging, Cancer detection, Breast cancer, 3D image enhancement
Background: The policy of the NHS Breast Cancer Screening Programme is for each woman’s mammograms to be examined by two separate readers, working independently. In practice, sometimes the second reader (reader 2) can see the decision of the first reader (reader 1). The National Breast Screening Service (NBSS) computer software automatically records whether the second reader can see the decision of the first reader or whether they are ‘blinded’. This study aimed to determine the effect of blinding the second reader on the recall rate and cancer detection rate of reader 2. Methods: Data were from eight screening centers based in the Midlands area in England participating in the 'Changing Case Order to Optimize Patterns of Performance in Screening (CO-OPS)' clinical trial. A three-level Markov Chain Monte Carlo multilevel model was fitted to determine the effect of blinding reader 2 on recall rate and cancer detection. Results: 207,595 women were included in the analysis, of whom 1,796 had cancer detected. Reader 2 was blinded to reader 1’s decisions for 54.5% (113,029/207,595) cases. If reader 2 is blinded, there is a high probability that they are more likely to recall than if they were not blinded for a prevalent case but less likely to recall an incident case. The interaction effects on reader 2’s cancer detection rate were not significant. Conclusion: If the second reader is not blinded to the decision of the first reader, they appear to be influenced by the first reader’s decision suggesting that reading is not independent.
Background: The vigilance decrement and prevalence effect both describe changes to speed and accuracy with time on task. Whilst there is much laboratory based research on these effects, little is known about whether they occur in real world mammography practice. Methods: The Changing Case Order to Optimise Patterns of Performance in Screening (CO-OPS) trial randomised 37,724 batches containing 1.2 million women attending breast screening to intervention or control (222,208 from the Midlands of England). In the control arm the batch was examined in the same order by both readers, in the intervention arm it was examined in a different order by both readers. Time taken, recall decision by both readers, and cancers detected were recorded for each case, and used to examine patterns of performance with time on task. Results: 49,575 women were recalled and 10,484 had cancer detected. Median time taken to examine each case was 35 seconds (out of cases where time taken was 10 minutes or less). The intervention did not affect overall cancer detection rates or recall rates. A more detailed analysis of the Midlands data indicates cancer detection rate did not change when reading up to 60 cases in a batch, but recall rate reduced. Time taken per case reduced with time on task, from a median 41 seconds when examining the second case in the batch to 28.5 seconds examining the 60th case. Conclusion: Reader behavior and performance systematically changes with time on task in breast screening.
KEYWORDS: Cancer, Mammography, Breast cancer, Breast, Radiology, Image analysis, Medical imaging, Image compression, Current controlled current source, Visualization
Background: The interpretation of screening mammograms is influenced by factors such as reader experience and their annual interpretative volume. There is some evidence that time of day can also have an effect, with better diagnostic accuracy for readings conducted early in the day. This is not a consistent finding, however. The aim of our study is to provide further evidence on whether there is an effect of time of day on recall- and breast cancer detection rates. Method: We analysed breast screening data from 222,577 women from the Midlands of England. Data were split into three eight hour periods: 0900-1700, 1700-0100, 0100-0900. Differences in recall- and cancer detection rates were analysed using multilevel logistic regression models. Results: Recall rates were lowest for mammograms read between the 1700-0100 time period. Cancer detection rates were lowest during the 0100-0900 time period. Conclusions: Our findings suggest that there are fluctuations in recall- and cancer detection rates over the course of the day.
KEYWORDS: Cancer, Breast cancer, Breast, Data modeling, Medical research, Statistical analysis, Statistical modeling, Modeling, Systems modeling, Tumor growth modeling
It is well known that socio-economic status is a strong predictor of screening attendance, with women of higher socioeconomic status more likely to attend breast cancer screening. We investigated whether socio-economic status was related to the detection of cancer at breast screening centres. In two separate projects we combined UK data from the population census, the screening information systems, and the cancer registry. Five years of data from all 81 screening centres in the UK was collected. Only women who had previously attended screening were included. The study was given ethical approval by the University of Warwick Biomedical Research Ethics committee reference SDR-232-07- 2012. Generalised linear models with a log-normal link function were fitted to investigate the relationship between predictors and the age corrected cancer detection rate at each centre. We found that screening centres serving areas with lower average socio-economic status had lower cancer detection rates, even after correcting for the age distribution of the population. This may be because there may be a correlation between higher socio-economic status and some risk factors for breast cancer such as nullparity (never bearing children). When applying adjustment for age, ethnicity and socioeconomic status of the population screened (rather than simply age) we found that SDR can change by up to 0.11.
The radiologist’s task of reviewing many cases successively is highly repetitive and requires a high level of concentration. Fatigue effects have, for example, been shown in studies comparing performance at different times of day. However, little is known about changes in performance during an individual reading session. During a session reading an enriched case set, performance may be affected by both fatigue (i.e. decreasing performance) and training (i.e. increasing performance) effects. In this paper, we reanalyze 3 datasets from 4 studies for changes in radiologist performance during a reading session. Studies feature 8-20 radiologists reading and assessing 27-60 cases in single, uninterrupted sessions. As the studies were not designed for this analysis, study setups range from bone fractures to mammograms and randomization varies between studies. Thus, they are analyzed separately using mixed-effects models. There is some indication that, as time goes on, specificity increases (shown with p<0.05 for 2 out of 3 datasets, no significant difference for the other) while sensitivity may also increase (p<0.05 for 1 out of 3 datasets). The difficulty of ‘normal’ (healthy / non-malignant) and ‘abnormal’ (unhealthy / malignant) cases differs (p<0.05 for 3 out of 3 datasets) and the reader’s experience may also be relevant (p<0.05 for 1 out of 3 datasets). These results suggest that careful planning of breaks and session length may help optimize reader performance. Note that the overall results are still inconclusive and a targeted study to investigate fatigue and training effects within a reading session is recommended.
The purpose of this study was to measure how mammography readers' performance varies with time of day and time
spent reading. This was investigated in screening practice and when reading an enriched case set. In screening practice
records of time and date that each case was read, along with outcome (whether the woman was recalled for further tests,
and biopsy results where performed) was extracted from records from one breast screening centre in UK (4 readers).
Patterns of performance with time spent reading was also measured using an enriched test set (160 cases, 41% malignant,
read three times by eight radiologists). Recall rates varied with time of day, with different patterns for each reader. Recall
rates decreased as the reading session progressed both when reading the enriched test set and in screening practice.
Further work is needed to expand this work to a greater number of breast screening centres, and to determine whether
these patterns of performance over time can be used to optimize overall performance.
Receiver Operating Characteristic analysis provides a reliable and cost effective performance measurement tool, without
using full clinical trials. However, when ROC analysis shows that performance is statistically superior in one condition
than another it is difficult to relate this result to effects in practice, or even to determine whether it is clinically
significant. In this paper we present two concurrent analyses: using ROC methods alongside single threshold recall rate
data, and suggest that reporting both provides complimentary data. Four mammographers read 160 difficult cases (41%
malignant) twice, with and without prior mammograms. Lesion location and probability of malignancy was reported for
each case and analyzed using JAFROC. Concurrently each participant chose recall or return to screen for each case.
JAFROC analysis showed that the presence of prior mammograms improved performance (p<.05). Single threshold data
showed a trend towards a 26% increase in the number of false positive recalls without prior mammograms (p=.056). If
this trend were present throughout the NHS Breast Screening Programme then discarding prior mammograms would
correspond to an increase in recall rate from 4.6% to 5.3%, and 12,414 extra women recalled annually for assessment.
Whilst ROC methods account for all possible thresholds of recall and have higher power, providing a single threshold
example of false positive, false negative, and recall rates when reporting results could be more influential for clinicians.
This paper discusses whether this is a useful additional method of presenting data, or whether it is misleading and
inaccurate.
After the introduction of digital mammography the film mammograms from the previous screening round (the prior
mammograms) can be displayed in a variety of ways. This paper investigates the performance of radiologists reading
digital screening mammograms with the prior mammograms displayed either as film or in digitised format. A set of 162
cases was assembled, each with two view digital mammograms and two view film prior mammograms. Of these cases 66
were malignant as proven by biopsy, and the others were normal or benign. The film prior mammograms were digitised
at 75μm. Eight participants, with four to seventeen years experience of reading screening mammograms, each read the
mammograms twice; once with the digitised prior mammograms displayed on the digital workstation, and once with the
film prior mammograms displayed on an adjacent multi-viewer. The two viewings were at least one month apart.
Participants marked the location of abnormalities on a paper copy of the mammograms and rated the probability of
malignancy of each abnormality. Participants were video-taped whilst reading the cases to enable analysis of gross eye
movements for information regarding the level of use of the prior mammograms. JAFROC analysis showed no
difference in performance between the conditions.
In the UK Breast Screening Programme there is a growing transition from film to digital mammography, and
consequently a change in mammography workstation ergonomics. This paper investigates the effect of the change for
radiologists including their comfort, likelihood of developing musculoskeletal disorders (MSD's), and work practices.
Three workstations types were investigated: one with all film mammograms; one with digital mammograms alongside
film mammograms from the previous screening round, and one with digital mammograms alongside digitised film
mammograms from the previous screening round. Mammographers were video-taped whilst conducting work sessions at
each of the workstations. Event based Rapid Upper Limb Assessment (RULA) postural analysis showed no overall
increase in MSD risk level in the switch from the film to digital workstation. Average number of visual glances at the
prior mammograms per case measured by analysis of recorded video footage showed an increase if the prior
mammograms were digitised, rather than displayed on a multi-viewer (p<.05). This finding has potential implications for
mammographer performance in the transition to digital mammography in the UK.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.