Mucus aggregation on the vocal folds, a common complaint amongst persons with voice disorders, has been visually
rated on four parameters: type, pooling, thickness, and location. Rater training is used to improve the reliability and
accuracy of these ratings. The goal of this study was to evaluate the effect of training on rater reliability, accuracy and
response time.
Two raters scored mucus aggregation from 120 stroboscopic exams after a brief introductory session and again after a
thorough training session. Reliability and accuracy were calculated in percent agreement. Two-tail paired t-tests were
used to assess differences in reaction time for ratings before and after training.
Inter-rater reliability improved from 79% pre-training to 92% post-training. Intra-rater reliability improved from 77% to
91% for Rater 1 and 80% to 88% for Rater 2 following training. Accuracy improved from 80% to 96% for Rater 1 and
76% to 95% for Rater 2 from pre- to post-training. Reaction time decreased for both raters (p=0.025).
These findings further our understanding of observer performance on judgments of laryngeal mucus. These results
suggest that rater training increases reliability and accuracy while decreasing reaction time. Future studies should assess
the relationship of these judgments and voice changes.
|