Assessing Inter- and Intra-Observer Variability in Pediatric Testicular Ultrasound Volumes: A Prospective Study
Diana Cardona-Grau, MD, Charles Welliver, MD, Barry Kogan, MD.
Albany Medical Center, Albany, NY, USA.
BACKGROUND: Testicular volume (TV) is an important factor in the evaluation of testicular health, pubertal development and as a metric of testicular function in infertile males. As important clinical decisions are often made based on TV, it is essential that we understand the reliability and reproducibility of the TV measurements as obtained by scrotal ultrasound (SU). We sought to determine the intra-and inter-observer variability in TVs as obtained by SU in pediatric patients.
METHODS: Patients age 10-19 with an indication for SU or undergoing an ultrasound for another reason were recruited for the study. Children with recent orchitis, epididymitis, torsion of appendix testis or epididymis by history or scrotal ultrasound were excluded. Patients underwent SU by two different technicians (A and B) to assess inter-observer variability. Within 60 days of the initial measurement by technician A (A1), patients underwent another SU by A (A2) to assess for intra-observer variability. The technicians were blinded to all other ultrasound results. TV was calculated using the formula of Lambert (LxWxHx0.71).
Inter- and intra-observer variability was analyzed using the Bland-Altman analysis as this type of statistical analysis does not assume a gold standard measurement (neither technician assumed to be more accurate). Using the Bland-Altman analysis, the accuracy (bias or mean difference) and precision (standard deviation of the bias) of measurements of TV were assessed. Inter-observer and intra-observer agreement for TV was measured using Kappa, the fraction of agreement beyond chance. For interpretation of Kappa, the following standard measures were followed: poor (κ< 0.2), fair (κ = 0.21- 0.4), moderate (κ = 0.41- 0.60), good (κ= 0.61- 0.80), and very good agreement (κ = 0.81-1.00).
RESULTS: Thus far, nine patients have enrolled providing 18 testes for statistical analysis. The mean age (SD) was 14.6 (± 2.6) with a mean BMI 22.7 (± 3) and Tanner stage 3.7 (±1.3). Mean values for TV measurements were A1: 15.6 ± 11.1 cm3, A2: 16.1 ± 10.0 cm3 and B: 16.7 ± 10.6 cm3.
On bland Altman analysis, there was a wider range in the limits of agreement in inter-observer TV measurements compared to intra-observer measurements (-4.11 to 5.35 cm3 vs -5.9 to 7.8 cm3). Practically speaking, this means a difference of >20% in TV would be reported 44% of the time on studies done by different technicians (k= 0.1). While intra-observer discrepancies were less common, a difference of > 20% in testicular volume would still be reported 22% of the time on studies performed by the same technician (k=0.5).
CONCLUSION: Significant and surprising variability exists in inter-observer and intra-observer measurements of TV. However, this variability is minimized when the same technician performs the measurements (less intra-observer variability). These differences in TV measurements should be taken into account in clinical decision making particularly when studies are performed by two different technicians.
Back to 2016 Fall Congress