BACKGROUND: Our objective was to test the capabilities of natural language processing models (NLMs) to accurately detect the finding of hydronephrosis from renal ultrasound radiology impressions. We hypothesized that two existing NLMs (bidirectional encoder representations from transformers (BERT) models) would accurately detect the finding of hydronephrosis from radiology impressions.
METHODS:: Impressions from 2,392 renal ultrasounds of spina bifida patients performed at the Children’s Hospital of Philadelphia (CHOP) were manually reviewed for the finding of any degree of hydronephrosis in either kidney. Overall, 391 ultrasounds were found to have hydronephrosis present. The raw impressions and outcome (presence or absence of hydronephrosis) were provided to two BERT models for training on our data set. The first model, BERT-base, is an NLM that is pre-trained on two general language corpora (Wikipedia and Google’s BooksCorpus). The second model, BioclinicalBERT, is an expanded version of BERT-base with additional pre-training on PubMed abstracts, PubMed Central full-text articles, and the MIMIC-III database of clinical notes from critical care patients for improved performance in clinical biomedical contexts. We trained both models on the CHOP dataset using an 80:20 training:testing split. The models were then validated using 3-fold cross validation. Model performance was externally validated retrospectively using a set of 110 renal ultrasound impressions from the Hospital for Sick Children (SickKids).
RESULTS: Using the pure external validation set of 110 ultrasound impressions from SickKids, the BERT-base model had an 88.9% sensitivity and 89.7% specificity for detecting presence of hydronephrosis on impressions. The receiver operating characteristic (ROC) curve had an area under the curve (AUC) of 0.87 with 95% confidence interval (CI) of 0.84-0.90 (Figure 1). The BioclinicalBERT model had a 92.1% sensitivity and 85.3% specificity and ROC curve with AUC of 0.93 with 95% CI of 0.91-0.96 (Figure 1).
CONCLUSIONS::The BioclinicalBERT model was superior to the BERT-base model in predicting the presence of hydronephrosis on radiologic impressions of renal ultrasounds in our population. Our results indicate that BioclinicalBERT will allow for the automated large-scale data extraction of ultrasound findings from the electronic medical record.