Objectives: Electronic health record (EHR) can allow for the generation of large cohort of individuals with given diseases for clinical...
Objectives: Electronic health record (EHR) can allow for the generation of large cohort of individuals with given diseases for clinical and genomic research. A rate-limiting step is the development of electronic phenotype selection algorithms to find such cohorts. This study evaluated the portability of a published phenotype algorithm to identify rheumatoid arthritis (RA) patients from HER records at three institutions with different EHR systems.
Material and Methods: Physicians reviewed charts from three institutions to identify patients with RA. Each institutions compiled attributes from various sources in the EHR, including codified data and clinical narratives, which were searched using one of two natural language processing (NLP) systems. The performance of the published model was compared with locally retrained models.
Results: Appying the previously published model from Partners Healthcare to datasets from Northwestern and Vanderbilt Universities, the area under the receiver operating characteristic curve was found to be 92% for Northwestern and 95% for Vanderbilt compared with 97% to 72% from the original 65%. Both the original logistic regression models and locally restrained models were superior to simple billing code count thresholds.
Discussion: These results show that a previously published algorithm for RA is portable to two external hospitals using different HER systems, different NLP systems, and different target NLP vocabularies. Retraining the algorithm primarily increased the sensitivity at each site.
Conclusion: Electronic phenotype algorithms allow rapid identification of case populations in multiple sites with little retraining