Application of Automated Data Extraction Technologies in Ophthalmic Electronic Health Records

Yinghai Yu

doi:10.54097/p6k4fz34

Authors

Yinghai Yu School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, China

DOI:

https://doi.org/10.54097/p6k4fz34

Keywords:

Unstructured data, Data extraction, Electronic health records, Ophthalmology, Optical character recognition, Natural language processing, Large language model

Abstract

Now that the healthcare industry is increasingly using digital technology, there is a lot of useful unstructured data in the electronic health record of the eye, but it is easy to make mistakes when it takes time to process it manually. In order to solve this problem, this paper carefully examines the application of optical character recognition and natural language processing in ophthalmic electronic medical records, and also pays special attention to the large language models that have recently been very hot. We checked a lot of data through Google Scholar, and after careful screening, we selected 30 good quality articles to study. From the results, these technologies have their own advantages and disadvantages: optical character recognition processing standardized equipment reporting accuracy is very high, but it is not very good when it comes to handwriting. Natural language processing can find vision data and disease characteristics from the medical record text, but if the medical record format is not uniform, the effect will be reduced. The latest large-scale language models are really powerful, they can directly handle words and pictures, and change the entire workflow, but they also have their own problems, such as data security, high operating costs, and possible hallucinations. In general, these automated technologies have deepened the use of data on electronic medical records of ophthalmology. Later research can focus more on how to better combine different types of data, and to develop some language models that are specific to ophthalmology, smaller, and can operate safely locally.

Downloads

Download data is not yet available.

References

[1] Mun Y, Kim J, Noh K J, et al. An innovative strategy for standardized, structured, and interoperable results in ophthalmic examinations [J]. BMC Medical Informatics and Decision Making, 2021, 21(1): 9.

[2] Park C H, Lee S H, Lee D Y, et al. Analysis of Retinal Thickness in Patients With Chronic Diseases Using Standardized Optical Coherence Tomography Data: Database Study Based on the Radiology Common Data Model [J]. JMIR Medical Informatics, 2025, 13: e64422.

[3] using Structural A G D. Automated Glaucoma Detection using Structural Optical Coherence Tomography With Data Mining [J].

[4] Kumar A, Singh P, Lata K. Comparative study of different optical character recognition models on handwritten and printed medical reports [C]//2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA). IEEE, 2023: 581-586.

[5] Luaibi Z H, Alkhayyat A N. Detection and Analysis of Diabetic Macular Edema (DME) Using Artificial Intelligence Techniques [J]. Journal of Madenat Al-Elem University College/Magalla ẗ Kulliyya ẗ Madīna ẗ Al- ʿ Alam Al- Ğ ā mi'a ẗ, 2025, 17(2).

[6] Rasmussen L V, Peissig P L, McCarty C A, et al. Development of an optical character recognition pipeline for handwritten form fields from an electronic health record [J]. Journal of the American Medical Informatics Association, 2012, 19(e1): e90-e95.

[7] Mendes I, Miranda V, Salazar M, et al. Enhancing Amblyopia Screening with Machine Learning: Challenges and Solutions in Data Preparation [J]. Procedia Computer Science, 2025, 257: 1092-1097.

[8] Majid I, Zhang Y V, Chang R, et al. Extraction of Text from Optic Nerve Optical Coherence Tomography Reports [J]. arXiv preprint arXiv:2308.10790, 2023.

[9] Peissig P L, Rasmussen L V, Berg R L, et al. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records [J]. Journal of the American Medical Informatics Association, 2012, 19(2): 225-234.

[10] Hua C, Shi Y, Hu M, et al. Intelligent data extraction system for RNFL examination reports [C]//CAAI International Conference on Artificial Intelligence. Cham: Springer Nature Switzerland, 2022: 537-542.

[11] Macri C Z, Teoh S C, Bacchi S, et al. A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry [J]. Graefe's Archive for Clinical and Experimental Ophthalmology, 2023, 261(11): 3335-3344.

[12] Wang S Y, Pershing S, Tran E, et al. Automated extraction of ophthalmic surgery outcomes from the electronic health record [J]. International journal of medical informatics, 2020, 133: 104007.

[13] Bernstein I A, Koornwinder A, Hwang H H, et al. Automated recognition of visual acuity measurements in ophthalmology clinical notes using deep learning [J]. Ophthalmology science, 2024, 4(2): 100371.

[14] Valdes Sanz N, Garcia-Layana A, Colas T, et al. Clinical characterization of inpatients with acute conjunctivitis: a retrospective analysis by natural language processing and machine learning [J]. Applied Sciences, 2022, 12(23): 12352.

[15] Mbagwu M, French D D, Gill M, et al. Creation of an accurate algorithm to detect Snellen best documented visual acuity from ophthalmology electronic health record notes [J]. JMIR medical informatics, 2016, 4(2): e14.

[16] Chen J S, Lin W C, Yang S, et al. Development of an open-source annotated glaucoma medication dataset from clinical notes in the electronic health record [J]. Translational Vision Science & Technology, 2022, 11(11): 20-20.

[17] Wang B, Lai J, Liu M, et al. Electronic source data transcription for electronic case report forms in China: validation of the electronic source record tool in a real-world ophthalmology study [J]. JMIR Formative Research, 2022, 6(12): e43229.

[18] Lin W C, Chen J S, Kaluzny J, et al. Extraction of active medications and adherence using natural language processing for glaucoma patients [C]//AMIA Annual Symposium Proceedings. 2022, 2021: 773.

[19] Wang S Y, Huang J, Hwang H, et al. Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam [J]. International journal of medical informatics, 2022, 167: 104864.

[20] Wang S Y, Singh S, Njie Jr S. Looking for Low Vision in Electronic Health Records [J].

[21] Gui H, Tseng B, Hu W, et al. Looking for low vision: predicting visual prognosis by fusing structured and free-text data from electronic health records [J]. International journal of medical informatics, 2022, 159: 104678.

[22] Mao X, Li F, Duan Y, et al. Named entity recognition of electronic medical record in ophthalmology based on crf model[C]//2017 International conference on computer technology, electronics and communication (ICCTEC). IEEE, 2017: 785-788.

[23] Baughman D M, Su G L, Tsui I, et al. Validation of the total visual acuity extraction algorithm (TOVA) for automated extraction of visual acuity data from free text, unstructured clinical records [J]. Translational Vision Science & Technology, 2017, 6(2): 2-2.

[24] Luo M J, Bi S, Pang J, et al. A large language model digital patient system enhances ophthalmology history taking skills [J]. NPJ Digital Medicine, 2025, 8(1): 502.

[25] Tan J C K, Coroneo M T. Diagnostic interpretation of corneal tomography using a multimodal large language model (ChatGPT) [J]. American Journal of Ophthalmology Case Reports, 2025: 102441.

[26] Majid I, Mishra V, Ravindranath R, et al. Evaluating the performance of large language models for named entity recognition in ophthalmology clinical free-text notes [C]//AMIA Annual Symposium Proceedings. 2025, 2024: 778.

[27] Ruan F Y, Lam J W, Esmaeilkhanian H, et al. Leveraging Large Language Models with Sequential Prompting to Extract Eye Examination Findings from Free-Text Ophthalmology Notes [J]. Ophthalmology Science, 2025: 100944.

[28] Salvi A, Arnal L, Ly K, et al. Ocular Biometry OCR: a machine learning algorithm leveraging optical character recognition to extract intra ocular lens biometry measurements [J]. Frontiers in Artificial Intelligence, 2025, 7: 1428716.

[29] Chen K M, Chen K W, Diaconita V, et al. OphthoACR (Ophthalmology Automated Chart Review): An AI-Powered Tool for Complete Automation of Ophthalmology Chart Reviews and Cohort Data Analysis [J]. Translational Vision Science & Technology, 2025, 14(10): 8-8.

[30] Satheakeerthy S, Jesudason D, Bahrami B, et al. Zero-shot LLM-based visual acuity extraction: a pilot study [J]. BMC ophthalmology, 2025, 25(1): 359.

Application of Automated Data Extraction Technologies in Ophthalmic Electronic Health Records

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Cover

Indexing

Keywords

Latest publications

Information