Use of artificial intelligence in ophthalmology: a narrative review

ABSTRACT BACKGROUND: Artificial intelligence (AI) deals with development of algorithms that seek to perceive one’s environment and perform actions that maximize one’s chance of successfully reaching one’s predetermined goals. OBJECTIVE: To provide an overview of the basic principles of AI and its main studies in the fields of glaucoma, retinopathy of prematurity, age-related macular degeneration and diabetic retinopathy. From this perspective, the limitations and potential challenges that have accompanied the implementation and development of this new technology within ophthalmology are presented. DESIGN AND SETTING: Narrative review developed by a research group at the Universidade Federal de São Paulo (UNIFESP), São Paulo (SP), Brazil. METHODS: We searched the literature on the main applications of AI within ophthalmology, using the keywords “artificial intelligence”, “diabetic retinopathy”, “macular degeneration age-related”, “glaucoma” and “retinopathy of prematurity,” covering the period from January 1, 2007, to May 3, 2021. We used the MEDLINE database (via PubMed) and the LILACS database (via Virtual Health Library) to identify relevant articles. RESULTS: We retrieved 457 references, of which 47 were considered eligible for intensive review and critical analysis. CONCLUSION: Use of technology, as embodied in AI algorithms, is a way of providing an increasingly accurate service and enhancing scientific research. This forms a source of complement and innovation in relation to the daily skills of ophthalmologists. Thus, AI adds technology to human expertise.

When examining the performance results from an algorithm, it is important to evaluate the methodology and the way in which it was developed. For example, an algorithm developed for analysis of fundus retinography may perform poorly if applied to a retinal photograph with a larger field. [4][5][6][7][8][9]

OBJECTIVE
The purpose of this article was to provide an overview of the basic principles of AI and its main studies in the fields of glaucoma, retinopathy of prematurity, age-related macular degeneration and diabetic retinopathy. From this perspective, the limitations and potential challenges that have accompanied implementation and development of this new technology within ophthalmology are presented.

METHODS
We searched the literature on the main applications of artificial intelligence within ophthalmology, using the keywords "artificial intelligence", "diabetic retinopathy", "macular degeneration age-related", "glaucoma" and "retinopathy of prematurity", cov-  Table 1.

RESULTS
From the search in the databases, one clinical trial, four metaanalyses, four randomized controlled trials, 47 reviews and four systematic reviews were identified. After screening the titles and abstracts, removing duplicates and screening the citations, 47 studies were considered eligible for critical analysis. The article selection process is detailed in Figure 1.

Diabetic retinopathy
Diabetes is the leading cause of blindness in adulthood, affecting more than 415 million people worldwide. 10,11 Recent studies on the use of AI for monitoring diabetic retinopathy have demonstrated that it has high precision for detecting this disease. 10,12,13 In 2018, IDx-DR, which is an AI diagnostic system that autonomously diagnoses patients with diabetic retinopathy (including macular edema), was approved by the United States Food and Drug Administration (FDA) for classifying diabetic retinopathy. This was the first artificial intelligence device approved by that institution.
Ting et al. 12  for referable retinopathy (maculopathy, proliferative or pre-proliferative diabetic retinopathy). Retmaker had sensitivity of 73% for any retinopathy, 85% for referable retinopathy and 97.9% for proliferative retinopathy. AI models were trained to detect microaneurysms, retinal hemorrhages and hard or soft exudates. Retmaker is a system that has been used for screening diabetic retinopathy. 10 Abràmoff et al. 14 evaluated an algorithm for automatic detection of diabetic retinopathy, with specificity of 59.45% and sensitivity of 96.8%. Gulshan et al. 15 developed an algorithm for screening diabetic retinopathy using 128,175 images of color fundus retinography, which had specificity of 90.3% and sensitivity of 98.1%, and reached an area below the receiver operating characteristic (ROC) curve of 0.99 for detecting referable diabetic retinopathy.
Subsequently, Gulshan et al. 16 investigated use of an algorithm in 10 primary care centers for six months, which resulted in sensitivity of 87.2% and specificity of 90.8% for detecting clinically significant macular edema in at least one eye. This follow-up study emphasized the importance of testing an artificial intelligence algorithm in the real world.
Li et al. 13 developed an artificial intelligence-based model for detecting diabetic retinopathy based on the color of retinography photographs. Its sensitivity was 97.0% and specificity was 91.4%, as result of using more than 100,000 images. It reached an area below the ROC curve of 0.99 in validation and 0.955 in external validation using an independent multiethnic data set.
Gargeya et al. 17   systems only evaluate the central 45 degrees, close to the macula, and do not assess diabetic retinopathy lesions that may be occurring on the periphery of the patient's retina. 16 Because of the great variation in the reference standards between different studies, it is difficult to compare the performance of the algorithms. To solve this challenge, algorithms could be tested on an independent data set with a single reference standard.

Age-related macular degeneration
Diabetic retinopathy and age-related macular degeneration (AMD) are the leading causes of blindness among adults over the age of 50 years in the United States. Just like in relation to diabetic retinopathy, development of algorithms for diagnosing and monitoring AMD has therefore been stimulated. AMD cases normally need to be referred to a tertiary-level eye service for clinical evaluation by experts. [18][19][20] Algorithms evaluating color fundus photos Thus, the main algorithms that have been developed are useful for detecting and segmenting injuries, estimating the risk of progression to advanced stages or evaluating the risk of conversion of dry AMD to an exudative form.

Glaucoma
Glaucoma is an important cause of loss of vision worldwide. In evaluating optic neuropathy, the cup disc needs to be characterized: its size and shape can vary between people. However, defining the cup disc is insufficient for diagnosing glaucoma, due to the large anatomical changes of the optic disc. Examination of the OCT retinal nerve fiber layer thickness and ganglion cell complex can be used for diagnosing glaucoma. Visual field examination is inexpensive and can be used to assess functional loss. However, the sensitivity and specificity of the diagnosis is lower than when a combination of visual field and OCT data is used.
Use of anatomical and functional data together is superior to anatomical data in isolation for diagnosing glaucoma. Artificial intelligence algorithms can combine these factors to aid in making diagnoses. In a study using a database of 125,189 fundus retinography images, Ting et al. 12  Thus, studies using longitudinal data are needed in order to correctly identify patients who will develop glaucoma.
In patients with severe glaucoma, disease identification by means of algorithms usually has better results. However, caution needs to be exercised due to the great anatomical variability of optic nerves in populations, especially among patients with a high degree of myopia.

Retinopathy of prematurity
Retinopathy of prematurity (ROP), which has a prevalence of 6%-18%, is one of the main causes of loss of vision in childhood worldwide. 38 This disease, in its third epidemic, resulted in irreversible blindness in more than 50,000 premature newborns because of a shortage of trained specialists. 39,40 Experts usually disagree about the clinical classification of ROP.
In the cryotherapy (CRYO)-ROP study, the second examiner disagreed with the first regarding the diagnosis of threshold disease in 12% of the cases. 41 Also, in a multicenter telemedicine study on diagnosing ROP, almost 25% of the tests did not align with one of the three criteria for clinically significant ROP. 42 The initial approaches to automated image analysis have been based on quantification of vascular tortuosity and vascular dilation.
These systems were developed and validated for wide-angle RetCam images. They were evaluated based on the diagnoses of specialists but did not have any application in the real world because they are only semi-automated, thus requiring manual identification. 43 The initial computational approaches for detecting this pathological condition focused on the vascular tortuosity of retinopathy of prematurity-plus (ROP-plus). 44  Current methods for detecting ROP can distinguish between mild and severe cases of ROP but are still unable to identify the stage of the disease. 50 Campbell et al. 51 demonstrated that automated diagnosis of ROP (i-ROP) had an accuracy of 95%, while the average accuracy of 11 specialists was 87%. Thus, algorithms with performance comparable to that of retinal specialists already exist.

DISCUSSION
Development of algorithms for diagnosing ophthalmic diseases requires many images in order to achieve a classification. When an algorithm is designed, the following need to be considered: the population in which it will be applied, whether it is aligned with current clinical evidence and whether use of the algorithm applies only to diagnosing the disease. 52 Because diseases such as glaucoma, macular degeneration, diabetic retinopathy and retinopathy of prematurity have relatively high prevalence, this favors creation of algorithms, given the large amount of data that has been documented. Rare diseases, with limited data, still present a challenge with regard to development of artificial intelligence programs. Among the topics selected for the present review, retinopathy of prematurity is one for which the fewest algorithms have been developed. This is thought to be due  Table 2). 10,12,[14][15][16][21][22][23][28][29][30][32][33][34]36,46,49,51 It is important to highlight the reliability of the ground truth labels, which in ophthalmological studies are evaluations by specialists, who may nevertheless have divergent opinions. It is important that the sample used for training the algorithms should be specified.
Incorporation of machine learning technology within ophthalmology can improve medical care for the population in regions with limited medical resources, thus reducing some social inequalities.

Future directions, strengths and limitations
Development of enormous longitudinal studies to judge the artificial intelligence systems developed is important for assessing the real security and effectiveness of artificial intelligence systems. Narrative reviews contribute towards providing Table 2. Comparison of accuracy, sensitivity, specificity and number and type of images analyzed Authors Pathological condition/ number of images analyzed Precision Ting et al. 12 Diabetic retinopathy/76,370 images of retinal photographs Sensitivity of 90.5% and specificity of 91.6% Tufail et al. 10 Diabetic retinopathy/20,258 images of retinal photographs EyeArt (sensitivity of 93.8%) and Retmaker (sensitivity of 97.9%) Abràmoff et al. 14 Diabetic retinopathy/1,748 images of retinal photographs Sensitivity of 96.8% and specificity of 59.4% Gulshan et al. 15 Diabetic retinopathy/9,963 images of retinal photographs Sensitivity of 98.1% and specificity of 90.3% Gulshan et al. 16 Diabetic retinopathy/103,634 images of retinal photographs Sensitivity of 87.2% and specificity of 90.8% Ting et al. 12 AMD/72,610 images of retinal photographs Sensitivity of 93.2% and specificity of 88.2% Burlina et al. 21 AMD/130,000 images of retinal photographs 91.6% accuracy Grassmann et al. 22 AMD/120,656 images of retinal photographs 84.2% accuracy Venhuizen et al. 28 AMD/3,265 images of OCT Sensitivity of 98.2% and specificity of 91.2% Peng et al. 23 AMD/58,402 images of retinal photographs Accuracy of 97.0% Lee et al. 29 AMD/48,312 images of OCT Sensitivity of 84.6% and specificity of 91.5% Ting et al. 12 Glaucoma/125,189 images of retinal photographs 96.4% sensitivity and 87.2% specificity Li et al. 30 Glaucoma/48,116 images of retinal photographs Sensitivity of 95.6% and specificity of 92% Kim et al. 32 Glaucoma/399 images of visual field Sensitivity of 98.3% and specificity of 97.5% Ahn et al. 33 Glaucoma/1,542 images of retinal photographs 92.2% accuracy Asaoka et al. 34 Glaucoma/171 images of visual field 92.6% accuracy Masumoto et al. 36 Glaucoma/982 images of visual field Sensitivity of 81.3% and specificity of 80.2% Brown et al. 46 ROP/5,511 images of retinal photographs 93% sensitivity and 94% specificity Ataer-Cansizoglu et al. 49 ROP/77 images of retinal photographs 95% accuracy Campbell et al. 51 ROP/77 images of retinal photographs 95% accuracy AMD = age-related macular degeneration; OCT = optical coherence tomography; ROP = retinopathy of prematurity. The articles included in this review generated heterogeneous data because of the diversity in the design of the studies. The main limitation of this review was the lack of tools for methodological assessment of the reviews. In addition, this narrative review does not provide quantitative answers to specific questions about studying artificial intelligence.

CONCLUSION
Use of technology, as embodied in artificial intelligence algorithms, is a way of providing an increasingly accurate service and enhancing scientific research. This forms a source of complement and innovation in relation to the daily skills of ophthalmologists.
Thus, artificial intelligence adds technology to human expertise.