Introduction
Breast cancer is the most common type of cancer in women, and it is also a common cause of cancerrelated mortality, both in developing and developed countries. Approximately 1.4 million new cases occurred in 2008 worldwide, representing 23% of all cancers ^{1} ^{,} ^{2}. Fortunately, the early detection of breast cancer can improve the chance of successful treatment and recovery ^{3}.
In recent decades, artificial intelligence has become widely accepted in medical applications ^{4}. One such application is Bayesian networks, which are becoming widely used to represent knowledge domains in the presence of uncertainty from randomness. Bayesian networks can be used as an analysis and decision aid in the interpretation of the results of a diagnostic test (e.g. mammography), signs and symptoms when uncertainty is known to be a dominant factor ^{5}.
A Bayesian network is a graphical model that represents probabilistic relationships among variables of interest ^{6}. Such networks consist of a qualitative component (i.e., the structural model), which provides a visual representation of the interactions among variables, and a quantitative component (i.e., a set of local probability distributions), which permits probabilistic inference. Together, these components determine the unique joint probability distribution over the variables in a specific problem ^{7}. In clinical medical practice, professionals can calculate probabilities using Bayes' theorem without a computer for a specific diagnosis with limited parameters (i.e., a few conditional probabilities). If the factors that modify the probability of disease have interactions, however, the complexity of such calculations can increase exponentially, making it difficult to solve without computational support. In this case, Bayesian networks may be useful ^{8}.
In the Bayesian networks, the nodes represent uncertain variables, and typically, there is a primary or "root" node that represents the variable of interest, other nodes impact the probability of that primary node ^{6} ^{,} ^{7}. For example, in medical diagnosis, estimating the probability of an event, such as the malignancy of a breast mass ("root" node), given a set of evidence (i.e., demographics, image characteristics, etc.) is a problem that can be solved with Bayesian networks ^{8}. Patient risk factors, signs, symptoms, and the results of diagnostic test are inputs of the system ^{8}.
Each node necessarily contains mutually exclusive and collectively exhaustive instances ^{6} ^{,} ^{7}. Mutually exclusive instances refer to events that cannot occur at the same time. For example, if a coin toss is the variable of interest, heads and tails are the mutually exclusive instances of that variable because they cannot occur simultaneously ^{8}.
When both the structure and probabilities are established, the Bayesian network can be used to determine the probability of one node based upon the available information about other conditionally dependent nodes using an inference algorithm. Inference is the reasoning process used to draw conclusions from available evidence based on the principles of probabilistic reasoning and Bayes' theorem ^{6} ^{,} ^{7} ^{,} ^{8}.
This theory may be used to differentiate between benign and malignant breast diseases, using radiologists' descriptions of breast imaging findings, helping to define which patients should be referred for biopsy and which should not be referred for this procedure ^{9}. These systems can also perform more complex reasoning tasks, such as mammographyhistology correlation, and detect sampling error better than radiologists ^{10}. This technique can also be used to help the general practitioner to determine which breast lesions identified on physical examination should be directed to mammography or for mastologist. It can assist the pathologist to do the pathological diagnosis too.
The purpose of this systematic review was to assess the accuracy of Bayesian networks in patients with breast lesions.
Methods
Search strategy
The adopted search strategy was to perform a search, between January 1990 and March 2014, in the following databases: Medical Literature Analysis and Retrieval System Online (MEDLINE) through PubMed, Cancer Literature (CancerLit), Latin American and Caribbean Health Sciences (LILACS), Excerpta Medical Database (Embase), SciVerse Scopus (Scopus), Cochrane Central Register of Controlled Studies (Cochrane), Spanish Bibliographic Index of Health Sciences (IBECS), Biological Abstracts (BIOSIS), and Web of Science.
We also searched papers in grey literature (which includes Google Scholar, published papers from conferences, government technical reports, and other materials that are not controlled by scientific publishers).
The databases were searched using the keywords included in the Medical Subject Headings (MeSH) and their synonyms, including the following terms: "breast neoplasms", "breast lesions", "breast cancer", "breast tumor", "mammary neoplasms", "mammary cancer", and "mammary tumor". The papers with these terms were associated with the evaluated test, "Bayesian network", and with the terms "sensitivity" and "specificity". The full search strategy will be available from the authors upon request.
The "*" symbol was used to allow the recovery of all variations of the original words with suffixes. The above terms were combined using the Boolean operators "AND", "OR", and "NOT".
The search was limited to studies in human females. There was no language restriction. The reference lists from all retrieved primary studies were checked. The references cited in metaanalyses, guidelines, and comments identified in the previously mentioned databases were also checked. We contacted the authors of the studies published with incomplete information; however, we received no responses to the emails sent.
Study screening and eligibility
The initial analysis of the abstracts and titles identified by the search strategy in the aforementioned databases was independently performed by three researchers (M.I.R., P.W.S., and G.D.S.); the assessment of English articles was performed by M.I.R., P.W.S., and C.S.S., and articles in other languages were assessed by different reviewers, E.P.W. and G.D.S. with translations performed as necessary. Disagreements about the inclusion or exclusion of each study were initially resolved by consensus; when consensus was not possible, differences were arbitrarily resolved by L.R.M.
Concordance between the reviewers was computed in primary studies using the Kappa Coefficient of Agreement (κ) ^{11} ^{,} ^{12} ^{,} ^{13}. We used the categories proposed by Altman in 1991 ^{14}.
We included crosssectional studies that used Bayesian networks (i.e., the evaluation test) as diagnostic test to assess breast lesions (i.e., the target conditions). The diagnostic test evaluation consisted of the results provided by the Bayesian networks (i.e., positive or negative).
We analyzed the studies involving women with breast tumors (benign or malignant) who had undergone surgery or core biopsy followed by histological analysis. The reference diagnostic test was the result of histological analysis of paraffinembedded sections; a breast cancer result by a Bayesian network was considered to be correct if it did not differ from the histological analysis.
For inclusion in our systematic review, the final histological diagnosis of breast lesions had to characterize the lesion as benign or malignant. The diagnostic being considered was the result of the Bayesian networks and was identified as the diagnosis with the highest a posteriori probability calculated from each study, ie, no matter how small the difference between the probabilities of a diagnosis of cancer against a diagnosis of benign lesion or how big the final probability of cancer; thus, a patient whose probability of having a malignancy of 49% against 51% probability of being a false positive could be classified by the Bayesian network as breast cancer. However, this method has been used successfully in the treatment of falsepositives and falsenegatives ^{8}.
Borderline lesions were included, allowing the consideration of the applicability of Bayesian networks to the pattern classification boundary ^{6} ^{,} ^{7}.
In this systematic review, we excluded studies presenting exclusively benign or breast cancer as the reference standard. Thus, the primary outcome analyzed was the accuracy of breast lesion result by Bayesian network.
Data collection and quality evaluation methodology
From the studies, we extracted the data (i.e., design, location, year, sample, average age, and prevalence), outcomes, and Bayesian network architectures, including the qualitative part (i.e., the description of the input and output nodes), the quantitative part (i.e., how to obtain the conditional probabilities), and the software used to build the Bayesian network and perform inference, in addition to the patient characteristics.
M.I.R. and P.W.S. independently abstracted data on the prevalence of benign and malignant breast lesions, study (year), study design, design and setting, number of lesions, age in years, Bayesian network architecture (description of outcomes and inputs) including its design, cutoff conditional and unconditional probabilities, and software used. Other reviewers (E.P.W. and L.R.M.) independently extracted data from the articles published in languages other than English, and translations were performed when necessary. Disagreements regarding the data extraction were resolved by consensus initially, and, when this was not possible, the differences were resolved by S.M.N.
Each reviewer also calculated the pretest probability (i.e., prevalence), sensitivity, specificity, positive and negative likelihood ratios, and posttest probabilities from the primary studies that used Bayesian networks for breast cancer diagnoses. As previously detailed, the studies that did not contain the necessary data to construct a 2x2 contingency table were excluded.
The methodological quality assessment was conducted using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) criteria modified by Cochrane, which consists of 11 study characteristics that have the potential to introduce bias. These items were classified as positive (without bias), negative (potential bias), or insufficient information ^{12} ^{,} ^{13}.
Summary of data and statistical analysis
To assess agreement between each study's eligibility and methodological quality, and the consistency between the Bayesian network results and the paraffin and histological results, we calculated the percentage observed using the κ ^{11} ^{,} ^{12}.
A 2x2 contingency table was constructed for each selected study, and all biopsies were classified as benign or malignant. We calculated the sensitivity, specificity, likelihood ratios, and posttest probability.
For studies in which one cell of the 2x2 contingency table contained a value of zero, 0.5 was added. However, the studies in which zero occurred in two or more cells were excluded from the analysis ^{13} ^{,} ^{14}.
We aimed to produce pooled estimates of sensitivity and specificity using a metaanalysis, which was performed using the MetaDisc software version 1.4 (developed by the Clinical Biostatistics Unit, Hospital Ramón y Cajal, Madrid, Spain) ^{15} and Review Manager (RevMan) version 5.0.21 (developed at The Nordic Cochrane Centre, Copenhagen, Denmark) ^{16}.
A bivariate analysis were used to calculate the pooled estimates of sensitivity, specificity, and likelihood ratios, along with 95% confidence intervals (95%CI) to estimate the values presented in the summarized metaanalysis ^{17} ^{,} ^{18} ^{,} ^{19}. All measurements that were summarized by the random effects model of DerSimonian & Laird ^{20}, which considers the presence of heterogeneity in the studies, were calculated, and all global averages were weighted with 95%CI.
The heterogeneity of the results of sensitivity and specificity across studies was analyzed by the χ^{2} distribution with n1 degrees of freedom, and the inconsistency I^{2} ^{21} ^{,} ^{22}. The heterogeneity of the positive and negative likelihood ratios of the different studies was examined with the Cochran Q test (establishing the weights of the studies by the inverse of the variance), the χ^{2} distribution with n1 degrees of freedom, the inconsistency (I^{2}), and χ^{2} were used to estimate the variation between the studies ^{21} ^{,} ^{22}.
As there was no heterogeneity, a summary receiver operating characteristic (SROC) curve was generated considering the data from all thresholds using the method of Littenberg and Moses ^{22}. This curve can change, depending on the threshold and the ROC curve used to define an abnormal test, thereby resulting in an expected oscillation between sensitivity and specificity.
The SROC curve can be considered an excellent summary graph; however, for the purpose of comparison, we calculated the Q* point as an additional statistic ^{18} ^{,} ^{22}. The Q* point is the one at which the sensitivity and specificity are equal in the SROC, and indicates when a test approximates the desired performance of 100% sensitivity and specificity, similar to the area under the curve (AUC) for an receiver operating characteristic (ROC) curve. Higher Q* values are associated with better performance in diagnostic tests ^{18} ^{,} ^{22}. The AUC and Q* considered in the SROC curve were estimated using the trapezoidal numerical integration method MetaDisc ^{15}.
Results
Identification of studies and eligibility
The process of study selection is shown in Figure 1. We identified 100 citations in the searched databases. After the initial review of titles and abstracts, 24 full papers were recovered, four of which were considered to be eligible for our systematic review.
Four primary studies, which included breast lesions from 1,223 women, met the inclusion criteria and were analyzed (as shown in Table 1) ^{10} ^{,} ^{23} ^{,} ^{24} ^{,} ^{25}. The overall concordance between the eligibility and methodological quality of the studies was 84% (κ = 0.67), indicating good agreement ^{12}. There was disagreement, which was resolved by consensus, between the reviewers on the inclusion and exclusion criteria in these four studies.
Characteristic  Study (year)  Total  

Kahn Jr. et al. ^{24} (1997)  Hamilton et al. ^{23} (1994)  CruzRamírez et al. ^{25} (2007)  Burnside et al. ^{10} (2004)  
Design and setting  Crosssectional; USA  Crosssectional; Northern Ireland  Crosssectional; England  Crosssectional; USA  
N *  77  40  1,014  92  1,223 
Age in years (SD/range)  NA  NA  NA  58.2 (±10.5) (range, 3784 years)  
Breast cancer (prevalence %)  25 (32.47) 
19 (47.50)  423 (41.72)  29 (31.52)  496 (40.56) 
Architecture  
Qualitative (nodes)  
Outcomes  Benign/Malignant  Benign/Malignant  Benign/Malignant  Benign/Malignant  
Inputs  Five patient history items **; Two physical findings ***; 15 mammographic findings  10 cytological features  Age; 10 cytological features  25 hierarchical descriptors of BIRADS 

Quantitative  
Conditional and unconditional probabilities  Peerreviewed medical literature ^{#}; Census data; Health statistics reports  One cytopathologist ^{##}  Two databases ^{###}  Peerreviewed medical literature ^{#}  
Software  BNG ^{26} (learning); IDEAL ^{27 }(inference)  Developed in the Optical Sciences Center, University of Arizona, USA  Power constructor (CBL2learning) ^{31,32}; Netica (inference) ^{33}  GeNIe Modeling Environment developed by the Decision Systems Laboratory of the University of Pittsburgh, USA ^{36} 
BIRADS: Breast ImagingReporting and Data System; NA: not available.
^{*} Number of lesions (malignant, benign and normal tissue);
^{**} Age at menarche, age at first live birth, number of firstdegree relatives with breast cancer, previous biopsy;
^{***} Nipple discharge, architectural distortion;
^{#} The authors of the original references used data from the medical literature to fill the parameters of conditional probabilities (dependent relationship between the variable and the outcome) and unconditional probabilities (prevalence of breast cancer) of the architecture of Bayesian network used;
^{##} A medical specialist who informed, through interviews, conditional probabilities (dependent relationship between the variable and the outcome) and unconditional probabilities (prevalence of breast cancer) of the architecture of Bayesian network used;
^{###} These databases come from the field of pathology, regarding the cytodiagnosis of breast cancer using fine needle aspiration cytology (FNAC) of the breast lesion; the first database, collected retrospectively by a single observer with 10 years' experience of reporting FNAC, during 19921993. The second database, collected prospectively by 19 observers with 520 years' experience of reporting FNAC, contains 322 consecutive adequate specimens, during 19961997. Reference test: Histology.
• Study descriptions
Table 1 provides detailed descriptions of the studies, tests, and standards used. The mean participant age was found in only one study ^{10}. Two studies were performed in the United States ^{10} ^{,} ^{24}, one in Ireland ^{23}, and another in England ^{25}.
Breast cancer was found in 496 cases (prevalence of 40.56%, i.e., the number of positive cases among all individuals included in the studies, considering the positive and negative cases), and 727 (59.44%) patients had benign lesions. Table 2 summarizes the results for all studies included in the metaanalysis.
Study (year)  Bayesian network  Sensitivity  Specificity  

Biopsy positive  Biopsy negative  
TP  FN  FP  TN  
Kahn Jr. et al. ^{24} (1997)  23  2  6  46  92.0 (74.099.0)  88.5 (76.695.6) 
Hamilton et al. ^{23} (1994)  17  2  0  21  89.5 (66.998.7)  100.0 (83.9100.0) 
CruzRamírez et al. ^{25} (2007)  381  42  36  555  90.1 (86.892.7)  93.9 (91.795.7) 
Burnside et al. ^{10} (2004)  23  6  4  59  79.3 (60.392.0)  93.7 (84.598.2) 
Total  444  52  46  681  89.5 (86.592.1)  93.7 (91.795.3) 
FN: false negative; FP: false positive; TN: true negative; TP: true positive.
The Bayesian network documented in these studies had the same possible outcomes (i.e., benign and malignant); however, the nodes representing the input variables showed different characteristics relevant to breast cancer diagnosis.
The Bayesian network in Kahn et al. ^{24} used information from the patients' histories, such as the physical exam and mammography results, and the conditional probabilities were obtained by reviewing the indexed medical literature, census data and statistical health reports, i.e., conditional probabilities for architectural distortion, previous biopsy at the same site were estimated as well; values for demographic variables were derived from published epidemiological data; and statistical studies published in peerreviewed radiology journals provided most of the data for knowledge base, such as values of conditional probabilities findings and mammographic findings for breast cancer. The software used for learning conditional probabilities was BNG ^{26}, and IDEAL was used for inference ^{27}.
The Bayesian network documented by Hamilton et al. ^{23} used 10 cytological characteristics as input information obtained by fine needle aspiration cytology (FNAC), and the conditional probability matrices relating each of the diagnostic clues and their outcomes to diagnosis (benign, malignant) were defined by a cytopathologist. The software for generating these Bayesian networks was developed at the University of Arizona (USA).
The Bayesian network developed by CruzRamírez et al. ^{25} considered the patients' age and 10 cytological features obtained by FNAC; the conditional probabilities were obtained in an automated way from two databases ^{28} ^{,} ^{29} ^{,} ^{30}. These databases come from the field of pathology, regarding the cytodiagnosis of breast cancer using FNAC of the breast lesion; the first database, collected retrospectively by a single observer with 10 years' experience of reporting FNAC, during 19921993. The second database, collected prospectively by 19 observers with 520 years' experience of reporting FNAC, contains 322 consecutive adequate specimens, during 19961997. The software used for learning conditional probabilities was the Power Constructor (CBL2Learning) ^{31} ^{,} ^{32}, and Netica software was used for inference ^{33}.
The Bayesian network presented by Burnside et al ^{10} consisted of 25 hierarchical descriptors of BIRADS classification ^{34} ^{,} ^{35}, and the conditional probabilities were obtained from indexed medical literature, i.e., the BIRADS descriptors links with diseases of the breast applying probabilities derived from the literature. The software used in the development of these Bayesian networks was the GeNIe Modeling Environment developed at the University of Pittsburgh (USA) ^{36}.
We seek with this systematic review to present the use of Bayesian networks as a method of decision support and diagnosis of breast cancer; therefore, despite differences in input variables of Bayesian networks of studies included in the metaanalysis, all showed the same Bayesian networks outcome, were used as decision support and diagnosis, and used Bayes' theorem to calculate inference and a posteriori probability.
• Methodological quality assessment
Methodological quality assessments of the studies were performed according to a modified version of QUADAS ^{11} ^{,} ^{12} and are illustrated in Table 3.
The reviewers disagreed on 3 of the 11 items; this disagreement was resolved by consensus. The presence of withdrawals from the sample was not clarified in all studies; additionally, it was not possible to determine whether data interpretation was conducted in a blinded manner in the included studies. Fifty percent of the studies did not describe whether the clinical information available was that used in clinical practice, and the remainder showed bias on that item. Additionally, 25% of the studies did not provide sufficient information to enable an assessment of whether the sample was representative, that is, whether they considered the patients receiving routine tests.
Three studies performed well and received a positive rating on at least 8 of the 11 items ^{10} ^{,} ^{24} ^{,} ^{25}. The interobserver agreement in the analysis of methodological quality with QUADAS was 94% (κ = 0.86), indicating a good agreement ^{14}. As described above, all disagreements were resolved by consensus.
• Summary of diagnostic performance
The overall interobserver agreement between the Bayesian networks and the paraffin examination was 92.4% (95%CI: 90.9%93.9%) (κ = 0.84), indicating a good agreement ^{14}. The overall sensitivity was 89.5% (95%CI: 86.5%92.1%), and the specificity was 93.7% (95%CI: 91.7%95.3%), as shown in Table 2.
The sensitivity shown in Table 2 demonstrates that there was no heterogeneity (χ ^{2}, p = 0.41), and the inconsistency (I^{2} = 0%) was intermediate ^{37}. The specificity plot shown in Table 2 suggests that there was no heterogeneity, as assessed by χ^{2} (p = 0.19), and the inconsistency (I^{2} = 36.8%) was intermediate ^{37}.
The pooled positive likelihood ratio (Table 4) was 13.55 (95%CI: 10.2517.92), meaning that a positive result from the Bayesian networks increased the odds that the patient had breast cancer (i.e., a true positive, TP) by 13.55 times. We did not observe heterogeneity in the Cochran Q test (p = 0.4154), at τ^{2} = 0.0, and there was no inconsistency (I^{2} = 0.0%).
Characteristic  Kahn Jr. et al. ^{24}
(1997) 
Hamilton et al. ^{23}
(1994) 
CruzRamírez et al. ^{25} (2007)  Burnside et al. ^{10}
(2004) 

Representative spectrum  +  +  +  
Acceptable reference standard  +  +  +  + 
Acceptable delay between tests  +  +  +  + 
Partial verification avoided  +  +  +  + 
Differential verification avoided  +  +  +  + 
Incorporation avoided  +  +  +  + 
Reference standard results blinded  +  +  +  + 
Index test results blinded  
Relevant clinical information      
Uninterpretable results reported  +  +  +  + 
Withdrawals explained 
+: without bias; : potential bias; blank: insufficient information.
Study (year)  Likelihood ratio (95%CI)  Posttest probability * (95%CI)  

Positive  Negative  Positive  Negative  
Kahn Jr. et al.^{24} (1997)  7.97 (3.7217.07)  0.09 (0.02 0.34)  79.37 (78.3480.40)  4.59 (4.065.12) 
Hamilton et al. ^{23} (1994)  38.50 (2.47 599.32)  0.13 (0.040.41)  90.89 (89.4892.30)  6.44 (5.247.64) 
CruzRamírez et al. ^{25} (2007)  14.79 (10.7620.33)  0.11 (0.080.14)  91.38 (91.3391.43)  6.68 (6.636.73) 
Burnside et al. ^{10} (2004)  12.49 (4.7532.83)  0.22 (0.110.45)  85.18 (84.4385.94)  9.23 (8.629.85) 
Total  13.55 (10.2517.92)  0.12 (0.090.18)  90.24 (90.1990.29)  7.80 (7.767.84) 
95%CI: 95% confidence interval.
^{*} Values of the calculated posttest probability: Pretest probability = prevalence = 40.56%, Pretest odds = prevalence/ (1prevalence), Posttest odds = LH* x pretest odds, Posttest probability = posttest odds/(1+posttest odds).
The pooled negative likelihood ratio shown in Table 4 was 0.12 (95%CI: 0.090.18), a good result because it is close to zero and indicates that a negative result of Bayesian networks decreases the odds of a malignant breast lesion by a factor of 0.12. We found no heterogeneity with either the Cochran Q test (p = 0.2941) or with τ^{2} (0.0337) and low inconsistency (I^{2} = 19.2%).
The pretest probability (i.e., prevalence) of the presence of cancer increased the probability of a positive test result from 40.56% to 90.24% (95%CI: 90.19%90.29%) and a negative result from the Bayesian network decreased the probability of a FP from 40.56% to 7.8% (95%CI: 7.76%7.84%) (Table 4).
In the analysis of breast cancer versus benign lesions, the area under the SROC curve shown in Figure 2 was high (0.97) ^{38}, supporting the use of Bayesian networks in breast cancer diagnosis; the Q * point was 0.92. In Figure 2 each point represents a single study, the middle line is the main curve, and other curves (first and third) represent the confidence interval.
Discussion
This systematic review is the first to examine the accuracy of Bayesian networks in supporting breast cancer diagnoses. Our results demonstrated that this computational model can represent a noninvasive and accurate method that can be used to support breast cancer diagnosis.
For the development and modeling of Bayesian networks there are dozens of free tools or demo versions; however, though there are also dozens of Bayesian networks applied to support the diagnosis of breast cancer presented by the scientific literature and included in our systematic review ^{10} ^{,} ^{23} ^{,} ^{24} ^{,} ^{25}, these are made available by contacting the researchers who developed them and are used in the research centers that developed them.
By the research conducted, we observed the absence of costeffectiveness studies on the use of this technology. Computational intelligence methods, such as Bayesian networks, have been introduced into clinical practice with the primary aims of assisting physicians in the diagnostic process by preparing therapeutic decisions and predicting various outcomes ^{39}.
The mathematical formalism for Bayesian analysis originated in the theorem proposed by Thomas Bayes in 1763; as previously detailed, it states that conditional probabilities can be obtained through two approaches: (a) using information derived from expert knowledge or literature, and (b) the probabilities and structure obtained by Bayesian learning from large databases ^{7}. In the past, studies typically used the first approach ^{9}. Our systematic review included studies with both approaches; the specific approach used may relate to the period in which the study was performed.
Research using this theory to support breast cancer diagnosis began in the 1990s with a study in Ireland that aimed to determine the diagnosis and Bayesian network using information from fine needle aspiration ^{23}. That study ^{23}, which was included in our systematic review and published in 1994, was a pioneer in using this computational model, and despite having not used an automated technique for the computation of the Bayesian network, that study reported high accuracy and favorable results that support its use in clinical practice.
Moreover, among the methods used in breast cancer diagnosis, mammography is generally considered to be the best method available for breast cancer screening. However, some types of cancers detected in mammograms are missed by radiologists. Systems based on Bayesian networks applied to mammography seek to reduce false negatives by highlighting suspected areas for radiologists ^{40}. This feature was also noted in the study by Burnside ^{10}, which is included in our systematic review.
A study performed by Laming & Warren ^{41} in 2000 reinforces this statement because while mammography is the primary method used for breast cancer screening, approximately 16% to 31% of cancers detected in mammograms can be missed when interpreted by a single radiologist. In this regard, a systematic review performed by Taylor & Potts in 2008 ^{42} revealed that the dual analysis in which two radiologists assessed the image increased the rate of cancer detection by 311 per 100,000 women screened. Thus, systems based on Bayesian networks can assist the analysis of suspicious areas that deserve review in those cases when it is not possible for two radiologists to analyze the mammograms ^{42}.
Our systematic review allowed the extraction and reconstruction of diagnostic data from crosssectional studies. The methodological quality of the included studies was high, although some QUADAS issues had negative evaluations, such as lacking information concerning whether the sample was representative, blinding in the use of Bayesian networks, lacking relevant clinical records, and no record of the subjects who were removed from the sample.
Although an extensive and detailed search strategy was employed, which enabled retrieval of all publications regardless of language, the terms used may have contributed to the failure to locate certain publications that could be relevant to our systematic review. A bivariate analysis used preserves the twodimensional nature of the diagnostic data and considers the measurement variability within and between studies ^{19}. We used the most current guidelines indicated in the preparation of systematic reviews as described in the Handbook for Systematic Reviews of Diagnostic Accuracy of Cochrane ^{11} ^{,} ^{12} ^{,} ^{17} ^{,} ^{18} ^{,} ^{19} ^{,} ^{21} ^{,} ^{22}.
The studies included in our systematic review presented a dichotomous outcome for breast cancer diagnoses; however, there were some differences in the composition of the input nodes. The study by Kahn Jr. et al. ^{24} considered the patients' histories and physical findings and some mammographic data. Hamilton et al. ^{23} considered several cytological characteristics. CruzRamírez et al. ^{25} used the age of the patient and the cytological characteristics, and Burnside et al. ^{10} considered 25 descriptors to produce hierarchical classifications in BIRADS. Despite these differences, all considered issues were relevant to the breast cancer diagnosis.
Regarding economics, some studies have shown that the women treated in public institutions have more advanced stages of the disease, less access to modern therapies, and a lower survival rate than the patients treated in private institutions ^{43} ^{,} ^{44}. The triple test (physical examination, mammography, and cytology by fine needle aspiration) has been employed as a method to accurately diagnose palpable breast lesions. When the three diagnostic methods are consistent, the elimination of biopsy as a confirmatory test is usually recommended and can result in reduced spending ^{43}. In this context, artificial intelligence techniques, such as Bayesian networks using above characteristics, now represent a reality and may decrease the uncertainty present in biopsies derived from suspicious nodules, enabling the reduction of public health costs ^{43} ^{,} ^{44}.
Bayesian networks have potential to enhance the diagnostic process by instilling consistency and repeatability. The use of this system, together with preinterpreted diagnostic information, could also provide an effective computerbased training system for breast cancer diagnostic ^{45}. This comparison does not imply that the Bayesian network could replace the specialist but may indicate that technology can calculate diagnostic across many variables, incorporate complex dependencies among variables, and aid, for example, the radiologists' interpretations ^{45}.
Our metaanalysis showed that Bayesian networks have increased the probability of a breast cancer diagnosis by 49.68%, suggesting that this type of tool can be useful in evaluating suspicious lesions. Our findings also indicate that, given a negative diagnosis, Bayesian networks decreased the likelihood of false positives by 32.76% supporting their utility in evaluating lesions that are deemed to be most likely benign.
In conclusion, probabilistic computer models like Bayesian networks represents a noninvasive method that may substantially aid physicians attempting to diagnose breast cancer in a timely and accurate manner.