Quadas and Stard: Evaluating the Quality of Diagnostic Accuracy Studies

OBJECTIVE: To compare the performance of two approaches, one based on the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) and another on the Standards for Reporting Studies of Diagnostic Accuracy (STARD), in evaluating the quality of studies validating the OptiMal® rapid malaria diagnostic test. METHODS: Articles validating the rapid test published until 2007 were searched in the Medline/PubMed database. This search retrieved 13 articles. A combination of 12 QUADAS criteria and three STARD criteria were compared with the 12 QUADAS criteria alone. Articles that fulfi lled at least 50% of QUADAS criteria were considered as regular to good quality. RESULTS: Of the 13 articles retrieved, 12 fulfi lled at least 50% of QUADAS criteria, and only two fulfi lled the STARD/QUADAS criteria combined. Considering the two criteria combination (≥ 6 QUADAS and ≥ 3 STARD), two studies (15.4%) showed good methodological quality. The articles selection using the proposed combination resulted in two to eight articles, depending on the number of items assumed as cutoff point. CONCLUSIONS: The STARD/QUADAS combination has the potential to provide greater rigor when evaluating the quality of studies validating malaria diagnostic tests, given that it incorporates relevant information not contemplated in the QUADAS criteria alone.


INTRODUCTION
New technologies, especially those related to disease diagnosis, must be validated by means of accuracy evaluations.This involves comparing the new test to other, established ones, which are regarded as a gold-standard.Such evaluation is essential to guide the use of a given diagnostic test, especially in the context of widespread use by public health services.The quality and methodological rigor of evaluation studies, as well as the quality of the data obtained, depend on factors that must also be measured and considered.
The technique traditionally used for malaria diagnosis is microscopy.Though inexpensive, this technique requires the presence of trained and experienced professionals.Beginning in the 1990's, rapid tests (RT) were introduced as an alternative to microscopy for malaria diagnosis.Different diagnostic tests are currently on the market. 21RTs rely on immunochromatographic methods, and can be administered in about 15 minutes by persons with minimal technical training and using kits that do not require electricity or special equipment. 12,21R are an effective alternative for malaria diagnosis, for in addition to being easy to implement, their accuracy can be similar to that of microscopy in a number of settings. 21The high initial cost of RT is one of the major impediments to its widespread adoption. 21tiMal® is one of the RT registered and validated in Brazil.a There are countless validation studies of OptiMal® published in the literature.Different studies deal with populations of endemic and susceptible areas, travelers, symptomatic and asymptomatic populations, and with different clinical aspects of Plasmodium falciparum malaria.Determining the quality of these studies using standardized methodology will be fundamental to inform any decisions regarding there use in Brazil.
Two instruments are widely in use in the scientifi c literature for evaluating the quality of studies validating diagnostic tests: the Standards for Reporting Studies of Diagnostic Accuracy (STARD), 3 comprising 25 criteria, and the Quality Assessment of Diagnostic Accuracy Studies (QUADAS), 19 comprising 14 criteria.A number of criteria are common to the two studies.
STARD is an instrument aimed at researchers and editors.It was devised by a group of editors with the purpose of evaluating the quality of articles by simple checking of each of the items in the score, and of guiding authors when elaborating scientifi c reports. 3 QUADAS is intended as an instrument for assessing the quality of previously published studies, especially in the context of systematic literature reviews.This instrument was commissioned by the United Kingdom's NHS R&D Health Technology Assessment Programme (HTA). 19ADAS and STARD were created with different aims and applications.Researchers have discussed the need to introduce modifi cations or combinations of parameters to potentiate the use of these instruments as well as to improve the evaluation of validation studies. 2,20lthough the aim of STARD is not to evaluate studies included in systematic reviews, the introduction of three of its criteria into such evaluations has been suggested; these are three special items that can provide essential information when evaluating epidemiological studies and methods, and which are absent from QUADAS.We believe that QUADAS, an instrument that has been validated, is considered easy to use, 20 and is widely employed in systematic reviews of validation studies, could be improved by the addition of items pertaining to sampling, estimate precision, and the characteristics of study populations.This is important given that validation studies must be representative, precise, and have good external validity to the population of interest.
These issues were discussed during the elaboration of the instrument, but the corresponding criteria were not included in the fi nal document. 19e present study aimed to compare two different approaches based on the QUADAS and STARD criteria in their ability to assess the quality of validation studies of the rapid malaria test, irrespective of estimates of test accuracy reported by each of the studies evaluated.

METHODS
Validation studies of the OptiMal® RT were obtained from the scientifi c literature.The bibliographic survey was carried out in December 2007.
We surveyed the available literature using the following inclusion criteria: 1) Studies must use microscopy as a gold-standard; and 2) The study population must comprise patients with clinical signs of malaria, symptomatic, and living in endemic areas, regardless of age group.The fi rst inclusion criterion was used to decide whether the article would be read in full, and the second, to decide whether it would be defi nitively included in the study.Studies evaluating the accuracy of OptiMal® were selected from within the Medline database using the PubMed search engine.The following key words were used in the search: "evaluation" and "malaria" and "rapid tests" and "diagnosis" (fi rst search) and "OptiMal®" and "malaria" and "diagnosis" (second search).Secondary searches were also carried out in the SciELO and Lilacs databases using the same terms.There were no limitations as to year of publication.
We excluded studies carried out exclusively with patients in specific population subgroups, such as pregnant women, children, or severe malaria patients.
The selected articles were read and analyzed according to a combination of 12 criteria from QUADAS and three from STARD.QUADAS comprises 14 criteria, 12 of which were considered.The considered criteria are as follows: 1) Was the spectrum of patients representative of the patients who will receive the test in practice?; 2) Were selection criteria clearly described?; 3) Is the time period between reference standard and index test short enough to be reasonably sure that the target condition did not change between the two tests?; 4) Did the whole sample or a random selection of the sample, receive verifi cation using a reference standard of diagnosis?; 5) Did patients receive the same reference standard regardless of the index test result?; 6) Was the execution of the index test described in suffi cient detail to permit replication of the test?; 7) Was the execution of the reference standard described in suffi cient detail to permit its replication?; 8) Were the index test results interpreted without knowledge of the results of the reference standard?; 9) Were the reference standard results interpreted without knowledge of the results of the index test?; 10) Were the same clinical data available when test results were interpreted as would be available when the test is used in practice?; 11) Were uninterpretable/intermediate test results reported?12) Were withdrawals from the study explained?Each item must be answered as "yes," "no," or "unclear"; the latter should be used in case the available information is deemed insuffi cient to make a yes/ no call.The instrument can be used in its entirety or not; the researcher should select the items considered to be relevant to the index test. 19 did not consider the criterion "does the goldstandard correctly classify the disease?," since one of the criteria for inclusion of articles into our study was use of the thick smear as a gold-standard, and thus maintaining this item in the QUADAS scale would be redundant.For the same reason, we did not consider the criterion "is the gold-standard independent of the index test?"since we knew beforehand that the tests are distinct technologies, and there was therefore no reason to categorize this item.The QUADAS instrument does not determine a priori the scores for defi ning quality; it is up to the researcher to decide which cutoff point to use.We therefore considered the fulfi lling of six to eight criteria ("yes" answers) as the median cutoff point for defi ning regular and good studies, and the 75% cutoff point -at least nine criteria -as the defi nition of a good quality study.
Of the 25 STARD criteria, three were selected as being absent from QUADAS and pertaining to the representativeness and precision of the study sample, both of which are fundamental to the evaluation of quality of epidemiological studies.The remaining STARD items are already included, directly or indirectly, in QUADAS.The three criteria considered were: Item 5 -The sampling process is described; Item 21 -Sensitivity and specifi city results are reported with their respective confi dence intervals (CI); and Item 15 -Clinical and demographic characteristics of patients are reported.The answers to these three items were dichotomous (yes/no).Good-quality studies should fulfi ll all three STARD criteria.
Since some of the QUADAS criteria could be interpreted differently by different researchers, we defi ned parameters to be considered when evaluating the three following criteria: 1) Were selection criteria (of cases) clearly described?-The sample was considered as welldefi ned in the methodology section when reporting the criteria used for the inclusion of cases (for example: patient with suspected malaria, presenting with acute febrile syndrome) and informing the provenance and recruitment of cases.2) Was the execution of the index test described in enough detail to allow for its replication?-we considered a description as adequate when including the techniques used for administering and reading the RT. 3) Was the execution of the index test described in enough detail to allow for its replication?-We considered a description appropriate when the article described the techniques used for coloring and reading the thick smear test.

RESULTS
Our literature search retrieved a total of 254 references, 11 of which were duplicates.All abstracts were read, and 30 articles were selected for full examination, all of which validated the OptiMal® test using microscopy as a gold-standard (fi rst inclusion criterion).Of these, 29 were read in full; we were unable to obtain the full article for one of the 30 abstracts.
Thirteen studies fulfi lled all requirements of the second inclusion criterion (Table 1). 1,4,5,7-10,13-18Tables 2 and 3 present the results of the evaluation of selected articles according to the selected QUADAS and STARD criteria.Four QUADAS criteria were fulfilled by all studies evaluated: 1) representative spectrum of patients; 2) clear description of selection criteria; 3) entire sample or subsample diagnosed by the goldstandard; and 4) patients received the same test as a gold-standard, regardless of the result of the index test.A smaller number of articles fulfi lled the selected STARD criteria, the criterion regarding the confi dence intervals being the one most frequently fulfi lled (seven of 13 articles; Table 2).None of the articles fulfi lled nine of the 12 QUADAS criteria, and 12 of the 13 articles were categorized as positive in at least 50% of criteria.Five studies failed to fulfi ll all three STARD criteria (Table 3).
Using a cutoff of six positive responses to the 12 QUADAS criteria and all three QUADAS criteria, two studies (15.4%) were considered as of good methodological quality, regardless of the estimated accuracy reported.Two studies carried out in Colombia fulfi lled eight QUADAS criteria (67% fulfi llment) and the three STARD criteria.
The number of selected articles using the proposed combination of criteria ranged from two to eight, depending on the number of STARD criteria required in the cutoff point, even when maintaining a median cutoff of 50% of QUADAS criteria (Table 3).

DISCUSSION
The two instruments -QUADAS and STARD -represent advancement in scientifi c knowledge in that they allow for a systematic evaluation of published validation studies.
QUADAS is a fl exible instrument that allows for the exclusion of any of its criteria. 19,20e criterion "were all losses from the study explained?"did not add discriminatory capacity to the evaluation: losses were observed in only two studies, and were all explained.The "does not apply" category does not exist in QUADAS, and should be added specifi cally for this item.Similar problems were encountered for the criterion "were uninterpretable/ intermediate test results reported?"which would be useful in cases of results expressed as a continuous scale or which included the possibility of classifying results as uninterpretable.It is likely that many of the studies for which this criterion was classifi ed as a "no" are cases in which it is not applicable.Similar considerations regarding the absence of adequate categorization of these two items from the form were reported in a study that evaluated and validated QUADAS. 20These two criteria also showed the lowest agreement in the QUADAS validation study, 20 as well as in a review of psychometric instruments, 11 perhaps refl ecting difficulties with the administration of the questionnaire.
Parameters should be established for the evaluation of criteria judging the selection of study subjects for validation studies and the description of both index and gold-standard tests.The researcher must defi ne a priori which information will be suffi cient to obtain a "yes" response in these items.Likewise, the criterion "detailed description of diagnostic tests" can mean different things to different evaluators, and again a priori standardization will be necessary, especially in the case of multiple reviewers.
We expect that articles following STARD criteria will be better classifi ed according the QUADAS instrument, given that the former provides guidelines and information that are useful for publication of validation fi ndings.Adding items from outside QUADAS and that complement this instrument by responding to specifi c questions is a strategy that is recommended in the QUADAS validation study itself. 20Clearer knowledge of what is to be evaluated and of the purpose of the information obtained, will lead to a better evaluation of the studies under review.
A systematic review by Fontela et al, 6 focusing on the diagnosis of malaria, tuberculosis and HIV, highlighted the complementarity of the two instruments in determining the quality of published articles.Whereas STARD allows one to check the information that ideally should be contained in published validation articles, QUADAS allows one to evaluate the quality of the published information.
The use of instruments to assess the quality of published articles is an increasingly encouraged and useful practice for evidence analysis, especially in the context of systematic reviews and metanalyses.The use of such instruments, however, does not substitute for a careful and judicious qualitative analysis of the concepts and methods in the study.This is a key task of the researcher when carrying out a literature review.
In conclusion, the QUADAS and STARD instruments are important means to support and substantiate clinical and public health decision-making regarding the use of diagnostic tests.Its combined use has the potential to confer greater rigor to the evaluation of quality of published articles validating malaria diagnostic tests, due to its incorporation of relevant information not contemplated by the use of QUADAS alone.The fl exibility of both instruments allows them to be adapted to the purpose of each study.

Table 1 .
Selected OptiMal® validation studies, according to fi rst author, year of publication, study site, and number of patients included.2007.

Table 2 .
Combination of QUADAS and STARD criteria and classifi cation of accuracy studies of the OptiMal® RT selected from the literature, according to number of studies in each response category.2007.

Table 3 .
Selected OptiMal® validation studies, according to fi rst author, year of publication, study site, and number of "yes" answers to 12 QUADAS and three STARD criteria.2007.