Print version ISSN 0021-7557
J. Pediatr. (Rio J.) vol.83 no.5 Porto Alegre Sept./Oct. 2007
Augusto SolaI; Fernando Dominguez DieppaII; Marta R. RogidoIII
MANA and Atlantic Neonatal Research Institute, Division of Neonatology, Atlantic
Health System, Morristown, NJ, USA
IIMD. Gonzalez Coro University Hospital, La Habana, Cuba
IIIMD. MANA and Atlantic Neonatal Research Institute, Division of Neonatology, Atlantic Health System, Morristown, NJ, USA
To provide valuable elements and some humor in this so-called era of "evidence-based
practice" with the aim of helping clinicians make better choices in the care
they deliver based on evidence, not simply or exclusively based on a randomized
clinical trial (RCT) or meta-analysis (which may not be evidence).
SOURCES: Books and peer-reviewed articles are quoted and listed in the bibliography. Evidence of life, learning from our own mistakes and many other evident facts that support this review are not quoted.
SUMMARY OF THE FINDINGS: 1) "Absence of evidence is not evidence of absence" and "lack of evidence of effect does not mean evidence of no effect." 2) RCTs with "negative" results and those with "positive" results, but without outcomes that matter, often cannot conclude what they conclude. 3) Non-randomized clinical trials and practical trials may be important. 4) Research to prove is different than research to improve. 5) Clinical choice must assess effects on outcomes that matter to patients and their parents. 6) Quantifying adverse outcomes, number needed to damage and to treat is not that simple.
CONCLUSIONS: Significant challenges inherent to health service research must be correlated to possible clinical applications using tools to have a more "evident view of evidence-based practice" in perinatal medicine, recalling that absence of evidence is not evidence of absence.
Keywords: Evidence-based medicine, number needed to treat, randomized trials, outcome variables, treatment effects, critical reading, statistical significance.
The non-equivalence of statistical significance and clinical importance has long been recognized; this error of interpretation remains common. A significant result may sometimes not be clinically important. Much worse is the misinterpretation of "nonsignificant" findings. Other common misinterpretations are the confusions between "evidence of no effect" with "no evidence of effect" and "absence of evidence" with "no evidence of absence." All these factors have an impact on the application of results from clinical research into clinical practice. Hence, this review is important for the practice of neonatology and pediatrics.
Several people have educated and inspired us in many or all of the concepts we will share in this review.
Our aim is to summarize some "evident concepts of evidence based practice" in a simple and user-friendly way, with some humor every so often, to remind us that "one of the first symptoms of an approaching nervous breakdown is the belief that one's work is terribly important" (Bertrand Russell). Our hope is that some of the concepts covered in this review will become of value to clinicians for daily practice and delivery of care and, therefore, be of worth for one infant, one day, some place.
Basic concepts of statistical inference
There are many significant flaws in selecting and reporting denominators and statistical tests in the medical literature. Two books describe this with great humor and easy reading: "Bare Essentials of Biostatistics"1 and "Biomedical Bestiary".2
One modern "intellectual disease" in the literature is the inappropriate utilization of statistical significance. Many authors, peer reviewers and editors "die for" statistical significance and for "p < 0.05." However, some of them neither quite understand its real meaning nor fully comprehend if the appropriate statistical method has been applied correctly. W. Castle has said: "Most researchers use statistics the way a drunkard uses a lamppost: More for support than for illumination."
Clinicians and statistical inference are sometimes at odds. Inferential statistics are used to determine the probability of a conclusion based on data analysis being true and to quantify the degree of inaccuracy of the estimate. And then, "we play a game." It is accepted that if a difference can occur more than 5 times in 100, there is too great a probability that the difference is due to chance alone. On the other hand, if the probability that the difference is due to chance < 5% of the time, we say that the difference "demonstrates" that the two samples are actually different. Now you know what p < 0.05 means, or almost. If not, read again, and keep reading, please. In summary, a p value < 0.05 means that 95% of the time or more the results found are not due to chance. If the likelihood of chance affecting the results is less than 1 in 20, then one might regard the result as significant, as Sir R. Fisher said.
For type I and type II errors and the power of a test, see Table 1 before reading further. When we use a p level of 0.05, we accept that 5% of the time we can be making a type I error. Since experiments are usually done to demonstrate differences, statisticians are often interested in the probability of detecting a true difference. This 5%, however, is not some absolute criterion of truth. If, for example, an effect exists at exactly p = 0.049, it does not suddenly disappear at p = 0.051. Rosnow eloquently said: "Surely, God loves the 0.06 nearly as much as the 0.05."
A frequent error in the quest for a p < 0.05 is using the Student's t test when analyzing repeated measures or variables. If the outcome were blood pressure changes over time and four values are reported, making statistical comparisons within or between groups at those time points, repeating t tests for each of the comparisons increases the chance of showing statistically significant difference, when in reality there is none. Imagine flipping a coin one time after another. Let us assume that the first time you get heads (tails also had a prior probability of 50%, you surely agree). As of that time however, the chances for tails are higher in each successive flip. By 3, 4 or 5 tosses, the chances to get tails have increased exponentially to 84-93% (don't waste time with mathematics; just believe us on this one). Similarly, when repeating t test after t test, the probability to get a significant p just by chance increases to about 30% by the 5th-6th time, even when no true difference exists. The t test is very valid when the means of two groups are compared, but when dealing with more than two groups or with repeated measurements in two or more groups, the t test is incorrect and not valid statistically. Be aware of this, and do not accept the "p value" as low as it may be when you see this in the literature. The correct analysis should include the analysis of variance (ANOVA; one way or factorial) with one of several post hoc comparisons, but that is for an article on biostatistics, sorry!
Statistical significance and clinical importance
Significance which is insignificant
RCTs that show a significant difference between the treatments compared are often called "positive." Statistical significance is a probabilistic term (the probability to refute a null hypothesis when it is a correct one; the likelihood that the observed difference is, indeed, different from zero). Many clinicians tend to make equivalent a low p (statistical significance) with something that is of clinical importance or significance. However, "p < 0.0001" has nothing to do with the magnitude or importance of a difference or an effect. Such magnitude is called clinical importance or significance.
The industry advertises a "breakthrough" for hypoglycemia. The authors studied 2,000 hypoglycemic newborns in a randomized, prospective, multicenter, double masked controlled trial (RPMDMCT). One group gets the treatment, the other gets placebo; both receive supplemental glucose. Glycemia increases in the treatment group from a baseline (mean ± SD) of 25±8 mg/dL to 37±6, 46±6 and 52±4 at 30, 60 and 90 minutes. The placebo group goes from 26±9 mg/dL to 35±4 to 43±4 to 50±3. The authors use repeated Student t tests (evidently incorrect), reporting a difference in both groups compared to baseline (p < 0.0001) and a significantly better response in the treated group compared to placebo at 60 and 90 minutes (p < 0.001). Before using this treatment in your patients, and after having read the preceding paragraphs, you would write to the journal and authors asking for ANOVA and exact p values, right? The authors thank you (not very happily though, discussing among themselves who you are to mention their error publicly) and, holding grudges, publish an erratum with the ANOVA. They report p = 0.048. Now you feel comfortable using this treatment in your patients. Wait! Before doing so, spend 2 minutes in Table 2 and also ask yourself if the authors measured serum, plasma or whole blood glucose. Did they tell us how they handled the samples and which method they used for actual measurements? With Table 2 and these unanswered questions most sensible clinicians will not expose hypoglycemic infants to the new "breakthrough" treatment despite this large, "evident," RPMDMCT. In addition, this treatment may be costly and it may be found later that it produces infrequent, but important adverse effects, not fully analyzed in the study. For "clinical outcomes that matter" and infrequent but important adverse events, keep reading, please!
Nonsignificance which may be significant
By now you know: p value greater than 5% or p > 0.05 is "not statistically significant." This means that anywhere between 5.1%-95% of the time the difference is due to chance and the samples are not different. See Table 3 for related concepts and questions a clinician should ask himself or herself in such cases. If still interested in this issue, read later about the right denominators and important outcomes.
Imagine now that the authors of the example above followed 300 of the 2,000 hypoglycemic infants to 5 years of age. By using masked, detailed neurodevelopmental evaluations and careful analyses of potential confounders with logistic regression, they found that some of the developmental and intelligence tests favor the treated group by 7-10 points and that the incidence of cerebral palsy (CP) is half in the treated infants (3% vs. 5.8%). The authors, helped by your previous question and suggestion from a few years before, have now performed excellent statistical analyses. They report no statistical difference, providing the exact p value of 0.059 and an odds ratio (OR) for CP of 0.75 with confidence interval (CI) of 0.67-1.03 which shows no statistical difference, since it crosses 1.0 (Figure 1). Furthermore, they show no adverse effects of the therapy. The question should be: Is this nonstatistical significance clinically significant? You'll have to decide! (Table 3). We personally would not give up 7-10 points of our IQ nor would we like to have an almost twofold greater risk for CP. One thing that may be happening here is that there had been no sample size calculations for the effect size on these outcomes and that the sample size at 5 years is not large enough to reach statistical significance for the effect size found (Type II error; see Table 1).
However, as clinicians, we have to decide if the findings are clinically significant and, if so, offer treatment to the patients that entrust their care to us.
Would you agree that, differently from law and justice, a publication is "guilty" until proven innocent? Even the manuscript you are reading now! When assessing and criticizing a scientific publication with healthy skepticism and rational criticism, one is criticizing the publication and NOT the authors. Maybe you have other reasons to criticize the authors, but that doesn't matter in this review.
The selection of the denominator is crucial for all studies. Choosing an inadequate denominator takes away the validity of the results partially or completely, regardless of how prospective, randomized, controlled or blinded the study was and regardless of how fancy the statistics were. Denominators are essential for incidence or rates of diseases, risk factors and impact or effect size of an intervention. The choice of a correct denominator is among the most important factors in the hands of the authors. Of course, the reviewers and editors should be the "first line of protection" when authors choose denominators wrongly or compare some of them erroneously. However, clinicians have the obligation to look carefully at the denominator chosen and at the denominators used in all comparisons and decide what they mean, if anything. Unfortunately, many manuscripts do not make valid comparisons, "flipping" among denominators and/or using incorrect denominators. See the following example.
The rate of prostate cancer decreases over the years in one community as opposed to many other communities in which there has been an increase. Long and behold the denominator used in that community was the entire population (children, young men and women!). Also, during the previous 5 years a proportion of men over 60 moved out of the community because of retirement and weather reasons!
If you look carefully in the manuscripts you read, you will sometimes find significant errors in denominators such as this one (hopefully not as flagrant). Ask yourself: "What is the denominator? What should it be? What is the total population at risk?" In the case of prostate cancer definitely not children, female or young adults. And also be cautious and attentive, since sometimes the authors "change denominators on you." The issue is then to clearly identify who they are talking about and compare this to the real population at risk. For example, in studies of bronchopulmonary dysplasia (BPD) and retinopathy of prematurity (ROP) or severe intraventricular hemorrhage (IVH) do they use all live births < 1,500 g as denominator? Doing so may be similar to the example of prostate cancer above. If many infants die before 6 weeks of age, or are not submitted to detailed ophthalmologic examination or head ultrasounds, the denominator is wrong and rates will be falsely low. When correct denominators are not used, the incidence of the problem is likely to be "better" than it really is! Denominators are of extreme importance when making comparisons or when deciding to change treatments.
You also need to identify how the outcome ("numerator") was defined. We hope you agree that it is not the same defining BPD as O2 for > 28 days as O2 at home, or need for ventilation for > 4 months.
Number needed to treat (NNT), number needed to harm (NNH), absolute risk reduction (ARR) and relative risk reduction (RRR)
NNT is a popular way of expressing the number of patients who should be treated to prevent one outcome event. It is calculated by obtaining the reciprocal of the ARR (Table 4). For instance, if in a trial of BPD prevention 13% of those treated and 18% controls developed BPD, the ARR would be 5% (18% minus 13%). NNT would be 20 (1/5 x100). Knowing that you have to treat 20 infants to prevent one case of BPD seems more useful than an odds ratio of 0.68 or a RRR of 28% (18% minus 13%/18%).
Thanks to the NNT, you and the parents of your patients may better understand that a treatment may not benefit the individual infant, because for many treatments the NNT is > 10. Clinicians should also wonder about the number needed to harm (NNH) (Table 4). In the ideal world, we would love "certainty," i.e.: NNT of 1, infinite NNH. Since this does not happen, we have to make decisions based on the efficacy of the intervention in terms of risk reduction (Table 4) and possible adverse consequences. If the NNT were relatively low (say 6-10), the most important question to ask is "for what outcome?" If the outcome is important, then an NNT of 10 is always better than an NNT of 100. If you treat 100 infants with an NNT of 10, the probability of observing no net benefit is more than 1,000 times lower (i.e.: better) than if the NNT were 100. (Calculated with standard binomial distribution.) Don't worry! We will not add formulas for you to learn; we are just trying to clarify a concept. You can choose not to believe this. If you do, please read references in this regard.3-8 We cannot expand the NNT further, but to say that it focuses only on the treatment arm and neglects the type of placebo or control; it usually overestimates the real value in placebo-controlled trials. Moreover, NNTs are commonly presented as a single value without CI or standard error (Table 4, Figure 1) and this may be insufficient to get "the whole evidence." We suggest you look at an article you have recently read and see if authors report the NNT. If not, by using Table 4, you may be able to do it yourself if data presented in the results clearly describe the incidence in the control and treated groups. If you can do it, you will get a more "evident picture.If you cannot, be wary".
What doesn't work and how to show it
As Alderson wrote,9 Cochrane posed three key questions about any healthcare intervention: "Can it work?" "Does it work in practice?" and "Is it worth it?" It would be great if the answers were always positive, but in real life, the possible answers might be "yes", "not sure", and "no". The rules for deciding for "yes" are relatively clear and well known, but less has been written about deciding that something doesn't work or does not cause adverse effects, even if infrequent. We will look at issues concerning interventions and dilemmas of trying to decide between an answer of "not sure" and "no," the assessment of what "important outcomes" are and what to do when we are not sure.
Trying to convince the public that a factor has no effect or poses no risk is very hard because this involves "proving a negative".10 Even with all the "evidence" we can amass, we are many times uncertain about the right treatment choice.11 It is almost never possible, and in many cases, it is incorrect, to claim that there is no difference in the effects of treatments. One can has truly evidence of absence of an effect or of a difference only if enough large and well designed studies were all to show that a medical treatment, an exposure or a non-treatment was unassociated with an outcome.12 One RPMDMCT, as good as it may be, cannot show evidence of absence. In your daily practice you have to consider "other levels of evidence" and not solely RPMDMCT (Table 5). There will always be uncertainty surrounding estimates of treatment effects; small but important differences can never be excluded in one study.13 Claims of no effect or no difference may lead clinicians to deny their patients interventions with important beneficial effects or to expose them to interventions with serious harmful effects.14 Therefore, claims of no effect should be very infrequent, and when they are made we should be skeptical. Reviews or studies making "claims of no effect" or of "evidence of no effect" are erroneous most of the time. Phrases such as "did not reduce," "has no effect" and "is not effective" are usually not justified and should not be allowed by reviewers or editors, because what was shown was "no evidence of effect" as opposed to "evidence of no effect." Acceptable phrases in manuscripts would be: "no significant differences were detected" or "there is insufficient evidence either to support or to refute."
Absence of evidence is no evidence of absence
There is "evident" misconception in these terms, which are not interchangeable. Misinterpretation of "non-significant" findings can become a great problem (Table 3). In general, studies with a p value > 0.05 usually can only show absence of evidence of a difference or absence of evidence of negative effects. They cannot be considered as "evidence of absence," which wrongly implies that the study has shown there is no difference. To interpret these trials as providing evidence of the ineffectiveness of a treatment or evidence of no adverse events is "clearly wrong and foolhardy."13 We could use even somewhat harder terms, but we may be accused of being "uncontrolled" and therefore "not evident." It suffices to say that there are dangers in the misinterpretation of nonsignificant results (Table 3). A dramatic example is the fibrinolytic treatment for reinfarction prevention after myocardial infarction (MI), which one of us may need or have already needed. Nineteen of 24 RPMDMCT had "shown no difference" ("p> 0.05") leading to a "statistically significant delay" before the true value of streptokinase, which actually existed in the real world, was appreciated. Our apologies! We got confused and used the term erroneously! …The studies did not show "no difference" as we mistakenly wrote; they only showed absence of evidence of a difference. Later, a meta-analysis showed a highly significant reduction in mortality (22%). Thank God, serious clinician scientists realized this; some of us will not die due to reinfarction despite "gold standard studies"!
In summary, when issues of public health are of concern, we must be skeptical about whether the absence of evidence is valid justification for action or inaction since where risks are small, or sample size is small, or sample size is large but the correct denominator for the outcome in question is not used in the study, the "negative" p values are likely to be misleading.
In studies of this sort (wrongly called "negative studies") wide confidence intervals (CI) many times tell the story or illuminate the absence of evidence.15 In cases like the streptokinase one, CIs are likely to be wide, indicating "evident uncertainty." Therefore, in cases of absence of evidence in RPMDMCT, we must know the CI and make detailed assessments of important outcomes in that trial. In addition, we need to assess the size of effect and what is important in what situation, observing if authors describe limits of equivalence decided in advance. As shown in Figure 1, if the CI is between those limits of equivalence, an effect is designated as being too small to be important. Please observe Figure 1 where we try to make these concepts clear. Believe us; it is not easy. Just as it is not easy to clearly assess how important a reduction is in the incidence of severe meconium aspiration syndrome (MAS) leading to death or in severe patent ductus arteriosus (PDA) leading to BPD. It is also hard to say who decides when such reductions are important. Of course, if the parents of one unnecessarily seriously affected baby were to decide, they would say that the "difference of 1" was big enough.
Other issues "complicating evidence" is how widespread the exposure is and what the evidence is from previous or subsequent case control or epidemiological studies, which cannot be ignored. Other factors that increase uncertainty of results in RPMDMCT (RCTs, for short) are failure to follow the protocol and non-random loss to follow up. We, as clinicians, are usually not good at understanding all the statistical jargon that has been just mentioned. Authors and journals need to report uncertain results clearly and we should increase our skepticism, trying to incorporate into our daily practices some of the concepts we describe, so we are not misled when the article leaves the impression that it proved that no effect or no difference exists, when evidently it did not.
Important outcomes and outcomes that matter
"The more important things cannot be at the mercy of less important things." (Goethe)
How to define "important outcomes"? Choice of treatment should be determined by effects on outcomes that matter to patients and their parents. Some would even say that they should also matter to society as a whole. Important outcomes are, fortunately, "frequently infrequent," like death, stroke, long-term effects on brain development, vision, hearing, and others. Therefore, to show a statistical difference, there is a need for a large sample size or a correct denominator of the population really at risk or both. This is why "important" outcomes are commonly poorly studied. So, when reading "evident" manuscripts, try to find out what the authors chose as the main outcome variable. Is it of clinical importance? Is it biologically credible? Are rare and infrequent but serious adverse events (like death) well analyzed and reported?
In summary, be wary of "positive" outcomes of "little" clinical importance, even if p < 0.0001, and of "negative" findings of a treatment if nothing is clearly shown about important outcomes (mortality, severe morbidity). The complex issue of "composite outcomes" is in Table 6.16
Research to prove is different than to improve: one rct is not evidence
The gold standard for evident-based practice is the RCT. But sometimes gold in some RCTs doesn't shine or is of very low karat. As Jorge Luis Borges said "rational systems extended to the extremes of their rationality turn into nightmares." Of course he wasn't specifically referring to RCTs; as far as we know, none of his stories were randomized.
There are things we know we know and are evident without RCTs. Is it "evident" to you that penicillin cures strep throat? Well, no RCT proves that. Is it evident that if a person jumps from a plane from 1,000 meters and the parachute does not open he will not be healthy after hitting the ground? (No RCT has made this "evident" either.) Is it evident to you that the risk of having an accident with harm to oneself or others is higher when driving a car when the brakes do not function or function very inadequately? If you do not know that you know this and would like to do an RCT, be our guest.
RCTs are essential for evaluating the efficacy of clinical interventions if the causal chain between the agent and the outcome is relatively short and simple and where results may be safely extrapolated to other settings. However, it should be no news to anyone that the "first paper is intriguing, with the next three there is growing concern, maybe even a bit of confusion and, after that, what one really would like to know is the real answer." We must be cautious about accepting the results of a single experiment or RCT, by using more extensive exploration under different conditions, in other places and at other times.
However well RCTs might be able to prove or disprove therapeutic claims, however strong their credentials when it comes to seeking evidence, they have their limits when it comes to assuring good care.17-30 Additionally, RCTs are usually expensive, and always artificial, performed in a selected and restricted group, with exclusion criteria. In brief, RCTs can never be perfect, because they are conducted by humans..., for humans; and they are performed in humans affected by medical conditions and pathologies that are inevitably heterogeneous. RCTs are as good French perfumes, good to smell but not to drink; or like Argentinean wines (Malbec?): great to taste and drink a bit, but not to get intoxicated with.
Some healthcare researchers are advocating to carefully consider "adding karats to the gold standard" with the non-randomized clinical trial (NRCT) and the practical clinical trial (PCT) (Table 5).19 In RCTs, results are subject to effect modification in different populations. Therefore, both the internal and external validity of RCT findings can be greatly enhanced by observational studies using adequacy or plausibility designs. Additionally, in public health and large-scale interventions, studies with plausibility designs are often the only feasible option and may provide valid evidence of impact when RCTs cannot or are plainly not appropriate.20 There is also a pressing need for PCTs that are relevant to clinicians and decision-makers. Tunis et al. have addressed very well how to assess their value21 and Glasgow et al. provided recommendations and examples of how PCTs can be conducted to enhance external validity without sacrificing internal validity.22
In summary, developing an evidence base for making public health and practice decisions requires using data from evaluation studies with randomized and nonrandomized designs. Individual studies and studies in quantitative research syntheses require transparent reporting of the study, with sufficient detail and clarity to readily see differences and similarities among studies in the same area. The Consolidated Standards of Reporting Trials (CONSORT) statement provides guidelines for transparent reporting of RCTs. There is also the Transparent Reporting of Evaluations with Nonrandomized Designs (TREND). These guidelines emphasize the reporting of theories used, research design and descriptions of intervention and comparison conditions, and methods of adjusting for possible biases in studies that use nonrandomized designs.19-22 Many times, NRCT and PCT are far superior to RCTs.
It should be evident to all clinicians that some RCTs don't meet expectations despite a large sample size and that many times they aren't enough to provide "evident evidence" for daily practice. Furthermore, some of them lead to unnecessary confusion and to the production of unnecessary human damage, as infrequent or rare as this may be. We are the ones responsible for the care we deliver, for the care our patients receive, not an RCT or a "superior authority" in a pediatric society. Practical intervention options, alternative research designs and representativeness at the patient level are important to address clinical and policy implications to help reduce the gap between research and practice.
Real life examples
"Evident" RCTs may lead to changes in practice because of hasty analyses and "turning our backs" on previous evidence. We recommend you not skip this section and spend some time on the examples and tables as a form of self reflection, identifying related facts in this summary put forth by many in relation to evidence-based practice. Many show pervasive extremism in neonatology. We, our children, teachers, mentors, friends, dogs and cats and most of our enemies consider that one may gain more insight into life (and for the delivery of care) from "evident" examples, learning from them and from one's own mistakes. But this is not evident since there has been no RCT on the issue! As Popper said: "If we respect the truth, we must learn from our own mistakes through rational criticism and self criticism."
This is an extremely important goal! Table 723 is used for this example. Remember the analogy with the Malbec wine. Then you decide with "more evidence" what to do as a concerned clinician.
This is a well known, sad story for many babies.24 Many issues have been unveiled in the last decade. Table 8 shows the risks of generalizing the administration of systemic therapies and miraculously improving short-term effects on one organ in RCTs, but without complete analyses of important outcomes that matter to patient and family.
Carbon dioxide: the good, the bad and the ugly
Table 9 briefly summarizes some of the fascinating stories of neonatal CO2. Extremism was initially to the low side, then to the high side. A long time ago, based on emerging evidence, we chose to try to prevent both hypocarbia and hypercarbia; it has now been shown that not being CO2 "extremists" may be good for infants.44 Table 9 should be fun.
Issues are summarized in Table 10. This topic could be by itself in a manuscript. A few references for those interested.27,28,45-47 Clinical practice is the science of particular or individuals and philosophy is the science of populations' issues. We consider that in practice it is fundamental to evaluate each infant according to his/her needs and not to generalize, even if the concepts seem "very logical" or "evident" or are repeated by "neonatal gurus." But we have not done an RCT.
Evidently (?), clearing the airway is the first step in resuscitation. Table 11 addresses an RCT which found "absence of evidence" of a beneficial effect of clearing the upper airway in meconium-stained neonates.48 Before changing practice universally it may be of value to ask some important questions (Table 11.) Caution is needed in most (inadequately called) "negative" trials before universalizing practices. As mentioned, the "first paper is intriguing, with the next three there is growing concern, and, after that, what one really would like to know is the real answer." So, as it has been recommended, delivery of care should not change based on only one RCT.17-22
Iron and oxidation
Table 12 presents a recent RCT, an editorial and a commentary on that study29,49,50 in the same volume of a journal in July 2007. Braekke reported (an inadequately called) "negative study." The study does not assess any clinical short- or long-term outcomes, but the effects on clinical outcomes that matter to patients and parents are the effects that should guide practice choices. Basic scientific evidence shows that excess neonatal iron is very damaging30,51,52 (Table 12.) Preventing iron deficiency does not mean that we should use therapies that may induce "iron excess." Evident extremes in neonatology are generally not good (Table 12.)
Affects tiny, preterm infants. Word of mouth and no RCTs "suggest" no intervention. If you decide not to treat, good luck to some of the untreated infants until data on outcomes that matter become available. See Table 13 for related issues.
A public health issue "affecting" all newborns (see before on RCTs for public health issues). Just for fun, seriously, we "compare" some brief issues of this intervention to no intervention in PDA (Table 13). You have to decide on your own on these two. Stay tuned for rare but important adverse outcomes!
Can you find evident benefits for neonates? Midazolam is "evidently" a neonatal poison, with serious nervous system side effects, more intracranial hemorrhage and death or neurodevelopmental disability, leading neurons to "commit suicide."54-56 If you continue to administer midazolam, you are probably not among neonatal care providers that try, with all the uncertainty and ignorance we have, to practice based on evidence to improve infants' outcomes, one baby at a time.
Frequently heard arguments about using or not using some practices are still, unfortunately and evidently, too simplistic. Table 14 shows their possible real meaning with a bit of humor.
Final comments and conclusion
We have summarized important issues related to clinical research findings and their incorporation into our daily encounter with patients. We hope we have shed some light on some significant challenges inherent to health service research. We have correlated the main ideas to many possible clinical applications and used real life examples to emphasize some points. We have provided tools to have a more "evident view of evidence-based practice" and stressed the fact that absence of evidence is not evidence of absence. Misinterpreting a trial that found no significant effect as if "there is no effect" is one of several problems that arise within the more general topic of application of evidence from clinical research in evidence-based care. Most trials in neonatology are far too small to rule out effects of a size that could be clinically important and can also fail to show real evidence of adverse effects. Furthermore, statistical differences may be of a magnitude with no clinical significance or in outcomes that really do not matter much.
It is not easy to find evidence that some therapy or intervention is, indeed, for the better, that it is effective and has (only) the desired effects.57-70 It is harder to find evidence that some intervention is indeed not necessary in any case, ineffective for all patients, and that not doing it has no undesired important effects, as rare as they may be. One focus of the Cochrane Collaboration Effective Practice and Organization of Care is to include other types of studies beyond RCTs and at to optimize validity, generalizability and evidence of "what works" without causing any unnecessary adverse outcome that matters: to improve professional practice and delivery of effective health services.
The question of what good care is cannot be answered with the use of RCTs alone, however impressive their evidence. The answer depends on the character of the desired effect, on what is more important.18 If there are different goods and bads at stake, a value judgment is called for and this requires careful evaluation, listing and balancing the pros and cons. The task of researchers might still be to provide the evidence that forms the backdrop against which choices may take shape.
In medical practice, we are responsible for each of our patients, not science, RCTs, authors, guideline developers or what an expert says or does. Caregivers have to decide what to do using evidence, including the uncertainties and definitely the important outcomes that really matter to patients, family and society. What makes each of us more or less responsible in the care we deliver is not only what we decide and/or accept to do, but also what we refuse to do. It is our responsibility as clinicians caring for patients to practice according to this understanding.
It is necessary to create a culture that is comfortable with estimating and discussing uncertainty. We are hopeful that in the future changes in practice will occur with studies using the correct denominators and accuracy in interpretation and language. With this and with the incorporation of all available research (not just only one or the latest RCT) uncertainty can be reduced. As this happens more and more, fewer guidelines that are developed and implemented "universally" will be proven at a later time to be wrong, thereby increasing important outcomes that matter and the well being of more infants.
We hope that, after reading (and re-reading) this manuscript, you'll feel more empowered to make the right choices for your patients and that it becomes evident that some expert opinions (and not opinion of experts) can become yours too. To end, our best wishes for the delivery of care you provide your patients with, one baby at a time, in this complex era of "evidence-based practice."
Acknowledgements with evident gratitude
Drs. B. Griffin and F. Bednarek, University of Massachusetts, Mentors "par excellence," positively influenced our approach to caring intensely.
Drs. R. Phibbs, G. Gregory and J.A. Kitterman, Neonatology, CVRI at University of California, San Francisco, Mentoring, teaching and caring in daily practice positively influenced our approach to practice.
Drs. J. Comroe, J. Clements, A. Rudolph and S. Glantz, CVRI at University of California, San Francisco, Contributed to our education on care and statistics through their advice, lectures and publications.
Dr. E. Dueñas, Havana, Cuba, Inspirational in organization of care and outcome improvement.
Dr. J. Sinclair, McMaster University, Canada, Workshops, lectures, publications. Inspirational.
Dr. A. Brann, Emory University, Untiring clinician and information seeker.
Drs. G.R. Norman and D. Streiner, Canada, Workshop and book Biostatistics: the bare essentials.
Dr. P. Alderson, UK Cochrane Centre, His writings as model.
Others, Many, Writings, lectures, approach to practice, friendship.
If you have read slowly and attentively, it should all be evident!
1. Norman GR, Streiner DL. Biostatistics: the bare essentials. 2nd ed. Hamilton: B.C. Decker; 2000. [ Links ]
2. Michael M, Boyce TW, Wilcox AJ. Biomedical bestiary: an epidemiologic guide to flaws and fallacies in the medical literature. Boston: Little, Brown;1984. [ Links ]
3. Alderson P , Deeks JJ. Concerning: 'The number needed to treat needs an associated odds estimation'. J Public Health (Oxf). 2004;26:400. [ Links ]Author reply in: J Public Health (Oxf). 2004;26:401. [ Links ]
4. Aino H, Yanagisawa S, Kamae I.The number needed to treat needs an associated odds estimation. J Public Health (Oxf). 2004;26:84-7. [ Links ]Erratum in: J Public Health (Oxf). 2004;26:218. [ Links ]
5. Artalejo FR, Banegas JR, Artalejo AR, Guallar-Castillo, P. Number-needed-to treat to prevent one death. Lancet. 1998;351:1365. [ Links ]
6. Cook RJ, Sackett DL. The number needed to treat: a clinically useful measure of treatment effect. BMJ. 1995;310:452-4. [ Links ]
7. Saver JL. Deriving number-needed-to-treat and number-needed-to-harm from the Saint I trial results. Stroke. 2007;38:257. [ Links ]
8. de Craen AJ, Vickers AJ, Tijssen JG, Kleijnen J. Number-needed-to-treat and placebo-controlled trials. Lancet. 1998;351:310. [ Links ]
9. Alderson P, Groves T. What doesn't work and how to show it. BMJ. 2004;328:473. [ Links ]
10. Hunt A, Millar R. AS science for public understanding. Oxford: Heinemann, 2000. [ Links ]
11. Chalmers I. Well informed uncertainties about the effects of treatments. BMJ. 2004;328:475-6. [ Links ]
12. Joffe M. "Evidence of absence" can be important. BMJ. 2003;326:1267. [ Links ]
13. Altman DG, Bland JM. Absence of evidence is not evidence of absence. BMJ. 1995;311:485. [ Links ]
14. Alderson P, Chalmers I. Survey of claims of no effect in abstracts of Cochrane reviews. BMJ. 2003;326:475. [ Links ]
15. Altman DG, Bland JM. Confidence intervals illuminate the absence of evidence. BMJ. 2004;328:1016-7. [ Links ]
16. Shankaran S, Laptook AR, Ehrenkranz RA, Tyson JE, McDonald SA, Donovan EF, et al. Whole-body hypothermia for neonates with hypoxic-ischemic encephalopathy. N Engl J Med. 2005;353:1574-84. [ Links ]
17. Sinclair JC. Weighing risks and benefits in treating the individual patient. Clin Perinatol. 2003;30:251-68. [ Links ]
18. Mol A. Proving or improving: on health care research as a form of self-reflection. Qual Health Res. 2006;16:405-14. [ Links ]
19. Des Jarlais DC, Lyles C, Crepaz N; TREND Group. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. Am J Public Health. 2004;94:361-6. [ Links ]
20. Victora CG, Habicht JP, Bryce J. Evidence-based public health: moving beyond randomized trials. Am J Public Health. 2004;94:400-5. [ Links ]
21. Tunis SR, Stryer DB, Clancy CM. Practical clinical trials: increasing the value of clinical research for decision making in clinical and health policy. JAMA. 2003;290:1624-32. [ Links ]
22. Glasgow RE, Magid DJ, Beck A, Ritzwoller D, Estabrooks PA. Practical clinical trials for translating research to practice: design and measurement recommendations. Med Care. 2005;43:551-7. [ Links ]
23. Meis PJ, Klebanoff M, Thom E, Dombrowski MP, Sibai B, Moawad AH, et al. Prevention of recurrent preterm delivery by 17 alpha-hydroxyprogesterone caproate. N Engl J Med. 2003;348:2379-85. [ Links ]
24. Finer NN, Craft A, Vaucher YE, Clark RH, Sola A. Postnatal steroids: short-term gain, long-term pain? J Pediatr. 2000;137:9-13. [ Links ]
25. Mariani G, Cifuentes J, CarloWA. Randomized trial of permissive hypercapnia in preterm infants. Pediatrics. 1999;104:1082-8. [ Links ]
26. CarloWA, StarkAR, WrightLL, TysonJE, PapileLA, ShankaranS, et al. Minimal ventilation to prevent bronchopulmonary dysplasia in extremely-low-birth-weight infants. J Pediatr. 2002;141:370-4. [ Links ]
27. Verder H, Robertson B, Greisen G, Ebbesen F, Albertsen P, Lundstrom K, et al. Surfactant therapy and nasal continuous positive airway pressure for newborns with respiratory distress syndrome. Danish-Swedish Multicenter Study Group. N Engl J Med. 1994;331:1051-5. [ Links ]
28. Vain NE, Szyld EG, Prudent LM, Wiswell TE, Aguilar AM, Vivas NI. Oropharyngeal and nasopharyngeal suctioning of meconium-stained neonates before delivery of their shoulders: multicentre, randomised controlled trial. Lancet. 2004;364:597-602. [ Links ]
29. Braekke K, Bechensteen AG, Halvorsen BL, Blomhoff R, Haaland K, Staff AC. Oxidative stress markers and antioxidant status after oral iron supplementation to very low birth weight infants. J Pediatr. 2007;151:23-8. [ Links ]
30. Hutton EK, Hassan ES. Late vs early clamping of the umbilical cord in full-term neonates: systematic review and meta-analysis of controlled trials. JAMA. 2007;297:1241-52. [ Links ]
31. Howard E. Reductions in size and total DNA of cerebrum and cerebellum in adult mice after corticosterone treatment in infancy. Exp Neurol. 1968;22:191-208. [ Links ]
32. Regev R, de Vries LS, Noble-Jamieson CM, Silverman M. Dexamethasone and increased intracranial echogenicity. Lancet.1987;1:632-3. [ Links ]
33. Noble-Jamieson CM, Regev R, Silverman M. Dexamethasone in neonatal chronic lung disease: pulmonary effects and intracranial complications. Eur J Pediatr. 1989;148:365-7. [ Links ]
34. Stark AR, Carlo WA, Tyson JE, Papile LA, Wright LL, Shankaran S, et al. Adverse effects of early dexamethasone in extremely-low-birth-weight infants. National Institute of Child Health and Human Development Neonatal Research Network. N Engl J Med. 2001;344:95-101. [ Links ]
35. Murphy BP, Inder TE, Huppi PS, Warfield S, Zientara GP, Kikinis R, et al. Impaired cerebral cortical gray matter growth after treatment with dexamethasone for neonatal chronic lung disease. Pediatrics. 2001;107:217-21. [ Links ]
36. Barrington KJ. The adverse neuro-developmental effects of postnatal steroids in the preterm infant: a systematic review of RCTs. BMC Pediatr. 2001;1:1. [ Links ]
37. Gu XQ, Xue J, Haddad GG. Effect of chronically elevated CO2 on CA1 neuronal excitability. Am J Physiol Cell Physiol. 2004;287:C691-7. [ Links ]
38. Martin C, Jones M, Martindale J, Mayhew J. Haemodynamic and neural responses to hypercapnia in the awake rat. Eur J Neurosci. 2006;24:2601-10. [ Links ]
39. Fritz KI, Zubrow A, Mishra OP, Delivoria-Papadopoulos M. Hypercapnia-induced modifications of neuronal function in the cerebral cortex of newborn piglets. Pediatr Res. 2005;57:299-304. [ Links ]
40. Kaiser JR, Gauss CH, Pont MM, Williams DK. Hypercapnia during the first 3 days of life is associated with severe intraventricular hemorrhage in very low birth weight infants. J Perinatol. 2006;26:279-85. [ Links ]
41. Thome UH, Carroll W, Wu TJ, Johnson RB, Roan C, Young D, et al. Outcome of extremely preterm infants randomized at birth to different PaCO2 targets during the first seven days of life. Biol Neonate. 2006;90:218-25. [ Links ]
42. Sola A, Spitzer A, Morin FC 3rd, Schlueter M, Phibbs RH. Effects of arterial carbon dioxide tension on newborn lamb's cardiovascular response to rapid hemorrhage. Pediatr Res. 1983;17:70-6. [ Links ]
43. Fabres J, Carlo WA, Phillips V, Howard G, Ambalavanan N. Both extremes of arterial carbon dioxide pressure and the magnitude of fluctuations in arterial carbon dioxide pressure are associated with severe intraventricular hemorrhage in preterm infants. Pediatrics. 2007;119:299-305. [ Links ]
44. Gregory GA, Kitterman JA, Phibbs RH, Tooley WH, Hamilton WK. Treatment of the idiopathic respiratory-distress syndrome with continuous positive airway pressure. N Engl J Med. 1971;284:1333-40. [ Links ]
45. Finer NN, Carlo WA, Duara S, Fanaroff AA, Donovan EF, Wright LL, et al . Delivery room continuous positive airway pressure/positive end-expiratory pressure in extremely low birth weight infants: a feasibility trial. Pediatrics. 2004;114:651-7. [ Links ]
46. Sandri F, Ancora G, Lanzoni A, Tagliabue P, Colnaghi M, Ventura ML, et al. Prophylactic nasal continuous positive airways pressure in newborns of 28-31 weeks gestation: multicentre randomised controlled clinical trial. Arch Dis Child Fetal Neonatal Ed. 2004;89:F394-8. [ Links ]
47. Morley C; COIN Trial Investigators. COIN Trial Abstract at PAS Meeting, Toronto, Canada, May 2007. [ Links ]
48. Jobe AH. Iron for preterm infants. J Pediatr. 2007;151:A3. [ Links ]
49. Kling P J. Iron supplementation in prematurity: how much is too much? J Pediatr. 2007;151:3-4. [ Links ]
50. Zecca L, Youdim MB, Riederer P, Connor JR, Crichton RR. Iron, brain ageing and neurodegenerative disorders. Nat Rev Neurosci. 2004;5:863-73. [ Links ]
51. Kaur D, Peng J, Chinta SJ, Rajagopalan S, Di Monte DA, Cherny RA, et al. Increased murine neonatal iron intake results in Parkinson-like neurodegeneration with age. Neurobiol Aging. 2007;28:907-13. [ Links ]
52. Peng J, Peng Li , Stevenson FF, Doctrow SR, Andersen JK. Iron and paraquat as synergistic environmental risk factors in sporadic Parkinson's disease accelerate Age-related neurodegeneration. J Neurosci. 2007;27:6914 - 92. [ Links ]
53. Chaparro CM, Neufeld LM, Tena Alavez G, Eguia-Líz Cedillo R, Dewey KG. Effect of timing of umbilical cord clamping on iron status in Mexican infants: a randomised controlled trial. Lancet. 2006;367:1997-2004. [ Links ]
54. Anand KJ, Barton BA, McIntosh N, Lagercrantz H, Pelausa E, Young TE, et al. Analgesia and sedation in preterm neonates who require ventilatory support: results from the NOPAIN trial. Neonatal Outcome and Prolonged Analgesia in Neonates. Arch Ped Adol Medicine. 1999;153:331-8. [ Links ]
55. Waisman D, Weintraub Z, Rotschild A, Bental Y Myoclonic movements in very low birth weight premature infants associated with midazolam intravenous bolus administration. Pediatrics. 1999;104:579. [ Links ]
56. Olney JW, Young C, Wozniak DF, Jevtovic-Todorovic V, Ikonomidou C. Do pediatric drugs cause developing neurons to commit suicide? Trends Pharmacol Sci. 2004;25:135-9. [ Links ]
57. Sinclair JC. Evidence-based therapy in neonatology: distilling the evidence and applying it in practice. Acta Paediatr. 2004;93:1146-52. [ Links ]
58. Sinclair JC, Haughton DE, Bracken MB, Horbar JD, Soll RF. Cochrane neonatal systematic reviews: a survey of the evidence for neonatal therapies. Clin Perinatol. 2003;30:285-304. [ Links ]
59. Sackett DL, Rosenberg WM, Gray JA, Haynes RB, Richardson WS. Evidence based medicine: what it is and what it isn't. 1996. Clin Orthop Relat Res. 2007;455:3-5. [ Links ]
60. Sacket DL. Patients and therapies: getting the two together. N Engl J Med. 1978;298:278-9. [ Links ]
61. Sackett DL. Superiority trials, noninferiority trials, and prisoners of the 2-sided null hypothesis. ACP J Club. 2004;140:A11. [ Links ]
62. Chalmers I. Proposal to outlaw the term "negative trial". BMJ. 1985:290:1002. [ Links ]
63. Chalmers I. Well informed uncertainties about the effects of treatment. BMJ. 2004:328:475-6. [ Links ]
64. Fretheim A, Schunemann HJ, Oxman AD. Improving the use of research evidence in guideline development: 15. Disseminating and implementing guidelines. Health Res Policy Syst. 2006;4:27. [ Links ]
65. Mainland D. Is the difference statistically significant? An innocent request for quack treatment. Clin Pharmacol Ther. 1969;10:436-8. [ Links ]
66. Mainland D. Statistical ward rounds --14. "How many patients would we need?Clin Pharmacol Ther. 1969;10:272-81. [ Links ]
67. Mainland D. Medical statistics-thinking vs arithmetic. J Chronic Dis.1982;35:413-7. [ Links ]
68. Mainland D. Statistical ritual in clinical journals: is there a cure?--II. Br Med J (Clin Res Ed). 1984;288:920-2. [ Links ]
69. Devereaux PJ, Bhandari M, Clarke M, Montori VM, Cook DJ, Yusuf S, et al. Need for expertise based randomised controlled trials. BMJ. 2005;330:88. [ Links ]
70. Alderson P. Absence of evidence is not evidence of absence. BMJ. 2004;328:476-7. [ Links ]
Augusto Sola, MD
MANA and Atlantic Neonatal Research Institute, Division of Neonatology, Atlantic Health System
100 Madison Ave
07960 - Morristown, NJ - USA
Tel.: (+1) 973-971-8985
Manuscript received Jul 06 2007, accepted for publication Jul 26 2007.