Comments on the paper “ c-erbB2 expression and nuclear pleomorphism in canine mammary tumors ”

,Dutra et al. (1) evaluated the influence of c-erbB-2 expression and nuclear polymor-phism on survival using a semiquantitativescoring system. Survival was calculated us-ing the Spearman rank correlation test. Theyobtained a regression coefficient of r = 0.441for c-erbB-2 expression and of r = -0.295 fornuclear polymorphism, with P > 0.05 inboth cases. Therefore, the authors concludedthat neither c-erbB-2 nor nuclear polymor-phism could be considered to be prognosticfactors.We would like to discuss some of themethodological aspects of the study.1. In survival studies some individualswill still be alive at the end. Others may havebeen lost to follow-up some time earlier andthus survival times will be unknown for asubset of the study group. There is a consen-sus that these so-called “censored” casesmust be included in the analysis. Most sur-vival analyses use Kaplan-Meier plots, logrank tests or the Cox proportional hazardregression (2). A Spearman rank order cor-relation test, as carried out by Dutra et al.(1), can only be used if all the animals underobservation die (2).2. Both variables examined, c-erbB-2expression and nuclear pleomorphism, wereevaluated using a semiquantitative scoringsystem. Categorizing variables that are meas-ured on a continuum is a frequently usedprocedure in medicine, especially in surgicalpathology. By looking, for instance, at re-cently published issues of this Journal, wecan see considerable variation in the scoringsystems applied in histopathology, not onlybetween different groups but also within thesame laboratory (1,3-6).It is well known that categorization maybe useful for clinical decision-making, treat-ment recommendations, determining studyeligibility, or for illustration purposes (7),but we should be well aware that this proce-dure is dangerous for several reasons.When the probability of an event associ-ated with a quantitative prognostic factorvaries monotonously, the selection of a cut-off point is always arbitrary. This may createscientifically unrealistic models because thisprocedure discards potentially important in-formation since all values between the cut-off levels have an equal effect on the model.Therefore, categorization considerably low-ers the statistical test-power and can be seenas an introduction of measurement errors (8-14).This can be easily illustrated with a simpleexperiment. In Table 1 we simulated a pos-sible situation similar to that of the study ofDutra et al. (1): let us assume that 10 of 17dogs with neoplasia died. The percentages ofc-erbB-2-positive cells in the tumors had

In their interesting study on canine mammary tumors published in the Brazilian Journal of Medical and Biological Research, Dutra et al. (1) evaluated the influence of c-erbB-2 expression and nuclear polymorphism on survival using a semiquantitative scoring system.Survival was calculated using the Spearman rank correlation test.They obtained a regression coefficient of r = 0.441 for c-erbB-2 expression and of r = -0.295for nuclear polymorphism, with P > 0.05 in both cases.Therefore, the authors concluded that neither c-erbB-2 nor nuclear polymorphism could be considered to be prognostic factors.
We would like to discuss some of the methodological aspects of the study.
1.In survival studies some individuals will still be alive at the end.Others may have been lost to follow-up some time earlier and thus survival times will be unknown for a subset of the study group.There is a consensus that these so-called "censored" cases must be included in the analysis.Most survival analyses use Kaplan-Meier plots, log rank tests or the Cox proportional hazard regression (2).A Spearman rank order correlation test, as carried out by Dutra et al. (1), can only be used if all the animals under observation die (2).
2. Both variables examined, c-erbB-2 expression and nuclear pleomorphism, were evaluated using a semiquantitative scoring system.Categorizing variables that are measured on a continuum is a frequently used procedure in medicine, especially in surgical pathology.By looking, for instance, at recently published issues of this Journal, we can see considerable variation in the scoring systems applied in histopathology, not only between different groups but also within the same laboratory (1,(3)(4)(5)(6).
It is well known that categorization may be useful for clinical decision-making, treatment recommendations, determining study eligibility, or for illustration purposes (7), but we should be well aware that this procedure is dangerous for several reasons.
When the probability of an event associated with a quantitative prognostic factor varies monotonously, the selection of a cutoff point is always arbitrary.This may create scientifically unrealistic models because this procedure discards potentially important information since all values between the cutoff levels have an equal effect on the model.Therefore, categorization considerably lowers the statistical test-power and can be seen as an introduction of measurement errors (8)(9)(10)(11)(12)(13)(14).
This can be easily illustrated with a simple experiment.In Table 1 we simulated a possible situation similar to that of the study of Dutra et al. (1): let us assume that 10 of 17 dogs with neoplasia died.The percentages of c-erbB-2-positive cells in the tumors had Quantification in histopathology -some pitfalls been counted as percentage values and then were transformed by the HercepTest scoring system, assuming that none of the cells showed faint staining.When analyzing the scores by the Kaplan-Meier survival plot followed by the log rank test we cannot show any significant influence of c-erbB-2 on overall survival (P = 0.131; SPSS 10.0 software).Similarly, c-erbB-2 scores analyzed by Cox regression were not significant either when simply entering the categories or when using the quotient calculated between observed and expected events in the log rank test of the Kaplan-Meier model, as suggested recently, thus considering the natural order between the categories (15).But when we include the continuous percentage values in the Cox proportional hazard model, we discover a significant influence of c-erbB-2 expression on overall survival (P = 0.017, B = 0.0834; SPSS 10.0).Thus, our example shows that even in studies with small sample size it is possible to detect prognostic factors with statistical significance when using continuous data.Choosing the cut-off points of a continuously measured variable is in most cases arbitrary or opportunistic (16).The so-called minimum P-value approach involves multiple testing which inflates the type I error rates and is therefore not recommended (8).Variations in the choice of cut-off points between different studies provoke a bias called the 'Will Rogers phenomenon', which describes an apparent improvement of prognosis in subgroups due to classification or categorization changes, but which is not accompanied by an overall improvement of survival (11).Therefore, the results of studies using different cut-off points may not be comparable at all.In this context the use of a previously defined classification system, in our case the HercepTest scoring system, is certainly of advantage.But this classification system was created in order to identify patients who would probably benefit from a specific therapy.Yet this is not the topic of the present study on prognostic variables and therefore possible relevant cut-off points may be completely different for this purpose.In summary, the accuracy and reliability of a study may be compromised by categorization of continuous data or qualitative interpretations.Categorization of a variable which can be continuously measured should be done only exceptionally.For instance, when precise measurement is difficult, the distribution of the variable is highly skewed or its relation with another variable is nonlinear (8,10,17).
Therefore, we suggest that immunohistochemical expression should be quantified as done for the MIB-1 antibody, i.e., by counting the percentage of antibody-stained cells.One may argue that this method will not take into consideration differences in staining intensity between the cells.In this case the immunohistochemical reaction product may be easily quantified with interactive software applied to digitized images.Nuclear pleomorphism can also be easily and precisely quantified using texture analysis of digitized images (18)(19)(20)(21).Therefore, we believe that a re-evaluation of the material

Table 1 .
Simulation of survival data in order to show the effect of categorization.