H-index of Collective Health Professors in Brazil

OBJECTIVE: To estimate reference values and the hierarchy function of professors engaged in Collective Health in Brazil by analyzing the distribution of the h-index. METHODS: From the Portal of Coordination for the Improvement of Higher Education Personnel (Portal da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), 934 authors were identifi ed in 2008, of whom 819 were analyzed. The h-index of each professor was obtained through the Web of Science (WoS) using search algorithms controlling for namesakes and alternative spellings of their names. For each Brazilian region and for the country as a whole, we adjusted an exponential probability density function to provide the population parameters and rate of decline by region. Ranking measures were identifi ed using the complement of the cumulative probability function and the hierarchy function among authors according to the h-index by region. RESULTS: Among the professors analyzed, 29.8% had no citation record in WoS (h=0). The mean h for the country was 3.1, and the region with greatest mean was the southern region (h=4.7). The median h for the country was 3.1, and the greatest median was for the southern region (3.2). Standardizing populations to one hundred, the fi rst rank in the country was h=16, but stratifi cation by region shows that, within the northeastern, southeastern and southern regions, a greater value is necessary for achieving the fi rst rank. In the southern region, the index needed to achieve the fi rst rank was h=24. on the basis of the WoS h-index, did not exceed h=5. Regional differences exist, with the southeastern and northeastern regions being similar and the southern region being outstanding.


INTRODUCTION
The h-index has attracted wide interest in the academic community since its introduction by Hirsch in 2005. 6Its attractiveness arises from the possibility to sort scientists on the basis of a single number.This yields an advantage over other indexes that are based on citations, such as those based on the total number of publications, total number of citations or the number of citations per publication. 2Bibliographic databases such as the Web of Science (Thomson Reuters) and Scopus (Elsevier B.V.) have incorporated this calculation for use in evaluating an author's scientifi c production.The h-index has become an item on the curriculum vitae (CV) of researchers, as is shown by its adoption by the Lattes Platform of the Conselho Nacional de Desenvolvimento Científi co e Tecnológico (National Council for Scientifi c and Technological Development).
The h-index quantifi es the cumulative production of an author 6 , incorporating information about his/her publication record and evaluation by the corresponding scientifi c community (the impact of citations). 5,12ccording to Hirsch's defi nition 6 , "A scientist has index h if h of his or her Np papers have at least h citations each and the other (Np -h) papers have h < citations each."Therefore, the index measures the number of articles of an author having at least as many citations as the cardinality of the set of articles, e.g., an author who has ten articles published, of which fi ve have at least fi ve citations, has an h-index of 5. Despite this interest, the h-value of a given author lacks meaning and does not help in the judgment of merit; this can only be done by comparison with reference values in each fi eld of knowledge.In fact, to contribute semantic content to values of h, Hirsch's original article describes the h-index of notable authors in his fi eld, which is Physics.In Brazil, at least three initiatives for the identifi cation of h-reference values exist. 1,8,10 2006, Batista et al 1  of fi ve institutions of higher education based on the institutional h-index, irrespective of the fi eld of knowledge.In fact, Van Raan 12 found an association not only between different numerical indicators but also with the judgments of peers in research groups in Chemistry.
This study aims to estimate the reference values and hierarchy function of graduate researchers in Collective Health based on an analysis of the distribution parameters of the h-index.

METHODS
The sample size of the scientific production in Collective Health is inaccurate, and it is not identifi able either by institutional affi liation or by publishing vehicle.We examined the set of all graduate researchers in Collective Health in the country to obtain a sample of authors.The names and affi liations of the graduate programs were accessed through the records of the Coordination for the Improvement of Higher Level Personnel (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) in the public domain on the internet.a The following options were selected: 1) Registration of students; 2) Book of indicators and 3) Collective Health for the year 2008, resulting in the sampling of all Higher Education Institutions (HEIs) and their programs in Collective Health in Brazil.For each HEI, we selected the Faculty option, resulting in the assembly of a list of all professors in Collective Health with information regarding their institutional affi liation, fi eld and academic title.These data formed the database on faculty in Collective Health in Brazil.
Publications of professors were sorted based on the number of "times cited" obtained from the WoS database.The h-index obtained on the "citation report" page was recorded.For each name, we considered different versions of name spelling identifi ed in the citations of CV Lattes and in the "author index" of WoS.The main diffi culties of this phase were the presence of homonyms and different name formats used in bibliographic citations.Homonym cases were solved by considering institutional affi liation, recognizing the group by co-authors, consistency of the investigation fi eld and comparison with the Lattes database.For the different bibliographic citation formats, we included the possible names by using an asterisk at the end of capital letters, aiming for a more sensitive search.For example, if the fi ctitious name "João Adalberto Gonçalvez Silva" were registered as Silva J, Silva JA or Silva JAG on CV Lattes, the name would be queried in WoS as Silva J*, and the information used for solving homonyms would be included in the fi lter page of WoS for searching the author's h-index.In the case of different authors having the same name in citations, such publications were excluded, and the h-index was automatically recalculated.Publications were compared with those identifi ed in CV Lattes to ensure the validity of the information obtained.
Search algorithm and validation strategies were tested for each professor from March to November 2008.After query standardization on WoS, we proceeded with the collection of updated data in November 2009.
Figure 1 shows the frequency distribution of h-values based on region and suggests a methodological strategy for analysis.The dotted line describes an exponential decay curve, a Lotka characteristic 4 (Lotka's Law 7 ) of the h distribution.The theoretical exponential probability distribution and the Pareto are both able to generalize this type of frequency distribution; we chose the fi rst distribution for the adjustment of events from h=0.The exponential probability density function and cumulative distribution function are described as follows: With the assistance of the SPSS statistical package, we fi tted the density functions to the frequency data of each region of Brazil.The quality of fi t of each function was described by the complement of the residual variance divided by the total variance (R 2 adjusted), and estimates of the decline rate (λ) were assessed based on the 95% confi dence interval (95% CI) and the descriptive level obtained using Anova.
To defi ne a hierarchy function of h according to the event, we resorted to the complement of the cumulative distribution percentiles: Null h values (zero percentiles) corresponded to the last position of a supposed set of discrete and ordered values of 100 h.Values between the 98.5 and 99.49 percentiles indicated fi rst place (both extreme values were rounded to 99 and 100-99=1), and percentiles beyond 99.49 were rounded to zero and considered hors concours -very rare occurrences of 0.5% or less.This statistically suggests a strange element in the set, albeit in the sense of positively highlighting the high performance.The second place corresponded to the percentile values between 97.5 and 98.49 (rounded to 98) and so on.We obtained different order positions among authors in a given set that would reduce the total number of authors to 100.This strategy seeks to balance the hierarchy of exceptional authors and authors with h = 1, providing a distance between authors and last place, as such a position should be reserved for those who do not have any cited articles.

RESULTS
The h-index of 934 authors dispersed over the region, HEI and program are described in Table 1.
Figures In Table 2, we recorded the results of the analysis of the h-index distribution by region and for the country as a whole.For all regions, we reached a satisfactory adjustment to the exponential probability density function with parameter λ and with statistical significance.For the function adjustment to the data of each region, repeated records of authors from more than one program were ignored.The fi rst line of Table 2 reports the number of authors' records contributed by each region.

DISCUSSION
The S and SE regions have the lowest proportion of h-indices equaling zero.However, the SE region has a defi nite shortcoming, having the greatest rate of decline (28% on average for every unit increase in the value of h).A greater rate of decline indicates a larger drop of probability density from h=0 and, consequently, a reduction of the probability of occurrence of higher values of h.Thus, if h=19 places the author at rank 1 in the SE region, this position would require h=14 in the S region.
After adjusting the exponential probability density function, the regions of greatest similarity are the SE and NE regions: their λ parameters of the density function are similar, with a large overlapping of confi dence intervals.As a corollary, their means and medians are similar, as are the hierarchy positions for a given h for these two regions.
The hierarchy function in each region (Table 2) aids in the assessment of the position in a given region and for a particular value of h.For example, for h-index = 10 for a hypothetical author in the SE region, we have the following calculation:  This means that if there were 100 authors in Collective Health in the SE region, this specifi c author would be ranked sixth.In this region, h=10 corresponds to the 93.92 percentile, whose complement 6.08 yields 6 when rounded.In the CW region, whose average h-index is 2, h=10 corresponds to the fi rst place tied with the authors with h=9 (in both cases, the rank function yields 1 as the result).In the NE region, this author would be in fi fth place, and in the S region, this author would be in eleventh place.Again, there are similarities between the SE and NE regions.
In previous studies, 11 more similarities between the NE and S regions were found.These regions registered the highest annual growth rates of publications and citations, less dispersion of research interests (i.e., The value of the h-index from the 'citation report' page underestimates the real value of h of authors whose works are not part of the publication records of WoS.The estimate of h can be refi ned via a 'cited references search', which will also be limited to citations of published articles that are registered in WoS.Any inaccuracy of this metric does not compromise comparisons of measurements taken under the same assumption.The h-index can also be obtained on BV Scopus and Google Scholar, resulting in different values.It is thus inappropriate to compare values of h from different sources. The h-index has limitations that are the basis for a critical interpretation of the scientifi c production of an author.Examples are its dependence on the number of years of scientifi c activity, 6 which hinders comparisons of the h-index of young researchers with that of older researchers, an excessive use of self-citation (which can infl ate the value of the h-index) 13 and the possibility of underestimating the production of "selective authors", i.e., authors who publish fewer papers but ones that have remarkable international impact and receive many citations. 3Moreover, evaluation of the productivity of scientifi c researchers cannot be restricted to the use of a single indicator.A single number cannot provide more than a rough approximation of an individual's multifaceted profi le, and many other factors should be considered in combination when evaluating a researcher. 6he h-index is a tool to evaluate scientifi c researchers.The previous 11 and present studies agree in concluding that the NE region has equaled the "Sul maravilha" ("southern wonder"), a phrase coined by Henfil (Henrique de Souza Filho, 1944 -1988).If he were still alive, maybe his character Grauna would acknowledge a "Nordeste maravilha" ("northeastern wonder"), at least in Collective Health.
As a bibliometric indicator, the h-index has attracted the attention of Scientometric academics, who have analyzed the advantages and disadvantages of the index and study new opportunities for scientifi c production modeling.Since 1995, articles analyzing and modeling the index have accumulated in specialized journals: Scientometrics logs 55 of these articles, 23 of which were published in 2009 [search algorithm on WoS: Publication Name=(scientometrics) AND Topic=(H index)].Journals from various fi elds of knowledge have devoted editorials to the h-index, and the fi rst editorial was encountered in 2005; 26 editorials in 22 journals were found in WoS [Topic=(H index) AND Year Published=(2008) Refi ned by: Document Type=(editorial material)].
1tudied Brazilian scientific publications registered by the WoS from 1970 to 2004 for Physics, Chemistry, Mathematics and Biomedical and Life Sciences and determined the highest values of h found in each area.Batista et al.1proposed a new indicator, in which the h-index is weighted by the number of co-authors, which attracted wide attention from Scientometric researchers.
a Ministério da Educação.CAPES -Caderno de Avaliação.Brasília; 2007 [cited 2008 Mar].Available from: http://conteudoweb.capes.gov.br/conteudoweb/CadernoAvaliacaoServlet?acao=fi ltraArquivo&ano=2008&codigo_ies=&area=22 2 and 3 show that the southeastern (SE) and northeastern (NE) regions have more programs and professors: we found an average of 35 professors per program in the SE and 22 in the NE regions.The southern (S) region, although having a smaller number of programs and professors, showed an average of professors per program (15/program) that was more similar to the NW than to the SE.There is only one program with 21 authors in the central-western (CW) region.In the northern (N) region, there is a master's degree program in Collective Health at the Federal University of Acre (approved by the National Board of Education [CNE], Ministry of Education and Culture [MEC] ordinance 458, DOU 04-11-2008 -Endorsed CES/CNE 28/2008, 04-10-2008), but there is no "book of indicators" that allows the identifi cation of authors.

Table 1 .
Graduate school professors in Collective Health according to region, higher education institution and program.
a Professional Master's degree

Table 2 .
Characteristics of the h-index distribution of graduate degree professors in Collective Health.Brazil, 2008.Distribution of graduate programs in Collective Health by region.Brazil, November 2008.