Brazilian academic search fi lter : application to the scientifi c literature on physical activity

OBJECTIVE: To develop a search fi lter in order to retrieve scientifi c publications on physical activity from Brazilian academic institutions. METHODS: The academic search fi lter consisted of the descriptor “exercise” associated through the term AND, to the names of the respective academic institutions, which were connected by the term OR. The MEDLINE search was performed with PubMed on 11/16/2008. The institutions were selected according to the classifi cation from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for interuniversity agreements. RESULTS: A total of 407 references were retrieved, corresponding to about 0.9% of all articles about physical activity and 0.5% of the Brazilian academic publications indexed in MEDLINE on the search date. When compared with the manual search undertaken, the search fi lter (descriptor + institutional fi lter) showed a sensitivity of 99% and a specifi city of 100%. CONCLUSIONS: The institutional search fi lter showed high sensitivity and specifi city, and is applicable to other areas of knowledge in health sciences. It is desirable that every Brazilian academic institution establish its “standard name/brand” in order to effi ciently retrieve their scientifi c literature. DESCRIPTORS: Information Storage and Retrieval. Publications for Science Diffusion. Bibliography as Topic. Exercise. Motor Activity. Bibliometrics.


INTRODUCTION
The search for solid and relevant scientifi c literature has become a necessity for any investigator in the scientifi c sphere.Having knowledge of existing bodies of work and their content is a pre-condition for resolving any informational problems that arise in the course of professional activity.It is necessary, though, to be aware of logical procedures that allow us to satisfactorily obtain these references in order to actually use them.
Quantitative analysis, to discover and evaluate scientifi c work in a research area, is currently becoming very important.It is part of the social study of science, and one of its main uses is in the area of science policy, providing tools that allow for evaluating the results of investigation.Therefore, considering the impact these measures have upon the allocation of research funds and upon professional promotion and accreditation of investigators, it is necessary to understand the particular details and the limitations that there use implies.
Being familiar with the scientifi c work of universities and research centers is very important to be able to perform these evaluations.In every case, it is indispensable that the indicators of scientifi c production, as well as other indicators for Science and Technology, are reproducible with a set and generally accepted methodology, so that results can be compared and comparable.Indices, quotients, obsolescence and other data can provide or end the possibility of accessing public or private fi nancial resources, and also generate classifi cations which are of extreme importance for managers and evaluators of policy for science and technology. 13,3n this sense, the evaluation and accreditation agencies, which judge the merit of students and investigators, value the importance of publication according to the prestige of the journal where it was published, 22 which is generally measured through bibliometric indicators.
Nonetheless, when evaluating the scientifi c literature, one should remember there are cases that can not be resolved with the use of one or multiple Medical Subject Headings (MeSH), which allow for the retrieval of the existing scientifi c work on a given subject, produced by specifi c institutions and countries. 12This demonstrates the need to create geographical or academic search fi lters that ensure effi cient access to this scientifi c literature. 15,19erefore, the objective of this study was to develop a search equation to retrieve the academic scientifi c work pertaining to the theme of physical activity.

METHODS
The fi lter was created according to the methodology developed and tested by Valderas et al 19 for the creation of geographical search fi lters, incorporating corrections previously proponed by Sanz-Valero et al. 15 The distinct academic institutions and possible acronyms were associated with the "OR" connector, and different languages were employed for their identifi cation in a pilot search.The academic/institutional search fi lter was corrected by incorporating the occurrences observed in this study.
• The fi nal search fi lter consists of the boolean association between four equations: • Equation 1: name of institution in Portuguese and in the main languages used in MEDLINE.
• Equation 3: name of the universities and corresponding name of the Brazilian cities, excluding those that could create confusion with cities in other countries.
• Equation 4: names of institutions that could not be included in the previous equations.
The fi nal equation was structured in the following manner:

Exercise[Mesh] AND (((equation 1 OR equation 2) AND ("Brasil"[ad] OR "Brazil"[ad])) OR (equation 1 AND equation 3) OR equation 4)
The searches were performed from the fi rst day available until November 16, 2008 (the last day that the obtained references were verifi ed).
The project was done on MEDLINE which allows for searching mediated by Tags, which are classifi ers of a bibliographic database, identifi ed according to a label of two or more letters that can be added to each term with brackets.The Tag [ad] (address) was used to retrieve the institutional affi liation of at least the primary author of the article (e.g.: "Universidade Estadual"[ad] AND "Rio de Janeiro"[ad], is equivalent to having Universidades Estadual and Rio de Janeiro in the fi eld for institutional address).The MEDLINE search engine, through PubMed, was used due to its free and permanent access, even though it is the most consulted biomedical database. 18e fi nal search fi lter can be directly used by copying and pasting it into the PubMed search windows or by performing separate searches and creating the equation using the portal history.Any part of the fi nal equation can be updated by adding new descriptors, changing a part or eliminating an undesired segment.
The fi lter was evaluated by manually reviewing the references obtained through the proposed search fi lter, while considering and scoring the previous studies.The institutional affi liation of the article (based on the fi rst author) should belong to Brazilian academic centers or institutions.If articles met these requirements, they were categorized as "without incidence" or as "with incidence".It was documented for later review and correction, if the articles were retrieved from the search equation.The congruence of the retrieved articles to the area studied (exercise = physical activity) allowed for the classifi cation of articles as "pertinent" or "not pertinent".Since a gold standard does not exist, the pertinence of the results was evaluated by comparing the manual review and the subsequent calculation of sensibility and specifi city when utilizing the search equation. 15,19r the bibliometric evaluation the following variables were considered: document number and type, publication in electronic format (epub), number of authors, institutional affi liation, presence of institutional name in affi liation, language in which the country name appears, language the article was written, journal, year of publication, presence of a link on PubMed to the full text (visibility), open access availability of the article and access to the text on Scientifi c Electronic Library Online (SciELO).
The bibliometric indicators utilized were: • Lotka's productivity index -based on the citations of scientifi c Publications, it allows for classifi cation of the authors in three levels according to their productivity: large producers, with more than ten published works; medium producers that have published between two and nine works; and small producers that have published one work.
• Transience Index -frequency and percentage of authors or institutions that have published only one work about the subject.
• Burton and Kebler half-life -refers to the obsolescence of the works studied and is measured by the median age.
• Price Index -percentage of references equal to or less than fi ve years.
• Bradford Law -indicator of the dispersion of scientifi c information, which holds that if the journals in a thematic area are divided into groups, then the number of journals in each group would be proportional to 1:n:n², where the main nucleus represents the network of journals of most pertinence to an area of knowledge.

RESULTS
The use of the proposed Boolean search equation retrieved 407 references, corresponding to 0.9% of the references on exercise and 0.5% of the Brazilian academic scientifi c production indexed in MEDLINE at search date.
The manual review of the retrieved references allowed for the consideration of the 407 (100%) articles pertinent to the theme of "exercise" (physical activity).
In regards to the institutional affi liation, 377 (92.6%) were in Portuguese or Spanish and 30 (7.4%) were not.The retrieval errors that occurred were as follows: on 19 (4.7%) of occasions the Southern Cross University of Australia, due to the translation of the Universidade Cruzeiro do Sul; three (0.7%) times the Santa Cruz de Tenerife (Spain) and one time the University of California at Santa Cruz (USA) occurred, due to the use of the term "Santa Cruz" for the Universities of Santa Cruz do Sul and Estadual de Santa Cruz; and in three other occasions the Catholic University of Sacred Heart (Italy) was retrieved when translating Universidade do Sagrado Coração.The other four (1.0%) incidences were due to confusion in abbreviations and word fragments that are diffi cult to predict and correct.
After corrections considering the observed incidences, the equation retrieved 381 references, of which 377 had a Brazilian affi liation.Therefore, comparing the manual revision to verify the pertinence of the articles retrieved by the search equation (descriptor + institutional fi lter) showed a sensibility of 99.0% and a specifi city of 100.0%.
The institutional affi liation was present in 194 (51.5%) of Portuguese publication, in 182 (48.3%) of English publications and in one (0.3%) of Spanish publications.In 12 (2.8%)occasions there was diffi culty in recognizing the institution; seven (1.9%) of these were identifi ed by referring to dependent institutions, and six (1.6%) were identifi ed by the acronym.
In regard to the nomenclature of the country, it was written in 265 (18.3%) occasions in English (Brazil), in 69 (70.3%) occasions in Portuguese (Brasil), and in 43 cases (11.4%), the country did not appear.
The regards to the affi liation of the scientifi c work there were 54 academic institutions identifi ed, of which three were in the fi rst tertile of productivity: Universidade de São Paulo (USP), Universidade Federal de São Paulo (UNIFESP) and Universidade Estadual de Campinas (UNICAMP) (Table 1).The classifi cation of productivity by institution according to Lotka's Index resulted in three levels of performance: 26 low production center, with only one work (48.2%); 18 medium production centers (between two and nine works) (33.3%); and ten high productions centers (ten or more works) (18.5%).
Of the 377 articles studied, 44 (11.7%) were reviews with an mean of 5.05 (SD = 0.13) authors per publication (CI 95%: 4.80; 5.30), with a minimum of one and a The retrieved references were from 156 journals.The dispersion study of the retrieved scientifi c literature found a concentration of 127 (33.7%) articles in seven (4.5%) journals (Table 2); these documents consist of the main Bradford nucleus (Figure ), which together with the other tertiles constitutes the dispersion of the publications.

DISCUSSION
The institutional search fi lter allowed for the retrieval of Brazilian academic publications in the MEDLINE database.The evaluation demonstrated a very good sensibility (capability of retrieving the desired publications) and an adequate percentage of pertinent articles, after the correction for the observed incidents.The retrieval of very pertinent publications may cause a large decrease in specifi city, making the search less exhaustive.This fact is explained by the correction of the search fi lter and also the use of MeSH descriptors, but in any case, this situation favors the pertinence of retrieved publications.
Wild cards were not utilized in the search equations (for example: univers*) if they were not recognized by Tags, since the results would have been affected in a way contrary to the goal for formulating the equation.
A gold standard does not exist for comparison, but an evaluation can be done with already utilized techniques. 15,19In comparison to previous publications, the performance obtained in this study shows a similar or superior sensibility as other recent studies 6,9,15,17,19,23,24 about search methodology.This same result also occurs for specifi city. 6,9,11,23,24e progressive appearance of journals with electronic publication (epub), especially since the year 2000 in the case of Brazil, coincides with the progressive establishment of the SciELO. 2,10,21The contribution of SciELO to the visibility of Brazilian and Latin American scientifi c literature is also notable, 14 since it provides links to full text articles found through MEDLINE searches, greatly increasing the visibility of these documents. 1,5,7e proposed fi lter can be improved by utilizing it and identifying new incidences not covered in the corrected version proposed here.A similar situation has successfully happened in the case of a Spanish geographic fi lter, 20,16 which is of a modular structure and can be easily modifi ed through the addition or subtraction of any of its parts.
In many occasions the scientifi c documents can be retrieved for a given county without having such elaborate search strategies, but this depends on the characteristics of the study approach and the amount of error that can be accepted.It should be forewarned that the use of the filter currently generates the message "Quoted phrase not found" which is not an error message and does not interfere at all in the search process.The message is due to the non-recognition of some terms, such as "Municipal de Sao Caetano do Sul" because there is no reference that includes this in the Address fi eld.Nonetheless, it was decided to include these locations to improve the topicality of the fi lter and in case of new publications by investigators from the affected institutions.
The percentage of retrieved documents will depend on the thematic area studied, independent of utilizing the proposed fi lter.Also, the institutions included in the Brazilian academic fi lter will have varying importance, depending on the document content considered.
The search equation included the academic and research institutions that are part of the classifi cation in the Coordination for Training of Graduate Education (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, CAPES) for interuniversity agreements.
The results of the bibliometric analysis of scientifi c publications on exercise showed similar data to those presented by previous studies about health sciences, 4,8,22 except in the case of obsolescence.This is due to the current situation of the theme studied and from the dispersion, since a pair of key journals, where the majority of authors want to publish, does not exist.
It can be concluded that the institutional filter is suitable, with high sensibility and specificity, for retrieving Brazilian academic scientifi c publications in MEDLINE, and the fi lter can be applied to any thematic area related to health sciences.
When using the entire institutional name together with the corresponding acronym, a great ease of identifi cation was observed, especially in cases where the institutional name had been translated to a language other than Portuguese.Therefore, it is recommended that each Brazilian academic institution establishes a normalized name in order to facilitate the effi cient retrieval of its scientifi c work in the different bibliographic databases.This recommendation is valid for any academic institution, independent of country.
In conclusion, this study offers an institutional fi lter to effi ciently and easily retrieve the scientifi c work of Brazilian academic institutions and is applicable to political science studies.

Table 1 .
Universities with more than ten publications on physical activity, indexed in MEDLINE when using the Brazilian academic search fi lter.

Table 2 .
Journals belonging to the main Bradford nucleus, where articles about exercise (physical activity) were published and obtained by utilizing the Brazilian academic search fi lter.
* FIF = Impact factor, data obtained from 2007 JCR Science Edition Database of the ISI Web of Knowledge, Thomson Figure.Bradford nucleus of dispersion of journals and articles on physical activity published by Brazilian institutions and indexed in MEDLINE.