Open-access Show me the image: a systematic analysis on how results are represented in publications from different fields of biomedical and biological research

Abstract

Figures are essential to convey the main results of scientific articles. Different biomedical research fields have different methodologies and therefore different forms of data representation. To understand whether there are distinct patterns of data representation, we analyzed how results are displayed in scientific publications from six fields: Biochemistry and Cell Biology, Bioinformatics and Computational Biology, Clinical Sciences, Oncology and Carcinogenesis, Pharmacology and Pharmaceutical Sciences, and Zoology. Our results show that Graphics were the most frequent type of representation, followed by Schemes and diagrams. Microscopy was the third most used type of image in most fields, except in Pharmacology and Pharmaceutical Sciences, where Molecules and chemical reactions were the third most frequent. Interestingly, each research field has a characteristic pattern of image. We further classified the image types in primary or secondary data, according to the level of human interference in its construction. Each field has a particular proportion of primary and secondary images. We also analyzed the frequency of words and observed a remarkable vocabulary difference between fields. The most frequent word of each field nicely correlates with the unique type of figures used. Specific fields might gain more visibility for their data by using diverse approaches in image representation.

Key words
image; data representation; research field; vocabulary comparison

INTRODUCTION

The phrase “An image is worth a thousand words”, attributed to Fred R. Barnard (1927), precisely reflects the importance of images in modern society, particularly in advertising and in scientific research. In biomedical research, data are usually presented in scientific meetings and published in scientific journals in the form of figures and tables. Images may be the direct result of a technique, such as microscopy, or may be constructed as a visual aid to convey data obtained from other methodologies, such as colorimetric analysis of enzyme activity. Another common phase is “Show me the data” (Rossner et al. 2007), which refers to the importance of showing results instead of only describing them. Interestingly, it has been found that scientific journal-article components, such as tables and figures, are often among the first parts of an article read by researchers (Sandusky & Tenopir 2008). Although it is a consensus that images are essential for science, images can vary a lot in quality, type, and ethical concerns (Cromey 2013), not to mention the common lack of a detailed description of the images (Marques et al. 2020), and data on the legibility and interpretability of images (Jambor et al. 2021). The use of images in articles is so complex that in some highly ranked biomedical journals, figures included in publications are often composed of multiple subfigures or panels, each describing different methodologies or results (Lopez et al. 2013a). Furthermore, we showed that different biomedical research fields have different image methodologies and therefore different forms of data representation are expected (Reigoto et al. 2021). Here we aimed to analyze how different research fields use images and display results in scientific publications. To achieve this goal, we classified article’s results from six different research fields (such as biochemistry and cell biology and clinical sciences) in ten categories (such as macroscopic photography, diagrams and gels and blots). Our results show striking differences regarding how different scientific fields generate and represent their data. We discuss these results on the premise that specific fields, by using new approaches in image representation, might gain more visibility of their data and improve transdisciplinarity, contributing to imaging literacy.

MATERIALS AND METHODS

The following six fields of research were selected from the Dimensions database (https://www.dimensions.ai/): Biochemistry and Cell Biology (3101), Bioinformatics and Computational Biology (3102), Clinical Sciences (3202), Oncology and Carcinogenesis (3211), Pharmacology and Pharmaceutical Sciences (3214), and Zoology (3109). The numbers within the parenthesis represent the Fields of Research (FoR) classification of the 2020 Australian and New Zealand Standard Research Classification (ANZSRC) system (https://dimensions.freshdesk.com/support/solutions/articles/23000018826-what-is-the-background-behind-the-fields-of-research-for-classification-system). Our intention was to select very different biological and biomedical fields of research in terms of their main objects of study and, consequently, their possible representation of results in scientific articles. We used three criteria in the Dimensions database to select articles: (i) specific fields of research, (ii) only articles (excluding books, chapters of books and other non-related texts), and (iii) included in the PubMed database. These three criteria are options of the software Dimensions. The results were sorted by the number of citations (from the highest to the lowest) and the first 500 articles of each research field were exported to an Excel datasheet. Then, the database was manually analyzed to exclude articles that were revisions, commentaries, strictly methodological, case reports and meta-analysis. The one hundred most cited articles from each research field were then manually classified in relation to how they represented their results. We classified each article result into the following ten categories: macroscopic photography, microscopic image, schemes and diagrams, graphics, gel blot and PCR, machine output, genomic sequencing, molecules and chemical reactions, anatomic drawing and maps.

The date of the most recent search of the literature review conducted for your study was June 03, 2024. In our work we followed the PRISMA 2020 statement (https://www.prisma-statement.org/prisma-2020) for systematic reviews of studies (Figure 1).

Figure 1
PRISMA flow diagram. The diagram depicts the flow of information through the different phases of the systematic analysis mapping out the number of records identified, included and excluded.

We calculated the average percentage value for each method, and we compared the percentage value of each field with this average ((percentage of images using a given method for each field - average percentage of all fields)/standard deviation of all fields). We called this factor the “uniqueness index”.

We also analyzed the frequency of words that appear in the abstract of articles, assuming that they would provide a concise representation of the whole article. Wordle software (created by Feinberg and co-workers (Viegas et al. 2009) and freely available at http://www.wordle.net) was used to generate a list of words with their relative frequencies, and to generate “word clouds”. The clouds give greater prominence to words that appear more frequently in the source text, i.e., more frequent words appear with larger letters and in a colored gradient. We used the following parameters to generate word clouds: remove common English words and numbers, make all words lower-case, Telephoto font type, rounded edges, kindled color, horizontal layout. The list of words was manually edited to remove plural words, different spellings for some words, and symbols.

RESULTS AND DISCUSSION

To study whether there are distinct patterns of data representation in different biological and biomedical fields, we analyzed how results are displayed in scientific publications from six biomedical fields: 1) Biochemistry and Cell Biology, 2) Bioinformatics and Computational Biology, 3) Clinical Sciences, 4) Oncology and Carcinogenesis, 5) Pharmacology and Pharmaceutical Sciences, and 6) Zoology. Results from the most cited articles from each research field were classified in tables (a non-imagetic form of data representation) and ten categories of figures: a) macroscopic photography, b) microscopic image, c) schemes and diagrams, d) graphics, e) gel, blot and PCR, f) machine output, g) genomic sequencing, h) molecules and chemical reactions, i) anatomic drawing and j) maps. Examples of these types of images are shown in Figure 2.

Figure 2
Examples of the ten categories of figures used in this study. a) macroscopic photography, b) microscopic image, c) schemes and diagrams, d) graphics, e) gel, blot and PCR, f) machine output, g) genomic sequencing, h) molecules and chemical reactions, i) anatomic drawing and j) maps.

Most of the highly cited articles we analyzed had several figures and tables. Articles have between zero and 34 figures, most frequently three figures per article (Figure 3a). Interestingly, there were several articles with more than 14 figures. We also analyzed the number of tables per article and found that most of the articles have no table, but a few articles have several (up to 42) tables (Figure 3b). Then, we analyzed over the years the number of figures and tables in articles from different fields. Beginning in 1980, there was a clear increase in the number of figures in articles in all the fields (Figure 3c).

Figure 3
The evolution of use of figures and tables in articles from different biomedical and biological fields over the years. Articles from six different research fields were analyzed and the number of figures and tables were quantified.

Then, we asked which type of figures were most frequently used in each research field. We calculated the percentage of each type of figure in relation to the total number of figures for every field (Figure 4a). Graphics were the most frequent type of result representation in articles in five fields (except for Zoology), ranging from 27.9% in Zoology to 67.8% in Clinical Sciences (Figure 4b-g). Schemes and diagrams were the second most frequently used type of image, varying from 13.5% in Clinical Sciences to 39.7% in Zoology. Unexpectedly, microscopy was the third most used type of image in five of the six fields, except in Pharmacology and Pharmaceutical Sciences, where molecules and chemical reactions were the third most frequent. Interestingly, each research field has a characteristic pattern of image type frequency. Furthermore, we noticed that each field seemed to have a few types of figures that had a uniquely high frequency, different from the others. So, to analyze whether this assumption was true, we calculated an “uniqueness index” by comparing the frequencies of each image type with the average frequency in all research fields (Table I). These results showed that in Pharmacology and Pharmaceutical Sciences, the type of figure “molecule and chemical reactions” is much more frequent than in all other fields (Figure 4b-g). In Bioinformatics and Computational Biology, it is “machine output”; in Zoology, “anatomic drawing”; in Oncology and Carcinogenesis, “genomic sequencing” and “maps”; in Clinical Sciences, “graphics”; and in Biochemistry and Cell Biology is “gel, blot and PCR”. While a diversity in the distribution of image type among the areas should be expected, some discrepancies could be pointed out. For instance, we observed that, contrary to what could be expected, Zoology does not frequently use macroscopic photography for the description of animal species characterization. Likewise, we could expect more biochemical results, such as “gel, blot and PCR” in Clinical Sciences, since most diseases are currently understood at a molecular level.

Figure 4
Different biomedical and biological fields differ in the mode of representing scientific results. The use of ten different types of data representation was analyzed in articles from six different biomedical and biological fields.
Table I
Comparison on the “uniqueness index” from different biomedical and biological research fields. This table shows the values of the frequencies of each image type in relation to the average frequency in all research fields. The numbers in bold and in dark green are the “unique signature” for each research field.

To try to understand the importance of these field patterns, we searched for a way to measure the relative importance of non-imagetic concepts in each field and to try to relate these concepts with different image types. Therefore, we analyzed the frequency of words used in articles from the six different biomedical fields using word clouds. The abstract of articles was compared using the software Wordle and a remarkable vocabulary difference between the six fields was observed (Figure 5a-f). “Patients” was the most frequent word in Clinical Sciences, “cell” in Biochemistry and Cell Biology, “data” in Bioinformatics and Computational Biology, “cancer” in Oncology and Carcinogenesis, “drug” in Pharmacology and Pharmaceutical Sciences, and “species” in Zoology (Figure 5a-f). Curiously, while “cancer”, “cell” and “patient” are more frequent than all other words in each field, “species”, “drug” and “data” did not show a large size difference among other words. These results suggest that Biochemistry and Cell Biology, Oncology and Carcinogenesis, and Clinical Sciences are homogeneous fields in terms of vocabulary, while Bioinformatics and Computational Biology, Pharmacology and Pharmaceutical Sciences, and Zoology are diverse and heterogeneous fields. Furthermore, the word “patient” was highly frequent in Oncology and Carcinogenesis, Clinical Sciences and Pharmacology and Pharmaceutical Sciences, suggesting that these three research fields share common interests in human diseases and their treatment.

Figure 5
Vocabulary comparison between articles from different biomedical fields using word clouds. The abstract of articles from six research fields (Biochemistry and Cell Biology, Bioinformatics and Computational Biology, Clinical Sciences, Oncology and Carcinogenesis, Pharmacology and Pharmaceutical Sciences, and Zoology) was compared using the software Wordle. A remarkable vocabulary difference between the six fields can be observed.

The most frequent word of each field nicely correlates with the unique type of figures used in each field (compare Figures 4b-g and 5). For example, in Bioinformatics and Computational Biology the relative frequency of “machine output” is unique while the most frequent word is “data”. In Pharmacology and Pharmaceutical Sciences, “molecule and chemical reactions” is much more frequent, and the most frequent word is “drug”. This could suggest that the different uses of images between fields is directly related to their innate characteristics, justifying the existence of these patterns.

An important aspect of our work was to establish an image taxonomy for classifying biomedical and biological images. Even though several image taxonomy methods were previously proposed (De Herrera et al. 2016, Shatkay et al. 2006, Lopez et al. 2013b), none of them could fulfill all image categories that we observed in biomedical articles. Therefore, we classified images in ten categories: a) macroscopic photography, b) microscopic image, c) schemes and diagrams, d) graphics, e) gel, blot and PCR, f) machine output, g) genomic sequencing, h) molecules and chemical reactions, i) anatomic drawing and j) maps (Figure 2). Several differences can be found when comparing our classification with other studies. For example, Li et al. (2021) classified article’s images in 4 first levels: Graphics, Molecular Structure, Experimental and Other Images. At the second level, Graphics were subdivided into Histogram, Line Chart and Other Diagrams. Molecular Structure Images were classified into Macromolecule Sequence and 3D Structure Images. Experimental images were further classified into Fluorescence Microscopy, Light Microscopy, Whole Mount, Gel and Plate Images. Furthermore, we classified images into primary data and secondary data, according to the way they were created. Primary data are all the images that are directly obtained by the researcher using some device, such as a photograph of an animal or an organ, a microscopic image, a gel or blot, or a genetic sequencing. In contrast, secondary data are images artificially constructed by the researcher, such as a graphic, a scheme, an anatomic drawing or the design of a map. Using this classification, we compared how different research areas use primary versus secondary data. Table II shows that all the six research fields publish more secondary than primary data, although the balance between these two types of images vary widely among the fields. For example, while Biochemistry and Cell Biology have an almost equal use of primary and secondary data (45% versus 55%), Pharmacology and Pharmaceutical Sciences showed a much higher frequency of secondary data as compared to primary data (89% versus 11%). This difference is due to the highly frequent use of graphics in Pharmacology and in the highly frequent use of microscopic images in Biochemistry and Cell Biology. We had already pointed out the limited use of microscopy in Pharmacology, despite its intensive use of cell cultures (Reigoto et al. 2021). If some diversity of image methods among the research areas could be expected, we were not expecting the large difference between primary and secondary image types among fields. Primary images are the data itself, presumably more objective than secondary images, that are more prone to human interpretation and bias. We are aware that this classification has limitations, in the sense that photography can be by itself art and graphics can show undisputed numeric values with varying bias (using different scales, for instance). Does this difference between usage of primary and secondary data mean that some areas are more subjective than others? We can speculate that the use of supplementary information in articles could be an attempt to emphasize the original, primary data. For example, usually authors are asked to submit the complete and original electrophoresis gels as supplementary material, and only selected and cropped protein bands are shown in the article’s main figures. Another interesting example is the Omics field, including genomic, transcriptomic, proteomic, metabolomic, among others big data omics. In Omics, the results are published as enormous lists of molecules, which are primary data, and researchers analyze these data in multiple types of charts, which are secondary data (Hasin et al. 2017, Vasaikar et al. 2023).

Table II
Comparison of how different biomedical and biological research areas use primary versus secondary data.

Finally, if we assume that integration among areas is beneficial for science and innovation, should we expect a more uniform distribution of image type among fields? We have previously described the difference in vocabulary between cell biology and medicine, and we pointed out that translational medicine was bridging the gap in vocabulary (Azevedo et al. 2021). We intend to expand the number and type of research fields in our study to include translational areas, seeking for integration of image methods and vocabulary in these innovative and spanning areas.

We observed that different biological and biomedical research fields have a characteristic pattern in relation to the use of certain types of images in articles, which nicely correlates with the most frequent words used in the text of these studies (Table III). We expect that our study will initiate a discussion on how specific research fields could use new approaches in image representation in articles.

Table III
Comparison on the unique use of images and most frequent words from different biomedical and biological research fields.

Our study shows differences between how observation, results and conclusions are presented in different areas of research and points to possible problems with data when a study from one discipline is used by other disciplines. Since these differences could be obstacles to an accurate interpretation of relevance or significance, we decided to provide specific guidelines on how to interpret, understand and use images from different areas of expertise (Table IV). Given that people working in different areas are more prone to make mistakes or over-interpretations of data when analyzing work from different tematics, we believe that these guidelines are an effort to improve cross-disciplinary values (and assessments) of observations and conclusions. Different actions could improve transdisciplinary communication and exchange, such as (i) guidelines on how to interpret, understand and use images from different areas of expertise, (ii) the inclusion of imagetic graphical abstracts in articles from all scientific journals, which generally helps the understanding of the paper’s objectives and findings, (iii) the inclusion of detailed figure legends in all the figures and tables from research articles, which are often incomplete and/or with errors, and (iv) the inclusion of schemes and diagrams (secondary data) in articles, which facilitates the understanding of the article’s experimental design, results and conclusions. These actions could benefit both the readers from outside an area and authors trying to work on transdisciplinary studies.

Table IV
Suggested guidelines on specific image types.

The variety of image approaches observed in our study reinforces the importance of image literacy and highlights the study of image by itself in education, maybe as an academic discipline. Furthermore, the data presented here could help the organization of transdisciplinary curricula in universities, research-oriented and technical-oriented institutes, since differences in how results are represented in different fields may create a challenge to transdisciplinary efforts. Interestingly, a study conducted by Wiles (2016) showed that undergraduate students reported a preference for learning by figure analysis over traditional lecture of the whole scientific articles, supporting the idea that images/figures are an important tool for scientific learning and for improving transdisciplinary education. As mentioned in this study, skills required to develop visual literacy are often overlooked in education (Wiles 2016). Another interesting study, conducted by Sandusky et al. (2008), analyzed how scientists utilize specific journal article components, tables, maps, photographs, and graphs, to support both their teaching and research (Sandusky at al. 2008). They found that scientists employ specific article’s figures and tables to (i) create material for educational purposes, (ii) create documents to support performative activities; (iii) make comparisons between a scientist’s own work and the work of other researchers; and (iv) create other information forms and objects (Sandusky at al. 2008). These results point to new trends in the use of images/figures retrieved from articles, such as in the preparation of research projects and articles, in the creation of teaching material, in the study of specific methodologies and techniques, in the evaluation of performance in research career, in the preparation of lectures for the academic public and for extramural activities, and for the advancement of challenging transdisciplinary curricula.

ACKNOWLEDGMENTS

This work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, funding number 302115/2017-0 to C.M., 308192/2021-4 to M.L.C.) and Fundação de Apoio à Pesquisa do Estado do Rio de Janeiro (FAPERJ, funding number E-26/202.920/2019 to C.M., E-26/201.085/2021 to M.L.C.).

REFERENCES

  • AZEVEDO S, SEIXAS MR, JURBERG AD, MERMELSTEIN C & COSTA ML. 2021. Do medicine and cell biology talk to each other? A study of vocabulary similarities between fields. Braz J Med Biol Res 54: e11728. doi: 10.1590/1414-431X2021e11728.
  • CROMEY DW. 2013. Digital images are data: and should be treated as such. Methods Mol Biol 931: 1-27. doi: 10.1007/978-1-62703-056-4_1.
  • DE HERRERA AGS, SCHAER R, BROMURI S & MULLER H. 2016. Overview of the ImageCLEF 2016 medical tasks. In: Working Notes of ImageCLEF, p. 219-232. Evora, Portugal.
  • HASIN Y, SELDIN M & LUSIS A. 2017. Multi-omics approaches to disease. Genome Biol 18: 83. doi: 10.1186/s13059-017-1215-1.
  • JAMBOR H ET AL. 2021. Creating clear and informative image-based figures for scientific publications. PLoS Biol Mar 31;19(3): e3001161. doi: 10.1371/journal.pbio.3001161.
  • LI P, JIANG X, ZHANG G, TRABUCCO JT, RACITI D, SMITH C, RINGWALD M, MARAI GE, ARIGHI C & SHATKAY H. 2021. Utilizing image and caption information for biomedical document classification. Bioinformatics 37 (Suppl1): i468-i476. doi: 10.1093/bioinformatics/btab331.
  • LOPEZ LD, YU J, ARIGHI C, TUDOR CO, TORII M, HUANG H, VIJAY-SHANKER K & WU C. 2013a. A framework for biomedical figure segmentation towards image-based document retrieval. BMC Syst Biol 7 (Suppl 4): S8. doi: 10.1186/1752-0509-7-S4-S8.
  • LOPEZ LD, YU J, ARIGHI CN & WU CH. 2013b. An image-text approach for extracting experimental evidence of protein-protein interactions in the biomedical literature. In Proc ACM BCB, p. 412-418. Washington, DC, USA.
  • MARQUES G, PENGO T & SANDERS MA. 2020. Imaging methods are vastly underreported in biomedical research. Elife 9: e55133. doi: 10.7554/eLife.55133.
  • REIGOTO AM, ANDRADE SA, SEIXAS MCRR, COSTA ML & MERMELSTEIN C. 2021. A comparative study on the use of microscopy in pharmacology and cell biology research. PLoS One 16: e0245795. doi: 10.1371/journal.pone.0245795.
  • ROSSNER M, VAN EPPS H & HILL E. 2007. Show me the data. J Cell Biol 179: 1091-1092. doi: 10.1083/jcb.200711140.
  • SANDUSKY RJ & TENOPIR C. 2008. Finding and using journal-article components: impacts of disaggregation on teaching and research practice. J Am Soc Inf Sci 59: 970-982. doi: 10.1002/asi.20804.
  • SANDUSKY RJ, TENOPIR C & CASADO MM. 2008. Uses of figures and tables from scholarly journal articles in teaching and research. Proc Am Soc Info Sci Tech 44: 1-13. doi: 10.1002/meet.1450440389.
  • SHATKAY H, CHEN N & BLOSTEIN D. 2006. Integrating image data into biomedical text categorization. Bioinformatics 22: e446-53. doi: 10.1093/bioinformatics/btl235.
  • VASAIKAR SV ET AL. 2023. A comprehensive platform for analyzing longitudinal multi-omics data. Nat Commun 14: 1684. https://doi.org/10.1038/s41467-023-37432-w.
    » https://doi.org/10.1038/s41467-023-37432-w
  • VIEGAS FB, WATTENBERG M & FEINBERG J. 2009. Participatory visualization with Wordle. IEEE Trans Vis Comput Graph 15: 1137-1144. doi: 10.1109/TVCG.2009.171.
  • WALTMAN L, VAN ECK NJ & NOYONS ECM. 2010. A unified approach to mapping and clustering of bibliometric networks. J Informetr 4: 629-635. doi: 10.48550/arXiv.1006.1032.
  • WILES AM. 2016. Figure analysis: A teaching technique to promote visual literacy and active Learning. Biochem Mol Biol Educ. 44: 336-344. doi: 10.1002/bmb.20953.

Publication Dates

  • Publication in this collection
    17 Mar 2025
  • Date of issue
    2025

History

  • Received
    13 Sept 2024
  • Accepted
    9 Dec 2024
location_on
Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 3907-8100 - Rio de Janeiro - RJ - Brazil
E-mail: aabc@abc.org.br
rss_feed Acompanhe os números deste periódico no seu leitor de RSS
Reportar erro