Acessibilidade / Reportar erro

EXPLORING THE CO-AUTHORSHIP NETWORK AMONG CNPQ’S PRODUCTIVITY FELLOWS IN THE AREA OF INDUSTRIAL ENGINEERING

ABSTRACT

In this article, we have built a co-authorship network among researchers with CNPQ grant in research productivity (PQ) in the area of Industrial Engineering and analyze which Social Network Analysis metrics impact their productivity level. Unlike other studies that mostly analyze unweighted networks, ours explored more broadly the network since the metrics were calculated in three ways: unweighted, including the edges weights and including the edges and nodes’ attributes. Thus, the generated results are more precise and detailed since more information is obtained. We consider the h-index of the researchers as the nodes’ attributes and measured the impact using Kendall correlation. We show that geographical distance is still a barrier to collaboration among PQs in this area and that collaboration with researchers with different levels of grant has the greatest impact in the level of the grant a researcher has.

Keywords:
weighted co-authorship network; nodes’ attributes; scientific productivity

1 INTRODUCTION

Co-authorship, development of a publication by two or more authors, is a form of collaboration. For (Hudson 199619 HUDSON J. 1996. Trends in Multi-Authored Papers in Economics. Journal of Economic Perspectives, 10: 153-158.), co-authorship is the most formal expression of intellectual collaboration in scientific research, and the biggest gain of the collaboration is to enable an efficient task division, through the complementary skills or synergy (joint creation of new ideas, not achieved individually). The result of this efficient task division is a scientific production of higher quality and/or quantity. These results had already been reported by (Barnett et al. 19888 BARNETT AH, AULT RW & KASERMAN DL. 1988. The Rising Incidence of Co-authorship in Economics: Further Evidence. Review of Economics and Statistics, 70: 539-543.) as the reason that leads researchers to work together. Other works, such as (Eaton et al. 199911 EATON JP, WARD JC & KUMAR A. 1999. Structural Analysis of Co-Author Relationships and Author Productivity in Selected Outlets for Consumer Behavior Research. Journal of Consumer Psychology, 8: 39-59.) and (Lee & Bozeman 200525 LEE S & BOZEMAN B. 2005. The impact of research collaboration on scientific productivity. Social Studies of Science, 35: 673-702.), also point productivity as a result of collaboration. (Hart 2000) shows that collaboration improves the quality of publications.

Co-authoring, writing the same paper with other authors, is a form of collaboration which implies a temporal and academic relationship, where authors share ideas and resources. One of the most famous co-authorship networks is the mathematician Paul Erdos network, which has more than 500 co-authors and more than 1,400 published works. The role of Erdos as a collaborator was so significant in the field of mathematics that the Erdos number is set to measure the proximity to Erdos through network co-authorship (Liu et al., 201426 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to co-author networks. Physica A, 419: 29-39.). Anyone who has published with Erdos has an Erdos number equal to 1, those having a publication with a co-author Erdos have an Erdos number equal to 2, and so on (Newman, 2001c31 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337-370.).

For (Kumar 2015), studies on co-authorship gained new interest after (Newman 2001a29 NEWMAN MEJ. 2001a. The structure of scientific collaboration networks. Proceeding of the National Academy of Sciences, 98: 404-409., b30 NEWMAN MEJ. 2001b. Scientific collaboration networks. I. Network constructions and fundamental results. Physical Review E, vol. 64., c31 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337-370., 200432 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.) have used methods of social network analysis to investigate the characteristics and interesting patterns of academic communities. (Kempe & Kleinberg 200522 KEMPE D, KLEINBERG J & TARDOS E. 2005. Influential nodes in a diffusion model for social networks. In: Automata, Languages and Programming, Springer, vol. 3580, p. 1127-1138.) also reported the emergence of many researches of co-authorship network analysis that try to identify the most influential authors in it. It is also observed that the works of (Huang et al. 201318 HUANG P-Y, LIU H-Y, CHEN C-H & CHENG P-J. 2013. The impact of social diversity and dynamic influence propagation for identifying influencers in social networks, in: Web Intelligence WI and Intelligent Agent Technologies IAT, 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1, p. 410-416.) and (Liu et al. 201426 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to co-author networks. Physica A, 419: 29-39.) sought to study the co-authorship network to assess the status of an author in a particular field, and thus enhance the relations to get closer to the community core by identifying the most influential researchers.

There exists an increasing interest in the study of the influence of the social structure on the behavior and performance of the researchers through social network analysis (SNA). Many of these studies seek to correlate the key centrality metrics of the network with measures based on the number of citations, such as the h-index (“A researcher has a h-index of k, if k of N his works have at least k citations each, and the other (N - k) papers have at most k citations each”, Hirsch, 200517 HIRSCH JE. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102: 16569-16572.). Such measures, among other factors, can be used to determine the quality of publications.

The work of (Yan & Ding 200940 YAN E & DING Y. 2009. Applying centrality measures to impact analysis: A coauthorship network analysis. Journal of the American Society for Information Science and Technology, 60: 2107-2118.) correlated four centrality metrics (degree, closeness, betweenness and PageRank) with the number of citations of publications of the authors of a co-authorship network. These metrics had significant correlations with the citation counts, especially thebetweenness centrality.

In the study of (Abbasi & Altmann 20112 ABBASI A, ALTMANN J & HOSSAIN L. 2011. Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5: 94-607.), all normalized metrics of degree centrality, betweenness centrality, closeness centrality, and the weighted degree centrality and efficiency (the ratio between the total number of distinct groups, whose nodes are directly connected, connected by a single central node and the degree of this node) were correlated with the h-index. The results showed that the h-index of the researchers had significant positive correlations only with the degree centrality and efficiency. In the same year, (Abbasi et al. 20111 ABBASI A & ALTMANN J. 2011. “On the correlation between research performance and social network analysis measures applied to research collaboration networks”. In: Hawaii International Conference on System Sciences, Proceedings of the 41st Annual. Waikoloa, HI: IEEE.) published a paper that analyzed the influence of six SNA metrics: degree centrality, closeness centrality, betweenness centrality, eigenvector centrality (all such normalized), the average links strength and efficiency; on the g-index (the g index is defined as follows: “Given a set of ordered papers in a decreasing way with respect to the number of citations, the g index is the highest value of g in that the first g articles receive together at least g 2 citations” Egghe, 200612 EGGHE L. 2006. Theory and practise of the g-index. Scientometrics, 69: 131-152.). The authors concluded that only the normalized degree centrality, efficiency, and the average links strength had significant influences on the g-index.

Another work that also correlates analytical metrics of social networking with the h-index was published by (Wanderley et al. 201437 WANDERLEY AJ, DUARTE AN, BRITO AV DE, PRESTES MAS & FRAGOSO FC. 2014. Identificando correlações entre métricas de Análise de Redes Sociais e o h-index de pesquisadores de Ciência da Computação. In. XXXIV Congresso da Sociedade Brasileira de Computação - CSBC, 2014.). The authors created a co-authorship network among researchers of Computer Science and calculated normalized metrics of degree centrality, closeness centrality, and betweenness centrality, weighted degree centrality and authority (calculated by adding the number of hubs, nodes with many links, with which a node is connected). Only the betweenness centrality and the weighted degree centrality had significant positive correlations. The authority also showed significant, but negative correlation.

(Souza et al. 201636 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.), in a co-authorship network among researchers with CNPq grant in research productivity in the area of Statistics, showed that the most productive fellows are also the most central in the network and that the metrics degree centrality and closeness centrality had a higher impact on the number of articles published by a fellow.

According to (CNPq 201510 CN. 2015. Critérios de Julgamento. Available in: <Available in: http://www.cn.br/web/guest/criterios-de-julgamento >. Access em: 28 de janeiro de 2015.
http://www.cn.br/web/guest/criterios-de-...
), the research productivity fellowship (PQ) is organized in levels, in ascending order: 2, 1D, 1C, 1B, 1A. The PQ is attributed to researchers from all areas of knowledge in Brazil, based not only on the quality of a submitted project, but mainly in the “quality” of the researcher (Wainer & Vieira, 201338 WAINER J & VIEIRA P. 2013. Correlation between bibliometrics and peer evaluation for all disciplines: the avaluation of Brazilian scientists. Scientometrics online, 96: 395-410.).

The work of (Fonseca & Digiampietri 2016) builds two kinds of classifiers using as attributes considering SNA metrics and other bibliometric measures. The first kind identifies among researchers in the area of Computer Science who are the fellowships holders and the other kind identifies the fellowship level of a given researcher. Other studies analyzing the impact of the co-authorship networks in the performance of the researchers were presented in (Andrade & Rêgo 2015a7 ANDRADE RL & RÊGO LC. 2015b. A influência da rede de coautoria no nível das bolsas de produtividade da área de engenharia de produção. In: XXXV Congresso da Sociedade Brasileira de Computação - CSBC., 2015b7 ANDRADE RL & RÊGO LC. 2015b. A influência da rede de coautoria no nível das bolsas de produtividade da área de engenharia de produção. In: XXXV Congresso da Sociedade Brasileira de Computação - CSBC.).

In these previous works, weighted metrics were not used, i.e., metrics calculated consideringthe frequency of the collaboration, with the exception of the weighted degree centrality. To the best of our knowledge, there are few works exploring such metrics in SNA. (Liu et al. 2015) proposed a method that inserts the importance (based on citations) of researchers in co-authorship network structures redefining the weight of the edges. This weight is used in the calculation of PageRank, applied to the Erdos network, to identify the most influential authors. (Andrade 20165 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de Pós-Graduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado - Programa de Pós-graduação em Engenharia de Produção/UFPE.) also developed a metric that inserts the importance of nodes in the network structure, in this work, the weight of a given edge is equal to the average of the nodes’ attributes connected by this edge times the original weight of the edge. This work has addressed the impact of the nodes’ attributes in different SNA metrics.

The objective of this work is to identify the researchers with the CNPq grant in research productivity in the area of Industrial Engineering in Brazil, to analyze their academic achievements in terms of published papers, to construct a co-authorship network among such researchers, to analyze the characteristics of the network and to verify which SNA metrics have more impact in their productivity level. SNA metrics will be calculated in three ways: unweighted, weighted with the weight the edges and weighted with weights of the edges and the nodes’ attributes.

Our third analysis of social network metrics is in the context of non-homogenous nodes. Thus being in equivalent positions in the network may have different impacts on the SNA metrics if nodes’ attributes are taken into consideration. The h-factor is a node attribute which was freely available to use and is clearly related to the prestige of some researcher. Therefore, it was chosen to be applied in this context.

The structure of this work is divided as follows: in this first section we present a review of the studies that analyzed the influence of the SNA metrics, co-authorship networks and the performance of the researchers; Section 2 briefly presents the SNA metrics used in this work; Section 3 describes the methodology used to create the co-authorship network; the co-authorship network and the impact of individual SNA metrics on the level of productivity is presented in Section 4. Finally, Section 5 presents the final considerations of the study and proposals for future work.

2 SOCIAL NETWORK ANALYSIS METRICS

A weighted network can be defined as a set of nodes, V(G), a set of edges, E(G), which consists of ordered pairs of nodes, and a weighted adjacency matrix, W(G), where w(vi, vj ) represents the weight associated with the edge connecting the pair of vertices, vi and vj . We assume that w(vi, vj ) = w(vj, vi ), since co-authorship is a symmetric relation.

(Liu et al. 2015) and (Andrade 20165 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de Pós-Graduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado - Programa de Pós-graduação em Engenharia de Produção/UFPE.) developed methods to include nodes’ attributes in the SNA. Thus, it is possible to classify networks as unweighted, weigthed by edges and weighted by edges and nodes. Since Liu et al.’s method transform the network into an asymmetric relation, we do not view it as appropriate to study co-authorship. Therefore, we focus here on Andrade’s method which mantains the symmetric nature of co-authorships.

The metric proposed by (Andrade 20165 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de Pós-Graduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado - Programa de Pós-graduação em Engenharia de Produção/UFPE.) as a way to take into consideration the importance of the node in the network context is given by:

Z ( v i , v j ) = w ( v i , v j ) × ( s ( v i ) + s ( v j ) 2 ) , (1)

where Z(vi, vj ) equals the edge weight w(vi, vj ) between vertices vi and vj , combined with the attributes of these vertices s(vi ) and s(vj ), respectively. With the incorporation of the attributes of the nodes in the network, Z(G) shall be the new weighted adjacency matrix and Z(vi, vj ) the new edge weight between vertices vi and vj . The attributes of the vertices are measurable characteristics associated with the type of relationship that connects them.

A binary network with n vertices is represented by an adjacency matrix A(G) with n × n elements, where

a ( v i , v j ) = { 1 i f ( v i , v j ) E ( G ) , i . e . i f v i a n d v j a r e c o n n e c t e d , 0 o t h e r w i s e . (2)

The SNA metrics can be divided into global, describing the characteristics of the whole graph, and individual, which are related to the analysis of individual properties of network actors (nodes or vertices).

The number of edges, as the name suggests, refers to the cardinality of the set of edges, E(G), denoted by #E(G).

A path between two vertices, vi and vj , is a sequence of vertices c = (v 0, v 1, v 2, ...vk ) such that v 0 = vi , vk = vj , vl is adjacent to v ( l +1), for l = 0, 1, …, k - 1 and there is no pair of vertices that appear more than once in the sequence. A set of vertices is said to be connected if there exists a path between any two vertices in the set. A graph is connected if there is a path between any two vertices and is complete if every vertice is connected to one another.

The density calculates how close the graph is to being complete. That is, the relationship between total connections in the graph and the total connections if all vertices were connected to each other. For an undirected graph with n nodes, the density is defined as:

D e n s ( G ) = 2 × ( # E ( G ) ) n × ( n 1 ) (3)

A geodesic path or shortest path is the shortest path between two vertices, (Newman 200432 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.). The geodesic path length, d(vi , vj ), also called geodesic distance or shortest distance, thus, is the shortest distance in the network between these two vertices. Given a path c = (v 0, v 1, v 2, ...vk ) between vertices vi and vj , the length of this path is given by dc . Let C(vi , vj ) be the set of all paths between vertices vi and vj , then the geodesic distance is defined by:

d ( v i , v j ) = min { d c : c C ( v i , v j ) } . (4)

In the case of weighted networks, the length of a path c = (v 0, v 1, v 2, ...vk ) between vertices vi and vj , can be formally defined by Dijkstras algorithm (Newman 200131 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337-370.) and (Brandes 2001):

d c w = ( 1 w ( v 0 , v 1 ) + 1 w ( v 1 , v 2 ) + + 1 w ( v ( k 1 ) , v k ) ) . (5)

And the weighted geodesic distance is given by:

d w ( v 1 , v 2 ) = min { d c w : c C ( v i , v j ) } . (6)

The largest geodesic distance between any pair of vertices is called the diameter of a graph and in a binary network, it can vary from a minimum of 1, if the graph is complete, to a maximum of n - 1, where n is the number of vertices in the graph. Formally, the diameter of the connected graph G is given by:

D i m ( G ) = max { v i , v j V ( G ) } d ( v i , v j ) (7)

In case of weighted networks, the weighted diameter is calculated using the weighted geodesic distance, dw (vi, vj ).

Also known as the giant component, the size of the largest connected component, refers to the cardinality of the connected component with the highest number of nodes.

Given a vertex vi , eccentricity, e(vi ), is the maximum distance from it to any other vertex of the graph. The relationship of a vertex to other vertices is better the smaller the eccentricity. The vi eccentricity is given by:

e ( v i ) = max v j V ( G ) d ( v i , v j ) (8)

The diameter, as defined above, is equal to the maximum eccentricity, while the minimum eccentricity is the radius. In the case of weighted networks, the eccentricity may be calculated using the weighted geodesic distance, dw (vi, vj ).

The degree centrality, proposed by (Freeman 197914 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.), is calculated in terms of the number of adjacent vertices, namely, degree centrality of the vertex vi , denoted by Cd (vi ) is the number of vertices adjacent to vertex vi . Formally the degree centrality is defined by:

C d ( v i ) = j = 1 n a ( v i , v j ) (9)

If the network is weighted, the degree centrality of vertex vi is equal to the sum of the weights of the edges that are connected to the vertex vi . For (Newman 200432 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.) and (Barrat et al. 2004) the weighted degree centrality is defined by:

C d w ( v i ) = j = 1 n w ( v i , v j ) . (10)

The degree centrality is the simplest and easiest way to measure the influence of a node (Abbasi et al., 20123 ABBASI A, HOSSAINA L & LEYDESDORFF L. 2012. Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks. Journal of Informetrics, 6: 403-412.; Liu et al., 200527 LIU X, BOLLEN J, NELSON ML & SOMPEL H VAN DE. 2005. Co-authorship networks in the digital library research community. Information Processing and Management, 41: 1461-1480.). In a co-authorship network, this metric identifies the most active and popular authors (Abbasi et al., 20112 ABBASI A, ALTMANN J & HOSSAIN L. 2011. Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5: 94-607.; Anastasios et al., 20124 ANASTASIOS T, SGOIROPOULOU C, PAPAGEORGIOU E, TERRAZ O & MIAOULIS, G. 2012. Co-authorship networks in academic research communities: the role of network strength. 16th Panhellenic Conference on Informatics.; Freeman, 197914 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.).

Another metric to analyze a node on the network started from the theory of “the strong links” of (Krackhardt 199224 KRACKHARDT D. 1992. The strength of strong ties: The importance of philos in organizations. In Networks and Organizations: Structure, Form, and Action, p. 216-239.). The average link strength of vertex vi is defined as the ration between the weighted degree, Cdw(vi ), and the degree centrality, Cd (vi ):

L S ( i ) = C d w ( v i ) C d ( v i ) (11)

Therefore, LS(i) represents the average weight of the links of node vi .

A metric that takes into consideration the geodesic distance from a given initial node to all other nodes of the network is the closeness centrality. (Freeman 1978) asserted that the closeness centrality of vertex vi , defined by Cc (vi ), is given by:

C c ( v i ) = 1 j d ( v i , v j ) (12)

The most central vertices in a network according to this metric are those that have a smaller distance to the other vertices. In weighted networks, the weighted closeness centrality is given by:

C c w ( v i ) = 1 j d w ( v i , v j ) (13)

A node that is on average in a position closer to the other nodes can get information more efficiently, i.e., closeness metric is related to the independence and efficient communication with other nodes (Freeman, 197914 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.).

The betweenness centrality of vertex vi is the sum, for every pair of nodes different from vi , of the ratio between the number of shortest paths between the given pair of nodes that go through vi , and the total number of shortest paths between the given pair of nodes (Freeman, 197914 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.; Wasserman & Faust, 199439 WASSERMAN S & FAUST K. 1994. Social Networks Analysis: Methods and Applications. Cambridge University Press. Structural analysis in social the social sciences series, vol. 8.). The betweenness centrality, Cb (vi ), of vertex vi is given by:

C b ( v i ) = j , k g ( v j , v i , v k ) g ( v j , v k ) , j k i , (14)

where g(vj, vk ) is the number of shortest paths between vertex vj and vertex vk and g(vj, vi, vk ) is the number of shortest paths between vertex vj and vertex vk going through vi .

In a weighted network the betweenness centrality is given by:

C b w ( v i ) = j , k g w ( v j , v i , v k ) g w ( v j , v k ) , (15)

where gw (vj, vk ) is the number of weighted shortest paths between vertex vj and vertex vk and gw (vj, vi, vk ) is the number of weighted shortest paths between vertex vj and vertex vk going through vi , considering the weighted distance, dw (vi, vj ).

The betweenness is an indicator of the potential of a node to play the role of “mediator” or “gatekeeper” (Freeman, 197914 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.; Abbasi et al., 20123 ABBASI A, HOSSAINA L & LEYDESDORFF L. 2012. Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks. Journal of Informetrics, 6: 403-412.), being able to control more often the flow of information on the network.

A metric of importance of the vertex in the network based on the connections, the eigenvector centrality is supported on the idea that a particular node will have high centrality if it is connected to vertices with central positions in the network (Bonacich, 19879 BONACICH P. 1987. Power and centrality: a family of measures. The American Journal of Sociology, 92: 1170-1182.). In other words, the centrality of the vertex does not depend only on the number of adjacent vertices but also on the centrality of these vertices. Let λ be a constant, then the eigenvector centrality of Ce (vi ) is given by:

C e ( v i ) = 1 λ j = 1 n a ( v i , v j ) C e ( v j ) (16)

Using the vector notation, let X = (Ce (1), Ce (2) ... Ce (n)) be the vector of eigenvector centralities, we can rewrite Equation (14) as λX = AX. By assuming that the eigenvector centrality assumes only non-negative values (using the Perron-Frobenius theorem), it can be shown that λ is the largest eigenvalue of the adjacency matrix, where X is the corresponding eigenvector (Jackson, 200820 JACKSON MO. 2008. Social and Economic Networks. Princeton University Press. Stanford University, February 2008.).

In the case of weighted networks, the elements of the adjacency matrix are the weights of the edges, w(vi, vj ), (Newman, 200432 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.). And the eigenvector centrality is defined by:

C e w ( v i ) = 1 λ j = 1 n w ( v i , v j ) C e w ( v j ) (17)

The local clustering coefficient indicates how connected are the nodes adjacent to a given node and together with the average value of the shortest path, can identify a “small-world” effect (networks with large cluster coefficient and the relatively short distance between the nodes), Watts & Strogatz (1998). The clustering coefficient of a vertex vi is the ratio of the number of triangles that contains vertex vi and the number of possible edges between the neighboring vertices. Let NT(vi ) be the number of triangles (consists of three nodes connected by three links) containing vertex vi . For (Onnela et al. 2005), the local cluster coefficient is defined as:

C C L ( v i ) = 2 N T ( v i ) C d ( v i ) ( C d ( v i ) 1 ) (18)

The weighted local clustering coefficient was proposed by (Onnela et al. 2005) and is given by:

C C L w ( v i ) = 2 C d ( v i ) ( C d ( v i ) 1 ) j , k ( w ^ ( v i , v j ) w ^ ( v i , v k ) w ^ ( v j , v k ) ) 1 / 3 , (19)

where the weights of the edges are normalized by the maximum weight of the network, w^(vi, vj ) = w(vi, vj )/maxi,j V ( G ) (w(vi, vj )) and the contribution of each triangle depends on all the weights of the edges.

The average clustering coefficient is the average value of the individual or local coefficientsand is given by:

C L ( G ) = 1 n i C C L ( v i ) (20)

The clustering coefficient, CL(G), for the co-authorship network refers to the probability that any two collaborators of a researcher have collaborated with each other (Onel et al., 201133 ONEL S, ZEID A & KAMARTHI S. 2011. The structure and analysis of nanotechnology co-author and citation networks. Scientometrics, 89: 119-138.). In the individual case, the clustering coefficient of a particular author indicates how his collaborators are working together.

PageRank is a method of ranking web pages, measuring effectively the interest of browsers and attention devoted to them, (Page at al. 1999). The PageRank considers the number and quality of links to a web page in order to determine how influential it is (Liu et al., 201426 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to co-author networks. Physica A, 419: 29-39.). Let TA be a web page and Ti one of the web pages that connects to TA . (Brin & Page 1998) defined PageRank as follows:

P R ( T A ) = ( 1 δ ) + δ ( P R ( T 1 ) C ( T 1 ) + + P R ( T n ) C ( T n ) ) , (21)

where PR(TA ) is the PageRank of page TA , PR(Ti ) is the PageRank of page Ti , C(Ti ) is the number of outbound links on page Ti and δ is a damping factor (assuming that a person randomly clicks on pages and eventually stops clicking, δ is the probability at any given moment, the person will continue to click), which can be set between 0 and 1.

In the study of (Santos 201435 SANTOS AM DOS. 2014. Aplicações de modelos de grafos na análise de conceitos e de redes sociais. Recife. 2014. 162 p. Doutorado - Programa de Pós-graduação em Estatística/UFPE.) on co-authorship networks, it was proposed a metric to evaluate the benefit or utility for a given author of belonging to a certain network structure. According to this metric, it is considered that each author has a finite amount of time to devote to scientific collaborations, and that each author receives a utility from an adjacent author who is equal to the proportion of papers that the co-author has with him more the formation of a synergy, which is given by the product of the dedication of each author to the collaboration. Formally, a utility Uw (vi ) of a given author vi in a given graph G is given by:

U w ( v i ) = j ( w ( v i , v j ) C d w ( v i ) + w ( v i , v j ) C d w ( v j ) + w ( v i , v j ) 2 C d w ( v i ) C d w ( v j ) ) , (22)

where w(vi, vj ) is the total number of works between authors vi and vj , Cdw(vi ) and Cdw(vj ) are the weighted degrees of these authors, respectively.

The utility developed by (Santos 201435 SANTOS AM DOS. 2014. Aplicações de modelos de grafos na análise de conceitos e de redes sociais. Recife. 2014. 162 p. Doutorado - Programa de Pós-graduação em Estatística/UFPE.) was based on the original model of the utility of (Jackson & Wolinsky 199621 JACKSON MO & WOLINSKY A. 1996. A Strategic Model of Social and Economic Networks. Journal of economic theory, 71: 44-74.). This model takes into account only if the author is or is not connected to another author, disregarding the number of works done together. Thus the utility of a particular author vj in a given graph G is given by:

U ( v i ) = j ( 1 C d ( v i ) + 1 C d ( v j ) + 1 C d ( v i ) C d ( v j ) ) , (23)

where Cd (vi ) and Cd (vj ) are the centrality degree of vertices vi and vj , respectively.

To analyze the degree of externality and internality of relations (heterophilia and homophilia, respectively) in a network where the actors are labeled or partitioned by one or several of their features, (Krackhardt & Stern 198823 KRACKHARDT D & STERN R. 1988. Informal networks and organizational crises: An experimental simulation. Social Psychology Quarterly, 51: 123-140.) proposed a metric called E-I index that assesses the trends of connections between members of the partition cells, comparing the number of connections within and outside the partition cells (Hanneman & Riddle, 200516 HANNEMAN RA & RIDDLE M. 2005. Introduction to social network methods. Riverside: University of Califórnia.).

E I i n d e x = E L E I E L + E I , (24)

where EL is the number of external relations and EI is the number of internal relations.

The E-I index has values ranging from - 1 to +1. Values close to +1 indicates a higher tendency of the relationship between actors of different cells of the partition (heterophilia), while values closer to - 1 reveal a propensity of actors to relate internally to other actors in the same cell of the partition (homophilia). If the links are equally divided, the E-I index is equal to zero. We also assume that isolated nodes have E-I index equal to zero, since they do not favor neither external nor internal links.

In a weighted network the E-I index can be calculated using the weight of the edges, this way EL is the sum of the edge weights that connect different cells of the partition and EI is the sum of the edge weights that connect actors of the same cell of the partition.

3 METHODOLOGY

In this section, we describe the methodology used to collect data, build the network, calculate SNA metrics and the statistical methods applied.

3.1 Obtaining data and building the co-authorship network

For the construction of a co-authorship network between researchers with CNPq grant in research productivity in the area of Industrial Engineering in Brazil, it was considered as the only data source the list of articles published in journals and those accepted for publications between 2005 and 2014. The following steps were taken to build the network: identification of the researchers and their fellowship level; identification of the Lattes curriculum (the “Lattes Curriculum” presents a history of the scientific activies, academic and professional of the researchers registered in the Lattes Platform (lattes.cnpq.br)) of researchers; identification of the h-index; extraction of the publications; identification of the publications in co-authorship; production co-authorship network; calculation of SNA metrics.

The identification of researchers in the area of Industrial Engineering in Brazil was obtainedfrom the CNPq website. On March 2, 2015, there were in total 145 of them, and these were used in the network construction.

The academic data presented in this study was obtained from the Lattes Platform, which reflects the experience of CNPq in integrating curricula databases. The identification of the Lattes curriculum was held in parallel with the identification of the fellows, because at the moment they were identified, their Lattes IDs (16-digit code that the CNPq uses as an identifier of each Lattes CV) were also registered.

The h-index of the fellows was obtained on the “Indicators of Production” in the CNPq site when using a search engine for Lattes curricula and click on the name of the researcher. In this tab, it is available the h-index calculated by the Web of Science and Scopus. The Scopus h-index was considered because the database of Scopus is greater than that of the Web of Science and thus includes more papers that are listed in the Lattes Curriculum.

To extract the publications of the fellows and to analyze the co-authorships relations, the scriptLattes (Mena-Chalco & Cesar Jr, 2009) was used. With the co-authorship relations found by scriptLattes, a network was built and the calculation of the metrics of this network were performed using the software NetworkX in three ways: unweighted; weighted by edges; weighted by edges and nodes. The metrics applied in this work were: E-I index, Degree centrality, Closeness centrality, Betweenness centrality, Eigenvector centrality, PageRank, Local clustering coefficient, Eccentricity and Utility. All described in Section 2.

3.2 Analysis of the influences of SNA metrics at the fellowships level

The effect of SNA metrics on researchers’ fellowships level will be evidenced by the following means: (i) tables ranking the top 10 researchers; (ii) Kendall correlations between fellowship levels and SNA metrics; (iii) boxplot graphs that compares the distributions of the metrics at the different fellowship levels; and (iv) using a logistic regression model.

The effect of the SNA metrics on the fellowship level of the researchers using the logistic regression was made in three ways, considering only the unweighted metrics, then the weighted metrics and finally the metrics that incorporate the node’s attributes. The method of regression applied was the backward stepwise, this method is characterized by incorporating all variables and then, per step, one variable at a time can be eliminated. Each step removes the least significant variable and the process ends when all variables of the model have p-values less than or equal to the specified significance level (α), here we adopt a equal to 0.1. To ascertain the existence of multicollinearity in the model, before applying the method backward stepwise, we use the Variance Influencing Factor (VIF) in order to avoid adjustment or imprecision problems. This problem exists when there is an exact or approximate linear dependence between the covariates of the model and, generally, the VIF is indicative of multicollinearity problems if VIF>10, Hair (200915 HAIR JF, BLACK WC, BABIN BJ, ANDERSON RE & TATHAM RL. 2009. Análise multivariada de dados. Bookman Editora.). To eliminate the effects of multicollinearity, we first calculate the VIF for each variable, considering all of them in the model. Then we eliminate the one with the highest VIF and repeat the process until all VIFs are less than 10.

4 PRESENTATION OF CO-AUTHORSHIP NETWORK AMONG RESEARCHERS

WITH CNPQ GRANT IN RESEARCH PRODUCTIVITY AND IMPACT OF SNA

METRICS IN FELLOWSHIP LEVEL

The co-authorship network among researchers with CNPq grant in research productivity in the area of Industrial Engineering was built using bibliometric data from the period between 2005 and 2014. A total of 3,796 full papers published in journals and 89 accepted for publication were analyzed in the period, totaling 3,885 papers. Distributed among 145 productivity fellows, an average of 2.679 papers for each fellow per year. From these papers, 1,026 were carried out in co-authorship. Table 1 presents an overview of the macro network level. In a similar work, (Souza et al. 201636 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.) found a total of 935 papers published by 68 CNPq productivity research fellows in the area of Probability and Statistics from 2009 to 2013, which gives an average of 2.75 papers for each fellow per year.

Table 1
Overview of the macro level network.

The network is divided into 33 components, and the giant component consists of 92 vertices, representing approximately 63.45% of the network vertices; the second largest component has 8 vertices (5.52%) and 21 researchers are isolated in the network, in other words, about 14.48% of the fellows do not have collaborators on the network. In the work of (Souza et al. 201636 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.), the giant component corresponded to 70.59% of the nework vertices and isolated nodes corresponded to 13.24%, showing that the Probability and Statistics community seems to be more connected than the Industrial Engineering one. Thus on average, a researcher collaborated with a little over 2 other researchers holding CNPq grant in research productivity.

The network contains 161 edges, which gives an average centrality degree of 2.221 and a density equal to 0.015, that is, only 1.5% of the possible connections in the network occur. Thus on average, a researcher collaborated with a little over 2 other researchers holding CNPq grantin research productivity during this 10-year period. A low density of 4.7% with an average centrality degree of 3.147 was also found by (Souza et al. 201636 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.). This result is superior to the one present in this work, even though Souza et al. only considered papers published or accepted for publication in the period of 2009 to 2013, half of the length of time considered here. However, these low densities can be justified by the fact that the network is formed only by a small part of the researchers’ production (only papers published in journals and papers accepted for publications in a certain period of time) and only analyzes collaboration among CNPq grant in research productivity in the same area, not taking into account collaboration with other researchers. Moreover, since Industrial Engineering encompasses a diverse number of sub-areas, that result suggests that the fellowships are dispersed along different sub-areas, what reduces the chance of a collaboration among those researchers.

The network diameter is equal to 13 and the radius 0, representing the maximum and the minimum eccentricity, respectively, and radius of the giant component is equal to 7. The average clustering coefficient is equal to 0.293, knowing that this coefficient may vary from 0 to 1, then we have that just under a third of the possible co-authorships among co-authors of a given author are present on the network. (Souza et al. 201636 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.) found an average clustering coefficient of 0.31 in their network, suggesting that the Probability and Statistics community seems to bemore cohesive than the Industrial Engineering one.

The average distance of a path between a pair of vertices is approximately 6.00. This value refers to the giant component and means that, on average, 6.00 connections separate two researchers in that component. The number of shortest paths is 8,464.

Figure 1 illustrates the co-authorship network of the fellows generated by the software Gephi, where the thickness of the edges is proportional to their weights (total papers co-authored), and the diameter of the vertex is proportional to its centrality degree.

Figure 1
Co-authorship network among fellows.

The SNA metrics of this co-authorship network were calculated in three ways: non-weighted (NP), weighted by edges (W) and weighted by edges and nodes (Z). The results are shown next.

4.1 E-I Index

To analyze the degree of externality and internality of the researchers, where they were labeled by the fellowship levels, the E-I indexl metric was used. There is a significant correlation between the E-I indexl and the fellowship level which is equal to 0.406 (at a significance level of 0.01). Thus, researchers who establish relationships with researchers with different fellowship levels tend to have higher fellowship levels. If relations take into account the weights of the edges (E-I indexl _W), the correlation with the fellowship level has a decrease and is equal to 0.343 (at a significance level of 0.01). Whereas considering both the edges and nodes’ attributes (E-I indexl _Z), the correlation is equal to 0.302 (at significance level 0.01).

However, this result is somewhat misleading since one has to take into account the distribution of the fellowship levels among researchers. Of 145 researchers, 86 (59.31%) had fellowship level 2; 28 (19.31%) had fellowship level 1D; 12 (8.28%) with fellowship level 1C; 7 (4.83%) with fellowship level 1B and 12 (8.28%) with fellowship level 1A. Thus, assuming the formation of co-authorships at random, there is a greater likelihood of researchers from lower levels to engage in collaboration with researchers of the same level and of researchers of higher levels to engage in collaboration with researchers from different levels. Table 2 shows a comparison of the actual value with the theoretical, this is, what was expected if all nodes were connected. From this comparison, one can conclude that researchers with fellowship levels 2 and 1C tend to have higher collaboration with researchers with different fellowship levels than what is expected if collaboration is chosen at random. On the other hand, researchers with fellowship levels 1D, 1B and 1A have the opposite behavior.

Table 2
Comparison between the real and theoretical E-I index.

Figure 2 shows the distribution of the E-I indexl , E-I indexl _W and E-I indexl _Z, respectively, at different levels of fellowships. The largest variations are presented by the levels 2 and 1D. Level 2 shows the smaller median.

Figure 2
Box plots for the E-I indexl metrics for fellowship level diferences.

The level 1C has the smallest variation, almost all researchers in that level have E-I index equal to 1, meaning that relationships are strictly external. All level 1B researchers have more external than internal links. One can also observe that the inclusion of either the weights of the links or the nodes’ attributes mantain the main characteristics of the E-I indexl accross the different fellowship levels.

We also verify the external and internal relations of the researchers when considered the region of actuation, that is, the regional location of the institution that they operate. We present the result in Figure 3. The center-west region has only two researchers, one has no relationship and the other, therefore, has an external relationship. Researchers in the northeast, southeast, and south regions have a predominance of internal relations over external. It is also observed that the external relations of the researchers of the southeast, as well as those of the south, are less intense, that is, they collaborate little with the same researchers from other regions. Therefore, the results show that geographical distances are still a main barrier to be overcomed by the researchers in such community. There is no significant correlation between the E-I indexr , the E-I indexr _W and the E-I indexr _Z for geographical regions with the fellowship level.

Figure 3
Box plots for the E-I indexr metrics for regional differences.

4.2 Degree Centrality

Table 3 ranks the 10 researchers with higher degree centrality, calculated in three ways: unweighted degree centrality - UDC; W-weighted degree centrality - WDC; and Z-weighted degree centrality - ZDC. In this table, it can be seen how the number of links, the frequencies of the links and the combination of the frequency of links with the weights of the nodes change the positions of researchers. For example, the researcher PQ124 appears in the first position in the UDC, but when considering the weight of the links and the importance of the node this research does not appear in the top ten. In this case, the researcher PQ124 has the greatest number of co-authors, but with lower frequencies of collaboration compared, for example, with researcher PQ62 who was second in UDC and assumes the first position in WDC and ZDC. Even though being such a central node in the network, PQ62 is only level 1D in his productivity grant.

Table 3
The 10 researchers best positioned according to degree centrality.

Regarding the productivity level of the researchers of lower levels (2 and 1D), they occupy positions among the top 10, mainly when considering WDC and the ZDC. Surprisingly, UNISINOS is the institution having more PQs in the top 10 according to WDC and ZDC, all of them being level 2.

Table 4 shows the correlations of the three degree centrality metrics with the fellowship level. It is observed that only WDC does not have a significant correlation with the fellowship level. The unweighted degree centrality has the highest correlation with the fellowship level. Thus, collaborating with more authors or collaborating frequently with authors of higher performance (h-index) impacts the fellowship level.

Table 4
Correlations of the degree centrality metrics with fellowship levels.

Figure 4 shows the boxplots graphs to evaluate and compare the distributions of degree centrality among the fellowship levels. Level 1A has the highest variability and highest median and level 1B has the smaller variability, considering or not the weights of the edges or nodes’ attributes.A larger number of outliers are observed in the lower levels, when considering the weights W or Z, revealing that the high values of WDC or ZDC obtained by some researchers are atypical (rare) to these fellowship levels.

Figure 4
Box plots of degree centrality.

4.3 Average Link Strength

The results of the 10 authors better positioned, according to the average link strength are shown in Table 5. The W-average link strength - WLS, is the result of the ratio between WDC and UDC and the Z-average link strength - ZLS is the ratio of ZDC and UDC. Note that the participation of level 2 researchers predominates in the top 10 positions according to WLS and, moreover, even considering the importance of nodes, this participation decreases but is still high. This indicates that fellows with lower level tend to focus their work with some other fellows, while fellows with the highest level tend to further diversify their collaborations. UNISINOS and UFF had the greatest number of PQs among the top 10 average link strengthes.

Table 5
The 10 researchers best positioned in the average links strength.

Table 6 shows the correlations between the two metrics of the average link strength with the level of productivity. The average links strength has little impact on the fellowship level when considering the importance of the node, and the impact is almost zero and not significant at the level 0.05 when it is not considered.

Table 6
Correlations of metrics of the average link strength with the fellowship levels.

Figure 5 shows the boxplots graphs to evaluate and compare the distributions of averages link strength among the fellowship levels. Level 1D shows the highest variability and highest median in both metrics. Levels 2 and 1D have outliers.

Figure 5
Box plots of the average link strength.

4.4 Closeness Centrality

Table 7 shows the results of the 10 authors better positioned, according to the unweighted closeness centrality - UCC; W-weighted closeness centrality - WCC and Z-weighted closeness centrality - ZCC.

Table 7
The 10 researchers best positioned according to closeness centrality.

To illustrate the change in the positions of the nodes in the three closeness centrality metrics observe researchers PQ32 and PQ60. Researcher PQ32 is the second closest of the other nodes, by UCC, in this case, the sum of the distances from it and the other nodes is smaller than the sum of the distances of PQ60 to other nodes. However, considering WCC, the paths that connect researcher PQ60 to other nodes are formed by more frequent connections than those from the paths that connect researcher PQ32 to other nodes. As the frequency of connections shortens the paths, researcher PQ60 obtained a better position according to WCC. This researcher also remained in second place according to ZCC.

Regarding the fellowship level, researchers with higher levels predominate among the 10 positions according to the three metrics of closeness centrality. Level 2 researchers take on average three positions in this table. Researchers working at USP and UFScar are predominant in these rankings, implying that PQs at these institutions have easier access to other PQs in the network. PQ124 from UFScar obtained the highest value according to the three methods, what shows a proeminent position in the network. Table 8 shows the correlations of the three metrics of closeness centrality with the fellowship level.

Table 8
Correlations of the closeness centrality metrics with the fellowship levels.

The three closeness centrality metrics showed significant positive correlations with the fellowship level. The one that presented the highest correlation was ZCC, followed by UCC. Thus, because they have higher possibilities of establishing partnerships publications, researchers with greater closeness centralities also tend to have a higher fellowship level. Furthermore, the researcher who is closest to the leading researchers tends to have a higher fellowship level.

Figure 6 presents the box plots to evaluate and compare the distributions of closeness centralities among the fellowship levels. In the first two graphs, the highest variability is obtained by level 1C, and the highest median and smallest variability are obtained by level 1B. In the third graph, level 2 has the highest variability and level 1B maintains the highest median.

Figure 6
Box plots of the closeness centrality.

4.5 Betweenness Centrality

Table 9 shows the results of the top 10 authors according to the unweighted betweenness centrality - UBC; W-weighted betweenness centrality - WBC; and Z-weighted betweenness centrality - ZBC.

Table 9
The top researchers 10 positioned in the betweenness centrality.

Researcher PQ124 is the most central in the three betweenness centralities, implying that he is the researcher that the greater ability to connect different pairs of PQs in the Industrial Engineering area. It is observed that most researchers classified among the top 10 in the UBC are also classified in the other two betweenness centralities, with small variations in the positions. Thus, the use of the weight of the edges or vertices does not change significantly the betweenness centrality of researchers in the network. Moreover, levels 1A and 1B are majority in this table and level 2 figures on average in 3 positions in each metric. USP also has the greater number of PQs occupying the top 10 positions according to the three methods. Table 10 shows the correlations of the three betweenness centrality metrics with the fellowship level.

Table 10
Correlations of the betweenness centrality metrics with the fellowship levels.

The three betweenness centralities metrics showed significant positive correlations with the fellowship level. The one that presented the highest correlation with the fellowship level was ZBC, that is, considering the importance of nodes, followed by WBC. Thus, researchers who assume the role of “intermediary”, controlling the frequency of information flow tend to have higher levels of productivity, however, those that intermediate nodes in paths whose connections are more frequent and or have the most important nodes have higher fellowship levels.

In Figure 7, you can view the center, the dispersion, the diversion of symmetry and the identification of the observations considered atypical. In these three graphs, levels 1A and 1B show the highest variability and level 1B the highest medians. Level 2 has the smallest variation and the highest number of atypical points.

Figure 7
Box plot of the betweenness centrality.

4.6 Eigenvector Centrality

Table 11 shows the top authors according to the eigenvector centrality: unweighted eigenvector centrality - UEC; W-weighted eigenvector centrality - WEC; and Z-weighted eigenvector centrality - ZEC.

Table 11
The top 10 researchers according to the eigenvector centrality.

It is evident that the composition of the top 10 positions according to the three eigenvector centrality metrics are formed by different researchers, only researcher PQ14, tenth placed in UEC appears twice in the table position 7 in ZEC. As for researchers fellowship levels, the majority are of levels 2 and 1D. UFPE, UNISINOS and UFF have the PQs with highest UEC, WEC and ZEC values, respectively. The correlations among the eigenvector centrality metrics and the fellowship level are presented in Table 12.

Table 12
Correlations of the eigenvector centrality metrics with the fellowship levels.

For the eigenvector centrality, researchers connected with more central researchers in accordance with the degree, have higher centrality. Thus, according to UEC, (resp., WEC or ZEC) a researcher will have a higher centrality if he is connected to researchers with greater UEC (resp., WEC or ZEC). The correlations of these metrics with the fellowship level were significant, especially UEC and ZEC.

Figure 8 presents the box plots to evaluate and compare the distributions of the eigenvector centralities among the fellowship levels. To get a better view were disregarded in these graphs the isolated nodes they have eigenvector centrality equal to zero and the scale used was logarithmic. You can view that level 1A has the greatest variability.

Figure 8
Box plots of the eigenvector centrality.

4.7 PageRank

Table 13 presents the results of the top 10 authors according to the unweighted PageRank - UPR; W-weighted PageRank - WPR and; Z-weighted PageRank - ZPR. The most influential researchers have fellowship level 1A. They are present mainly in the WPR and ZPR. Note also that the ranking of the researchers according to these three metrics remain almost unchanged. PQ124 and PQ82 have obtained the highest values for the PageRank, implying that their importance in the network is related to having collaborated with other proeminent PQs.

Table 13
The top 10 researchers according to the PageRank.

Table 14 shows that the correlations of the PageRank with the fellowship level are significant. However, the UPR value of the researcher has the greatest impact on the fellowship level.

Table 14
Correlations of PageRank with the fellowship levels.

Figure 9 presents the box plots to evaluate and compare the distributions of PageRank metrics among the fellowship levels. Level 1A has the greatest variability and the largest median according to the three PageRank metrics.

Figure 9
Box plots of the PageRank.

4.8 Local clustering coefficient

Table 15 presents the rankings of the top 10 researchers according to the unweighted local clustering coefficient - ULC; W-weighted local clustering coefficient - WLC and; Z-weighted local clustering coefficient - ZLC. The composition of this table is formed basically by level 2 researchers. And their positions according to the three metrics are almost the same.

Table 15
The top 10 researchers according to the Local Clustering Coefficient.

The clustering coefficient of a given researcher indicates how many collaborators are collaborating with each other. However, these metrics have no impact on the fellowship level of the researcher, as indicate correlations in Table 16. High clustering coefficient may imply that the researcher does not have a very diverse group of collaborators. UFPE, UFRJ and UTFPR have the higher number of PQs among the 27 greater values of ULC, while UNISNOS and UTFPR have PQs with high WLC and ZLC values.

Table 16
Correlations of the Local Clustering Coefficient with the fellowship levels.

The box plots in Figure 10, were used to assess the distributions of the local clustering coefficients of the researchers for different fellowship levels. The highest median and variability are obtained by level 1D according to all local clustering coefficient metrics.

Figure 10
Box plots of the local clustering coefficient.

4.9 Eccentricity

Table 17 displays the 10 researchers with lowest eccentricities values, which corespond to the most central ones. These values were obtained from three distinct forms: unweighted eccentricity - UE; W-weighted eccentricity - WE; Z-weighted eccentricity - ZE. Researchers of different levels are listed in this table. These are the researchers with the smallest maximum distances from them to any other in the giant component of the network. Many of the researchers in the top-10 positions of UE do not figure in the top-10 according to WE or ZE. In fact, as the weights of the edges and the weights of the edges combined with the nodes’ attributes shorten the paths, other researchers were prioritized. Among the institutions are frequent UNESP-BAU, UFSCar and USP.

Table 17
The 10 researchers best positioned in eccentricity.

The relationship of a researcher with other researchers is better the smaller the eccentricity is. However, none of the metrics of eccentricity significantly impacts the fellowship level, as shown in the Table 18.

Table 18
Correlations of the eccentricity with the fellowship levels.

The box plots, in Figure 11, were used to assess the eccentricity distribution of the researchers for different fellowship levels. In the first two graphics, level 1A features the largest variations and levels 2 and 1D the highest medians. In the last graph, the biggest variation is obtained by level 1C. Moreover, levels 2, 1D and 1C exhibit approximately the same median.

Figure 11
Box plots of the eccentricity.

4.10 Utility

The top 10 reasearchers acccording to their utility or benefit of belonging to network structure is shown in Table 19. The utility was obtained in three different ways: unweighted utility - UU; W-weighted utility - WU; and Z-weighted utility - ZU. The researchers from lower levels 2 and 1D are the ones that have the greatest benefits in UU, and the level 1A researchers occupy 50% the 10 top positions according to WU and ZU. All researchers in the WU list also appear in the ZU list. Once more PQ124 has a proeminent posisiton in the rankings according to all methods.

Table 19
The 10 researchers best positioned according to the utility.

The benefit of a researcher belonging to the network impacts significantly and in a moderate way his fellowship level, as shown in Table 20. The highest correlation is obtained with ZU followed by UU.

Table 20
Correlations of the utility with the fellowship levels.

Figure 12 presents the box plots of the utility metrics, through these graphs, we can see the central position and the dispersion for different fellowship levels. Level 1A presents the largest variability and the highest median.

Figure 12
Box plots of the utility.

4.11 Logistic regression

Insofar as our Kendall correlation analysis shows only the existence of association between SNA metrics and the level of productivity, but not the effect of independent variables on dependent variables, For that, we use a multivariate logisitc regression analysis, where SNA metrics are independent variables and the level of the research productivity grant is the dependent value. Since we do not have many researchers in each level 1 fellowship, we grouped all level 1 fellowships in a single group (1) and level 2 felloships received value 0. In this analysis, we consider only the researchers that belong to the main component of the network, since the values of some metrics for nodes in different components may not be comparable.

The first model, a logistic regression analysis was performed to determine the effects of unweighted metrics (U) on the fellowship level of researchers. The metrics considered in this regression were: E-I indexl , E-I indexr , UDC, UCC, UBC, UEC, ULC, UE and UU. The UPR was excluded to avoid the effect of multicollinearity. The logistic regression model was statistically significant, χ2 = 37.859, p<0.0005. The model explained 45.30% (Nagelkerke R2) of the variation of the fellowship level and correctly classified 77.17% of the cases. Of the nine predictors, only two are statistically significant, the result is shown in the Table 21. Thus, we conclude that the unweighted metrics (E-I indexl and Utility) positively influence the researchers’ fellowships level.

Table 21
Model summary.

The second model, we used the logistic regression was performed to determine the effects of weighted metrics (W) on the fellowships level of researchers. The metrics considered in this regression were: E-I indexl _W, E-I indexr , WDC, WCC, WBC, WEC, WLC, WE, WU and WLS. The WPR was excluded to avoid the effect of multicollinearity. The logistic regression model was statistically significant, χ2 = 32.543, p<0.0005. The model explained 39,78% (Nagelkerke R2) of the variation of the level of productivity and correctly classified 75.00% of the cases. Of the ten predictor variables four are statistically significant, the result is shown in the Table 22 below. According to the result, we conclude that the metrics weighted (E-I indexl , Betweenness centrality, Utility and Eccentricity) positively influence the researchers’ fellowships level.

Table 22
Model summary.

In the third model, the effects of the weighted metrics with the insertion of the nodes’ attributes (Z) on the fellowship level of researchers were also analyzed by a logistic regression analysis. The metrics considered in this regression were: E-I indexl _Z, E-I indexr _Z, ZCC, ZBC, ZEC, ZLC, ZE, ZU and ZLS. The ZDC and ZPR were excluded to avoid the effect of multicollinearity. The logistic regression model was statistically significant, χ2 = 35.182, p<0.0005. The model explained 42.43% (Nagelkerke R2) of the variation of the level of productivity and correctly classified 77.2% of the cases. Of the nine predictor variables five are statistically significant, the result is summarized in the Table 23 below. We find that the weighted metrics with the insertion of the nodes’ attributes (E-I indexl _Z, Betweenness centrality, Eigenvector centrality, Utility and Average of the strong links) act positively in the researchers’ fellowships level.

Table 23
Model summary.

Comparing the three models, we see that there is evidence that SNA metrics involving the weight of edges and the authors’ attributes contextualize with more information (resources) ways a researcher can achieve better productivity. Moreover, the E-I indexl and Utility have shown to be in all models the metric which most influenced the fellowships level, they are present in the three models. Thus, researchers which are to collaborate with other researchers who devote most of their attention to their mutual project are more likely to hold a level 1 fellowship. As well as seek partnerships with researchers from different fellowship levels. The betweenness centrality also has a positive participation in the definition of scholarship levels, but only in the second and third models, so researchers who assume the role of “intermediary”, in paths whose connections are more frequent and have the most important nodes, tend to have level 1 fellowship.

(Souza et al. 201636 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.) also developed a logistic regression model to verify the influences of the unweighted metrics (Degree Centrality, Betweenness Centrality, Closeness Centrality, Eigenvector Centrality, Eccentricity, Cluster Coefficient and Utility) in the fellowships level of researchers in the Probability and Statistic area. And as a result, the degree centrality had a positive effect and the average distance (which was defined as closeness centrality in Souza et al. (2016)) had a negative effect on the fellowship level.

5 CONCLUSIONS

A co-authorship network with 145 reasearchers with CNPq grant in research productivity in the area of Industrial Engineering in Brazil was built and a total of 32 SNA metrics were calculated. Such metrics were divided among unweighted, weighted by edges, and weighted by edges and nodes. The metrics analyzed were: the E-I index, the Degree centrality, the Average link strength, the Closeness centrality, the Betweenness centrality, the Eigenvector centrality, the PageRank, the Eccentricity, the Local clustering coefficient and the Utility.

The unweighted metrics that showed the greatest association with the fellowship level were: the E-I indexl 0.406, the utility 0.251, the degree centrality 0.244, the betweenness centrality 0,241 and the PageRank 0.240. The metrics weighted by edges which presented the highest association with the fellowship level were: the betweenness centrality 0.307, the E-I indexl 0.343 and the utility 0.228. The metrics weighted by edges and nodes which presented highest association with the fellowship level were: the betweenness centrality 0.367, the E-I indexl 0.302, the utility 0.257 and the closeness centrality 0.229.

Thus, the major conclusions of this paper are:

  • As compared to the co-authorship network of PQs in the Probability and Statistic area, although the Industrial Engineering community is larger, the collaboration among the PQs is not as strong;

  • The E-I indexl analysis shows that the geographical distances is still a main barrier to collaboration among PQs in the Industrial Engineering community;

  • researchers who assume a role of mediator (greater betweenness centralities) controlling the flow of information, tend to have higher fellowship levels, especially those among the nodes whose paths are formed by connections more frequent or feature more important researchers;

  • researchers of higher ranking, by the unweighted PageRank, also have higher fellowship levels. If the PageRank is weighted by edges and nodes, the impact on the fellowshiplevel is lower;

  • researchers who present the highest unweighted degree centralities, namely, greaternumbers of co-authors, tend to have higher fellowship levels. If the degree centrality is weighted this trend decreases;

  • researchers who present greater possibilities for establishing publications partnerships are those with greater closeness centralities and with higher fellowship levels, especially if the partners are researchers with the highest h-index;

  • researchers with greater benefits of belonging the network (greater Utility) also have the highest fellowship levels, especially those that have high h-index or collaborate with researchers with the highest h-index;

  • researchers with more heterogeneous co-authoring relationships tend to have higher fellowship levels. If the relations take into account the weights of the edges, the correlation with the fellowship level has a small decrease. Whereas with the weights of edges and nodes’ attributes, the correlation goes back up, but not as high as the unweighted one;

  • UNISINOS, UFF, USP, UFScar, UFPE and UTFPR are the institutions that hold the higher number of PQs among the top 10 according to some SNA metrics;

  • PQ124, from UFScar, was the one more important in the network according to a greater number of metrics.

  • Finally, through a logistic regression analysis, the unique metrics that, according to all three methods, influence the fellowship level being of level 1 as opposed to level 2 are the E-I index and the Utility. This implies that level 2 PQs desiring to obtain a level 1 fellowship should both collaborate with level 1 PQs and also concentrate collaboration with other PQs that devote much of their collaboration effort in their relationship.

It is important to note that fellowships are granted for a period of time ranging from 3 years (level 2) up to 5 years (level 1A). Thus, researchers only compete with those that are in the same time cycle, what can cause some discrepancies between different cycles. However, on August 2013, all fellowship levels were reclassified, (Figueiredo 201313 FIGUEIREDO RW DE. 2013. CNPq reclassifica pesquisadores e 39 docentes da UFC são promovidos. Available in: <Available in: http://www.ufc.br/noticias/noticias-de-2013/4013-cnpq-reclassifica-pesquisadores-e-39-docentes-da-ufc-sao-promovidos >. Access on: 30 May 2017.
http://www.ufc.br/noticias/noticias-de-2...
), to reduce those discrepancies. Since our data was collected on March 2014, this problem of the time cycle was mitigated.

It is worth mentioning that the network was formed only among researchers with CNPq grant in research productivity in the area of Industrial Engineering in Brazil. Thus, in this network, it was not considered the co-authorship relations between them and other non-fellows authors or of other PQs in different areas of knowledge. For future work, one can develop a network that involves beyond the relations among the fellows, their other relationships. And compare the results of the correlations of the network metrics with those studied in this work. Other types of weights of the authors, besides the h-index, may also be studied to evaluate how that would change the results shown here.

REFERENCES

  • 1
    ABBASI A & ALTMANN J. 2011. “On the correlation between research performance and social network analysis measures applied to research collaboration networks”. In: Hawaii International Conference on System Sciences, Proceedings of the 41st Annual. Waikoloa, HI: IEEE.
  • 2
    ABBASI A, ALTMANN J & HOSSAIN L. 2011. Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5: 94-607.
  • 3
    ABBASI A, HOSSAINA L & LEYDESDORFF L. 2012. Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks. Journal of Informetrics, 6: 403-412.
  • 4
    ANASTASIOS T, SGOIROPOULOU C, PAPAGEORGIOU E, TERRAZ O & MIAOULIS, G. 2012. Co-authorship networks in academic research communities: the role of network strength. 16th Panhellenic Conference on Informatics.
  • 5
    ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de Pós-Graduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado - Programa de Pós-graduação em Engenharia de Produção/UFPE.
  • 6
    ANDRADE RL & RÊGO LC. 2015a. Conhecendo a rede de coautoria dos bolsistas de produtividade em pesquisa da área de engenharia de produção e a sua influência no nível de produtividade. In: XLVII Simpósio Brasileiro de Pesquisa Operacional - SBPO.
  • 7
    ANDRADE RL & RÊGO LC. 2015b. A influência da rede de coautoria no nível das bolsas de produtividade da área de engenharia de produção. In: XXXV Congresso da Sociedade Brasileira de Computação - CSBC.
  • 8
    BARNETT AH, AULT RW & KASERMAN DL. 1988. The Rising Incidence of Co-authorship in Economics: Further Evidence. Review of Economics and Statistics, 70: 539-543.
  • 9
    BONACICH P. 1987. Power and centrality: a family of measures. The American Journal of Sociology, 92: 1170-1182.
  • 10
    CN. 2015. Critérios de Julgamento. Available in: <Available in: http://www.cn.br/web/guest/criterios-de-julgamento >. Access em: 28 de janeiro de 2015.
    » http://www.cn.br/web/guest/criterios-de-julgamento
  • 11
    EATON JP, WARD JC & KUMAR A. 1999. Structural Analysis of Co-Author Relationships and Author Productivity in Selected Outlets for Consumer Behavior Research. Journal of Consumer Psychology, 8: 39-59.
  • 12
    EGGHE L. 2006. Theory and practise of the g-index. Scientometrics, 69: 131-152.
  • 13
    FIGUEIREDO RW DE. 2013. CNPq reclassifica pesquisadores e 39 docentes da UFC são promovidos. Available in: <Available in: http://www.ufc.br/noticias/noticias-de-2013/4013-cnpq-reclassifica-pesquisadores-e-39-docentes-da-ufc-sao-promovidos >. Access on: 30 May 2017.
    » http://www.ufc.br/noticias/noticias-de-2013/4013-cnpq-reclassifica-pesquisadores-e-39-docentes-da-ufc-sao-promovidos
  • 14
    FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.
  • 15
    HAIR JF, BLACK WC, BABIN BJ, ANDERSON RE & TATHAM RL. 2009. Análise multivariada de dados. Bookman Editora.
  • 16
    HANNEMAN RA & RIDDLE M. 2005. Introduction to social network methods. Riverside: University of Califórnia.
  • 17
    HIRSCH JE. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102: 16569-16572.
  • 18
    HUANG P-Y, LIU H-Y, CHEN C-H & CHENG P-J. 2013. The impact of social diversity and dynamic influence propagation for identifying influencers in social networks, in: Web Intelligence WI and Intelligent Agent Technologies IAT, 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1, p. 410-416.
  • 19
    HUDSON J. 1996. Trends in Multi-Authored Papers in Economics. Journal of Economic Perspectives, 10: 153-158.
  • 20
    JACKSON MO. 2008. Social and Economic Networks. Princeton University Press. Stanford University, February 2008.
  • 21
    JACKSON MO & WOLINSKY A. 1996. A Strategic Model of Social and Economic Networks. Journal of economic theory, 71: 44-74.
  • 22
    KEMPE D, KLEINBERG J & TARDOS E. 2005. Influential nodes in a diffusion model for social networks. In: Automata, Languages and Programming, Springer, vol. 3580, p. 1127-1138.
  • 23
    KRACKHARDT D & STERN R. 1988. Informal networks and organizational crises: An experimental simulation. Social Psychology Quarterly, 51: 123-140.
  • 24
    KRACKHARDT D. 1992. The strength of strong ties: The importance of philos in organizations. In Networks and Organizations: Structure, Form, and Action, p. 216-239.
  • 25
    LEE S & BOZEMAN B. 2005. The impact of research collaboration on scientific productivity. Social Studies of Science, 35: 673-702.
  • 26
    LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to co-author networks. Physica A, 419: 29-39.
  • 27
    LIU X, BOLLEN J, NELSON ML & SOMPEL H VAN DE. 2005. Co-authorship networks in the digital library research community. Information Processing and Management, 41: 1461-1480.
  • 28
    MENA-CHALCO JP & CESAR JR RM. 2009. ScriptLattes: An open-source knowledge extraction system from the Latts platform. Journal of the Braszilian Computer Society, 15: 31-39.
  • 29
    NEWMAN MEJ. 2001a. The structure of scientific collaboration networks. Proceeding of the National Academy of Sciences, 98: 404-409.
  • 30
    NEWMAN MEJ. 2001b. Scientific collaboration networks. I. Network constructions and fundamental results. Physical Review E, vol. 64.
  • 31
    NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337-370.
  • 32
    NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.
  • 33
    ONEL S, ZEID A & KAMARTHI S. 2011. The structure and analysis of nanotechnology co-author and citation networks. Scientometrics, 89: 119-138.
  • 34
    ONNELA J-P, SARAMÄKI J, KERTÉSZ J & KASKI K. 2015. Intensity and coherence of motifs in weighted complex networks. Physical Review E, 716: 4.
  • 35
    SANTOS AM DOS. 2014. Aplicações de modelos de grafos na análise de conceitos e de redes sociais. Recife. 2014. 162 p. Doutorado - Programa de Pós-graduação em Estatística/UFPE.
  • 36
    SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.
  • 37
    WANDERLEY AJ, DUARTE AN, BRITO AV DE, PRESTES MAS & FRAGOSO FC. 2014. Identificando correlações entre métricas de Análise de Redes Sociais e o h-index de pesquisadores de Ciência da Computação. In. XXXIV Congresso da Sociedade Brasileira de Computação - CSBC, 2014.
  • 38
    WAINER J & VIEIRA P. 2013. Correlation between bibliometrics and peer evaluation for all disciplines: the avaluation of Brazilian scientists. Scientometrics online, 96: 395-410.
  • 39
    WASSERMAN S & FAUST K. 1994. Social Networks Analysis: Methods and Applications. Cambridge University Press. Structural analysis in social the social sciences series, vol. 8.
  • 40
    YAN E & DING Y. 2009. Applying centrality measures to impact analysis: A coauthorship network analysis. Journal of the American Society for Information Science and Technology, 60: 2107-2118.

Publication Dates

  • Publication in this collection
    May-Aug 2017

History

  • Received
    16 Aug 2016
  • Accepted
    22 June 2017
Sociedade Brasileira de Pesquisa Operacional Rua Mayrink Veiga, 32 - sala 601 - Centro, 20090-050 Rio de Janeiro RJ - Brasil, Tel.: +55 21 2263-0499, Fax: +55 21 2263-0501 - Rio de Janeiro - RJ - Brazil
E-mail: sobrapo@sobrapo.org.br