EXPLORING THE CO-AUTHORSHIP NETWORK AMONG CNPQ’S PRODUCTIVITY FELLOWS IN THE AREA OF INDUSTRIAL ENGINEERING

Andrade, Ricardo Lopes de; Rêgo, Leandro Chaves

doi:10.1590/0101-7438.2017.037.02.0277

ABSTRACT

In this article, we have built a co-authorship network among researchers with CNPQ grant in research productivity (PQ) in the area of Industrial Engineering and analyze which Social Network Analysis metrics impact their productivity level. Unlike other studies that mostly analyze unweighted networks, ours explored more broadly the network since the metrics were calculated in three ways: unweighted, including the edges weights and including the edges and nodes’ attributes. Thus, the generated results are more precise and detailed since more information is obtained. We consider the h-index of the researchers as the nodes’ attributes and measured the impact using Kendall correlation. We show that geographical distance is still a barrier to collaboration among PQs in this area and that collaboration with researchers with different levels of grant has the greatest impact in the level of the grant a researcher has.

Keywords:
weighted co-authorship network; nodes’ attributes; scientific productivity

1 INTRODUCTION

Co-authorship, development of a publication by two or more authors, is a form of collaboration. For (^{Hudson 1996}19 HUDSON J. 1996. Trends in Multi-Authored Papers in Economics. Journal of Economic Perspectives, 10: 153-158.), co-authorship is the most formal expression of intellectual collaboration in scientific research, and the biggest gain of the collaboration is to enable an efficient task division, through the complementary skills or synergy (joint creation of new ideas, not achieved individually). The result of this efficient task division is a scientific production of higher quality and/or quantity. These results had already been reported by (^{Barnett et al. 1988}8 BARNETT AH, AULT RW & KASERMAN DL. 1988. The Rising Incidence of Co-authorship in Economics: Further Evidence. Review of Economics and Statistics, 70: 539-543.) as the reason that leads researchers to work together. Other works, such as (^{Eaton et al. 1999}11 EATON JP, WARD JC & KUMAR A. 1999. Structural Analysis of Co-Author Relationships and Author Productivity in Selected Outlets for Consumer Behavior Research. Journal of Consumer Psychology, 8: 39-59.) and (^{Lee & Bozeman 2005}25 LEE S & BOZEMAN B. 2005. The impact of research collaboration on scientific productivity. Social Studies of Science, 35: 673-702.), also point productivity as a result of collaboration. (Hart 2000) shows that collaboration improves the quality of publications.

Co-authoring, writing the same paper with other authors, is a form of collaboration which implies a temporal and academic relationship, where authors share ideas and resources. One of the most famous co-authorship networks is the mathematician Paul Erdos network, which has more than 500 co-authors and more than 1,400 published works. The role of Erdos as a collaborator was so significant in the field of mathematics that the Erdos number is set to measure the proximity to Erdos through network co-authorship (^{Liu et al., 2014}26 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to co-author networks. Physica A, 419: 29-39.). Anyone who has published with Erdos has an Erdos number equal to 1, those having a publication with a co-author Erdos have an Erdos number equal to 2, and so on (^{Newman, 2001c}31 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337-370.).

For (Kumar 2015), studies on co-authorship gained new interest after (^{Newman 2001a}29 NEWMAN MEJ. 2001a. The structure of scientific collaboration networks. Proceeding of the National Academy of Sciences, 98: 404-409., ^b30 NEWMAN MEJ. 2001b. Scientific collaboration networks. I. Network constructions and fundamental results. Physical Review E, vol. 64., ^c31 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337-370., ²⁰⁰⁴32 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.) have used methods of social network analysis to investigate the characteristics and interesting patterns of academic communities. (^{Kempe & Kleinberg 2005}22 KEMPE D, KLEINBERG J & TARDOS E. 2005. Influential nodes in a diffusion model for social networks. In: Automata, Languages and Programming, Springer, vol. 3580, p. 1127-1138.) also reported the emergence of many researches of co-authorship network analysis that try to identify the most influential authors in it. It is also observed that the works of (^{Huang et al. 2013}18 HUANG P-Y, LIU H-Y, CHEN C-H & CHENG P-J. 2013. The impact of social diversity and dynamic influence propagation for identifying influencers in social networks, in: Web Intelligence WI and Intelligent Agent Technologies IAT, 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1, p. 410-416.) and (^{Liu et al. 2014}26 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to co-author networks. Physica A, 419: 29-39.) sought to study the co-authorship network to assess the status of an author in a particular field, and thus enhance the relations to get closer to the community core by identifying the most influential researchers.

There exists an increasing interest in the study of the influence of the social structure on the behavior and performance of the researchers through social network analysis (SNA). Many of these studies seek to correlate the key centrality metrics of the network with measures based on the number of citations, such as the h-index (“A researcher has a h-index of k, if k of N his works have at least k citations each, and the other (N - k) papers have at most k citations each”, ^{Hirsch, 2005}17 HIRSCH JE. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102: 16569-16572.). Such measures, among other factors, can be used to determine the quality of publications.

The work of (^{Yan & Ding 2009}40 YAN E & DING Y. 2009. Applying centrality measures to impact analysis: A coauthorship network analysis. Journal of the American Society for Information Science and Technology, 60: 2107-2118.) correlated four centrality metrics (degree, closeness, betweenness and PageRank) with the number of citations of publications of the authors of a co-authorship network. These metrics had significant correlations with the citation counts, especially thebetweenness centrality.

In the study of (^{Abbasi & Altmann 2011}2 ABBASI A, ALTMANN J & HOSSAIN L. 2011. Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5: 94-607.), all normalized metrics of degree centrality, betweenness centrality, closeness centrality, and the weighted degree centrality and efficiency (the ratio between the total number of distinct groups, whose nodes are directly connected, connected by a single central node and the degree of this node) were correlated with the h-index. The results showed that the h-index of the researchers had significant positive correlations only with the degree centrality and efficiency. In the same year, (^{Abbasi et al. 2011}1 ABBASI A & ALTMANN J. 2011. “On the correlation between research performance and social network analysis measures applied to research collaboration networks”. In: Hawaii International Conference on System Sciences, Proceedings of the 41st Annual. Waikoloa, HI: IEEE.) published a paper that analyzed the influence of six SNA metrics: degree centrality, closeness centrality, betweenness centrality, eigenvector centrality (all such normalized), the average links strength and efficiency; on the g-index (the g index is defined as follows: “Given a set of ordered papers in a decreasing way with respect to the number of citations, the g index is the highest value of g in that the first g articles receive together at least g ² citations” ^{Egghe, 2006}12 EGGHE L. 2006. Theory and practise of the g-index. Scientometrics, 69: 131-152.). The authors concluded that only the normalized degree centrality, efficiency, and the average links strength had significant influences on the g-index.

Another work that also correlates analytical metrics of social networking with the h-index was published by (^{Wanderley et al. 2014}37 WANDERLEY AJ, DUARTE AN, BRITO AV DE, PRESTES MAS & FRAGOSO FC. 2014. Identificando correlações entre métricas de Análise de Redes Sociais e o h-index de pesquisadores de Ciência da Computação. In. XXXIV Congresso da Sociedade Brasileira de Computação - CSBC, 2014.). The authors created a co-authorship network among researchers of Computer Science and calculated normalized metrics of degree centrality, closeness centrality, and betweenness centrality, weighted degree centrality and authority (calculated by adding the number of hubs, nodes with many links, with which a node is connected). Only the betweenness centrality and the weighted degree centrality had significant positive correlations. The authority also showed significant, but negative correlation.

(^{Souza et al. 2016}36 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.), in a co-authorship network among researchers with CNPq grant in research productivity in the area of Statistics, showed that the most productive fellows are also the most central in the network and that the metrics degree centrality and closeness centrality had a higher impact on the number of articles published by a fellow.

According to (^{CNPq 2015}10 CN. 2015. Critérios de Julgamento. Available in: <Available in: http://www.cn.br/web/guest/criterios-de-julgamento >. Access em: 28 de janeiro de 2015.
http://www.cn.br/web/guest/criterios-de-... ), the research productivity fellowship (PQ) is organized in levels, in ascending order: 2, 1D, 1C, 1B, 1A. The PQ is attributed to researchers from all areas of knowledge in Brazil, based not only on the quality of a submitted project, but mainly in the “quality” of the researcher (^{Wainer & Vieira, 2013}38 WAINER J & VIEIRA P. 2013. Correlation between bibliometrics and peer evaluation for all disciplines: the avaluation of Brazilian scientists. Scientometrics online, 96: 395-410.).

The work of (Fonseca & Digiampietri 2016) builds two kinds of classifiers using as attributes considering SNA metrics and other bibliometric measures. The first kind identifies among researchers in the area of Computer Science who are the fellowships holders and the other kind identifies the fellowship level of a given researcher. Other studies analyzing the impact of the co-authorship networks in the performance of the researchers were presented in (^{Andrade & Rêgo 2015a}7 ANDRADE RL & RÊGO LC. 2015b. A influência da rede de coautoria no nível das bolsas de produtividade da área de engenharia de produção. In: XXXV Congresso da Sociedade Brasileira de Computação - CSBC., ^2015b7 ANDRADE RL & RÊGO LC. 2015b. A influência da rede de coautoria no nível das bolsas de produtividade da área de engenharia de produção. In: XXXV Congresso da Sociedade Brasileira de Computação - CSBC.).

In these previous works, weighted metrics were not used, i.e., metrics calculated consideringthe frequency of the collaboration, with the exception of the weighted degree centrality. To the best of our knowledge, there are few works exploring such metrics in SNA. (Liu et al. 2015) proposed a method that inserts the importance (based on citations) of researchers in co-authorship network structures redefining the weight of the edges. This weight is used in the calculation of PageRank, applied to the Erdos network, to identify the most influential authors. (^{Andrade 2016}5 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de Pós-Graduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado - Programa de Pós-graduação em Engenharia de Produção/UFPE.) also developed a metric that inserts the importance of nodes in the network structure, in this work, the weight of a given edge is equal to the average of the nodes’ attributes connected by this edge times the original weight of the edge. This work has addressed the impact of the nodes’ attributes in different SNA metrics.

The objective of this work is to identify the researchers with the CNPq grant in research productivity in the area of Industrial Engineering in Brazil, to analyze their academic achievements in terms of published papers, to construct a co-authorship network among such researchers, to analyze the characteristics of the network and to verify which SNA metrics have more impact in their productivity level. SNA metrics will be calculated in three ways: unweighted, weighted with the weight the edges and weighted with weights of the edges and the nodes’ attributes.

Our third analysis of social network metrics is in the context of non-homogenous nodes. Thus being in equivalent positions in the network may have different impacts on the SNA metrics if nodes’ attributes are taken into consideration. The h-factor is a node attribute which was freely available to use and is clearly related to the prestige of some researcher. Therefore, it was chosen to be applied in this context.

The structure of this work is divided as follows: in this first section we present a review of the studies that analyzed the influence of the SNA metrics, co-authorship networks and the performance of the researchers; Section 2 briefly presents the SNA metrics used in this work; Section 3 describes the methodology used to create the co-authorship network; the co-authorship network and the impact of individual SNA metrics on the level of productivity is presented in Section 4. Finally, Section 5 presents the final considerations of the study and proposals for future work.

2 SOCIAL NETWORK ANALYSIS METRICS

A weighted network can be defined as a set of nodes, V(G), a set of edges, E(G), which consists of ordered pairs of nodes, and a weighted adjacency matrix, W(G), where w(_{^{vi, vj}} ) represents the weight associated with the edge connecting the pair of vertices, _^vi and _^vj . We assume that w(_{^{vi, vj}} ) = w(_{^{vj, vi}} ), since co-authorship is a symmetric relation.

(Liu et al. 2015) and (^{Andrade 2016}5 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de Pós-Graduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado - Programa de Pós-graduação em Engenharia de Produção/UFPE.) developed methods to include nodes’ attributes in the SNA. Thus, it is possible to classify networks as unweighted, weigthed by edges and weighted by edges and nodes. Since Liu et al.’s method transform the network into an asymmetric relation, we do not view it as appropriate to study co-authorship. Therefore, we focus here on Andrade’s method which mantains the symmetric nature of co-authorships.

The metric proposed by (^{Andrade 2016}5 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de Pós-Graduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado - Programa de Pós-graduação em Engenharia de Produção/UFPE.) as a way to take into consideration the importance of the node in the network context is given by:

Z (v_{i}, v_{j}) = w (v_{i}, v_{j}) \times (\frac{s (v_{i}) + s (v_{j})}{2}),

(1)

where Z(_{^{vi, vj}} ) equals the edge weight w(_{^{vi, vj}} ) between vertices _^vi and _^vj , combined with the attributes of these vertices s(_^vi ) and s(_^vj ), respectively. With the incorporation of the attributes of the nodes in the network, Z(G) shall be the new weighted adjacency matrix and Z(_{^{vi, vj}} ) the new edge weight between vertices _^vi and _^vj . The attributes of the vertices are measurable characteristics associated with the type of relationship that connects them.

A binary network with n vertices is represented by an adjacency matrix A(G) with n × n elements, where

a (v_{i}, v_{j}) = {\begin{array}{l} 1 i f (v_{i}, v_{j}) \in E (G), i . e . i f v_{i} a n d v_{j} a r e c o n n e c t e d, \\ 0 o t h e r w i s e . \end{array}

(2)

The SNA metrics can be divided into global, describing the characteristics of the whole graph, and individual, which are related to the analysis of individual properties of network actors (nodes or vertices).

The number of edges, as the name suggests, refers to the cardinality of the set of edges, E(G), denoted by #E(G).

A path between two vertices, _^vi and _^vj , is a sequence of vertices c = (v ₀, v ₁, v ₂, ..._^vk ) such that v ₀ = _^vi , _^vk = _^vj , _^vl is adjacent to v ₍ _l ₊₁₎, for l = 0, 1, …, k - 1 and there is no pair of vertices that appear more than once in the sequence. A set of vertices is said to be connected if there exists a path between any two vertices in the set. A graph is connected if there is a path between any two vertices and is complete if every vertice is connected to one another.

The density calculates how close the graph is to being complete. That is, the relationship between total connections in the graph and the total connections if all vertices were connected to each other. For an undirected graph with n nodes, the density is defined as:

D e n s (G) = \frac{2 \times (# E (G))}{n \times (n - 1)}

(3)

A geodesic path or shortest path is the shortest path between two vertices, (^{Newman 2004}32 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.). The geodesic path length, d(_^vi , _^vj ), also called geodesic distance or shortest distance, thus, is the shortest distance in the network between these two vertices. Given a path c = (v ₀, v ₁, v ₂, ..._^vk ) between vertices _^vi and _^vj , the length of this path is given by _^dc . Let C(_^vi , _^vj ) be the set of all paths between vertices _^vi and _^vj , then the geodesic distance is defined by:

d (v_{i}, v_{j}) = \min {d_{c} : c \in C (v_{i}, v_{j})} .

(4)

In the case of weighted networks, the length of a path c = (v ₀, v ₁, v ₂, ..._^vk ) between vertices _^vi and _^vj , can be formally defined by Dijkstras algorithm (^{Newman 2001}31 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337-370.) and (Brandes 2001):

d_{c}^{w} = (\frac{1}{w (v_{0}, v_{1})} + \frac{1}{w (v_{1}, v_{2})} + \dots + \frac{1}{w (v_{(k - 1)}, v_{k})}) .

(5)

And the weighted geodesic distance is given by:

d^{w} (v_{1}, v_{2}) = \min {d_{c}^{w} : c \in C (v_{i}, v_{j})} .

(6)

The largest geodesic distance between any pair of vertices is called the diameter of a graph and in a binary network, it can vary from a minimum of 1, if the graph is complete, to a maximum of n - 1, where n is the number of vertices in the graph. Formally, the diameter of the connected graph G is given by:

D i m (G) = \max_{{v_{i}, v_{j} \in V (G)}} d (v_{i}, v_{j})

(7)

In case of weighted networks, the weighted diameter is calculated using the weighted geodesic distance, _^dw (_{^{vi, vj}} ).

Also known as the giant component, the size of the largest connected component, refers to the cardinality of the connected component with the highest number of nodes.

Given a vertex _^vi , eccentricity, e(_^vi ), is the maximum distance from it to any other vertex of the graph. The relationship of a vertex to other vertices is better the smaller the eccentricity. The _^vi eccentricity is given by:

e (v_{i}) = \max_{v_{j} \in V (G)} d (v_{i}, v_{j})

(8)

The diameter, as defined above, is equal to the maximum eccentricity, while the minimum eccentricity is the radius. In the case of weighted networks, the eccentricity may be calculated using the weighted geodesic distance, _^dw (_{^{vi, vj}} ).

The degree centrality, proposed by (^{Freeman 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.), is calculated in terms of the number of adjacent vertices, namely, degree centrality of the vertex _^vi , denoted by _^Cd (_^vi ) is the number of vertices adjacent to vertex _^vi . Formally the degree centrality is defined by:

C_{d} (v_{i}) = \sum_{j = 1}^{n} a (v_{i}, v_{j})

(9)

If the network is weighted, the degree centrality of vertex _^vi is equal to the sum of the weights of the edges that are connected to the vertex _^vi . For (^{Newman 2004}32 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.) and (Barrat et al. 2004) the weighted degree centrality is defined by:

C_{d}^{w} (v_{i}) = \sum_{j = 1}^{n} w (v_{i}, v_{j}) .

(10)

The degree centrality is the simplest and easiest way to measure the influence of a node (^{Abbasi et al., 2012}3 ABBASI A, HOSSAINA L & LEYDESDORFF L. 2012. Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks. Journal of Informetrics, 6: 403-412.; ^{Liu et al., 2005}27 LIU X, BOLLEN J, NELSON ML & SOMPEL H VAN DE. 2005. Co-authorship networks in the digital library research community. Information Processing and Management, 41: 1461-1480.). In a co-authorship network, this metric identifies the most active and popular authors (^{Abbasi et al., 2011}2 ABBASI A, ALTMANN J & HOSSAIN L. 2011. Identifying the effects of co-authorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5: 94-607.; ^{Anastasios et al., 2012}4 ANASTASIOS T, SGOIROPOULOU C, PAPAGEORGIOU E, TERRAZ O & MIAOULIS, G. 2012. Co-authorship networks in academic research communities: the role of network strength. 16th Panhellenic Conference on Informatics.; ^{Freeman, 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.).

Another metric to analyze a node on the network started from the theory of “the strong links” of (^{Krackhardt 1992}24 KRACKHARDT D. 1992. The strength of strong ties: The importance of philos in organizations. In Networks and Organizations: Structure, Form, and Action, p. 216-239.). The average link strength of vertex _^vi is defined as the ration between the weighted degree, $C_{d}^{w}$ (_^vi ), and the degree centrality, _^Cd (_^vi ):

L S (i) = \frac{C_{d}^{w} (v_{i})}{C_{d} (v_{i})}

(11)

Therefore, LS(i) represents the average weight of the links of node _^vi .

A metric that takes into consideration the geodesic distance from a given initial node to all other nodes of the network is the closeness centrality. (Freeman 1978) asserted that the closeness centrality of vertex _^vi , defined by _^Cc (_^vi ), is given by:

C_{c} (v_{i}) = \frac{1}{\sum_{j} d (v_{i}, v_{j})}

(12)

The most central vertices in a network according to this metric are those that have a smaller distance to the other vertices. In weighted networks, the weighted closeness centrality is given by:

C_{c}^{w} (v_{i}) = \frac{1}{\sum_{j} d^{w} (v_{i}, v_{j})}

(13)

A node that is on average in a position closer to the other nodes can get information more efficiently, i.e., closeness metric is related to the independence and efficient communication with other nodes (^{Freeman, 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.).

The betweenness centrality of vertex _^vi is the sum, for every pair of nodes different from _^vi , of the ratio between the number of shortest paths between the given pair of nodes that go through _^vi , and the total number of shortest paths between the given pair of nodes (^{Freeman, 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.; ^{Wasserman & Faust, 1994}39 WASSERMAN S & FAUST K. 1994. Social Networks Analysis: Methods and Applications. Cambridge University Press. Structural analysis in social the social sciences series, vol. 8.). The betweenness centrality, _^Cb (_^vi ), of vertex _^vi is given by:

C_{b} (v_{i}) = \sum_{j, k} \frac{g (v_{j}, v_{i}, v_{k})}{g (v_{j}, v_{k})}, j \neq k \neq i,

(14)

where g(_{^{vj, vk}} ) is the number of shortest paths between vertex _^vj and vertex _^vk and g(_{^{vj, vi, vk}} ) is the number of shortest paths between vertex _^vj and vertex _^vk going through _^vi .

In a weighted network the betweenness centrality is given by:

C_{b}^{w} (v_{i}) = \sum_{j, k} \frac{g^{w} (v_{j}, v_{i}, v_{k})}{g^{w} (v_{j}, v_{k})},

(15)

where _^gw (_{^{vj, vk}} ) is the number of weighted shortest paths between vertex _^vj and vertex _^vk and _^gw (_{^{vj, vi, vk}} ) is the number of weighted shortest paths between vertex _^vj and vertex _^vk going through _^vi , considering the weighted distance, _^dw (_{^{vi, vj}} ).

The betweenness is an indicator of the potential of a node to play the role of “mediator” or “gatekeeper” (^{Freeman, 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215-239.; ^{Abbasi et al., 2012}3 ABBASI A, HOSSAINA L & LEYDESDORFF L. 2012. Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks. Journal of Informetrics, 6: 403-412.), being able to control more often the flow of information on the network.

A metric of importance of the vertex in the network based on the connections, the eigenvector centrality is supported on the idea that a particular node will have high centrality if it is connected to vertices with central positions in the network (^{Bonacich, 1987}9 BONACICH P. 1987. Power and centrality: a family of measures. The American Journal of Sociology, 92: 1170-1182.). In other words, the centrality of the vertex does not depend only on the number of adjacent vertices but also on the centrality of these vertices. Let λ be a constant, then the eigenvector centrality of _^Ce (_^vi ) is given by:

C_{e} (v_{i}) = \frac{1}{λ} \sum_{j = 1}^{n} a (v_{i}, v_{j}) C_{e} (v_{j})

(16)

Using the vector notation, let X = (_^Ce (1), _^Ce (2) ... _^Ce (n)) be the vector of eigenvector centralities, we can rewrite Equation (14) as λX = AX. By assuming that the eigenvector centrality assumes only non-negative values (using the Perron-Frobenius theorem), it can be shown that λ is the largest eigenvalue of the adjacency matrix, where X is the corresponding eigenvector (^{Jackson, 2008}20 JACKSON MO. 2008. Social and Economic Networks. Princeton University Press. Stanford University, February 2008.).

In the case of weighted networks, the elements of the adjacency matrix are the weights of the edges, w(_{^{vi, vj}} ), (^{Newman, 2004}32 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 5200-5204.). And the eigenvector centrality is defined by:

C_{e}^{w} (v_{i}) = \frac{1}{λ} \sum_{j = 1}^{n} w (v_{i}, v_{j}) C_{e}^{w} (v_{j})

(17)

The local clustering coefficient indicates how connected are the nodes adjacent to a given node and together with the average value of the shortest path, can identify a “small-world” effect (networks with large cluster coefficient and the relatively short distance between the nodes), Watts & Strogatz (1998). The clustering coefficient of a vertex _^vi is the ratio of the number of triangles that contains vertex _^vi and the number of possible edges between the neighboring vertices. Let NT(_^vi ) be the number of triangles (consists of three nodes connected by three links) containing vertex _^vi . For (Onnela et al. 2005), the local cluster coefficient is defined as:

C C L (v_{i}) = \frac{2 N T (v_{i})}{C_{d} (v_{i}) (C_{d} (v_{i}) - 1)}

(18)

The weighted local clustering coefficient was proposed by (Onnela et al. 2005) and is given by:

C C L^{w} (v_{i}) = \frac{2}{C_{d} (v_{i}) (C_{d} (v_{i}) - 1)} \sum_{j, k} {(\hat{w} (v_{i}, v_{j}) \hat{w} (v_{i}, v_{k}) \hat{w} (v_{j}, v_{k}))}^{1 / 3},

(19)

where the weights of the edges are normalized by the maximum weight of the network, $\hat{w}$ (_{^{vi, vj}} ) = w(_{^{vi, vj}} )/max_i,j _∈ _V ₍ _G ₎ (w(_{^{vi, vj}} )) and the contribution of each triangle depends on all the weights of the edges.

The average clustering coefficient is the average value of the individual or local coefficientsand is given by:

C L (G) = \frac{1}{n} \sum_{i} C C L (v_{i})

(20)

The clustering coefficient, CL(G), for the co-authorship network refers to the probability that any two collaborators of a researcher have collaborated with each other (^{Onel et al., 2011}33 ONEL S, ZEID A & KAMARTHI S. 2011. The structure and analysis of nanotechnology co-author and citation networks. Scientometrics, 89: 119-138.). In the individual case, the clustering coefficient of a particular author indicates how his collaborators are working together.

PageRank is a method of ranking web pages, measuring effectively the interest of browsers and attention devoted to them, (Page at al. 1999). The PageRank considers the number and quality of links to a web page in order to determine how influential it is (^{Liu et al., 2014}26 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to co-author networks. Physica A, 419: 29-39.). Let _^TA be a web page and _^Ti one of the web pages that connects to _^TA . (Brin & Page 1998) defined PageRank as follows:

P R (T_{A}) = (1 - δ) + δ (\frac{P R (T_{1})}{C (T_{1})} + \dots + \frac{P R (T_{n})}{C (T_{n})}),

(21)

where PR(_^TA ) is the PageRank of page _^TA , PR(_^Ti ) is the PageRank of page _^Ti , C(_^Ti ) is the number of outbound links on page _^Ti and δ is a damping factor (assuming that a person randomly clicks on pages and eventually stops clicking, δ is the probability at any given moment, the person will continue to click), which can be set between 0 and 1.

In the study of (^{Santos 2014}35 SANTOS AM DOS. 2014. Aplicações de modelos de grafos na análise de conceitos e de redes sociais. Recife. 2014. 162 p. Doutorado - Programa de Pós-graduação em Estatística/UFPE.) on co-authorship networks, it was proposed a metric to evaluate the benefit or utility for a given author of belonging to a certain network structure. According to this metric, it is considered that each author has a finite amount of time to devote to scientific collaborations, and that each author receives a utility from an adjacent author who is equal to the proportion of papers that the co-author has with him more the formation of a synergy, which is given by the product of the dedication of each author to the collaboration. Formally, a utility _^Uw (_^vi ) of a given author _^vi in a given graph G is given by:

U^{w} (v_{i}) = \sum_{j} (\frac{w (v_{i}, v_{j})}{C_{d}^{w} (v_{i})} + \frac{w (v_{i}, v_{j})}{C_{d}^{w} (v_{j})} + \frac{w {(v_{i}, v_{j})}^{2}}{C_{d}^{w} (v_{i}) C_{d}^{w} (v_{j})}),

(22)

where w(_{^{vi, vj}} ) is the total number of works between authors _^vi and _^vj , $C_{d}^{w}$ (_^vi ) and $C_{d}^{w}$ (_^vj ) are the weighted degrees of these authors, respectively.

The utility developed by (^{Santos 2014}35 SANTOS AM DOS. 2014. Aplicações de modelos de grafos na análise de conceitos e de redes sociais. Recife. 2014. 162 p. Doutorado - Programa de Pós-graduação em Estatística/UFPE.) was based on the original model of the utility of (^{Jackson & Wolinsky 1996}21 JACKSON MO & WOLINSKY A. 1996. A Strategic Model of Social and Economic Networks. Journal of economic theory, 71: 44-74.). This model takes into account only if the author is or is not connected to another author, disregarding the number of works done together. Thus the utility of a particular author _^vj in a given graph G is given by:

U (v_{i}) = \sum_{j} (\frac{1}{C_{d} (v_{i})} + \frac{1}{C_{d} (v_{j})} + \frac{1}{C_{d} (v_{i}) C_{d} (v_{j})}),

(23)

where _^Cd (_^vi ) and _^Cd (_^vj ) are the centrality degree of vertices _^vi and _^vj , respectively.

To analyze the degree of externality and internality of relations (heterophilia and homophilia, respectively) in a network where the actors are labeled or partitioned by one or several of their features, (^{Krackhardt & Stern 1988}23 KRACKHARDT D & STERN R. 1988. Informal networks and organizational crises: An experimental simulation. Social Psychology Quarterly, 51: 123-140.) proposed a metric called E-I index that assesses the trends of connections between members of the partition cells, comparing the number of connections within and outside the partition cells (^{Hanneman & Riddle, 2005}16 HANNEMAN RA & RIDDLE M. 2005. Introduction to social network methods. Riverside: University of Califórnia.).

E - I i n d e x = \frac{E L - E I}{E L + E I},

(24)

where EL is the number of external relations and EI is the number of internal relations.

The E-I index has values ranging from - 1 to +1. Values close to +1 indicates a higher tendency of the relationship between actors of different cells of the partition (heterophilia), while values closer to - 1 reveal a propensity of actors to relate internally to other actors in the same cell of the partition (homophilia). If the links are equally divided, the E-I index is equal to zero. We also assume that isolated nodes have E-I index equal to zero, since they do not favor neither external nor internal links.

In a weighted network the E-I index can be calculated using the weight of the edges, this way EL is the sum of the edge weights that connect different cells of the partition and EI is the sum of the edge weights that connect actors of the same cell of the partition.

3 METHODOLOGY

In this section, we describe the methodology used to collect data, build the network, calculate SNA metrics and the statistical methods applied.

3.1 Obtaining data and building the co-authorship network

For the construction of a co-authorship network between researchers with CNPq grant in research productivity in the area of Industrial Engineering in Brazil, it was considered as the only data source the list of articles published in journals and those accepted for publications between 2005 and 2014. The following steps were taken to build the network: identification of the researchers and their fellowship level; identification of the Lattes curriculum (the “Lattes Curriculum” presents a history of the scientific activies, academic and professional of the researchers registered in the Lattes Platform (lattes.cnpq.br)) of researchers; identification of the h-index; extraction of the publications; identification of the publications in co-authorship; production co-authorship network; calculation of SNA metrics.

The identification of researchers in the area of Industrial Engineering in Brazil was obtainedfrom the CNPq website. On March 2, 2015, there were in total 145 of them, and these were used in the network construction.

The academic data presented in this study was obtained from the Lattes Platform, which reflects the experience of CNPq in integrating curricula databases. The identification of the Lattes curriculum was held in parallel with the identification of the fellows, because at the moment they were identified, their Lattes IDs (16-digit code that the CNPq uses as an identifier of each Lattes CV) were also registered.

The h-index of the fellows was obtained on the “Indicators of Production” in the CNPq site when using a search engine for Lattes curricula and click on the name of the researcher. In this tab, it is available the h-index calculated by the Web of Science and Scopus. The Scopus h-index was considered because the database of Scopus is greater than that of the Web of Science and thus includes more papers that are listed in the Lattes Curriculum.

To extract the publications of the fellows and to analyze the co-authorships relations, the scriptLattes (Mena-Chalco & Cesar Jr, 2009) was used. With the co-authorship relations found by scriptLattes, a network was built and the calculation of the metrics of this network were performed using the software NetworkX in three ways: unweighted; weighted by edges; weighted by edges and nodes. The metrics applied in this work were: E-I index, Degree centrality, Closeness centrality, Betweenness centrality, Eigenvector centrality, PageRank, Local clustering coefficient, Eccentricity and Utility. All described in Section 2.

3.2 Analysis of the influences of SNA metrics at the fellowships level

The effect of SNA metrics on researchers’ fellowships level will be evidenced by the following means: (i) tables ranking the top 10 researchers; (ii) Kendall correlations between fellowship levels and SNA metrics; (iii) boxplot graphs that compares the distributions of the metrics at the different fellowship levels; and (iv) using a logistic regression model.

The effect of the SNA metrics on the fellowship level of the researchers using the logistic regression was made in three ways, considering only the unweighted metrics, then the weighted metrics and finally the metrics that incorporate the node’s attributes. The method of regression applied was the backward stepwise, this method is characterized by incorporating all variables and then, per step, one variable at a time can be eliminated. Each step removes the least significant variable and the process ends when all variables of the model have p-values less than or equal to the specified significance level (α), here we adopt a equal to 0.1. To ascertain the existence of multicollinearity in the model, before applying the method backward stepwise, we use the Variance Influencing Factor (VIF) in order to avoid adjustment or imprecision problems. This problem exists when there is an exact or approximate linear dependence between the covariates of the model and, generally, the VIF is indicative of multicollinearity problems if VIF>10, ^{Hair (2009}15 HAIR JF, BLACK WC, BABIN BJ, ANDERSON RE & TATHAM RL. 2009. Análise multivariada de dados. Bookman Editora.). To eliminate the effects of multicollinearity, we first calculate the VIF for each variable, considering all of them in the model. Then we eliminate the one with the highest VIF and repeat the process until all VIFs are less than 10.

4 PRESENTATION OF CO-AUTHORSHIP NETWORK AMONG RESEARCHERS

WITH CNPQ GRANT IN RESEARCH PRODUCTIVITY AND IMPACT OF SNA

METRICS IN FELLOWSHIP LEVEL

The co-authorship network among researchers with CNPq grant in research productivity in the area of Industrial Engineering was built using bibliometric data from the period between 2005 and 2014. A total of 3,796 full papers published in journals and 89 accepted for publication were analyzed in the period, totaling 3,885 papers. Distributed among 145 productivity fellows, an average of 2.679 papers for each fellow per year. From these papers, 1,026 were carried out in co-authorship. Table 1 presents an overview of the macro network level. In a similar work, (^{Souza et al. 2016}36 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Co-authorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 29-47.) found a total of 935 papers published by 68 CNPq productivity research fellows in the area of Probability and Statistics from 2009 to 2013, which gives an average of 2.75 papers for each fellow per year.

Number of authors:	145
Number of papers:	3,885
Papers/authors	26.79
Authors/papers	0.037
Number of edges:	161
Number of components:	33
Number of authors in the main component (%):	63.45
Average clustering coefficient:	0.293
Density:	0.015
Diameter*:	13
Radius*:	7
Average distance*:	6.00
Number of shorter paths*:	8,464

Level	E-I index^l
Level	Real	Theoretical
2	0.078	-0.179
1D	0.559	0.621
1C	0.862	0.841
1B	0.826	0.910
1A	0.750	0.841

Rank	UDC				WDC				ZDC
Rank	PQ	Instituicion	value	level	PQ	Instituicion	value	level	PQ	Instituicion	value	level
1	124	UFSCar	10	1A	62	UFF	95	1D	62	UFF	1247	1D
2	0	UFPE	7	1A	108	UNISINOS	87	2	74	UFF	764	2
3	62	UFF	7	1D	56	UNISINOS	86	2	41	EMBRAPA	740	1D
4	65	UFRGS	6	1B	107	UNISINOS	80	2	82	INPE	490	1A
5	111	UFRJ	6	1A	74	UFF	65	2	0	UFPE	461	1A
6	84	PUC-RIO	6	1D	41	EMBRAPA	61	1D	12	PUCRIO	396	1A
7	41	EMBRAPA	5	1D	82	INPE	57	1A	108	UNISINOS	368	2
8	114	UFSC	5	1C	9	UNIFEI	53	2	56	UNISINOS	364	2
9	85	IBMEC	5	1D	117	UNIFEI	50	1D	107	UNISINOS	362	2
10	82	INPE	5	1A	0	UFPE	43	1A	9	UNIFEI	323	2

Rank	WLS				ZLS
Rank	PQ	Institution	Value	Level	PQ	Institution	value	level
1	108	UNISINOS	43.50	2	74	UFF	254.67	2
2	56	UNISINOS	43.00	2	108	UNISINOS	184.00	2
3	107	UNISINOS	26.67	2	56	UNISINOS	182.00	2
4	81	UFF	26.00	2	62	UFF	178.14	1D
5	74	UFF	21.67	2	41	EMBRAPA	148.00	1D
6	39	PUCPR	18.50	2	81	UFF	130.00	2
7	49	UFSCAR	17.00	2	107	UNISINOS	120.67	2
8	110	UFSCAR	17.00	1D	116	CNEN	102.50	1D
9	42	UFRN	14.00	2	82	INPE	98.00	1A
10	92	UFRN	14.00	2	39	PUCPR	95.00	2

Rank	UE				WE				ZE
Rank	PQ	Intitution	value	level	PQ	Intitution	value	level	PQ	Institution	value	level
1	47	UNESP/BAU	7	2	32	USP	4.67	1B	61	USP	0.70	2
2	124	UFSCar	7	1A	61	USP	4.97	2	76	USP	0.78	1B
3	14	ITA	8	1C	47	UNESP/BAU	5.32	2	12	PUC-RIO	0.78	1A
4	23	USP	8	2	76	USP	5.45	1B	102	USP	0.80	2
5	32	USP	8	1B	14	ITA	5.47	1C	86	UNESP/BAU	0.80	2
6	33	UFSCar	8	2	102	USP	5.62	2	104	IBGE	0.81	2
7	36	UFSCar	8	2	124	UFSCar	5.67	1A	32	USP	0.82	1B
8	60	USP	8	1A	12	PUC-RIO	5.70	1A	45	PUC-RIO	0.82	1C
9	61	USP	8	2	48	USP	5.72	2	48	USP	0.82	2
10	86	UNESP/BAU	8	2	86	UNESP/BAU	5.76	2	114	UFSC	0.84	1C

	Estimation	Standard error	Wald	p-value
Intercept	-0.898	0.345	6.779	0.009
E-I index	1.788	0.445	16.120	0.000
UU	1.244	0.390	10.166	0.001

	Estimation	Standard error	Wald	p-value
Intercept	-0.749	0.302	6.131	0.013
E-I index_W	1.246	0.339	13.509	0.000
WBC	0.803	0.349	5.302	0.021
WU	0.618	0.309	3.997	0.046
WE	0.637	0.363	3.081	0.079

	Estimation	Standard error	Wald	p-value
Intercept	-0.609	0.309	3.888	0.049
E-I index_Z	1.248	0.387	10.374	0.001
ZBC	0.564	0.297	3.602	0.058
ZEC	0.705	0.393	3.213	0.073
ZU	1.070	0.412	6.736	0.009
ZLS	-0.871	0.499	3.046	0.081

	Correlations
	UDC	WDC	ZDC
Fellowship Level	0.244**	0.090	0.205**

	Correlations
	WLS	ZLS
Fellowship Level	0.003	0.131*

	Correlations
	UCC	WCC	ZCC
Level of the fellowship	0.211**	0.182**	0.229**

	Correlations
	UBC	WBC	ZBC
Fellowship Level	0.241**	0.307**	0.367**

Brasil

Brasil

EXPLORING THE CO-AUTHORSHIP NETWORK AMONG CNPQ’S PRODUCTIVITY FELLOWS IN THE AREA OF INDUSTRIAL ENGINEERING

ABSTRACT

1 INTRODUCTION

2 SOCIAL NETWORK ANALYSIS METRICS

3 METHODOLOGY

3.1 Obtaining data and building the co-authorship network

3.2 Analysis of the influences of SNA metrics at the fellowships level

4 PRESENTATION OF CO-AUTHORSHIP NETWORK AMONG RESEARCHERS

WITH CNPQ GRANT IN RESEARCH PRODUCTIVITY AND IMPACT OF SNA

METRICS IN FELLOWSHIP LEVEL

4.1 E-I Index

4.2 Degree Centrality

4.3 Average Link Strength

4.4 Closeness Centrality

4.5 Betweenness Centrality

4.6 Eigenvector Centrality

4.7 PageRank

4.8 Local clustering coefficient

4.9 Eccentricity

4.10 Utility

4.11 Logistic regression

5 CONCLUSIONS

REFERENCES

Publication Dates

History