ABSTRACT
In this article, we have built a coauthorship network among researchers with CNPQ grant in research productivity (PQ) in the area of Industrial Engineering and analyze which Social Network Analysis metrics impact their productivity level. Unlike other studies that mostly analyze unweighted networks, ours explored more broadly the network since the metrics were calculated in three ways: unweighted, including the edges weights and including the edges and nodes’ attributes. Thus, the generated results are more precise and detailed since more information is obtained. We consider the hindex of the researchers as the nodes’ attributes and measured the impact using Kendall correlation. We show that geographical distance is still a barrier to collaboration among PQs in this area and that collaboration with researchers with different levels of grant has the greatest impact in the level of the grant a researcher has.
Keywords:
weighted coauthorship network; nodes’ attributes; scientific productivity
1 INTRODUCTION
Coauthorship, development of a publication by two or more authors, is a form of collaboration. For (^{Hudson 1996}19 HUDSON J. 1996. Trends in MultiAuthored Papers in Economics. Journal of Economic Perspectives, 10: 153158.), coauthorship is the most formal expression of intellectual collaboration in scientific research, and the biggest gain of the collaboration is to enable an efficient task division, through the complementary skills or synergy (joint creation of new ideas, not achieved individually). The result of this efficient task division is a scientific production of higher quality and/or quantity. These results had already been reported by (^{Barnett et al. 1988}8 BARNETT AH, AULT RW & KASERMAN DL. 1988. The Rising Incidence of Coauthorship in Economics: Further Evidence. Review of Economics and Statistics, 70: 539543.) as the reason that leads researchers to work together. Other works, such as (^{Eaton et al. 1999}11 EATON JP, WARD JC & KUMAR A. 1999. Structural Analysis of CoAuthor Relationships and Author Productivity in Selected Outlets for Consumer Behavior Research. Journal of Consumer Psychology, 8: 3959.) and (^{Lee & Bozeman 2005}25 LEE S & BOZEMAN B. 2005. The impact of research collaboration on scientific productivity. Social Studies of Science, 35: 673702.), also point productivity as a result of collaboration. (Hart 2000) shows that collaboration improves the quality of publications.
Coauthoring, writing the same paper with other authors, is a form of collaboration which implies a temporal and academic relationship, where authors share ideas and resources. One of the most famous coauthorship networks is the mathematician Paul Erdos network, which has more than 500 coauthors and more than 1,400 published works. The role of Erdos as a collaborator was so significant in the field of mathematics that the Erdos number is set to measure the proximity to Erdos through network coauthorship (^{Liu et al., 2014}26 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to coauthor networks. Physica A, 419: 2939.). Anyone who has published with Erdos has an Erdos number equal to 1, those having a publication with a coauthor Erdos have an Erdos number equal to 2, and so on (^{Newman, 2001c}31 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337370.).
For (Kumar 2015), studies on coauthorship gained new interest after (^{Newman 2001a}29 NEWMAN MEJ. 2001a. The structure of scientific collaboration networks. Proceeding of the National Academy of Sciences, 98: 404409., ^{b}30 NEWMAN MEJ. 2001b. Scientific collaboration networks. I. Network constructions and fundamental results. Physical Review E, vol. 64., ^{c}31 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337370., ^{2004}32 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 52005204.) have used methods of social network analysis to investigate the characteristics and interesting patterns of academic communities. (^{Kempe & Kleinberg 2005}22 KEMPE D, KLEINBERG J & TARDOS E. 2005. Influential nodes in a diffusion model for social networks. In: Automata, Languages and Programming, Springer, vol. 3580, p. 11271138.) also reported the emergence of many researches of coauthorship network analysis that try to identify the most influential authors in it. It is also observed that the works of (^{Huang et al. 2013}18 HUANG PY, LIU HY, CHEN CH & CHENG PJ. 2013. The impact of social diversity and dynamic influence propagation for identifying influencers in social networks, in: Web Intelligence WI and Intelligent Agent Technologies IAT, 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1, p. 410416.) and (^{Liu et al. 2014}26 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to coauthor networks. Physica A, 419: 2939.) sought to study the coauthorship network to assess the status of an author in a particular field, and thus enhance the relations to get closer to the community core by identifying the most influential researchers.
There exists an increasing interest in the study of the influence of the social structure on the behavior and performance of the researchers through social network analysis (SNA). Many of these studies seek to correlate the key centrality metrics of the network with measures based on the number of citations, such as the hindex (“A researcher has a hindex of k, if k of N his works have at least k citations each, and the other (N  k) papers have at most k citations each”, ^{Hirsch, 2005}17 HIRSCH JE. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102: 1656916572.). Such measures, among other factors, can be used to determine the quality of publications.
The work of (^{Yan & Ding 2009}40 YAN E & DING Y. 2009. Applying centrality measures to impact analysis: A coauthorship network analysis. Journal of the American Society for Information Science and Technology, 60: 21072118.) correlated four centrality metrics (degree, closeness, betweenness and PageRank) with the number of citations of publications of the authors of a coauthorship network. These metrics had significant correlations with the citation counts, especially thebetweenness centrality.
In the study of (^{Abbasi & Altmann 2011}2 ABBASI A, ALTMANN J & HOSSAIN L. 2011. Identifying the effects of coauthorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5: 94607.), all normalized metrics of degree centrality, betweenness centrality, closeness centrality, and the weighted degree centrality and efficiency (the ratio between the total number of distinct groups, whose nodes are directly connected, connected by a single central node and the degree of this node) were correlated with the hindex. The results showed that the hindex of the researchers had significant positive correlations only with the degree centrality and efficiency. In the same year, (^{Abbasi et al. 2011}1 ABBASI A & ALTMANN J. 2011. “On the correlation between research performance and social network analysis measures applied to research collaboration networks”. In: Hawaii International Conference on System Sciences, Proceedings of the 41st Annual. Waikoloa, HI: IEEE.) published a paper that analyzed the influence of six SNA metrics: degree centrality, closeness centrality, betweenness centrality, eigenvector centrality (all such normalized), the average links strength and efficiency; on the gindex (the g index is defined as follows: “Given a set of ordered papers in a decreasing way with respect to the number of citations, the g index is the highest value of g in that the first g articles receive together at least g ^{2} citations” ^{Egghe, 2006}12 EGGHE L. 2006. Theory and practise of the gindex. Scientometrics, 69: 131152.). The authors concluded that only the normalized degree centrality, efficiency, and the average links strength had significant influences on the gindex.
Another work that also correlates analytical metrics of social networking with the hindex was published by (^{Wanderley et al. 2014}37 WANDERLEY AJ, DUARTE AN, BRITO AV DE, PRESTES MAS & FRAGOSO FC. 2014. Identificando correlações entre métricas de Análise de Redes Sociais e o hindex de pesquisadores de Ciência da Computação. In. XXXIV Congresso da Sociedade Brasileira de Computação  CSBC, 2014.). The authors created a coauthorship network among researchers of Computer Science and calculated normalized metrics of degree centrality, closeness centrality, and betweenness centrality, weighted degree centrality and authority (calculated by adding the number of hubs, nodes with many links, with which a node is connected). Only the betweenness centrality and the weighted degree centrality had significant positive correlations. The authority also showed significant, but negative correlation.
(^{Souza et al. 2016}36 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Coauthorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 2947.), in a coauthorship network among researchers with CNPq grant in research productivity in the area of Statistics, showed that the most productive fellows are also the most central in the network and that the metrics degree centrality and closeness centrality had a higher impact on the number of articles published by a fellow.
According to (^{CNPq 2015}10 CN. 2015. Critérios de Julgamento. Available in: <Available in: http://www.cn.br/web/guest/criteriosdejulgamento
>. Access em: 28 de janeiro de 2015.
http://www.cn.br/web/guest/criteriosde...
), the research productivity fellowship (PQ) is organized in levels, in ascending order: 2, 1D, 1C, 1B, 1A. The PQ is attributed to researchers from all areas of knowledge in Brazil, based not only on the quality of a submitted project, but mainly in the “quality” of the researcher (^{Wainer & Vieira, 2013}38 WAINER J & VIEIRA P. 2013. Correlation between bibliometrics and peer evaluation for all disciplines: the avaluation of Brazilian scientists. Scientometrics online, 96: 395410.).
The work of (Fonseca & Digiampietri 2016) builds two kinds of classifiers using as attributes considering SNA metrics and other bibliometric measures. The first kind identifies among researchers in the area of Computer Science who are the fellowships holders and the other kind identifies the fellowship level of a given researcher. Other studies analyzing the impact of the coauthorship networks in the performance of the researchers were presented in (^{Andrade & Rêgo 2015a}7 ANDRADE RL & RÊGO LC. 2015b. A influência da rede de coautoria no nível das bolsas de produtividade da área de engenharia de produção. In: XXXV Congresso da Sociedade Brasileira de Computação  CSBC., ^{2015b}7 ANDRADE RL & RÊGO LC. 2015b. A influência da rede de coautoria no nível das bolsas de produtividade da área de engenharia de produção. In: XXXV Congresso da Sociedade Brasileira de Computação  CSBC.).
In these previous works, weighted metrics were not used, i.e., metrics calculated consideringthe frequency of the collaboration, with the exception of the weighted degree centrality. To the best of our knowledge, there are few works exploring such metrics in SNA. (Liu et al. 2015) proposed a method that inserts the importance (based on citations) of researchers in coauthorship network structures redefining the weight of the edges. This weight is used in the calculation of PageRank, applied to the Erdos network, to identify the most influential authors. (^{Andrade 2016}5 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de PósGraduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado  Programa de Pósgraduação em Engenharia de Produção/UFPE.) also developed a metric that inserts the importance of nodes in the network structure, in this work, the weight of a given edge is equal to the average of the nodes’ attributes connected by this edge times the original weight of the edge. This work has addressed the impact of the nodes’ attributes in different SNA metrics.
The objective of this work is to identify the researchers with the CNPq grant in research productivity in the area of Industrial Engineering in Brazil, to analyze their academic achievements in terms of published papers, to construct a coauthorship network among such researchers, to analyze the characteristics of the network and to verify which SNA metrics have more impact in their productivity level. SNA metrics will be calculated in three ways: unweighted, weighted with the weight the edges and weighted with weights of the edges and the nodes’ attributes.
Our third analysis of social network metrics is in the context of nonhomogenous nodes. Thus being in equivalent positions in the network may have different impacts on the SNA metrics if nodes’ attributes are taken into consideration. The hfactor is a node attribute which was freely available to use and is clearly related to the prestige of some researcher. Therefore, it was chosen to be applied in this context.
The structure of this work is divided as follows: in this first section we present a review of the studies that analyzed the influence of the SNA metrics, coauthorship networks and the performance of the researchers; Section 2 briefly presents the SNA metrics used in this work; Section 3 describes the methodology used to create the coauthorship network; the coauthorship network and the impact of individual SNA metrics on the level of productivity is presented in Section 4. Finally, Section 5 presents the final considerations of the study and proposals for future work.
2 SOCIAL NETWORK ANALYSIS METRICS
A weighted network can be defined as a set of nodes, V(G), a set of edges, E(G), which consists of ordered pairs of nodes, and a weighted adjacency matrix, W(G), where w(_{vi, vj} ) represents the weight associated with the edge connecting the pair of vertices, _{vi} and _{vj} . We assume that w(_{vi, vj} ) = w(_{vj, vi} ), since coauthorship is a symmetric relation.
(Liu et al. 2015) and (^{Andrade 2016}5 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de PósGraduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado  Programa de Pósgraduação em Engenharia de Produção/UFPE.) developed methods to include nodes’ attributes in the SNA. Thus, it is possible to classify networks as unweighted, weigthed by edges and weighted by edges and nodes. Since Liu et al.’s method transform the network into an asymmetric relation, we do not view it as appropriate to study coauthorship. Therefore, we focus here on Andrade’s method which mantains the symmetric nature of coauthorships.
The metric proposed by (^{Andrade 2016}5 ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de PósGraduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado  Programa de Pósgraduação em Engenharia de Produção/UFPE.) as a way to take into consideration the importance of the node in the network context is given by:
where Z(_{vi, vj} ) equals the edge weight w(_{vi, vj} ) between vertices _{vi} and _{vj} , combined with the attributes of these vertices s(_{vi} ) and s(_{vj} ), respectively. With the incorporation of the attributes of the nodes in the network, Z(G) shall be the new weighted adjacency matrix and Z(_{vi, vj} ) the new edge weight between vertices _{vi} and _{vj} . The attributes of the vertices are measurable characteristics associated with the type of relationship that connects them.
A binary network with n vertices is represented by an adjacency matrix A(G) with n × n elements, where
The SNA metrics can be divided into global, describing the characteristics of the whole graph, and individual, which are related to the analysis of individual properties of network actors (nodes or vertices).
The number of edges, as the name suggests, refers to the cardinality of the set of edges, E(G), denoted by #E(G).
A path between two vertices, _{vi} and _{vj} , is a sequence of vertices c = (v _{0}, v _{1}, v _{2}, ..._{vk} ) such that v _{0} = _{vi} , _{vk} = _{vj} , _{vl} is adjacent to v _{(} _{l} _{+1)}, for l = 0, 1, …, k  1 and there is no pair of vertices that appear more than once in the sequence. A set of vertices is said to be connected if there exists a path between any two vertices in the set. A graph is connected if there is a path between any two vertices and is complete if every vertice is connected to one another.
The density calculates how close the graph is to being complete. That is, the relationship between total connections in the graph and the total connections if all vertices were connected to each other. For an undirected graph with n nodes, the density is defined as:
A geodesic path or shortest path is the shortest path between two vertices, (^{Newman 2004}32 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 52005204.). The geodesic path length, d(_{vi} , _{vj} ), also called geodesic distance or shortest distance, thus, is the shortest distance in the network between these two vertices. Given a path c = (v _{0}, v _{1}, v _{2}, ..._{vk} ) between vertices _{vi} and _{vj} , the length of this path is given by _{dc} . Let C(_{vi} , _{vj} ) be the set of all paths between vertices _{vi} and _{vj} , then the geodesic distance is defined by:
In the case of weighted networks, the length of a path c = (v _{0}, v _{1}, v _{2}, ..._{vk} ) between vertices _{vi} and _{vj} , can be formally defined by Dijkstras algorithm (^{Newman 2001}31 NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337370.) and (Brandes 2001):
And the weighted geodesic distance is given by:
The largest geodesic distance between any pair of vertices is called the diameter of a graph and in a binary network, it can vary from a minimum of 1, if the graph is complete, to a maximum of n  1, where n is the number of vertices in the graph. Formally, the diameter of the connected graph G is given by:
In case of weighted networks, the weighted diameter is calculated using the weighted geodesic distance, _{dw} (_{vi, vj} ).
Also known as the giant component, the size of the largest connected component, refers to the cardinality of the connected component with the highest number of nodes.
Given a vertex _{vi} , eccentricity, e(_{vi} ), is the maximum distance from it to any other vertex of the graph. The relationship of a vertex to other vertices is better the smaller the eccentricity. The _{vi} eccentricity is given by:
The diameter, as defined above, is equal to the maximum eccentricity, while the minimum eccentricity is the radius. In the case of weighted networks, the eccentricity may be calculated using the weighted geodesic distance, _{dw} (_{vi, vj} ).
The degree centrality, proposed by (^{Freeman 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215239.), is calculated in terms of the number of adjacent vertices, namely, degree centrality of the vertex _{vi} , denoted by _{Cd} (_{vi} ) is the number of vertices adjacent to vertex _{vi} . Formally the degree centrality is defined by:
If the network is weighted, the degree centrality of vertex _{vi} is equal to the sum of the weights of the edges that are connected to the vertex _{vi} . For (^{Newman 2004}32 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 52005204.) and (Barrat et al. 2004) the weighted degree centrality is defined by:
The degree centrality is the simplest and easiest way to measure the influence of a node (^{Abbasi et al., 2012}3 ABBASI A, HOSSAINA L & LEYDESDORFF L. 2012. Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks. Journal of Informetrics, 6: 403412.; ^{Liu et al., 2005}27 LIU X, BOLLEN J, NELSON ML & SOMPEL H VAN DE. 2005. Coauthorship networks in the digital library research community. Information Processing and Management, 41: 14611480.). In a coauthorship network, this metric identifies the most active and popular authors (^{Abbasi et al., 2011}2 ABBASI A, ALTMANN J & HOSSAIN L. 2011. Identifying the effects of coauthorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5: 94607.; ^{Anastasios et al., 2012}4 ANASTASIOS T, SGOIROPOULOU C, PAPAGEORGIOU E, TERRAZ O & MIAOULIS, G. 2012. Coauthorship networks in academic research communities: the role of network strength. 16th Panhellenic Conference on Informatics.; ^{Freeman, 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215239.).
Another metric to analyze a node on the network started from the theory of “the strong links” of (^{Krackhardt 1992}24 KRACKHARDT D. 1992. The strength of strong ties: The importance of philos in organizations. In Networks and Organizations: Structure, Form, and Action, p. 216239.). The average link strength of vertex _{vi} is defined as the ration between the weighted degree, ${C}_{d}^{w}$(_{vi} ), and the degree centrality, _{Cd} (_{vi} ):
Therefore, LS(i) represents the average weight of the links of node _{vi} .
A metric that takes into consideration the geodesic distance from a given initial node to all other nodes of the network is the closeness centrality. (Freeman 1978) asserted that the closeness centrality of vertex _{vi} , defined by _{Cc} (_{vi} ), is given by:
The most central vertices in a network according to this metric are those that have a smaller distance to the other vertices. In weighted networks, the weighted closeness centrality is given by:
A node that is on average in a position closer to the other nodes can get information more efficiently, i.e., closeness metric is related to the independence and efficient communication with other nodes (^{Freeman, 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215239.).
The betweenness centrality of vertex _{vi} is the sum, for every pair of nodes different from _{vi} , of the ratio between the number of shortest paths between the given pair of nodes that go through _{vi} , and the total number of shortest paths between the given pair of nodes (^{Freeman, 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215239.; ^{Wasserman & Faust, 1994}39 WASSERMAN S & FAUST K. 1994. Social Networks Analysis: Methods and Applications. Cambridge University Press. Structural analysis in social the social sciences series, vol. 8.). The betweenness centrality, _{Cb} (_{vi} ), of vertex _{vi} is given by:
where g(_{vj, vk} ) is the number of shortest paths between vertex _{vj} and vertex _{vk} and g(_{vj, vi, vk} ) is the number of shortest paths between vertex _{vj} and vertex _{vk} going through _{vi} .
In a weighted network the betweenness centrality is given by:
where _{gw} (_{vj, vk} ) is the number of weighted shortest paths between vertex _{vj} and vertex _{vk} and _{gw} (_{vj, vi, vk} ) is the number of weighted shortest paths between vertex _{vj} and vertex _{vk} going through _{vi} , considering the weighted distance, _{dw} (_{vi, vj} ).
The betweenness is an indicator of the potential of a node to play the role of “mediator” or “gatekeeper” (^{Freeman, 1979}14 FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215239.; ^{Abbasi et al., 2012}3 ABBASI A, HOSSAINA L & LEYDESDORFF L. 2012. Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks. Journal of Informetrics, 6: 403412.), being able to control more often the flow of information on the network.
A metric of importance of the vertex in the network based on the connections, the eigenvector centrality is supported on the idea that a particular node will have high centrality if it is connected to vertices with central positions in the network (^{Bonacich, 1987}9 BONACICH P. 1987. Power and centrality: a family of measures. The American Journal of Sociology, 92: 11701182.). In other words, the centrality of the vertex does not depend only on the number of adjacent vertices but also on the centrality of these vertices. Let λ be a constant, then the eigenvector centrality of _{Ce} (_{vi} ) is given by:
Using the vector notation, let X = (_{Ce} (1), _{Ce} (2) ... _{Ce} (n)) be the vector of eigenvector centralities, we can rewrite Equation (14) as λX = AX. By assuming that the eigenvector centrality assumes only nonnegative values (using the PerronFrobenius theorem), it can be shown that λ is the largest eigenvalue of the adjacency matrix, where X is the corresponding eigenvector (^{Jackson, 2008}20 JACKSON MO. 2008. Social and Economic Networks. Princeton University Press. Stanford University, February 2008.).
In the case of weighted networks, the elements of the adjacency matrix are the weights of the edges, w(_{vi, vj} ), (^{Newman, 2004}32 NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 52005204.). And the eigenvector centrality is defined by:
The local clustering coefficient indicates how connected are the nodes adjacent to a given node and together with the average value of the shortest path, can identify a “smallworld” effect (networks with large cluster coefficient and the relatively short distance between the nodes), Watts & Strogatz (1998). The clustering coefficient of a vertex _{vi} is the ratio of the number of triangles that contains vertex _{vi} and the number of possible edges between the neighboring vertices. Let NT(_{vi} ) be the number of triangles (consists of three nodes connected by three links) containing vertex _{vi} . For (Onnela et al. 2005), the local cluster coefficient is defined as:
The weighted local clustering coefficient was proposed by (Onnela et al. 2005) and is given by:
where the weights of the edges are normalized by the maximum weight of the network, $\widehat{w}$(_{vi, vj} ) = w(_{vi, vj} )/max_{i,j} _{∈} _{V} _{(} _{G} _{)} (w(_{vi, vj} )) and the contribution of each triangle depends on all the weights of the edges.
The average clustering coefficient is the average value of the individual or local coefficientsand is given by:
The clustering coefficient, CL(G), for the coauthorship network refers to the probability that any two collaborators of a researcher have collaborated with each other (^{Onel et al., 2011}33 ONEL S, ZEID A & KAMARTHI S. 2011. The structure and analysis of nanotechnology coauthor and citation networks. Scientometrics, 89: 119138.). In the individual case, the clustering coefficient of a particular author indicates how his collaborators are working together.
PageRank is a method of ranking web pages, measuring effectively the interest of browsers and attention devoted to them, (Page at al. 1999). The PageRank considers the number and quality of links to a web page in order to determine how influential it is (^{Liu et al., 2014}26 LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to coauthor networks. Physica A, 419: 2939.). Let _{TA} be a web page and _{Ti} one of the web pages that connects to _{TA} . (Brin & Page 1998) defined PageRank as follows:
where PR(_{TA} ) is the PageRank of page _{TA} , PR(_{Ti} ) is the PageRank of page _{Ti} , C(_{Ti} ) is the number of outbound links on page _{Ti} and δ is a damping factor (assuming that a person randomly clicks on pages and eventually stops clicking, δ is the probability at any given moment, the person will continue to click), which can be set between 0 and 1.
In the study of (^{Santos 2014}35 SANTOS AM DOS. 2014. Aplicações de modelos de grafos na análise de conceitos e de redes sociais. Recife. 2014. 162 p. Doutorado  Programa de Pósgraduação em Estatística/UFPE.) on coauthorship networks, it was proposed a metric to evaluate the benefit or utility for a given author of belonging to a certain network structure. According to this metric, it is considered that each author has a finite amount of time to devote to scientific collaborations, and that each author receives a utility from an adjacent author who is equal to the proportion of papers that the coauthor has with him more the formation of a synergy, which is given by the product of the dedication of each author to the collaboration. Formally, a utility _{Uw} (_{vi} ) of a given author _{vi} in a given graph G is given by:
where w(_{vi, vj} ) is the total number of works between authors _{vi} and _{vj} , ${C}_{d}^{w}$(_{vi} ) and ${C}_{d}^{w}$(_{vj} ) are the weighted degrees of these authors, respectively.
The utility developed by (^{Santos 2014}35 SANTOS AM DOS. 2014. Aplicações de modelos de grafos na análise de conceitos e de redes sociais. Recife. 2014. 162 p. Doutorado  Programa de Pósgraduação em Estatística/UFPE.) was based on the original model of the utility of (^{Jackson & Wolinsky 1996}21 JACKSON MO & WOLINSKY A. 1996. A Strategic Model of Social and Economic Networks. Journal of economic theory, 71: 4474.). This model takes into account only if the author is or is not connected to another author, disregarding the number of works done together. Thus the utility of a particular author _{vj} in a given graph G is given by:
where _{Cd} (_{vi} ) and _{Cd} (_{vj} ) are the centrality degree of vertices _{vi} and _{vj} , respectively.
To analyze the degree of externality and internality of relations (heterophilia and homophilia, respectively) in a network where the actors are labeled or partitioned by one or several of their features, (^{Krackhardt & Stern 1988}23 KRACKHARDT D & STERN R. 1988. Informal networks and organizational crises: An experimental simulation. Social Psychology Quarterly, 51: 123140.) proposed a metric called EI index that assesses the trends of connections between members of the partition cells, comparing the number of connections within and outside the partition cells (^{Hanneman & Riddle, 2005}16 HANNEMAN RA & RIDDLE M. 2005. Introduction to social network methods. Riverside: University of Califórnia.).
where EL is the number of external relations and EI is the number of internal relations.
The EI index has values ranging from  1 to +1. Values close to +1 indicates a higher tendency of the relationship between actors of different cells of the partition (heterophilia), while values closer to  1 reveal a propensity of actors to relate internally to other actors in the same cell of the partition (homophilia). If the links are equally divided, the EI index is equal to zero. We also assume that isolated nodes have EI index equal to zero, since they do not favor neither external nor internal links.
In a weighted network the EI index can be calculated using the weight of the edges, this way EL is the sum of the edge weights that connect different cells of the partition and EI is the sum of the edge weights that connect actors of the same cell of the partition.
3 METHODOLOGY
In this section, we describe the methodology used to collect data, build the network, calculate SNA metrics and the statistical methods applied.
3.1 Obtaining data and building the coauthorship network
For the construction of a coauthorship network between researchers with CNPq grant in research productivity in the area of Industrial Engineering in Brazil, it was considered as the only data source the list of articles published in journals and those accepted for publications between 2005 and 2014. The following steps were taken to build the network: identification of the researchers and their fellowship level; identification of the Lattes curriculum (the “Lattes Curriculum” presents a history of the scientific activies, academic and professional of the researchers registered in the Lattes Platform (lattes.cnpq.br)) of researchers; identification of the hindex; extraction of the publications; identification of the publications in coauthorship; production coauthorship network; calculation of SNA metrics.
The identification of researchers in the area of Industrial Engineering in Brazil was obtainedfrom the CNPq website. On March 2, 2015, there were in total 145 of them, and these were used in the network construction.
The academic data presented in this study was obtained from the Lattes Platform, which reflects the experience of CNPq in integrating curricula databases. The identification of the Lattes curriculum was held in parallel with the identification of the fellows, because at the moment they were identified, their Lattes IDs (16digit code that the CNPq uses as an identifier of each Lattes CV) were also registered.
The hindex of the fellows was obtained on the “Indicators of Production” in the CNPq site when using a search engine for Lattes curricula and click on the name of the researcher. In this tab, it is available the hindex calculated by the Web of Science and Scopus. The Scopus hindex was considered because the database of Scopus is greater than that of the Web of Science and thus includes more papers that are listed in the Lattes Curriculum.
To extract the publications of the fellows and to analyze the coauthorships relations, the scriptLattes (MenaChalco & Cesar Jr, 2009) was used. With the coauthorship relations found by scriptLattes, a network was built and the calculation of the metrics of this network were performed using the software NetworkX in three ways: unweighted; weighted by edges; weighted by edges and nodes. The metrics applied in this work were: EI index, Degree centrality, Closeness centrality, Betweenness centrality, Eigenvector centrality, PageRank, Local clustering coefficient, Eccentricity and Utility. All described in Section 2.
3.2 Analysis of the influences of SNA metrics at the fellowships level
The effect of SNA metrics on researchers’ fellowships level will be evidenced by the following means: (i) tables ranking the top 10 researchers; (ii) Kendall correlations between fellowship levels and SNA metrics; (iii) boxplot graphs that compares the distributions of the metrics at the different fellowship levels; and (iv) using a logistic regression model.
The effect of the SNA metrics on the fellowship level of the researchers using the logistic regression was made in three ways, considering only the unweighted metrics, then the weighted metrics and finally the metrics that incorporate the node’s attributes. The method of regression applied was the backward stepwise, this method is characterized by incorporating all variables and then, per step, one variable at a time can be eliminated. Each step removes the least significant variable and the process ends when all variables of the model have pvalues less than or equal to the specified significance level (α), here we adopt a equal to 0.1. To ascertain the existence of multicollinearity in the model, before applying the method backward stepwise, we use the Variance Influencing Factor (VIF) in order to avoid adjustment or imprecision problems. This problem exists when there is an exact or approximate linear dependence between the covariates of the model and, generally, the VIF is indicative of multicollinearity problems if VIF>10, ^{Hair (2009}15 HAIR JF, BLACK WC, BABIN BJ, ANDERSON RE & TATHAM RL. 2009. Análise multivariada de dados. Bookman Editora.). To eliminate the effects of multicollinearity, we first calculate the VIF for each variable, considering all of them in the model. Then we eliminate the one with the highest VIF and repeat the process until all VIFs are less than 10.
4 PRESENTATION OF COAUTHORSHIP NETWORK AMONG RESEARCHERS
WITH CNPQ GRANT IN RESEARCH PRODUCTIVITY AND IMPACT OF SNA
METRICS IN FELLOWSHIP LEVEL
The coauthorship network among researchers with CNPq grant in research productivity in the area of Industrial Engineering was built using bibliometric data from the period between 2005 and 2014. A total of 3,796 full papers published in journals and 89 accepted for publication were analyzed in the period, totaling 3,885 papers. Distributed among 145 productivity fellows, an average of 2.679 papers for each fellow per year. From these papers, 1,026 were carried out in coauthorship. Table 1 presents an overview of the macro network level. In a similar work, (^{Souza et al. 2016}36 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Coauthorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 2947.) found a total of 935 papers published by 68 CNPq productivity research fellows in the area of Probability and Statistics from 2009 to 2013, which gives an average of 2.75 papers for each fellow per year.
The network is divided into 33 components, and the giant component consists of 92 vertices, representing approximately 63.45% of the network vertices; the second largest component has 8 vertices (5.52%) and 21 researchers are isolated in the network, in other words, about 14.48% of the fellows do not have collaborators on the network. In the work of (^{Souza et al. 2016}36 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Coauthorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 2947.), the giant component corresponded to 70.59% of the nework vertices and isolated nodes corresponded to 13.24%, showing that the Probability and Statistics community seems to be more connected than the Industrial Engineering one. Thus on average, a researcher collaborated with a little over 2 other researchers holding CNPq grant in research productivity.
The network contains 161 edges, which gives an average centrality degree of 2.221 and a density equal to 0.015, that is, only 1.5% of the possible connections in the network occur. Thus on average, a researcher collaborated with a little over 2 other researchers holding CNPq grantin research productivity during this 10year period. A low density of 4.7% with an average centrality degree of 3.147 was also found by (^{Souza et al. 2016}36 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Coauthorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 2947.). This result is superior to the one present in this work, even though Souza et al. only considered papers published or accepted for publication in the period of 2009 to 2013, half of the length of time considered here. However, these low densities can be justified by the fact that the network is formed only by a small part of the researchers’ production (only papers published in journals and papers accepted for publications in a certain period of time) and only analyzes collaboration among CNPq grant in research productivity in the same area, not taking into account collaboration with other researchers. Moreover, since Industrial Engineering encompasses a diverse number of subareas, that result suggests that the fellowships are dispersed along different subareas, what reduces the chance of a collaboration among those researchers.
The network diameter is equal to 13 and the radius 0, representing the maximum and the minimum eccentricity, respectively, and radius of the giant component is equal to 7. The average clustering coefficient is equal to 0.293, knowing that this coefficient may vary from 0 to 1, then we have that just under a third of the possible coauthorships among coauthors of a given author are present on the network. (^{Souza et al. 2016}36 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Coauthorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 2947.) found an average clustering coefficient of 0.31 in their network, suggesting that the Probability and Statistics community seems to bemore cohesive than the Industrial Engineering one.
The average distance of a path between a pair of vertices is approximately 6.00. This value refers to the giant component and means that, on average, 6.00 connections separate two researchers in that component. The number of shortest paths is 8,464.
Figure 1 illustrates the coauthorship network of the fellows generated by the software Gephi, where the thickness of the edges is proportional to their weights (total papers coauthored), and the diameter of the vertex is proportional to its centrality degree.
The SNA metrics of this coauthorship network were calculated in three ways: nonweighted (NP), weighted by edges (W) and weighted by edges and nodes (Z). The results are shown next.
4.1 EI Index
To analyze the degree of externality and internality of the researchers, where they were labeled by the fellowship levels, the EI index^{l} metric was used. There is a significant correlation between the EI index^{l} and the fellowship level which is equal to 0.406 (at a significance level of 0.01). Thus, researchers who establish relationships with researchers with different fellowship levels tend to have higher fellowship levels. If relations take into account the weights of the edges (EI index^{l} _W), the correlation with the fellowship level has a decrease and is equal to 0.343 (at a significance level of 0.01). Whereas considering both the edges and nodes’ attributes (EI index^{l} _Z), the correlation is equal to 0.302 (at significance level 0.01).
However, this result is somewhat misleading since one has to take into account the distribution of the fellowship levels among researchers. Of 145 researchers, 86 (59.31%) had fellowship level 2; 28 (19.31%) had fellowship level 1D; 12 (8.28%) with fellowship level 1C; 7 (4.83%) with fellowship level 1B and 12 (8.28%) with fellowship level 1A. Thus, assuming the formation of coauthorships at random, there is a greater likelihood of researchers from lower levels to engage in collaboration with researchers of the same level and of researchers of higher levels to engage in collaboration with researchers from different levels. Table 2 shows a comparison of the actual value with the theoretical, this is, what was expected if all nodes were connected. From this comparison, one can conclude that researchers with fellowship levels 2 and 1C tend to have higher collaboration with researchers with different fellowship levels than what is expected if collaboration is chosen at random. On the other hand, researchers with fellowship levels 1D, 1B and 1A have the opposite behavior.
Figure 2 shows the distribution of the EI index^{l} , EI index^{l} _W and EI index^{l} _Z, respectively, at different levels of fellowships. The largest variations are presented by the levels 2 and 1D. Level 2 shows the smaller median.
The level 1C has the smallest variation, almost all researchers in that level have EI index equal to 1, meaning that relationships are strictly external. All level 1B researchers have more external than internal links. One can also observe that the inclusion of either the weights of the links or the nodes’ attributes mantain the main characteristics of the EI index^{l} accross the different fellowship levels.
We also verify the external and internal relations of the researchers when considered the region of actuation, that is, the regional location of the institution that they operate. We present the result in Figure 3. The centerwest region has only two researchers, one has no relationship and the other, therefore, has an external relationship. Researchers in the northeast, southeast, and south regions have a predominance of internal relations over external. It is also observed that the external relations of the researchers of the southeast, as well as those of the south, are less intense, that is, they collaborate little with the same researchers from other regions. Therefore, the results show that geographical distances are still a main barrier to be overcomed by the researchers in such community. There is no significant correlation between the EI index^{r} , the EI index^{r} _W and the EI index^{r} _Z for geographical regions with the fellowship level.
4.2 Degree Centrality
Table 3 ranks the 10 researchers with higher degree centrality, calculated in three ways: unweighted degree centrality  UDC; Wweighted degree centrality  WDC; and Zweighted degree centrality  ZDC. In this table, it can be seen how the number of links, the frequencies of the links and the combination of the frequency of links with the weights of the nodes change the positions of researchers. For example, the researcher PQ124 appears in the first position in the UDC, but when considering the weight of the links and the importance of the node this research does not appear in the top ten. In this case, the researcher PQ124 has the greatest number of coauthors, but with lower frequencies of collaboration compared, for example, with researcher PQ62 who was second in UDC and assumes the first position in WDC and ZDC. Even though being such a central node in the network, PQ62 is only level 1D in his productivity grant.
Regarding the productivity level of the researchers of lower levels (2 and 1D), they occupy positions among the top 10, mainly when considering WDC and the ZDC. Surprisingly, UNISINOS is the institution having more PQs in the top 10 according to WDC and ZDC, all of them being level 2.
Table 4 shows the correlations of the three degree centrality metrics with the fellowship level. It is observed that only WDC does not have a significant correlation with the fellowship level. The unweighted degree centrality has the highest correlation with the fellowship level. Thus, collaborating with more authors or collaborating frequently with authors of higher performance (hindex) impacts the fellowship level.
Figure 4 shows the boxplots graphs to evaluate and compare the distributions of degree centrality among the fellowship levels. Level 1A has the highest variability and highest median and level 1B has the smaller variability, considering or not the weights of the edges or nodes’ attributes.A larger number of outliers are observed in the lower levels, when considering the weights W or Z, revealing that the high values of WDC or ZDC obtained by some researchers are atypical (rare) to these fellowship levels.
4.3 Average Link Strength
The results of the 10 authors better positioned, according to the average link strength are shown in Table 5. The Waverage link strength  WLS, is the result of the ratio between WDC and UDC and the Zaverage link strength  ZLS is the ratio of ZDC and UDC. Note that the participation of level 2 researchers predominates in the top 10 positions according to WLS and, moreover, even considering the importance of nodes, this participation decreases but is still high. This indicates that fellows with lower level tend to focus their work with some other fellows, while fellows with the highest level tend to further diversify their collaborations. UNISINOS and UFF had the greatest number of PQs among the top 10 average link strengthes.
Table 6 shows the correlations between the two metrics of the average link strength with the level of productivity. The average links strength has little impact on the fellowship level when considering the importance of the node, and the impact is almost zero and not significant at the level 0.05 when it is not considered.
Figure 5 shows the boxplots graphs to evaluate and compare the distributions of averages link strength among the fellowship levels. Level 1D shows the highest variability and highest median in both metrics. Levels 2 and 1D have outliers.
4.4 Closeness Centrality
Table 7 shows the results of the 10 authors better positioned, according to the unweighted closeness centrality  UCC; Wweighted closeness centrality  WCC and Zweighted closeness centrality  ZCC.
To illustrate the change in the positions of the nodes in the three closeness centrality metrics observe researchers PQ32 and PQ60. Researcher PQ32 is the second closest of the other nodes, by UCC, in this case, the sum of the distances from it and the other nodes is smaller than the sum of the distances of PQ60 to other nodes. However, considering WCC, the paths that connect researcher PQ60 to other nodes are formed by more frequent connections than those from the paths that connect researcher PQ32 to other nodes. As the frequency of connections shortens the paths, researcher PQ60 obtained a better position according to WCC. This researcher also remained in second place according to ZCC.
Regarding the fellowship level, researchers with higher levels predominate among the 10 positions according to the three metrics of closeness centrality. Level 2 researchers take on average three positions in this table. Researchers working at USP and UFScar are predominant in these rankings, implying that PQs at these institutions have easier access to other PQs in the network. PQ124 from UFScar obtained the highest value according to the three methods, what shows a proeminent position in the network. Table 8 shows the correlations of the three metrics of closeness centrality with the fellowship level.
The three closeness centrality metrics showed significant positive correlations with the fellowship level. The one that presented the highest correlation was ZCC, followed by UCC. Thus, because they have higher possibilities of establishing partnerships publications, researchers with greater closeness centralities also tend to have a higher fellowship level. Furthermore, the researcher who is closest to the leading researchers tends to have a higher fellowship level.
Figure 6 presents the box plots to evaluate and compare the distributions of closeness centralities among the fellowship levels. In the first two graphs, the highest variability is obtained by level 1C, and the highest median and smallest variability are obtained by level 1B. In the third graph, level 2 has the highest variability and level 1B maintains the highest median.
4.5 Betweenness Centrality
Table 9 shows the results of the top 10 authors according to the unweighted betweenness centrality  UBC; Wweighted betweenness centrality  WBC; and Zweighted betweenness centrality  ZBC.
Researcher PQ124 is the most central in the three betweenness centralities, implying that he is the researcher that the greater ability to connect different pairs of PQs in the Industrial Engineering area. It is observed that most researchers classified among the top 10 in the UBC are also classified in the other two betweenness centralities, with small variations in the positions. Thus, the use of the weight of the edges or vertices does not change significantly the betweenness centrality of researchers in the network. Moreover, levels 1A and 1B are majority in this table and level 2 figures on average in 3 positions in each metric. USP also has the greater number of PQs occupying the top 10 positions according to the three methods. Table 10 shows the correlations of the three betweenness centrality metrics with the fellowship level.
The three betweenness centralities metrics showed significant positive correlations with the fellowship level. The one that presented the highest correlation with the fellowship level was ZBC, that is, considering the importance of nodes, followed by WBC. Thus, researchers who assume the role of “intermediary”, controlling the frequency of information flow tend to have higher levels of productivity, however, those that intermediate nodes in paths whose connections are more frequent and or have the most important nodes have higher fellowship levels.
In Figure 7, you can view the center, the dispersion, the diversion of symmetry and the identification of the observations considered atypical. In these three graphs, levels 1A and 1B show the highest variability and level 1B the highest medians. Level 2 has the smallest variation and the highest number of atypical points.
4.6 Eigenvector Centrality
Table 11 shows the top authors according to the eigenvector centrality: unweighted eigenvector centrality  UEC; Wweighted eigenvector centrality  WEC; and Zweighted eigenvector centrality  ZEC.
It is evident that the composition of the top 10 positions according to the three eigenvector centrality metrics are formed by different researchers, only researcher PQ14, tenth placed in UEC appears twice in the table position 7 in ZEC. As for researchers fellowship levels, the majority are of levels 2 and 1D. UFPE, UNISINOS and UFF have the PQs with highest UEC, WEC and ZEC values, respectively. The correlations among the eigenvector centrality metrics and the fellowship level are presented in Table 12.
For the eigenvector centrality, researchers connected with more central researchers in accordance with the degree, have higher centrality. Thus, according to UEC, (resp., WEC or ZEC) a researcher will have a higher centrality if he is connected to researchers with greater UEC (resp., WEC or ZEC). The correlations of these metrics with the fellowship level were significant, especially UEC and ZEC.
Figure 8 presents the box plots to evaluate and compare the distributions of the eigenvector centralities among the fellowship levels. To get a better view were disregarded in these graphs the isolated nodes they have eigenvector centrality equal to zero and the scale used was logarithmic. You can view that level 1A has the greatest variability.
4.7 PageRank
Table 13 presents the results of the top 10 authors according to the unweighted PageRank  UPR; Wweighted PageRank  WPR and; Zweighted PageRank  ZPR. The most influential researchers have fellowship level 1A. They are present mainly in the WPR and ZPR. Note also that the ranking of the researchers according to these three metrics remain almost unchanged. PQ124 and PQ82 have obtained the highest values for the PageRank, implying that their importance in the network is related to having collaborated with other proeminent PQs.
Table 14 shows that the correlations of the PageRank with the fellowship level are significant. However, the UPR value of the researcher has the greatest impact on the fellowship level.
Figure 9 presents the box plots to evaluate and compare the distributions of PageRank metrics among the fellowship levels. Level 1A has the greatest variability and the largest median according to the three PageRank metrics.
4.8 Local clustering coefficient
Table 15 presents the rankings of the top 10 researchers according to the unweighted local clustering coefficient  ULC; Wweighted local clustering coefficient  WLC and; Zweighted local clustering coefficient  ZLC. The composition of this table is formed basically by level 2 researchers. And their positions according to the three metrics are almost the same.
The clustering coefficient of a given researcher indicates how many collaborators are collaborating with each other. However, these metrics have no impact on the fellowship level of the researcher, as indicate correlations in Table 16. High clustering coefficient may imply that the researcher does not have a very diverse group of collaborators. UFPE, UFRJ and UTFPR have the higher number of PQs among the 27 greater values of ULC, while UNISNOS and UTFPR have PQs with high WLC and ZLC values.
The box plots in Figure 10, were used to assess the distributions of the local clustering coefficients of the researchers for different fellowship levels. The highest median and variability are obtained by level 1D according to all local clustering coefficient metrics.
4.9 Eccentricity
Table 17 displays the 10 researchers with lowest eccentricities values, which corespond to the most central ones. These values were obtained from three distinct forms: unweighted eccentricity  UE; Wweighted eccentricity  WE; Zweighted eccentricity  ZE. Researchers of different levels are listed in this table. These are the researchers with the smallest maximum distances from them to any other in the giant component of the network. Many of the researchers in the top10 positions of UE do not figure in the top10 according to WE or ZE. In fact, as the weights of the edges and the weights of the edges combined with the nodes’ attributes shorten the paths, other researchers were prioritized. Among the institutions are frequent UNESPBAU, UFSCar and USP.
The relationship of a researcher with other researchers is better the smaller the eccentricity is. However, none of the metrics of eccentricity significantly impacts the fellowship level, as shown in the Table 18.
The box plots, in Figure 11, were used to assess the eccentricity distribution of the researchers for different fellowship levels. In the first two graphics, level 1A features the largest variations and levels 2 and 1D the highest medians. In the last graph, the biggest variation is obtained by level 1C. Moreover, levels 2, 1D and 1C exhibit approximately the same median.
4.10 Utility
The top 10 reasearchers acccording to their utility or benefit of belonging to network structure is shown in Table 19. The utility was obtained in three different ways: unweighted utility  UU; Wweighted utility  WU; and Zweighted utility  ZU. The researchers from lower levels 2 and 1D are the ones that have the greatest benefits in UU, and the level 1A researchers occupy 50% the 10 top positions according to WU and ZU. All researchers in the WU list also appear in the ZU list. Once more PQ124 has a proeminent posisiton in the rankings according to all methods.
The benefit of a researcher belonging to the network impacts significantly and in a moderate way his fellowship level, as shown in Table 20. The highest correlation is obtained with ZU followed by UU.
Figure 12 presents the box plots of the utility metrics, through these graphs, we can see the central position and the dispersion for different fellowship levels. Level 1A presents the largest variability and the highest median.
4.11 Logistic regression
Insofar as our Kendall correlation analysis shows only the existence of association between SNA metrics and the level of productivity, but not the effect of independent variables on dependent variables, For that, we use a multivariate logisitc regression analysis, where SNA metrics are independent variables and the level of the research productivity grant is the dependent value. Since we do not have many researchers in each level 1 fellowship, we grouped all level 1 fellowships in a single group (1) and level 2 felloships received value 0. In this analysis, we consider only the researchers that belong to the main component of the network, since the values of some metrics for nodes in different components may not be comparable.
The first model, a logistic regression analysis was performed to determine the effects of unweighted metrics (U) on the fellowship level of researchers. The metrics considered in this regression were: EI index^{l} , EI index^{r} , UDC, UCC, UBC, UEC, ULC, UE and UU. The UPR was excluded to avoid the effect of multicollinearity. The logistic regression model was statistically significant, χ^{2} = 37.859, p<0.0005. The model explained 45.30% (Nagelkerke R2) of the variation of the fellowship level and correctly classified 77.17% of the cases. Of the nine predictors, only two are statistically significant, the result is shown in the Table 21. Thus, we conclude that the unweighted metrics (EI index^{l} and Utility) positively influence the researchers’ fellowships level.
The second model, we used the logistic regression was performed to determine the effects of weighted metrics (W) on the fellowships level of researchers. The metrics considered in this regression were: EI index^{l} _W, EI index^{r} , WDC, WCC, WBC, WEC, WLC, WE, WU and WLS. The WPR was excluded to avoid the effect of multicollinearity. The logistic regression model was statistically significant, χ^{2} = 32.543, p<0.0005. The model explained 39,78% (Nagelkerke R2) of the variation of the level of productivity and correctly classified 75.00% of the cases. Of the ten predictor variables four are statistically significant, the result is shown in the Table 22 below. According to the result, we conclude that the metrics weighted (EI index^{l} , Betweenness centrality, Utility and Eccentricity) positively influence the researchers’ fellowships level.
In the third model, the effects of the weighted metrics with the insertion of the nodes’ attributes (Z) on the fellowship level of researchers were also analyzed by a logistic regression analysis. The metrics considered in this regression were: EI index^{l} _Z, EI index^{r} _Z, ZCC, ZBC, ZEC, ZLC, ZE, ZU and ZLS. The ZDC and ZPR were excluded to avoid the effect of multicollinearity. The logistic regression model was statistically significant, χ^{2} = 35.182, p<0.0005. The model explained 42.43% (Nagelkerke R2) of the variation of the level of productivity and correctly classified 77.2% of the cases. Of the nine predictor variables five are statistically significant, the result is summarized in the Table 23 below. We find that the weighted metrics with the insertion of the nodes’ attributes (EI index^{l} _Z, Betweenness centrality, Eigenvector centrality, Utility and Average of the strong links) act positively in the researchers’ fellowships level.
Comparing the three models, we see that there is evidence that SNA metrics involving the weight of edges and the authors’ attributes contextualize with more information (resources) ways a researcher can achieve better productivity. Moreover, the EI index^{l} and Utility have shown to be in all models the metric which most influenced the fellowships level, they are present in the three models. Thus, researchers which are to collaborate with other researchers who devote most of their attention to their mutual project are more likely to hold a level 1 fellowship. As well as seek partnerships with researchers from different fellowship levels. The betweenness centrality also has a positive participation in the definition of scholarship levels, but only in the second and third models, so researchers who assume the role of “intermediary”, in paths whose connections are more frequent and have the most important nodes, tend to have level 1 fellowship.
(^{Souza et al. 2016}36 SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Coauthorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 2947.) also developed a logistic regression model to verify the influences of the unweighted metrics (Degree Centrality, Betweenness Centrality, Closeness Centrality, Eigenvector Centrality, Eccentricity, Cluster Coefficient and Utility) in the fellowships level of researchers in the Probability and Statistic area. And as a result, the degree centrality had a positive effect and the average distance (which was defined as closeness centrality in Souza et al. (2016)) had a negative effect on the fellowship level.
5 CONCLUSIONS
A coauthorship network with 145 reasearchers with CNPq grant in research productivity in the area of Industrial Engineering in Brazil was built and a total of 32 SNA metrics were calculated. Such metrics were divided among unweighted, weighted by edges, and weighted by edges and nodes. The metrics analyzed were: the EI index, the Degree centrality, the Average link strength, the Closeness centrality, the Betweenness centrality, the Eigenvector centrality, the PageRank, the Eccentricity, the Local clustering coefficient and the Utility.
The unweighted metrics that showed the greatest association with the fellowship level were: the EI index^{l} 0.406, the utility 0.251, the degree centrality 0.244, the betweenness centrality 0,241 and the PageRank 0.240. The metrics weighted by edges which presented the highest association with the fellowship level were: the betweenness centrality 0.307, the EI index^{l} 0.343 and the utility 0.228. The metrics weighted by edges and nodes which presented highest association with the fellowship level were: the betweenness centrality 0.367, the EI index^{l} 0.302, the utility 0.257 and the closeness centrality 0.229.
Thus, the major conclusions of this paper are:

As compared to the coauthorship network of PQs in the Probability and Statistic area, although the Industrial Engineering community is larger, the collaboration among the PQs is not as strong;

The EI index^{l} analysis shows that the geographical distances is still a main barrier to collaboration among PQs in the Industrial Engineering community;

researchers who assume a role of mediator (greater betweenness centralities) controlling the flow of information, tend to have higher fellowship levels, especially those among the nodes whose paths are formed by connections more frequent or feature more important researchers;

researchers of higher ranking, by the unweighted PageRank, also have higher fellowship levels. If the PageRank is weighted by edges and nodes, the impact on the fellowshiplevel is lower;

researchers who present the highest unweighted degree centralities, namely, greaternumbers of coauthors, tend to have higher fellowship levels. If the degree centrality is weighted this trend decreases;

researchers who present greater possibilities for establishing publications partnerships are those with greater closeness centralities and with higher fellowship levels, especially if the partners are researchers with the highest hindex;

researchers with greater benefits of belonging the network (greater Utility) also have the highest fellowship levels, especially those that have high hindex or collaborate with researchers with the highest hindex;

researchers with more heterogeneous coauthoring relationships tend to have higher fellowship levels. If the relations take into account the weights of the edges, the correlation with the fellowship level has a small decrease. Whereas with the weights of edges and nodes’ attributes, the correlation goes back up, but not as high as the unweighted one;

UNISINOS, UFF, USP, UFScar, UFPE and UTFPR are the institutions that hold the higher number of PQs among the top 10 according to some SNA metrics;

PQ124, from UFScar, was the one more important in the network according to a greater number of metrics.

Finally, through a logistic regression analysis, the unique metrics that, according to all three methods, influence the fellowship level being of level 1 as opposed to level 2 are the EI index and the Utility. This implies that level 2 PQs desiring to obtain a level 1 fellowship should both collaborate with level 1 PQs and also concentrate collaboration with other PQs that devote much of their collaboration effort in their relationship.
It is important to note that fellowships are granted for a period of time ranging from 3 years (level 2) up to 5 years (level 1A). Thus, researchers only compete with those that are in the same time cycle, what can cause some discrepancies between different cycles. However, on August 2013, all fellowship levels were reclassified, (^{Figueiredo 2013}13 FIGUEIREDO RW DE. 2013. CNPq reclassifica pesquisadores e 39 docentes da UFC são promovidos. Available in: <Available in: http://www.ufc.br/noticias/noticiasde2013/4013cnpqreclassificapesquisadorese39docentesdaufcsaopromovidos
>. Access on: 30 May 2017.
http://www.ufc.br/noticias/noticiasde2...
), to reduce those discrepancies. Since our data was collected on March 2014, this problem of the time cycle was mitigated.
It is worth mentioning that the network was formed only among researchers with CNPq grant in research productivity in the area of Industrial Engineering in Brazil. Thus, in this network, it was not considered the coauthorship relations between them and other nonfellows authors or of other PQs in different areas of knowledge. For future work, one can develop a network that involves beyond the relations among the fellows, their other relationships. And compare the results of the correlations of the network metrics with those studied in this work. Other types of weights of the authors, besides the hindex, may also be studied to evaluate how that would change the results shown here.
REFERENCES

^{1}ABBASI A & ALTMANN J. 2011. “On the correlation between research performance and social network analysis measures applied to research collaboration networks”. In: Hawaii International Conference on System Sciences, Proceedings of the 41st Annual. Waikoloa, HI: IEEE.

^{2}ABBASI A, ALTMANN J & HOSSAIN L. 2011. Identifying the effects of coauthorship networks on the performance of scholars: A correlation and regression analysis of performance measures and social network analysis measures. Journal of Informetrics, 5: 94607.

^{3}ABBASI A, HOSSAINA L & LEYDESDORFF L. 2012. Betweenness centrality as a driver of preferential attachment in the evolution of research collaboration networks. Journal of Informetrics, 6: 403412.

^{4}ANASTASIOS T, SGOIROPOULOU C, PAPAGEORGIOU E, TERRAZ O & MIAOULIS, G. 2012. Coauthorship networks in academic research communities: the role of network strength. 16th Panhellenic Conference on Informatics.

^{5}ANDRADE RL. 2016. A Influência das Redes de Coautoria na Performance dos Bolsistas de Produtividade e nos Programas de PósGraduação em Engenharia de Produção. Recife. 2016. 144 p. Mestrado  Programa de Pósgraduação em Engenharia de Produção/UFPE.

^{6}ANDRADE RL & RÊGO LC. 2015a. Conhecendo a rede de coautoria dos bolsistas de produtividade em pesquisa da área de engenharia de produção e a sua influência no nível de produtividade. In: XLVII Simpósio Brasileiro de Pesquisa Operacional  SBPO.

^{7}ANDRADE RL & RÊGO LC. 2015b. A influência da rede de coautoria no nível das bolsas de produtividade da área de engenharia de produção. In: XXXV Congresso da Sociedade Brasileira de Computação  CSBC.

^{8}BARNETT AH, AULT RW & KASERMAN DL. 1988. The Rising Incidence of Coauthorship in Economics: Further Evidence. Review of Economics and Statistics, 70: 539543.

^{9}BONACICH P. 1987. Power and centrality: a family of measures. The American Journal of Sociology, 92: 11701182.

^{10}CN. 2015. Critérios de Julgamento. Available in: <Available in: http://www.cn.br/web/guest/criteriosdejulgamento >. Access em: 28 de janeiro de 2015.
» http://www.cn.br/web/guest/criteriosdejulgamento 
^{11}EATON JP, WARD JC & KUMAR A. 1999. Structural Analysis of CoAuthor Relationships and Author Productivity in Selected Outlets for Consumer Behavior Research. Journal of Consumer Psychology, 8: 3959.

^{12}EGGHE L. 2006. Theory and practise of the gindex. Scientometrics, 69: 131152.

^{13}FIGUEIREDO RW DE. 2013. CNPq reclassifica pesquisadores e 39 docentes da UFC são promovidos. Available in: <Available in: http://www.ufc.br/noticias/noticiasde2013/4013cnpqreclassificapesquisadorese39docentesdaufcsaopromovidos >. Access on: 30 May 2017.
» http://www.ufc.br/noticias/noticiasde2013/4013cnpqreclassificapesquisadorese39docentesdaufcsaopromovidos 
^{14}FREEMAN LC. 1979. Centrality in Social Networks Conceptual Clarification. Social Networks, 1: 215239.

^{15}HAIR JF, BLACK WC, BABIN BJ, ANDERSON RE & TATHAM RL. 2009. Análise multivariada de dados. Bookman Editora.

^{16}HANNEMAN RA & RIDDLE M. 2005. Introduction to social network methods. Riverside: University of Califórnia.

^{17}HIRSCH JE. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102: 1656916572.

^{18}HUANG PY, LIU HY, CHEN CH & CHENG PJ. 2013. The impact of social diversity and dynamic influence propagation for identifying influencers in social networks, in: Web Intelligence WI and Intelligent Agent Technologies IAT, 2013 IEEE/WIC/ACM International Joint Conferences on, vol. 1, p. 410416.

^{19}HUDSON J. 1996. Trends in MultiAuthored Papers in Economics. Journal of Economic Perspectives, 10: 153158.

^{20}JACKSON MO. 2008. Social and Economic Networks. Princeton University Press. Stanford University, February 2008.

^{21}JACKSON MO & WOLINSKY A. 1996. A Strategic Model of Social and Economic Networks. Journal of economic theory, 71: 4474.

^{22}KEMPE D, KLEINBERG J & TARDOS E. 2005. Influential nodes in a diffusion model for social networks. In: Automata, Languages and Programming, Springer, vol. 3580, p. 11271138.

^{23}KRACKHARDT D & STERN R. 1988. Informal networks and organizational crises: An experimental simulation. Social Psychology Quarterly, 51: 123140.

^{24}KRACKHARDT D. 1992. The strength of strong ties: The importance of philos in organizations. In Networks and Organizations: Structure, Form, and Action, p. 216239.

^{25}LEE S & BOZEMAN B. 2005. The impact of research collaboration on scientific productivity. Social Studies of Science, 35: 673702.

^{26}LIU J, LI Y, RUAN Z, FU G, CHEN X, SADIQ R & DENG Y. 2014. A new method to construc to coauthor networks. Physica A, 419: 2939.

^{27}LIU X, BOLLEN J, NELSON ML & SOMPEL H VAN DE. 2005. Coauthorship networks in the digital library research community. Information Processing and Management, 41: 14611480.

^{28}MENACHALCO JP & CESAR JR RM. 2009. ScriptLattes: An opensource knowledge extraction system from the Latts platform. Journal of the Braszilian Computer Society, 15: 3139.

^{29}NEWMAN MEJ. 2001a. The structure of scientific collaboration networks. Proceeding of the National Academy of Sciences, 98: 404409.

^{30}NEWMAN MEJ. 2001b. Scientific collaboration networks. I. Network constructions and fundamental results. Physical Review E, vol. 64.

^{31}NEWMAN MEJ. 2001c. Who is the best connected scientist? A study of scientific coauthorship networks. Complex Networks, 650: 337370.

^{32}NEWMAN MEJ. 2004. Coauthorship networks and patterns of scientific collaboration. Proceeding of the National Academy of Sciences, 101: 52005204.

^{33}ONEL S, ZEID A & KAMARTHI S. 2011. The structure and analysis of nanotechnology coauthor and citation networks. Scientometrics, 89: 119138.

^{34}ONNELA JP, SARAMÄKI J, KERTÉSZ J & KASKI K. 2015. Intensity and coherence of motifs in weighted complex networks. Physical Review E, 716: 4.

^{35}SANTOS AM DOS. 2014. Aplicações de modelos de grafos na análise de conceitos e de redes sociais. Recife. 2014. 162 p. Doutorado  Programa de Pósgraduação em Estatística/UFPE.

^{36}SOUZA FC DE, AMORIM RM & RÊGO LC. 2016. A Coauthorship network analysis of CNPq’sproductivity research fellows in the probability and statistic area. Perspectivas em Ciência da Informação, 21(4), 2947.

^{37}WANDERLEY AJ, DUARTE AN, BRITO AV DE, PRESTES MAS & FRAGOSO FC. 2014. Identificando correlações entre métricas de Análise de Redes Sociais e o hindex de pesquisadores de Ciência da Computação. In. XXXIV Congresso da Sociedade Brasileira de Computação  CSBC, 2014.

^{38}WAINER J & VIEIRA P. 2013. Correlation between bibliometrics and peer evaluation for all disciplines: the avaluation of Brazilian scientists. Scientometrics online, 96: 395410.

^{39}WASSERMAN S & FAUST K. 1994. Social Networks Analysis: Methods and Applications. Cambridge University Press. Structural analysis in social the social sciences series, vol. 8.

^{40}YAN E & DING Y. 2009. Applying centrality measures to impact analysis: A coauthorship network analysis. Journal of the American Society for Information Science and Technology, 60: 21072118.
Publication Dates

Publication in this collection
MayAug 2017
History

Received
16 Aug 2016 
Accepted
22 June 2017