Using network metrics to investigate football team players ’ connections : A pilot study

The aim of this pilot study was propose a set of network methods to measure the specific properties of football teams. These metrics were organized on “meso” and “micro” analysis levels. Five official matches of the same team on the First Portuguese Football League were analyzed. An overall of 577 offensive plays were analyzed from the five matches. From the adjacency matrices developed per each offensive play it were computed the scaled connectivity, the clustering coefficient and the centroid significance and centroid conformity. Results showed that the highest values of scaled connectivity were found in lateral defenders and central and midfielder players and the lowest values were found in the striker and goalkeeper. The highest values of clustering coefficient were generally found in midfielders and forwards. In addition, the centroid results showed that lateral and central defenders tend to be the centroid players in the attacking process. In sum, this study showed that network metrics can be a powerful tool to help coaches to understanding the specific team’s properties, thus supporting decision-making and improving sports training based on match analysis.


Introduction
The opposition and coordination between two teams is the essence of invasion sports wherein each team tries to recover, maintain, and move the ball toward the score zone to score the goal (Gréhaigne & Godbout, 1995).Thus, Metzler (1987) describes the essence of a football team as a possibility to solve, in action, an unpredictable set of problems with the highest efficacy possible.This problem occurs simultaneously in both offensive and defensive phases depending on which team possesses the ball.Therefore, an invasion team sport constitutes a complex and dynamic system that remains all match, adapting to the contextual constraints (Clemente, Couceiro, Martins, & Mendes, 2013;Gréhaigne, Bouthier, & David, 1997;McGarry, 2005).
To overcome the opposition, a strong collective organization should be undertaken to improve the possibilities of individual success.At the team organizational level, the numerous interrelations between players within the team make up what one might call a competency network (Gréhaigne, 1992).The competency network is based on each player's recognized strengths and weaknesses with reference to the practice of the sport and on the group's dynamism (Gréhaigne, Richard, & Griffin, 2005).Therefore, the team's functional performance is assured by a complex network of interpersonal relationships among players (Passos, et al., 2011) in which the competency network is more of a dynamic concept than a static one (Gréhaigne, Godbout, & Bouthir, 1999).Any network analysis needs to consider the regular and variable interactions between players.For the study of the competency network, some works have been undertaken to improve the knowledge of the team's collective behavior (Grunz, Memmert, & Perl, 2009;Memmert & Perl, 2009).
Some works have being suggesting the use of graph theory (a network method) in sports (Bourbousson, Poizat, Saury, & Seve, 2010;Duarte, Araújo, Correia, & Davids, 2012;Passos, et al., 2011).Bourbousson, Poizat, Saury, and Seve (2010) used graph theory to analyze the connectivity between basketball players in each unit of attack, crossing this quantitative analysis with a qualitative one to explain the social interactions.Their main finding was the rise of a specific network regarding each team.These results suggest that a network's coordination was built on local interactions that do not necessarily require all players to achieve the team's goal.In the case of water polo, it was shown that the most successful collective system behavior requires a high probability of each player interacting with other players in a team (Passos et al., 2011).More specifically, in the case of a football game, researchers proposed to analyze the attacking plays that result in shots and identify the main players that contribute to the process of building the attack (Duch, Waitzman, & Amaral, 2010).Using a centrality approach, they found the player with the most influence on each analyzed team.Such an approach was compared with an observational analysis of experts and showed strong correspondence.Recently, Malta and Travassos (2014) characterized the attacking transition using a network approach, thus revealing that the team opted for a style of play based on circulation and direct play.
Despite those studies that used a network approach to identify team properties, the use of network metrics is too limited.
Actually, the network (graph) as a single analysis cannot provide a powerful quantitative analysis.Using the network analysis alone does not allow one to identify the centroid player, the level of heterogeneity of the team, or clusters inside the team.In that sense, many metrics should be included in sports analysis for a further understanding of a team's behavior.
Therefore, this pilot study aimed to introduce a set of network metrics from the social sciences literature that can help in obtaining robust quantitative information about a team's process, mainly trying to characterize how the network approach can contribute to better understanding the teammates' interactions throughout the match.To identify the team's properties, the teammates' interactions were classified into two main levels of analysis: i) 'meso' analysis, exploring the clusters that emerged from the team's organization (Clustering Coefficient) and the connectivity level between players (Scaled Connectivity); and ii) 'micro' analysis, identifying the centroid players and how these centroids may help teammates connect to each other (Centroid Player).

Sample
Five official matches of the same team on the First Portuguese Football League were analyzed.The team won four matches and achieved a draw in one match.Over all the matches, 21 players were analyzed.Each player was encoded to identify individual characteristics, maintaining the same code for all matches.
Despite the different playing times per player, this study aimed at keeping the real characteristics of an official football game, thus respecting the substitutions and the different options for each match.In order to overcome this ecological constraint, a network for each half of a match and for each overall match was performed, resulting in 15 different networks.This solution was considered so as to provide a useful and easy reference in a practical point of view.Actually, this option allows one to consider that one player may not play with another due to substitutions.Nevertheless, this is a natural constraint of real and ecological data collecting.The same strategic distribution (1-4-2-3-1) was the observed for all matches.This strategic distribution was classified based on the routines and actions performed by individual players during the match (see Table 1).The players were classified based on their tactical region and movements.

Data collection
An adjacency matrix was computed for each match.The adjacency matrix was used to build a finite n×n network where the entries represent the individual participation in the offensive play (i.e., the network is developed considering the number of consecutive passes until the ball is lost).The offensive play considers all the passes from the same offensive sequence without losing the ball possession.This option was based on Bourbousson et al. (2010) and Passos et al. (2011) that defined each 'unit of attack' (for the football case the offensive play) starting at the moment a team gained the ball possession until the ball was recovered by the opposing team.An overall of 577 offensive plays were analyzed from the 5 matches.

Developing the adjacency matrix
A MatLab script denoted as wgPlot was developed by Michael Wu (Wu, 2009) which allowed to plot graphs similarly to gPlot, a MatLab function that allows to plot n nodes connected by links representing a given adjacency matrix defined by: (1) It is noteworthy that in the football situation, in which each adjacency matrix represent a successful pass, the diagonal elements (i.e., when i=j ) are set equal to 1 to identify player i as one of the players that participated in the offensive play.As an example, consider the herein presented sequence of passes in which the first player corresponds to the first vertex and so on.The team under study has 11 players, i.e., =11, but the five last players did not contribute to this offensive play.The adjacency matrix of this offensive play (Table 2) would be represented by: The script wgPlot from Michael Wu (2009) allows the user to input an adjacency matrix with weighted edges and/or weighted vertices being denoted as edge-weighted edge-adjacency matrix A w , introduced by Estrada (1995).
The weighted matrix A w can be easily defined by the sum of all adjacency graphs each one generated by a single offensive play.To allow a graphical representation of the players cooperation, the script presented by Michael Wu (2009), denoted as wgPlot, was further extended based with the following features: a) the vertex (i.e., player) size i, i=j, is proportional to the number of offensive plays player i participates in; b) the vertex (i.e., cooperation between players) thickness w ij and colormap of the network is proportional to the number of offensive plays in which players i and , j, i ≠ j participates in together; c) the script receives as input a binary database (e.g., excel file) in which each line corresponds to an offensive play and each column to a player, i.e., each line corresponds to an adjacency matrix A; and d) besides returning the network from A w , it also returns the clusters, i.e., sub communities, of the team based on Hespana's work (2004) and extensively used in Lim, Bohacek, Hespanha and Obraczka (2005).This last point will be further explained in next section.

Seeking for clusters within a team
In order to detect groups among players, graph theory has specific methodologies to constitute partitions.Uniform graph partition consists on dividing a graph into components, such that the components are of about the same size and there are few connections between the components.One of the functionalities of the graph partition is to generate communities (Couceiro, Clemente, & Martins, 2013).Communities, also called clusters or modules, are groups of vertices which probably share common properties and/or play similar roles within the graph (Fortunato, 2010).
The uniform graph partition has gained importance due to its application for clustering and detection of groups in social, pathological or biological networks (Fiduccia & Mattheyses, 1982).Commonly, the graph partition is defined by G = (V,E) where V is the vertex and E is the edge, such that is possible to partition G into smaller components with specific properties.A k-partition of V is a collection P = {V 1 ,V 2 ,...,V k } of k disjoint subsets of V, whose union equals V (Hespanha, 2004).

Example of adjacency matrix
The MatLab function grPartition described in the technical report of Hespana ( 2004) is able to perform a fast partition of large graphs.This function implements a graph partitioning algorithm based on spectral factorization.The herein proposed MatLab script then merges the wgPlot and grPartition functions, with a few adaptations as previously presented, to understand players' cooperation patterns within a given team, such as the numbers of presences in an offensive play, how many players they pass them with and the existence of sub communities among them.
Therefore, running the script with the previously described example (see Developing Adjacency Matrix) would then return the following players network, thus identifying the players' cooperation.

Using networks metrics for understanding football
Many kinds of networks (e.g., biological, sociological) share some topological properties.To identify and describe such properties, most potentially useful network concepts are known from graph theory (Couceiro et al., 2013).In the context of football, one can divide network concepts into: a) intra-players network concepts (i.e., network properties of a node); b) inter -players network concepts (i.e., network relationship between two or more vertices); and c) group network concepts (i.e., whole network concepts).
To allow the use of the network concepts, one can create a new relative weighted adjacency matrix , defined as: (2) where 0 ≤ rij ≤ 1 for i ≠ j, with i, j = 1,...,n.The denominator max i ≠ j A w corresponds to the larger inter-player connectivity (i.e., the players that participated most together in the same offensive plays).
It is noteworthy that the diagonals of A r represent the number of offensive plays in which a given player participated.However, this value is not considered in computing the network concepts herein presented.
Based on the weighted matrix, it was possible to compute a set of metrics based on two level of analysis (meso and micro).Each metric is a statistical method exclusively dedicated to network analysis.Therefore, more than just a visual representation, such values represent the individual contribution of each player in a given field of analysis.The different results from player to player can increase the understanding of the individual's contribution to the team's network.

Network contents for the "meso" analysis of a football team
For the football case, the offensive process can be developed in many ways.Therefore, it is important to understand how the team breaks their homogeneity level.Moreover, it is also important to understand the connectivity levels between teammates.Bearing these ideas in mind, two metrics will be suggested for the football analysis: a) scaled connectivity; and b) clustering coefficient.

Scaled connectivity
The first concept and one of the widely used in the literature for distinguishing a vertex of a network (Horvath, 2011) is the connectivity (also known as degree).
In the situation herein presented, i.e., players' networks, the connectivity k i equal the sum of connection weights between player i and the other players.The most cooperative player, or players, can be found by finding the index/indices of the maximum connectivity.
(3) Therefore, one can define a relative connectivity, known as scaled connectivity, of player i as: (4) such that is the vector of the relative connectivity of players.
In football context, one could interpret the scaled connectivity as a measure of cooperation level of a given player in which high values of S i (i.e., as S i tends to ) indicate that the i th player participate with most of the other players from the group.

Clustering coefficient
The clustering coefficient of player i offers a measure of the degree of interconnectivity in the neighborhood of player i, being defined as: (5) such that is the vector of the clustering coefficient of players and i, j = 1, ... , n.
The higher the clustering coefficient of a player, the higher is the cooperation among its teammates.If the clustering coefficient tends to zero than the teammates do not cooperate much each other.

Network contents for the "micro" analysis of a football team
To further understand teams' performance, one should be able to characterize the individual contribution of each player.Moreover, it is quite important to identify the players that As a consequence, two players have a high topological dependency, i.e., td ij = 1, if they participate in offensive plays with the same player and with one another.In other words, the more players are "shared" between two players that highly participate in offensive plays with one another, the stronger are their cooperation and more likely they will both represent a small cluster.
However, T d since corresponds to a square matrix with the size equal the number of players and since that contrarily to the adjacency matrix or topological overlap (Horvath, 2011), T d is not symmetric, i.e., td ij ≠ td ji , thus making it difficult to compare td ij and td ji pairs.To complement the previous concept, a new 'micro' metric denoted as topological inter-dependency is introduced as: (8) wherein is the transpose of matrix and corresponds to an antisymmetric square matrix, i.e., ti ji = ti ji .In players' networks, one can easily observe dependencies between players such that if ti ji > 0 then the i th player depends on the ji th player to play with his teammates.

'Meso' analysis
The connectivity level between players is one of the most important concepts for identifying a team's properties.Therefore, the scaled connectivity was performed for all matches (see Table 3).

Centroid significance and centroid conformity
The network centroid can define the centrally located node (Horvath, 2011).For the football case, the centroid can be defined as one of the most highly connected node(s) in the network.The first one arises from the centroid player(s) in which one can express his connectivity strength to all other teammates as: (6) This inter-player concept is denoted as centroid conformity and corresponds to the adjacency between the centroid player and the i th player, such that is the vector of the centroid conformity of player.In other words, CC i,centroid presents the cooperation level of the i th player with the top-ranked player.

Topological overlap measure and the topological inter-dependency
The second 'micro' analysis concept is based on the topological overlap presented in several works such as Ravasz, Somera, Mongru, Oltvai, and Barabasi (2002) and Horvath (2011) which represents the pair of players that cooperates with the same players.This measure may also represent the overlap between two players even if they do not participate in the same offensive plays with one another.In other words, the topological overlap between the i th player and the j th player depends on the number of offensive plays with the same "shared" players but it does not take into account the number of offensive plays between them.Moreover, the topological overlap is represented by a symmetric matrix, thus presenting the overlap between players but neglecting the most independent player of the pair.Therefore, by using the concepts inherent to the clustering coefficient (equation 5), one should consider not only the "shared" offensive plays but also the influence of the conjoint offensive plays among players i and j.
In other words, if two players participate in offensive plays with the same other players, then the cooperation between both of them allows building triangular relations between the other players.However, the i th player may be more dependable from the j th player if he only participates in offensive plays with the same player than player j th which, in turn, is able to participate in offensive plays with other players.As a result, similarly to Ravasz et al. (2002) and Horvath (2011), one can define a topological dependency as: (7) with i,j,l = 1,2, ... ,n.
In an overall analysis, it is possible to identify that the scaled connectivity values range between .564 and .680,therefore tends to generalize the cooperation in attacking process.Nevertheless, on an individual scale, it is also possible to describe the results per each player during the matches.The players with the higher scaled connectivity values were the player 12 (right defender) in the 1 st match, player 5 (left defender) in the 2 nd match, player 19 (left defender) in the 3 rd match, player 7 (midfielder) in the 4 th match and player 3 (central defender) in the 5 th match.Furthermore, in the matches overall, the higher mean values belong to the defenders and midfielders.On the other hand, the players with a lesser mean value of scaled connectivity are the striker (player 20), central defender (player 18) and the goalkeeper (player 1).Therefore, in a regular way, the defensive and midfield players are the ones that connect most with the other players overall.
The clustering coefficient for each player (see Table 4) was worked out to analyze if one player can involve all teammates in the offensive play (i.e., enabling a global cooperation).
In an overall analysis, it is possible to identify that the clustering coefficient values range between .443 and .538,therefore revealing the emergence of clusters within the team.Once again, on an individual scale, it is also possible to describe the results for each player during the matches.The higher clustering coefficient values were .6053(player 16 -forward) and .5967(player 7midfielder) in the 1 st match; .6403(player 13 -left midfielder) and .6006(player 10 -forward) in the 2 nd match; .5244(player 6 -midfielder) and .4559(player 1 -goalkeeper) in the 3 rd match; .5546(player 16 -forward) and .5395(player 18 -central defender) in the 4 th match; and .6129(player 1 -goalkeeper) and 0.6065 (player 13 -left midfielder) in the 5 th match.

'Micro' analysis
The network centroid can be defined by its central location in a network (Horvath, 2011).The centroid was defined as one of the most highly connected nodes in the network.The centroid values can be seen in the Table 5.
The centroid players in the 1 st match were player 12 (right defender) and player 3 (central defender).Players 5 (left defender) and 9 (right defender) were the centroids in the 2 nd match.In the 3 rd match the centroid players were player 19 (left defender) and player 4 (central defender).The centroid players in the 4 th match were player 7 (midfielder) and player 12 (right defender).Lastly, the players 3 (central defender) and 12 (right defender) were the centroids in the 5 th match.
Despite the importance of the centroid player(s), it is also important to understand the dependency between players.Therefore, a topological inter-dependency metric was performed.Considering the high volume of results, only the most important will be presented (Table 6).
In the 1 st match, the players with the least dependency were the goalkeeper (1), only depending on three players, and the midfielder ( 7) and striker (20), each depending on four players.The players with the most dependency were the right defender (12), depending on all players, and the central defender (4).The topological interdependency performed for the 2 nd match showed that the least dependent players were the goalkeeper (1), just depending on two players, and the midfielder ( 14), depending on three.The most dependent players were the left defender (5), depending on all players, and the striker (11), depending on all but one.
The results from the 3 rd match showed that the forward (10) and goalkeeper (1) were the least dependent players.On the other hand, the left defender ( 19) was the player with the highest dependency on the other players.
In the 4 th match the most dependent players were the midfielder ( 7) and the left midfielder (13).The less dependent players were the central defender ( 18) and the left midfielder (8).
Lastly, in the 5 th match the less dependent players were the midfielder ( 14) and the left midfielder (13).On the other hand, the most dependent players were the central defender (3) and the striker (11).

Discussion
This paper aimed to analyze the network properties of a football team by applying some metrics.Those metrics were proposed for two main levels of analysis: i) 'meso' level; and ii) 'micro' level.At the meso level of analysis, two methods were used: i) the scaled connectivity and ii) the clustering coefficient.Both metrics are between the 'macro' and 'micro' level analysis, thus providing information about global position in relation to the team.
The scaled connectivity was performed to analyze how each player interacts and connects with his teammates.The results suggested that the defenders (lateral and central) and midfielders are the players with the most connectivity with their teammates.Thus, it is possible to discuss that the offensive building style of the team is based on support play, mainly in the first half.Therefore, it is normal and understandable that the defenders and midfielder players connect more with each other.In a regular way, the less-pressed area is the team's half field (Fonseca, Milho, Travassos, & Araújo, 2012).Thus, it is easier to interact with higher frequency with the players that belong to this half field (i.e., the defenders and the midfielders).Moreover, the midfielder players and the lateral defenders, who many times support the offensive actions by the lateral allowing the lateral midfielders to occupy more central areas, act as links between the defense and the attack (Bloomfield, Polman, & O'Donoghue, 2007;Reilly & Thomas, 1976).Nevertheless, the connectivity is not the homogeneous type previously analyzed that suggests a tendency of clusters emerging inside the team.Therefore, a clustering coefficient analysis was performed in order to identify the players that contribute most to the generation of clusters.
The players with the highest clustering coefficients were the goalkeeper, wing midfielders, and the forwards.As previously described, greatest values of clustering coefficient means the greatest cooperation among teammates around a specific player.Thus, the results suggest that those players (goalkeeper, wing midfielders and forwards) participate in more attacking plays that involve a large number of teammates.Another interesting result showed that the majority of the players with higher clustering coefficient values had low connectivity values.Therefore, the results suggest that despite the reduction in global connectivity, these players participate in offensive plays with teammates who also have a higher level of interaction with each other.This is also important because it can generate a higher participation of all players in the offensive play, increasing the possibilities for new solutions and reducing the clusters inside the team.In the specific case of goalkeeper, it is possible to discuss that the attacking plays that involves such player had more possibility to include more players mainly in the defensive region where the opponent's defensive pressing is not too great.Such process may increases the participation of more teammates, thus justifying the great clustering coefficient of goalkeeper.
The team's analysis is not complete until individual participation and interaction have been explored at the 'micro' level.Therefore, the players that contribute the most to the offensive plays and cooperation were analyzed and also the dependency between teammates was explored.The results suggested that the lateral defenders (mainly the right defender), central defenders, and the midfielders are the centroid players.These results confirm the higher connectivity of these positions and their preponderance to build the offensive plays.Moreover, the results also confirm the lower values for the lateral midfielders, forwards, and strikers.Thus, the defenders and central midfielders are the players that generate offensive plays with more frequency.This can be explained by two main factors.The first is related to the type of the defensive strategy.If the team opts to press closer to its own goal, the ball will be recovered by the defenders or midfielders in the defensive zone, thus increasing their participation in the offensive plays.The second explanation is related to the team's own offensive strategy.If the team opts to build the offensive play around their defenders in order to 'attract' the opponents out of their defensive zone, it would be expected that the higher centralization of play would be with the defenders and midfielders.Nevertheless, the centroid players are not the only important aspect to consider in order to understand how players connect with each other.The centroid provides useful information for understanding who the most prominent players are in building the offensive plays.This information can help the opponent team understand the main players that generate the attack.However, the dependency between players can also be important to improve the understanding about intra-team relationships.Thus, the topological interdependency metric was calculated.
The most independent players in a regular way during all matches analyzed were the midfielders.These results suggest that midfielders are the players that can connect with any other player on the team most easily.Observing the behavior of defenders, it is possible to understand that the offensive play needs to pass by the midfielder players in order to reach the offensive zone.Thus, it is normal that the defenders are not the most independent players.Moreover, the forward players alone cannot be the most independent players because the players that most usually recover balls during the matches are the defenders and midfielders.Therefore, forward players need someone (i.e., teammates) in order to recover the ball and generate the offensive plays.Thus the dependency can be a useful tool to understand how to 'block' the offensive plays of the opponent team.Moreover, when associated with other network concepts (e.g., centroid player) the relative topological dependency allows for the identification of possible dependencies between players and even hierarchical relations.As a result, the herein proposed script returns the centroid conformity as well as the topological overlap of a given player's network.
This pilot study had a set of limitations due by their own characteristics.One of the main limitations was the sample.In fact, five games could be too small to generalize the results as recommended.Moreover, these metrics were not applied until now in other studies that used the network approach.Thus, the specific results of this pilot study cannot be compared with other network studies on football.Therefore, it is quite important to increase the sample and the application of such metrics in further studies in order to establish some comparisons between studies and to determine the best methods to apply in a sports context.
Despite the study's limitations, a new solution for match analysis emerges from the network.In fact, using a simple observational method that records the interaction between teammates can originate a great opportunity to quickly identify the properties of the team style of play.Such analysis does not replace the traditional notational analysis.In fact, for their own properties, the network analysis is per se a new vision for football analysis thus increasing the range of observation.
Nevertheless, a great research investment must be performed to consolidate the network as a match analysis method.Thus, new studies using similar approaches should be undertaken in the future to compare similar results and to compare the efficacy of the results.Moreover, software dedicated to analyzing the match and recording the interactions between teammates would be a very useful solution for generalizing this method.In fact, until now, only general statistical solutions based on a network have been developed.Thus, specific software dedicated to network analysis in sports could be the next step forward to consolidate this method.Using such an analysis could be possible in the future to reduce the time expended in the observational and paper-and-pencil analysis, thus providing new solutions and possibilities for coaches and sports analysts.As future work, it would be interesting to identify how the team's networks are built based on the specific region of the field.Moreover, it would be interesting to analyze only the network that resulted in goals or shots suffered and scored.Another interesting study would be to identify the defensive coverage between teammates and even the defensive coverage against the opponents.

Conclusion
Two main kinds of analysis about the network were performed in this study: i) 'meso' analysis; and ii) 'micro' analysis.The scaled connectivity and the clustering coefficient were applied for the 'meso' analysis.Using such methods, it was possible to identify that the defenders (lateral and central) and midfielders are the players with the most connectivity with their teammates.Lastly, the centroid significance and conformity and the typological overlapping that were measured were applied to the 'micro' analysis.The results reveal that the lateral defenders (mainly the right defender), central defenders, and midfielders are the centroid players of the team.The most independent players in a regular way during all matches analyzed were the midfielders.Such results suggest that while the midfielders are the most independent players during the match, they are also the players that interact most with the remaining teammates.In sum, this pilot study proposed a set of network metrics that can increase the range of match analysis, complementing the traditional notational analysis and giving new solutions to understanding the teammates' interactions during a match.

Table 1 .
Strategic position of each player.

Table 3 .
Scaled connectivity values for all matches.
*Lowest value and ** Highest value contribute the most for the teams' process and how players cooperate with each other.

Table 4 .
Clustering coefficient of each player for all matches.

Table 5 .
Centroid values of each player for all matches.
*Lowest Value and ** Highest Value

Table 6 .
Least and most dependency of players from the teammates.