Node influence metric

In graph theory and network analysis, '''node influence metrics''' are measures that rank or quantify the influence of every node (also called vertex) within a graph. They are related to centrality indices. Applications include measuring the influence of each person in a social network, understanding the role of infrastructure nodes in transportation networks, the Internet, or urban networks, and the participation of a given node in disease dynamics.

== Origin and development == The traditional approach to understanding node importance is via centrality indicators. Centrality indices are designed to produce a ranking which accurately identifies the most influential nodes. Since the mid 2000s, however, social scientists and network physicists have begun to question the suitability of centrality indices for understanding node influence. Centralities may indicate the most influential nodes, but they are rather less informative for the vast majority of nodes which are not highly influential.

Borgatti and Everett's 2006 review article<ref name="Borgatti2006"> {{cite journal | last1=Borgatti | first1=Steve |last2=Everett|first2=Martin|title=A graph-theoretic perspective on centrality|journal=Social Networks| year=2006|volume=28| issue=4 |pages=466–484 | doi=10.1016/j.socnet.2005.11.005}} </ref> showed that the accuracy of centrality indices is highly dependent on network topology. This finding has been repeatedly observed since then. (e.g.<ref name="daSilva2012">{{cite journal | last1=da Silva|first1=Renato |last2=Viana|first2=Matheus|last3=da F. Costa |first3=Luciano| title=Predicting epidemic outbreak from individual features of the spreaders| journal=J. Stat. Mech.: Theory Exp. | year=2012|volume=2012|article-number=P07005|number=7 | doi=10.1088/1742-5468/2012/07/p07005|arxiv=1202.0024|bibcode=2012JSMTE..07..005A|s2cid=2530998 }}</ref><ref name="Lawyer2015"> {{cite journal |last1= Lawyer |first1= Glenn |year= 2015 |title= Understanding the spreading power of all nodes in a network: a continuous-time perspective |journal=Sci Rep |volume=5|page=8665|doi=10.1038/srep08665 |pmid=25727453 |pmc=4345333|arxiv=1405.6707|bibcode=2015NatSR...5.8665L}}</ref>). In 2012, Bauer and colleagues reminded us that centrality indices only rank nodes but do not quantify the difference between them.<ref name="Bauer2012"> {{cite journal | last1=Bauer|first1=Frank | last2=Lizier|first2=Joseph|title=Identifying influential spreaders and efficiently estimating infection numbers in epidemic models: A walk counting approach| journal=Europhys Lett | year=2012| volume=99| article-number=68007|number=6 | doi=10.1209/0295-5075/99/68007|arxiv=1203.0502|bibcode=2012EL.....9968007B|s2cid=9728486 }} </ref> In 2013, Sikic and colleagues presented strong evidence that centrality indices considerably underestimate the power of non-hub nodes.<ref name="Sikic2013"> {{ cite journal| last1= Sikic| first1=Mile|last2=Lancic|first2=Alen|last3=Antulov-Fantulin|first3=Nino|last4=Stefanic|first4=Hrvoje| title = Epidemic centrality -- is there an underestimated epidemic impact of network peripheral nodes? |journal = The European Physical Journal B |volume=86 |number=10 |pages=1–13 |year=2013 | doi=10.1140/epjb/e2013-31025-5|arxiv=1110.2558 | bibcode=2013EPJB...86..440S| s2cid=12052238}} </ref> The reason is quite clear. The accuracy of a centrality measure depends on network topology, but complex networks have heterogeneous topology. Hence a centrality measure which is appropriate for identifying highly influential nodes will most likely be inappropriate for the remainder of the network.<ref name="Lawyer2015"/>

This has inspired the development of novel methods designed to measure the influence of all network nodes. The most general of these are the '''accessibility''', which uses the diversity of random walks to measure how accessible the rest of the network is from a given start node,<ref name="Travencolo2008">{{ cite journal| last1=Travencolo|first1=B. a. N.|last2=da F. Costa| first2 =Luciano| title=Accessibility in complex networks| journal=Phys Lett A| year=2008|volume=373|number=1|pages=89–95| doi=10.1016/j.physleta.2008.10.069| bibcode=2008PhLA..373...89T}}</ref> and the '''expected force''', derived from the expected value of the force of infection generated by a node.<ref name="Lawyer2015"/> Both of these measures can be meaningfully computed from the structure of the network alone.

== Accessibility == The '''Accessibility''' is derived from the theory of random walks. It measures the diversity of self-avoiding walks which start from a given node. A walk on a network is a sequence of adjacent vertices; a self-avoiding walk visits (lists) each vertex at most once. The original work used simulated walks of length 60 to characterize the network of urban streets in a Brazilian city.<ref name="Travencolo2008"/> It was later formalized as a modified form of hierarchical degree which controls for both transmission probabilities and the diversity of walks of a given fixed length.<ref name='Viana2012'>{{ cite journal| last1=Viana|first1=Matheus|last2=Batista|first2=Joao| last3=da F. Costa|first3=Luciano| title=Effective number of accessed nodes in complex networks | journal =Phys Rev E|year=2012 | volume=85| number = 3 pt 2|article-number=036105|arxiv=1101.5379|doi=10.1103/PhysRevE.85.036105|pmid=22587147|bibcode=2012PhRvE..85c6105V |s2cid=643417}} </ref>

=== Definition === The hierarchical degree measures the number of nodes reachable from a start node by performing walks of length <math>h</math>. For a fixed <math>h</math> and walk type, each of these neighbors is reached with a (potentially different) probability <math>p_j^{(h)}</math>. Given a vector of such probabilities, the accessibility of node <math>i</math> at scale <math>h</math> is defined

:<math>\kappa_i^{(h)} = \exp \left( - \sum_j p_j^{(h)} \log p_j^{(h)} \right) </math>

The probabilities can be based on uniform-probability random walks, or additionally modulated by edge weights and/or explicit (per edge) transmission probabilities.<ref name='Viana2012'/>

=== Applications === The accessibility has been shown to reveal community structure in urban networks,<ref name="Travencolo2008"/> corresponds to the number of nodes which can be visited in a defined time period,<ref name="Viana2012"/> and is predictive of the outcome of epidemiological SIR model spreading processes on networks with large diameter and low density.<ref name="daSilva2012"/>

== Expected force ==

The '''expected force''' measures node influence from an epidemiological perspective. It is the expected value of the force of infection generated by the node after two transmissions.

=== Definition ===

The expected force of a node <math>i</math> is given by

:<math>\kappa_i = - \sum_{j=1}^J d_j \log(d_j)</math>

where the sum is taken over the set <math>J</math> of all possible transmission clusters resulting from two transmissions starting from <math>i</math>. That is, node <math>i</math> and two of its neighbors or <math>i</math>, one of its neighbors (called infected) and a neighbor of the infected neighbor. <math>J</math> contains all possible orderings of the transmission events, so two clusters may contain the same nodes if they got infected in a different order. <math>d_j</math> is the normalized cluster degree of cluster <math>j \in J</math>, that is, the number of edges with exactly one endpoint in cluster <math>j</math>.

The definition naturally extends to directed networks by limiting the enumeration <math>J</math> by edge direction. Likewise, extension to weighted networks, or networks with heterogeneous transmission probabilities, is a matter of adjusting the normalization of <math>d_j</math> to include the probability that that cluster forms. It is also possible to use more than two transmissions to define the set <math>J</math>.<ref name="Lawyer2015"/>

=== Applications ===

The expected force has been shown to strongly correlate with SI, SIS, and SIR epidemic outcomes over a broad range of network topologies, both simulated and empirical.<ref name="Lawyer2015"/><ref name="Lawyer2014tr">{{cite arXiv |last1= Lawyer |first1= Glenn |year= 2014 |title= Technical Report: Performance of the Expected Force on AS-level Internet topologies |class= cs.NI | eprint=1406.4785 }}</ref> It has also been used to measure the pandemic potential of world airports,<ref>{{cite journal|last1=Lawyer|first1=Glenn|title=Measuring the potential of individual airports for pandemic spread over the world airline network|journal=BMC Infectious Diseases|date=2016|volume=16|article-number=70|doi=10.1186/s12879-016-1350-4|pmid=26861206|pmc=4746766 |doi-access=free }}</ref> and mentioned in the context of digital payments,<ref>{{cite journal|last1=Milkau|first1=Udo|last2=Bott|first2=Jürgen|title=Digitalisation in payments: From interoperability to centralised models?|journal=Journal of Payments Strategy & Systems|date=2015|volume=9|issue=3|page=321 |doi=10.69554/KUAW4429 |url=http://www.ingentaconnect.com/content/hsp/jpss/2015/00000009/00000003/art00008}}</ref> ecology,<ref>{{cite journal|last1=Jordan|first1=Lyndon|last2=Maguire|first2=Sean|last3=Hofmann|first3=Hans|last4=Kohda|first4=Masanori|title=The social and ecological costs of an 'over-extended' phenotype|journal=Proceedings of the Royal Society B|date=2016|volume=283|issue=1822|doi=10.1098/rspb.2015.2359|pmid=26740619|pmc=4721094|article-number=20152359}}</ref> fitness,<ref>{{cite journal|last1=Pereira|first1=Vanessa|last2=Gama|first2=Maria|last3=Sousa|first3=Filipe|last4=Lewis|first4=Theodore|last5=Gobatto|first5=Claudio|last6=Manchado-Gobatto|first6=Fúlvia|title=Complex network models reveal correlations among network metrics, exercise intensity and role of body changes in the fatigue process|journal=Scientific Reports|date=2015|volume=5|article-number=10489|doi=10.1038/srep10489|pmid=25994386|pmc=4440209|bibcode=2015NatSR...510489P}}</ref> and project management.<ref>{{cite journal|last1=Ellinas|first1=Christos|last2=Allan|first2=Neil|last3=Durugbo|first3=Christopher|last4=Johansson|first4=Anders|title=How Robust Is Your Project? From Local Failures to Global Catastrophes: A Complex Networks Approach to Project Systemic Risk|journal=PLOS ONE|date=2015|doi=10.1371/journal.pone.0142469|pmid=26606518|volume=10|issue=11|pmc=4659599|article-number=e0142469|bibcode=2015PLoSO..1042469E|doi-access=free}}</ref>

== Other approaches ==

Others suggest metrics which explicitly encode the dynamics of a specified process unfolding on the network. The '''dynamic influence''' is the proportion of infinite walks starting from each node, where walk steps are scaled such that the linear dynamics of the system are expected to converge to a non-null steady state.<ref name="Klemm2012"> {{cite journal | last1=Klemm|first1=Konstantin|last2=Serrano|first2=M Ángeles|author2-link=M. Ángeles Serrano|last3=Eguiluz|first3=Victor|last4=Miguel|first4=Maxi San |title= A measure of individual role in collective dynamics| journal =Sci Rep|year=2012|volume=2|article-number=292|doi=10.1038/srep00292|pmid=22379597|pmc=3289910|bibcode=2012NatSR...2..292K|arxiv=1002.4042}}</ref> The '''Impact''' sums, over increasing walk lengths, the probability of transmission to the end node of the walk and that the end node has not been previously visited by a shorter walk.<ref name="Bauer2012"/> While both measures well predict the outcome of the dynamical systems they encode, in each case the authors admit that results from one dynamic do not translate to other dynamics.

==References== {{reflist}}

* * * *

Category:Network analysis Category:Graph theory