Louvain method

{{short description|Clustering and community detection algorithm}} {{Original research|date=April 2025}} {{Network Science}}

The '''Louvain method for community detection''' is a greedy optimization method intended to extract non-overlapping communities from large networks created by Blondel ''et al''.<ref name = "Louvain">{{cite journal|last1=Blondel|first1=Vincent D|last2=Guillaume|first2=Jean-Loup|last3=Lambiotte|first3=Renaud|last4=Lefebvre|first4=Etienne|s2cid=334423|title=Fast unfolding of communities in large networks|journal=Journal of Statistical Mechanics: Theory and Experiment|date=9 October 2008|volume=2008|issue=10|article-number=10008|doi=10.1088/1742-5468/2008/10/P10008|arxiv=0803.0476|bibcode=2008JSMTE..10..008B}}</ref> from the University of Louvain (the source of this method's name).

==Modularity optimization== The inspiration for this method of community detection is the optimization of modularity as the algorithm progresses. Modularity is a scale value between −1 (non-modular clustering) and 1 (fully modular clustering) that measures the relative density of edges inside communities with respect to edges outside communities. Optimizing this value theoretically results in the best possible grouping of the nodes of a given network. But because going through all possible configurations of the nodes into groups is impractical, heuristic algorithms are used.

In the Louvain Method of community detection, first small communities are found by optimizing modularity locally on all nodes, then each small community is grouped into one node and the first step is repeated. The method is similar to the earlier method by Clauset, Newman and Moore<ref name="Clauset">{{Cite journal|last1=Clauset|first1=Aaron|last2=Newman|first2=M. E. J.|last3=Moore|first3=Cristopher|s2cid=8977721|date=2004-12-06|title=Finding community structure in very large networks|arxiv=cond-mat/0408187|journal=Physical Review E|volume=70|issue=6|article-number=066111|doi=10.1103/PhysRevE.70.066111|pmid=15697438|issn=1539-3755|bibcode=2004PhRvE..70f6111C}}</ref> that connects communities whose amalgamation produces the largest increase in modularity. Although the Louvain algorithm can correctly identify the community structure when its evidence is sufficiently strong in artificial networks, in particular those sampled from the assortative stochastic block model.<ref name="Cohen-Addad">{{Cite conference|last1=Cohen-Addad|first1=Vincent|last2=Kosowski|first2=Adrian|last3=Mallmann-Trenn|first3=Frederik|last4=Saulpic|first4=David | contribution = On the Power of Louvain in the Stochastic Block Model | pages = 4055–4066 | publisher = Curran Associates, Inc. | title = Advances in Neural Information Processing Systems (Neurips 2020) | year = 2020 }}</ref>, it is prone to finding spurious communities in random graphs<ref>{{Cite journal| doi = 10.1103/PhysRevE.70.025101| volume = 70| issue = 2| article-number = 025101| last1 = Guimerà| first1 = Roger| last2 = Sales-Pardo| first2 = Marta| last3 = Amaral| first3 = Luís A. Nunes| title = Modularity from fluctuations in random graphs and complex networks| journal = Physical Review E| access-date = 2013-10-08| date = 2004-08-19| url = http://link.aps.org/doi/10.1103/PhysRevE.70.025101| pmc = 2441765}}</ref> and has been shown to systematically overfit empirical data <ref>{{Cite journal| doi = 10.1109/TKDE.2019.2911585| issn = 2326-3865| pages = 1–1| last1 = Ghasemian| first1 = Amir| last2 = Hosseinmardi| first2 = Homa| last3 = Clauset| first3 = Aaron| title = Evaluating Overfit and Underfit in Models of Network Community Structure| journal = IEEE Transactions on Knowledge and Data Engineering| date = 2019| arxiv = 1802.10582}} </ref><ref>{{Cite journal| publisher = American Physical Society| doi = 10.1103/PhysRevE.108.024309| volume = 108| issue = 2| article-number = 024309| last1 = Peixoto| first1 = Tiago P.| last2 = Kirkley| first2 = Alec| title = Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection| journal = Physical Review E| access-date = 2024-03-18| date = 2023-08-23| url = https://link.aps.org/doi/10.1103/PhysRevE.108.024309| url-access = subscription}} </ref>.

==Algorithm description==

=== Modularity === The value to be optimized is modularity, defined as a value in the range <math>[-1,1]</math> that measures the density of links inside communities compared to links between communities.<ref name="Louvain" /> For a weighted graph, modularity is defined as:

<math display="block">Q = \frac{1}{2m}\sum_{i=1}^N\sum_{j=1}^N \bigg[A_{ij} - \frac{k_{i} k_{j}}{2m}\bigg]\delta (c_{i}, c_{j}), </math>

where:

* <math>A_{ij}</math> represents the edge weight between nodes {{mvar|i}} and {{mvar|j}}; see Adjacency matrix; * {{tmath|k_i}} and {{tmath|k_j}} are the sum of the weights of the edges attached to nodes {{mvar|i}} and {{mvar|j}}, respectively; * {{mvar|m}} is the sum of all of the edge weights in the graph; * {{mvar|N}} is the total number of nodes in the graph; * {{tmath|c_i}} and {{tmath|c_j}} are the communities to which the nodes {{mvar|i}} and {{mvar|j}} belong; and * <math>\delta</math> is Kronecker delta function: <math display="block"> \begin{align} \delta(c_i, c_j) &= \begin{cases} 1 & \text{if } c_i \text{ and } c_j \text{ are the same cluster} \\ 0 & \text{otherwise} \end{cases} \end{align} </math>

Based on the above equation, the modularity of a community {{mvar|c}} can be calculated as:<ref>{{cite conference | last1 = Ghosh | first1 = Sayan | last2 = Halappanavar | first2 = Mahantesh | last3 = Tumeo | first3 = Antonino | last4 = Kalyanaraman | first4 = Ananth | last5 = Lu | first5 = Hao | last6 = Chavarría-Miranda | first6 = Daniel G. | last7 = Khan | first7 = Arif | last8 = Gebremedhin | first8 = Assefaw Hadish | contribution = Distributed Louvain Algorithm for Graph Community Detection | contribution-url = https://eecs.wsu.edu/~ananth/papers/Ghosh_IPDPS18.pdf | doi = 10.1109/IPDPS.2018.00098 | pages = 885–895 | publisher = IEEE Computer Society | title = 2018 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018, Vancouver, BC, Canada, May 21–25, 2018 | year = 2018| isbn = 978-1-5386-4368-6 }}</ref>

<math display="block"> \begin{align} Q_c &= \dfrac{1}{2m}\sum_{i}\sum_{j}A_{ij}\mathbf{1}\left\{c_{i}=c_{j}=c \right\} - \left(\sum_{i} \dfrac{k_i}{2m} \mathbf{1}\left\{c_{i}=c \right\} \right)^2 \\ &= \frac{\Sigma_{in}}{2m} - \left(\frac{\Sigma_{tot}}{2m}\right)^2 \end{align} </math>

where * <math>\Sigma_{in}</math> is the sum of edge weights between nodes within the community {{mvar|c}} (each edge is considered twice); and * <math>\Sigma_{tot}</math> is the sum of all edge weights for nodes within the community (including edges which link to other communities).

As nodes in different communities do not contribute to the modularity {{mvar|Q}}, it can be written as:

=== The Louvain method algorithm === The Louvain method works by repeating two phases.<ref name="Louvain" /> In phase one, nodes are sorted into communities based on how the modularity of the graph changes when a node moves communities. In phase two, the graph is reinterpreted so that communities are seen as individual nodes. A detailed explanation is provided below.

==== Phase 1 ==== thumb|Figure 1: Each node in the graph is randomly assigned to a singleton community

===== Each node in the network is assigned to its own community. ===== The Louvain method begins by considering each node {{mvar|v}} in a graph to be its own community. This can be seen in Figure 1, where each dot (representing nodes) is a unique color (representing which community the node belongs to).

===== Nodes are grouped into communities ===== For each node {{mvar|v}}, we consider how moving {{mvar|v}} from its current community {{mvar|C}} into a neighboring community {{mvar|C'}} will affect the modularity of the graph partition. In the pseudo-code below, this happens in the for-loop. We select the community {{mvar|C'}} with the greatest change in modularity, and if the change is positive, we move {{mvar|v}} into {{mvar|C'}}; otherwise we leave it where it is. This continues until the modularity stops improving. thumb|Figure 2: Nodes are assigned to communities based on their modularities <syntaxhighlight line="1" lang="text">function moveNodes(Graph G, Partition P): do old_modularity <- current_modularity_of_partition for v in V(G), do # find the community that causes the largest increase in modularity when v is moved into it C' <- argmax(delta_Q) # delta_Q is the change in modularity if delta_Q > 0, then move v into C' end if end for update current_modularity_of_partition while current_modularity_of_partition > old_modularity return P end function </syntaxhighlight><ref name=":0" /> This process is applied repeatedly and sequentially to all nodes until no modularity increase can occur. Once this local maximum of modularity is hit, the first phase has ended. Figure 2 shows how the graph in Figure 1 might look after one iteration of phase 1.

==== Phase 2 ====

===== Communities are reduced to a single node ===== For each community in our graph's partition, the individual nodes making up that community are combined and the community itself becomes a node. The edges connecting distinct communities are used to weight the new edges connecting our aggregate nodes.

This process is modeled in the pseudo-code, where the function {{mvar|aggregateGraph}} returns a new graph whose vertices are the partition of the old graph, and whose edges are calculated using the old graph. This function does not show the edges being weighted, but a simple modification would allow for that information to be tracked. thumb|Figure 3: Communities are reduced to a single node with weighted edges <syntaxhighlight line="1" lang="text"> function aggregateGraph(Graph G, Partition P): V <- P E <- [(A,B) | (x,y) is in E(G), x is in A and A is in P, y is in B and B is in P] return Graph(V,E) end function </syntaxhighlight><ref name=":0" /> Figure 3 shows what the graph from Figure 2 would look like after being aggregated. This graph is analogous to the graph in Figure 1 in the sense that each node is assigned to a single community. From here, the process can be repeated so that more nodes are moved into existing communities until an optimal level of modularity is reached.

The pseudo-code below shows how the previous two functions work together to complete the process. <syntaxhighlight lang="text"> function louvain(Graph G, Partition P): do P <- moveNodes(G, P) done <- length(P) == length(V(G)) # every community is a single node, despite running moveNodes if not done, then: G <- aggregateGraph(G, P) P <- singletonPartition(G) end if while not done end function

function singletonPartition(Graph G): return [{v} | v is in V(G)] # each node is placed in its own community end function </syntaxhighlight><ref name=":0" />

=== Time complexity === Generally, the Louvain method is assumed to have a time complexity of <math>O(n \log{} n)</math>. Richard Blondel, co-author of the paper that originally published the Louvain method, seems to support this notion,<ref>{{Cite web |title=Louvain method for community detection |url=https://perso.uclouvain.be/vincent.blondel/research/louvain.html |access-date=2024-11-21 |website=perso.uclouvain.be}}</ref> but other sources claim the time complexity is "essentially linear in the number of links in the graph,"<ref>{{Cite web |title=Louvain - Analytics & Algorithms - Ultipa Graph |url=https://www.ultipa.com/document/ultipa-graph-analytics-algorithms/louvain/v4.5 |access-date=2024-11-21 |website=www.ultipa.com}}</ref> meaning the time complexity would instead be <math>O(m)</math>, where {{mvar|m}} is the number of edges in the graph. Unfortunately, no source has published an analysis of the Louvain method's time complexity so one is attempted here.

In the pseudo-code above, the function {{mvar|louvain}} controls the execution of the algorithm. It's clear to see that inside of {{mvar|louvain}}, {{mvar|moveNodes}} will be repeated until it is no longer possible to combine nodes into communities. This depends on two factors: how much the modularity of the graph can improve and, in the worst case, if the modularity can improve with every iteration of {{mvar|louvain}}, it depends on how quickly {{mvar|aggregateGraph}} will reduce the graph down to a single node.

If, in each iteration of {{mvar|louvain}}, {{mvar|moveNodes}} is only able to move one node into a community, then {{mvar|aggregateGraph}} will only be able to reduce the size of the graph by one. This would cause {{mvar|louvain}} to repeat {{mvar|v}} times. Since {{mvar|moveNodes}} iterates through all nodes in a graph, this would result in a time complexity of <math>\mathcal{O}(n^2)</math>, where {{mvar|n}} is the number of nodes.

It is unclear if this situation is possible, so the above result should be considered a loose bound. Blondel et al. state in their original publication that most of the run time is spent in the early iterations of the algorithm because "the number of communities decreases drastically after just a few passes."<ref name="Louvain" /> This can be understood by considering a scenario where {{mvar|moveNodes}} is able to move each node so that every community has two nodes. In this case, {{mvar|aggregateGraph}} would return a graph half the size of the original. If this continued, then the Louvain method would have a runtime of <math>n \log_2{n}</math>, although it is unclear if this would be the worst case, best case, average case, or none of those. Additionally, there is no guarantee the size of the graph would be reduced by the same factor with each iteration, and so no single logarithm function can perfectly describe the time complexity.

==Previous uses== *Twitter social Network (2.4 Million nodes, 38 million links) by Josep Pujol, Vijay Erramilli, and Pablo Rodriguez:<ref>{{cite arXiv|eprint=0905.4918v1|last1=Pujol|first1=Josep M.|title=Divide and Conquer: Partitioning Online Social Networks|last2=Erramilli|first2=Vijay|last3=Rodriguez|first3=Pablo|class=cs.NI|year=2009}}</ref> The authors explore the problem of partitioning Online Social Networks onto different machines. *Mobile phone Network (4 Million nodes, 100 Million links) by Derek Greene, Donal Doyle, and Padraig Cunningham:<ref>{{cite tech report|url=https://www.csi.ucd.ie/files/ucd-csi-2011-06.pdf|title=Tracking the Evolution of Communities in Dynamic Social Networks|first1=Derek|last1=Greene|first2=Dónal|last2=Doyle|first3=Pádraig|last3=Cunningham|institution=University College Dublin|number=UCD-CSI-2011-06|date=May 2011|access-date=2014-11-20 |url-status=dead |archive-url=https://web.archive.org/web/20130512153616/http://www.csi.ucd.ie/files/ucd-csi-2011-06.pdf |archive-date=2013-05-12}}</ref> Community-tracking strategies for identifying dynamic communities of different dynamic social networks. *Detecting species in network-based dynamical model.<ref>{{Cite journal|last1=Markovitch|first1=Omer | last2=Krasnogor|first2=Natalio | title=Predicting species emergence in simulated complex pre-biotic networks | journal=PLOS ONE|volume=13|issue=2|doi=10.1371/journal.pone.0192871|pmid=29447212 |pmc=5813963 |article-number=e0192871|year=2018 |bibcode=2018PLoSO..1392871M |doi-access=free }}</ref>

==Disadvantages== Louvain produces only non-overlapping communities, which means that each node can belong to at most one community. This is highly unrealistic in many real-world applications. For example, in social networks, most people belong to multiple communities: their family, their friends, their co-workers, old school buddies, etc. In biological networks, most genes or proteins belong to more than one pathway or complex. Furthermore, Louvain has been shown to sometimes produce arbitrarily badly connected communities, and has been effectively superseded (at least in the non-overlapping case) by the Leiden algorithm.

thumb|A graph illustrating how communities can become disconnected when using the Louvain algorithm. A worst case example of an arbitrarily badly connected community is a internally disconnected community. An internally disconnected community arises through the Louvain algorithm when a node that had been acting as a "bridge" between two groups of nodes in its community is moved to a new community, leaving the old one disconnected. The remaining nodes in the old community may also be relocated, but if their connection to the community is strong enough despite the removal of the "bridge" node, they will instead remain in place. For an example of this, see the image to the right; note how the removal of the bridge node, node 0, caused the red community to be split into two disjoint subgroups. While this is the worst-case scenario, there are other, more subtle problems with the Louvain algorithm that can also lead to arbitrarily badly connected communities, such as the formation of communities using nodes that are only weakly connected.

thumb|183x183px|An image depicting how the resolution limit of modularity can cause subcommunities to become hidden. Another common issue with the Louvain algorithm is the resolution limit of modularity - that is, multiple small communities being grouped together into a larger community. This causes the smaller communities to be hidden; for an example of this, see the visual depiction of the resolution limit to the right. Note how, when the green community is absorbed into the blue community to increase the graph's modularity, the smaller group of nodes that it represented is lost. There is no longer a way to differentiate those nodes from the nodes that were already in the blue community. Conversely, the nodes that were already in the blue community no longer appear distinct from those that were in the green community; in other words, whatever difference caused them to initially be placed in separate communities has been obscured.

Both the resolution limit of modularity and the arbitrarily badly connected community problem are further exacerbated by each iteration of the algorithm. Ultimately, the only thing the Louvain algorithm guarantees is that the resulting communities cannot be merged further; in other words, they're well-separated. To avoid the problems that arise from arbitrarily badly connected communities and the resolution limit of modularity, it is recommended to use the Leiden algorithm instead, as its refinement phase and other various adjustments have corrected these issues.<ref name=":0">{{Cite journal |last1=Traag |first1=V. A. |last2=Waltman |first2=L. |last3=van Eck |first3=N. J. |date=2019-03-26 |title=From Louvain to Leiden: guaranteeing well-connected communities |journal=Scientific Reports |language=en |volume=9 |issue=1 |page=5233 |doi=10.1038/s41598-019-41695-z |issn=2045-2322 |pmc=6435756 |pmid=30914743|arxiv=1810.08473 |bibcode=2019NatSR...9.5233T }}</ref>

==Comparison to other methods of non-overlapping community detection== When comparing modularity optimization methods, the two measures of importance are the speed and the resulting modularity value. A higher speed is better as it shows a method is more efficient than others and a higher modularity value is desirable as it points to having better-defined communities. The compared methods are, the algorithm of Clauset, Newman, and Moore,<ref name="Clauset" /> Pons and Latapy,<ref>{{cite journal |first1=Pascal |last1=Pons |first2=Matthieu |last2=Latapy |s2cid=121714719 |journal=Journal of Graph Algorithms and Applications |volume=10 |issue=2 |pages=191–218 |date=2006 |title=Computing Communities in Large Networks Using Random Walks |url=http://jgaa.info/accepted/2006/PonsLatapy2006.10.2.pdf |doi=10.7155/jgaa.00124 |arxiv=cond-mat/0412368 |doi-access=free}}</ref> and Wakita and Tsurumi.<ref>{{cite arXiv |eprint=cs/0702048 |last1=Wakita |first1=Ken |title=Finding Community Structure in Mega-scale Social Networks |last2=Tsurumi |first2=Toshiyuki |year=2007}}</ref>

{| border="1" class="wikitable" |+ Modularity Optimization Comparison<ref>{{cite journal|arxiv=0803.0476|doi=10.1088/1742-5468/2008/10/P10008|title=Fast unfolding of communities in large networks|journal=Journal of Statistical Mechanics: Theory and Experiment|volume=2008|issue=10|article-number=10008|year=2008|last1=Blondel|first1=Vincent D.|last2=Guillaume|first2=Jean-Loup|last3=Lambiotte|first3=Renaud|last4=Lefebvre|first4=Etienne|s2cid=334423|bibcode=2008JSMTE..10..008B}}</ref> ! ! Karate ! Arxiv ! Internet ! Web nd.edu ! Phone ! Web uk-2005 ! Web WebBase 2001 |- ! Nodes/links | 34/77 || 9k/24k || 70k/351k || 325k/1M || 2.6M/6.3M || 39M/783M || 118M/1B |- ! Clauset, Newman, and Moore | .38/0s || .772/3.6s || .692/799s || .927/5034s || -/- || -/- || -/- |- !Pons and Latapy |.42/0s || .757/3.3s || .729/575s || .895/6666s || -/- || -/- || -/- |- ! Wakita and Tsurumi |.42/0s || .761/0.7s || .667/62s || .898/248s || .56/464s || -/- || -/- |- ! Louvain Method |.42/0s || .813/0s || .781/1s || .935/3s || .769/134s || .979/738s || .984/152mn |} -/- in the table refers to a method that took over 24hrs to run. This table (from<ref name="Louvain" /><ref>{{Cite book |last1=Aynaud |first1=Thomas |title=Graph Partitioning |last2=Blondel |first2=Vincent D. |last3=Guillaume |first3=Jean-Loup |last4=Lambiotte |first4=Renaud |publisher=Wiley |year=2013 |isbn=978-1-84821-233-6 |editor-last=Bichot |editor-first=Charles-Edmond |edition=1 |publication-date=13 February 2013 |pages=315–345 |language=en |chapter=Multilevel Local Optimization of Modularity |doi=10.1002/9781118601181.ch13 |editor-last2=Siarry |editor-first2=Patrick |chapter-url=https://onlinelibrary.wiley.com/doi/10.1002/9781118601181.ch13}}</ref>) shows that the Louvain method outperforms many similar modularity optimization methods in both the modularity and the time categories.

==References== {{reflist}}  *"The Louvain method for community detection in large networks" Vincent Blondel http://perso.uclouvain.be/vincent.blondel/research/louvain.html

Category:Network theory