详细信息
DisHAP:基于层次亲和聚类的分布式大图划分算法 ( EI收录)
DisHAP:A Distributed Partition Algorithm for Large Scale Graphs Based on Hierarchical Affinity Clustering
文献类型:期刊文献
中文题名:DisHAP:基于层次亲和聚类的分布式大图划分算法
英文题名:DisHAP:A Distributed Partition Algorithm for Large Scale Graphs Based on Hierarchical Affinity Clustering
作者:柳菁[1];李琪[1]
机构:[1]绍兴文理学院计算机科学与工程系,浙江绍兴312000
年份:2021
卷号:49
期号:10
起止页码:2002
中文期刊名:电子学报
外文期刊名:Acta Electronica Sinica
收录:CSTPCD、、CSCD2021_2022、EI(收录号:20214711202735)、Scopus、北大核心、CSCD、北大核心2020
基金:国家自然科学基金青年科学基金(No.62002226)。
语种:中文
中文关键词:分布式大图划分;层次聚类;局部优化;分布式图计算;平衡划分
外文关键词:distributed large-scale graph partitioning;hierarchical clustering;local optimization;distributed graph computation;balanced partitioning
中文摘要:平衡图划分是改善并行图计算性能的关键.一个良好的划分算法应保证划分后的子图在负载均衡的前提下,减少子图之间的交互边(切割边)规模,从而减少网络通信.对此,本文设计一种基于层次亲和聚类的分布式大图划分算法(DisHAP).该算法采用亲和聚类的思想,将图初始划分为规模相等的k个子图;再将结果映射成顶点序列,以线性嵌入顺序处理节点,通过局部交换策略优化割边率;最后将DisHAP应用在MapReduce框架中,使用多种真实及理论图数据,与现有的大图划分算法做比较分析.以Twitter图为例,划分2,4,8,16,32个子区,相较于现有的大图划分算法(LDG,BLP,Spinner,Fennel,ParMetis及PSA-MIR算法),割边率减少1.7%~30.2%,说明了该算法的优越性.同时该算法具有良好的可扩展性,划分的子区数量及图的规模对划分时间具有较低的影响.
外文摘要:It is the key to improve the performance of graph calculation by improving the efficiency of the graph partitioning algorithm and reducing the communication edge scale between the subgraphs.Due to the limited memory capacity of single computing node,it is difficult to meet the partitioning requirements of large-scale graphs in terms of partitioning efficiency and partitioning quality.In this paper,a distributed graph partitioning algorithm based on hierarchical affinity clustering for massive scale graphs is designed,called DisHAP.It uses the Boruvka minimum spanning tree algorithm to balance cluster the graph under the constraint condition of the input graph according to the vertex similarity,and partitions the graph into k subgraph(partitions)with equal size.In order to optimize the fractiom of edges cut between these large subgraphs,we map the initial partition results to the vertex sequence and cut into a large number of subsections,randomly select the sub slice pairs in the neighbor subgraph sequence,and migrate the vertices according to the mutual exchange and the single vertex positive profit.Thus,the optimization problem of large data volume is converted to a large number of small problems to solve.The algorithm is applied to the MapReduce framework to effectively improve the efficiency of the algorithm.Finally,we use various actual and theoretical graph data to compare with existing graph partitioning algorithms to verify the effectiveness of the DisHAP algorithm.Taking the graph Twitter as an example,it is partitioned into 2,4,8,16,32 partitions.Compared to LDG,BLP,Spinner,Fennel,ParMetis and PSA-MIR algorithms,the fraction of edges cut is reduced by 30.2%,29.4%,10.2%,7.8%,1.7%,and 3.3%respectively.
参考文献:
正在载入数据...