一种结构化Web文档的联合聚类算法

来源期刊:中南大学学报(自然科学版)2010年第5期

论文作者:邓冬梅 龙际珍 尹湘舟

文章页码:1871 - 1876

关键词:联合聚类;相似性融合;结构化文档

Key words:co-clustering; similarity fusion; structured document

摘    要:为了对网上多媒体信息进行有效检索和过滤,提出一种基于文本和图片相似性融合的联合聚类算法。首先通过相似性计算得到文本相似性和图片相似性,然后,将所得文本相似性矩阵和图片相似性矩阵进行水平拼接融合,经奇异值分解后,进行k-means联合聚类,使得聚类后的结果融合文本信息和图片信息。研究结果表明:与单一图像联合聚类方法相比,采用联合聚类算法所得每一簇的F-Measure值都有明显提高,与单一文本联合聚类在第1,2,3和7簇的F-Measure值也有所提高。

Abstract: A similarity fusion algorithm about the text and image co-clustering of multimedia structured documents was given in order to perform multimedia retrieval and filter efficiently. This method fuses text similarity matrix and image similarity matrix to make a fusion similarity matrix and then it is co-clustered with k-means algorithm after eigenvector decomposition. This algorithm was tested on the task of multimedia structured documents which had two information sources, i.e., text and image. The results show that the F-Measure value in all clusters obtained by the co-clustering algorithm based on structured Web document are larger than those obtained by a flat image co-clustering and the F-Measure value increases in the first, second, third, seventh cluster compared to those obtained by flat text co-clustering.

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号