简介概要

Similarity measure design for high dimensional data

来源期刊:中南大学学报(英文版)2014年第9期

论文作者:LEE Sang-hyuk YAN Sun JEONG Yoon-su SHIN Seung-soo

文章页码:3534 - 3540

Key words:high dimensional data; similarity measure; difference; neighborhood information; financial fraud

Abstract: Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.

详情信息展示

Similarity measure design for high dimensional data

LEE Sang-hyuk1, YAN Sun2, JEONG Yoon-su3, SHIN Seung-soo4

(1. Department of Electrical and Electronic Engineering,
Xi’an Jiaotong-Liverpool University, Suzhou 215123, China;
2. International Business School Suzhou, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China;
3. Department of Information Communication Engineering, Mokwon University,
21 Mokwon-gil, Seo-gu, Daejeon, 302-318, Korea;
4. Department of Information Security, Tongmyong University, Sinseonno, Nam-gu, Busan, 608-711, Korea)

Abstract:Information analysis of high dimensional data was carried out through similarity measure application. High dimensional data were considered as the a typical structure. Additionally, overlapped and non-overlapped data were introduced, and similarity measure analysis was also illustrated and compared with conventional similarity measure. As a result, overlapped data comparison was possible to present similarity with conventional similarity measure. Non-overlapped data similarity analysis provided the clue to solve the similarity of high dimensional data. Considering high dimensional data analysis was designed with consideration of neighborhoods information. Conservative and strict solutions were proposed. Proposed similarity measure was applied to express financial fraud among multi dimensional datasets. In illustrative example, financial fraud similarity with respect to age, gender, qualification and job was presented. And with the proposed similarity measure, high dimensional personal data were calculated to evaluate how similar to the financial fraud. Calculation results show that the actual fraud has rather high similarity measure compared to the average, from minimal 0.0609 to maximal 0.1667.

Key words:high dimensional data; similarity measure; difference; neighborhood information; financial fraud

<上一页 1 下一页 >

相关论文

  • 暂无!

相关知识点

  • 暂无!

有色金属在线官网  |   会议  |   在线投稿  |   购买纸书  |   科技图书馆

中南大学出版社 技术支持 版权声明   电话:0731-88830515 88830516   传真:0731-88710482   Email:administrator@cnnmol.com

互联网出版许可证:(署)网出证(京)字第342号   京ICP备17050991号-6      京公网安备11010802042557号