基于聚类的个性化匿名隐私保护算法
首发时间:2023-05-12
摘要:匿名隐私保护技术是应用最为广泛的一种数据隐私保护技术,其原理是通过泛化或隐匿处理原始数据表中的准标识符属性,从而发布语义一致的数据。但是目前的匿名隐私保护模型大多未考虑敏感属性值间的语义相似性,易受相似性攻击,也无法在数据的安全性和实用性间取得合理的平衡。因此本文提出了基于聚类的个性化(a,k,d)-匿名隐私保护算法。该算法针对敏感属性定义了语义相似组的概念,并要求每个等价类中的语义相似组个数不小于d,以防御相似性攻击。同时为满足匿名模型的个性化需求,对在等价类中相异的敏感属性设置不同的频率约束,限制其出现频率。结合最大相异度聚类来实现匿名算法,在保障隐私的前提下,提高匿名数据的实用性。实验结果表明,该算法可以用比基于其他聚类的k-匿名模型更小的时间代价,将信息损失量降低了50%以上,抵御了相似性攻击,提供个性化的隐私保护。
For information in English, please click here
Personalized anonymity privacy protection algorithm based on clustering
Abstract:Anonymous privacy protection technology is the most widely used data privacy protection technology. Its principle is to publish semantically consistent data by generalizing or hiding the quasi-identifier attributePersonalized in the original data table. Most anonymity models for privacy preserving neither consider the semantic similarity of sensitive attribute values, which are vulnerable to similarity attack, nor achieve a balance between data privacy and availability. This paper proposes a (a,k,d)-anonymity privacy preserving algorithm based on clustering. The algorithm defines semantic similarity groups for sensitive attributes, and requires that the number of semantic similarity groups in each equivalence class is not less than d to prevent similarity attack. In addition, it satisfies the personalized needs of the anonymity model by setting different frequency constraints for sensitive attributes that differ in the equivalence class to limit their frequency. The anonymity algorithm is implemented by combining the maximum dissimilarity clustering, which improves the availability of anonymous data on the basis of privacy. The experimental results show that this algorithm can reduce the amount of information loss by more than 50%, resist the similarity attack and provide personalized privacy protection with less time cost than other k-anonymity models based on clustering.
Keywords: privacy protection similarity attack clustering personalization
引用
No.****
同行评议
勘误表
基于聚类的个性化匿名隐私保护算法
评论
全部评论