Tools
Privacy-preserving data mining : models and algorithms
Where to find it
Information & Library Science Library
Authors, etc.
Summary
Advances in hardware technology have increased the capability to store and record personal data about consumers and individuals, causing concerns that personal data may be used for a variety of intrusive or malicious purposes.
Privacy-Preserving Data Mining: Models and Algorithms proposes a number of techniques to perform the data mining tasks in a privacy-preserving way. These techniques generally fall into the following categories: data modification techniques, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and inference control, query auditing methods, randomization and perturbation-based techniques.
This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions.
Privacy-Preserving Data Mining: Models and Algorithms is designed for researchers, professors, and advanced-level students in computer science, and is also suitable for industry practitioners.
Contents
- Preface p. v
- List of Figures p. xvii
- List of Tables p. xxi
- An Introduction to Privacy-Preserving Data Mining p. 1 Charu C. Aggarwal and Philip S. Yu
- 1.1 Introduction p. 1
- 1.2 Privacy-Preserving Data Mining Algorithms p. 3
- 1.3 Conclusions and Summary p. 7
- References p. 8
- 2 A General Survey of Privacy-Preserving Data Mining Models and Algorithms p. 11 Charu C. Aggarwal and Philip S. Yu
- 2.1 Introduction p. 11
- 2.2 The Randomization Method p. 13
- 2.2.1 Privacy Quantification p. 15
- 2.2.2 Adversarial Attacks on Randomization p. 18
- 2.2.3 Randomization Methods for Data Streams p. 18
- 2.2.4 Multiplicative Perturbations p. 19
- 2.2.5 Data Swapping p. 19
- 2.3 Group Based Anonymization p. 20
- 2.3.1 The k-Anonymity Framework p. 20
- 2.3.2 Personalized Privacy-Preservation p. 24
- 2.3.3 Utility Based Privacy Preservation p. 24
- 2.3.4 Sequential Releases p. 25
- 2.3.5 The l-diversity Method p. 26
- 2.3.6 The t-closeness Model p. 27
- 2.3.7 Models for Text, Binary and String Data p. 27
- 2.4 Distributed Privacy-Preserving Data Mining p. 28
- 2.4.1 Distributed Algorithms over Horizontally Partitioned Data Sets p. 30
- 2.4.2 Distributed Algorithms over Vertically Partitioned Data p. 31
- 2.4.3 Distributed Algorithms for k-Anonymity p. 32
- 2.5 Privacy-Preservation of Application Results p. 32
- 2.5.1 Association Rule Hiding p. 33
- 2.5.2 Downgrading Classifier Effectiveness p. 34
- 2.5.3 Query Auditing and Inference Control p. 34
- 2.6 Limitations of Privacy: The Curse of Dimensionality p. 37
- 2.7 Applications of Privacy-Preserving Data Mining p. 38
- 2.7.1 Medical Databases: The Scrub and Datafly Systems p. 39
- 2.7.2 Bioterrorism Applications p. 40
- 2.7.3 Homeland Security Applications p. 40
- 2.7.4 Genomic Privacy p. 42
- 2.8 Summary p. 43
- References p. 43
- 3 A Survey of Inference Control Methods for Privacy-Preserving Data Mining p. 53 Josep Domingo-Ferrer
- 3.1 Introduction p. 54
- 3.2 A classification of Microdata Protection Methods p. 55
- 3.3 Perturbative Masking Methods p. 58
- 3.3.1 Additive Noise p. 58
- 3.3.2 Microaggregation p. 59
- 3.3.3 Data Wapping and Rank Swapping p. 61
- 3.3.4 Rounding p. 62
- 3.3.5 Resampling p. 62
- 3.3.6 PRAM p. 62
- 3.3.7 MASSC p. 63
- 3.4 Non-perturbative Masking Methods p. 63
- 3.4.1 Sampling p. 64
- 3.4.2 Global Recoding p. 64
- 3.4.3 Top and Bottom Coding p. 65
- 3.4.4 Local Suppression p. 65
- 3.5 Synthetic Microdata Generation p. 65
- 3.5.1 Synthetic Data by Multiple Imputation p. 65
- 3.5.2 Synthetic Data by Bootstrap p. 66
- 3.5.3 Synthetic Data by Latin Hypercube Sampling p. 66
- 3.5.4 Partially Synthetic Data by Cholesky Decomposition p. 67
- 3.5.5 Other Partially Synthetic and Hybrid Microdata Approaches p. 67
- 3.5.6 Pros and Cons of Synthetic Microdata p. 68
- 3.6 Trading off Information Loss and Disclosure Risk p. 69
- 3.6.1 Score Construction p. 69
- 3.6.2 R-U Maps p. 71
- 3.6.3 k-anonymity p. 71
- 3.7 Conclusions and Research Directions p. 72
- References p. 73
- 4 Measures of Anonymity p. 81 Suresh Venkatasubramanian
- 4.1 Introduction p. 81
- 4.1.1 What is Privacy? p. 82
- 4.1.2 Data Anonymization Methods p. 83
- 4.1.3 A Classification of Methods p. 84
- 4.2 Statistical Measures of Anonymity p. 85
- 4.2.1 Query Restriction p. 85
- 4.2.2 Anonymity via Variance p. 85
- 4.2.3 Anonymity via Multiplicity p. 86
- 4.3 Probabilistic Measures of Anonymity p. 87
- 4.3.1 Measures Based on Random Perturbation p. 87
- 4.3.2 Measures Based on Generalization p. 90
- 4.3.3 Utility vs Privacy p. 94
- 4.4 Computational Measures of Anonymity p. 94
- 4.4.1 Anonymity via Isolation p. 97
- 4.5 Conclusions and New Directions p. 97
- 4.5.1 New Directions p. 98
- References p. 99
- 5 k-Anonymous Data Mining: A Survey p. 105 V. Ciriani and S. De Capitani di Vimercati and S. Foresti and P. Samarati
- 5.1 Introduction p. 105
- 5.2 k-Anonymity p. 107
- 5.3 Algorithms for Enforcing k-Anonymity p. 110
- 5.4 k-Anonymity Threats from Data Mining p. 117
- 5.4.1 Association Rules p. 118
- 5.4.2 Classification Mining p. 118
- 5.5 k-Anonymity in Data Mining p. 120
- 5.6 Anonymize-and-Mine p. 123
- 5.7 Mine-and-Anonymize p. 126
- 5.7.1 Enforcing k-Anonymity on Association Rules p. 126
- 5.7.2 Enforcing k-Anonymity on Decision Trees p. 130
- 5.8 Conclusions p. 133
- Acknowledgments p. 133
- References p. 134
- 6 A Survey of Randomization Methods for Privacy-Preserving Data Mining p. 137 Charu C. Aggarwal and Philip S. Yu
- 6.1 Introduction p. 137
- 6.2 Reconstruction Methods for Randomization p. 139
- 6.2.1 The Bayes Reconstruction Method p. 139
- 6.2.2 The EM Reconstruction Method p. 141
- 6.2.3 Utility and Optimality of Randomization Models p. 143
- 6.3 Applications of Randomization p. 144
- 6.3.1 Privacy-Preserving Classification with Randomization p. 144
- 6.3.2 Privacy-Preserving OLAP p. 145
- 6.3.3 Collaborative Filtering p. 145
- 6.4 The Privacy-Information Loss Tradeoff p. 146
- 6.5 Vulnerabilities of the Randomization Method p. 149
- 6.6 Randomization of Time Series Data Streams p. 151
- 6.7 Multiplicative Noise for Randomization p. 152
- 6.7.1 Vulnerabilities of Multiplicative Randomization p. 153
- 6.7.2 Sketch Based Randomization p. 153
- 6.8 Conclusions and Summary p. 154
- References p. 154
- 7 A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining p. 157 Keke Chen and Ling Liu
- 7.1 Introduction p. 158
- 7.1.1 Data Privacy vs. Data Utility p. 159
- 7.1.2 Outline p. 160
- 7.2 Definition of Multiplicative Perturbation p. 161
- 7.2.1 Notations p. 161
- 7.2.2 Rotation Perturbation p. 161
- 7.2.3 Projection Perturbation p. 162
- 7.2.4 Sketch-based Approach p. 164
- 7.2.5 Geometric Perturbation p. 164
- 7.3 Transformation Invariant Data Mining Models p. 165
- 7.3.1 Definition of Transformation Invariant Models p. 166
- 7.3.2 Transformation-Invariant Classification Models p. 166
- 7.3.3 Transformation-Invariant Clustering Models p. 167
- 7.4 Privacy Evaluation for Multiplicative Perturbation p. 168
- 7.4.1 A Conceptual Multidimensional Privacy Evaluation Model p. 168
- 7.4.2 Variance of Difference as Column Privacy Metric p. 169
- 7.4.3 Incorporating Attack Evaluation p. 170
- 7.4.4 Other Metrics p. 171
- 7.5 Attack Resilient Multiplicative Perturbations p. 171
- 7.5.1 Naive Estimation to Rotation Perturbation p. 171
- 7.5.2 ICA-Based Attacks p. 173
- 7.5.3 Distance-Inference Attacks p. 174
- 7.5.4 Attacks with More Prior Knowledge p. 176
- 7.5.5 Finding Attack-Resilient Perturbations p. 177
- 7.6 Conclusion p. 177
- Acknowledgment p. 178
- References p. 179
- 8 A Survey of Quantification of Privacy Preserving Data Mining Algorithms p. 183 Elisa Bertino and Dan Lin and Wei Jiang
- 8.1 Introduction p. 184
- 8.2 Metrics for Quantifying Privacy Level p. 186
- 8.2.1 Data Privacy p. 186
- 8.2.2 Result Privacy p. 191
- 8.3 Metrics for Quantifying Hiding Failure p. 192
- 8.4 Metrics for Quantifying Data Quality p. 193
- 8.4.1 Quality of the Data Resulting from the PPDM Process p. 193
- 8.4.2 Quality of the Data Mining Results p. 198
- 8.5 Complexity Metrics p. 200
- 8.6 How to Select a Proper Metric p. 201
- 8.7 Conclusion and Research Directions p. 202
- References p. 202
- 9 A Survey of Utility-based Privacy-Preserving Data Transformation Methods p. 207 Ming Hua and Jian Pei
- 9.1 Introduction p. 208
- 9.1.1 What is Utility-based Privacy Preservation? p. 209
- 9.2 Types of Utility-based Privacy Preservation Methods p. 210
- 9.2.1 Privacy Models p. 210
- 9.2.2 Utility Measures p. 212
- 9.2.3 Summary of the Utility-Based Privacy Preserving Methods p. 214
- 9.3 Utility-Based Anonymization Using Local Recoding p. 214
- 9.3.1 Global Recoding and Local Recoding p. 215
- 9.3.2 Utility Measure p. 216
- 9.3.3 Anonymization Methods p. 217
- 9.3.4 Summary and Discussion p. 219
- 9.4 The Utility-based Privacy Preserving Methods in Classification Prob-lems p. 219
- 9.4.1 The Top-Down Specialization Method p. 220
- 9.4.2 The Progressive Disclosure Algorithm p. 224
- 9.4.3 Summary and Discussion p. 228
- 9.5 Anonymized Marginal: Injecting Utility into Anonymized Data Sets p. 228
- 9.5.1 Anonymized Marginal p. 229
- 9.5.2 Utility Measure p. 230
- 9.5.3 Injecting Utility Using Anonymized Marginals p. 231
- 9.5.4 Summary and Discussion p. 233
- 9.6 Summary p. 234
- Acknowledgments p. 234
- References p. 234
- 10 Mining Association Rules under Privacy Constraints p. 239 Jayant R. Haritsa
- 10.1 Introduction p. 239
- 10.2 Problem Framework p. 240
- 10.2.1 Database Model p. 240
- 10.2.2 Mining Objective p. 241
- 10.2.3 Privacy Mechanisms p. 241
- 10.2.4 Privacy Metric p. 243
- 10.2.5 Accuracy Metric p. 245
- 10.3 Evolution of the Literature p. 246
- 10.4 The FRAPP Framework p. 251
- 10.4.1 Reconstruction Model p. 252
- 10.4.2 Estimation Error p. 253
- 10.4.3 Randomizing the Perturbation Matrix p. 256
- 10.4.4 Efficient Perturbation p. 256
- 10.4.5 Integration with Association Rule Mining p. 258
- 10.5 Sample Results p. 259
- 10.6 Closing Remarks 263 Acknowledgments p. 263
- References p. 263
- 11 A Survey of Association Rule Hiding Methods for Privacy p. 267 Vassilios S. Verykios and Aris Gkoulalas-Divanis
- 11.1 Introduction p. 267
- 11.2 Terminology and Preliminaries p. 269
- 11.3 Taxonomy of Association Rule Hiding Algorithms p. 270
- 11.4 Classes of Association Rule Algorithms p. 271
- 11.4.1 Heuristic Approaches p. 272
- 11.4.2 Border-based Approaches p. 277
- 11.4.3 Exact Approaches p. 278
- 11.5 Other Hiding Approaches p. 279
- 11.6 Metrics and Performance Analysis p. 281
- 11.7 Discussion and Future Trends p. 284
- 11.8 Conclusions 285 References p. 286
- 12 A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries p. 291 Stephen E. Fienberg and Aleksandra B. Slavkovic
- 12.1 Introduction p. 291
- 12.2 The Statistical Approach Privacy Protection p. 292
- 12.3 Datamining Algorithms, Association Rules, and Disclosure Limitation p. 294
- 12.4 Estimation and Disclosure Limitation for Multi-way Contingency Tables p. 295
- 12.5 Two Illustrative Examples p. 301
- 12.5.1 Example 1: Data from a Randomized Clinical Trial p. 301
- 12.5.2 Example 2: Data from the 1993 U.S. Current Population Survey p. 305
- 12.6 Conclusions p. 308
- Acknowledgments p. 309
- References p. 309
- 13 A Survey of Privacy-Preserving Methods Across Horizontally Partitioned Data p. 313 Murat Kantarcioglu
- 13.1 Introduction p. 313
- 13.2 Basic Cryptographic Techniques for Privacy-Preserving Distributed Data Mining p. 315
- 13.3 Common Secure Sub-protocols Used in Privacy-Preserving Distributed Data Mining p. 318
- 13.4 Privacy-preserving Distributed Data Mining on Horizontally Partitioned Data p. 323
- 13.5 Comparison to Vertically Partitioned Data Model p. 326
- 13.6 Extension to Malicious Parties p. 327
- 13.7 Limitations of the Cryptographic Techniques Used in Privacy-Preserving Distributed Data Mining p. 329
- 13.8 Privacy Issues Related to Data Mining Results p. 330
- 13.9 Conclusion p. 332
- References p. 332
- 14 A Survey of Privacy-Preserving Methods Across Vertically Partitioned Data p. 337 Jaideep Vaidya
- 14.1 Introduction p. 337
- 14.2 Classification p. 341
- 14.2.1 Naïve Bayes Classification p. 342
- 14.2.2 Bayesian Network Structure Learning p. 343
- 14.2.3 Decision Tree Classification p. 344
- 14.3 Clustering p. 346
- 14.4 Association Rule Mining p. 347
- 14.5 Outlier detection p. 349
- 14.5.1 Algorithm p. 351
- 14.5.2 Security Analysis p. 352
- 14.5.3 Computation and Communication Analysis p. 354
- 14.6 Challenges and Research Directions p. 355
- References p. 356
- 15 A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods p. 359 Kun Liu and Chris Giannella and Hillol Kargupta
- 15.1 Introduction p. 360
- 15.2 Definitions and Notation p. 360
- 15.3 Attacking Additive Data Perturbation p. 361
- 15.3.1 Eigen-Analysis and PCA Preliminaries p. 362
- 15.3.2 Spectral Filtering p. 363
- 15.3.3 SVD Filtering p. 364
- 15.3.4 PCA Filtering p. 365
- 15.3.5 MAP Estimation Attack p. 366
- 15.3.6 Distribution Analysis Attack p. 367
- 15.3.7 Summary p. 367
- 15.4 Attacking Matrix Multiplicative Data Perturbation p. 369
- 15.4.1 Known I/O Attacks p. 370
- 15.4.2 Known Sample Attack p. 373
- 15.4.3 Other Attacks Based on ICA p. 374
- 15.4.4 Summary p. 375
- 15.5 Attacking k-Anonymization p. 376
- 15.6 Conclusion 376 Acknowledgments 377 References p. 377
- 16 Private Data Analysis via Output Perturbation p. 383 Kobbi Nissim
- 16.1 Introduction p. 383
- 16.2 The Abstract Model - Statistical Databases, Queries, and Sanitizers p. 385
- 16.3 Privacy p. 388
- 16.3.1 Interpreting the Privacy Definition p. 390
- 16.4 The Basic Technique: Calibrating Noise to Sensitivity p. 394
- 16.4.1 Applications: Functions with Low Global Sensitivity p. 396
- 16.5 Constructing Sanitizers for Complex Functionalities p. 400
- 16.5.1 k-Means Clustering p. 401
- 16.5.2 SVD and PCA p. 403
- 16.5.3 Learning in the Statistical Queries Model p. 404
- 16.6 Beyond the Basics p. 405
- 16.6.1 Instance Based Noise and Smooth Sensitivity p. 406
- 16.6.2 The Sample-Aggregate Framework p. 408
- 16.6.3 A General Sanitization Mechanism p. 409
- 16.7 Related Work and Bibliographic Notes p. 409
- Acknowledgments p. 411
- References p. 411
- 17 A Survey of Query Auditing Techniques for Data Privacy p. 415 Shubha U. Nabar and Krishnaram Kenthapadi and Nina Mishra and Rajeev Motwani
- 17.1 Introduction p. 415
- 17.2 Auditing Aggregate Queries p. 416
- 17.2.1 Offline Auditing p. 417
- 17.2.2 Online Auditing p. 418
- 17.3 Auditing Select-Project-Join Queries p. 426
- 17.4 Challenges in Auditing p. 427
- 17.5 Reading p. 429
- References p. 430
- 18 Privacy and the Dimensionality Curse p. 433 Charu C. Aggarwal
- 18.1 Introduction p. 433
- 18.2 The Dimensionality Curse and the k-anonymity Method p. 435
- 18.3 The Dimensionality Curse and Condensation p. 441
- 18.4 The Dimensionality Curse and the Randomization Method p. 446
- 18.4.1 Effects of Public Information p. 446
- 18.4.2 Effects of High Dimensionality p. 450
- 18.4.3 Gaussian Perturbing Distribution p. 450
- 18.4.4 Uniform Perturbing Distribution p. 455
- 18.5 The Dimensionality Curse and l-diversity p. 458
- 18.6 Conclusions and Research Directions p. 459
- References p. 460
- 19 Personalized Privacy Preservation p. 461 Yufei Tao and Xiaokui Xiao
- 19.1 Introduction p. 461
- 19.2 Formalization of Personalized Anonymity p. 463
- 19.2.1 Personal Privacy Requirements p. 464
- 19.2.2 Generalization p. 465
- 19.3 Combinatorial Process of Privacy Attack p. 467
- 19.3.1 Primary Case p. 468
- 19.3.2 Non-primary Case p. 469
- 19.4 Theoretical Foundation p. 470
- 19.4.1 Notations and Basic Properties p. 471
- 19.4.2 Derivation of the Breach Probability p. 472
- 19.5 Generalization Algorithm p. 473
- 19.5.1 The Greedy Framework p. 474
- 19.5.2 Optimal SA-generalization p. 476
- 19.6 Alternative Forms of Personalized Privacy Preservation p. 478
- 19.6.1 Extension of k-anonymity p. 479
- 19.6.2 Personalization in Location Privacy Protection p. 480
- 19.7 Summary and Future Work p. 482
- References p. 485
- 20 Privacy-Preserving Data Stream Classification p. 487 Yabo Xu and Ke Wang and Ada Wai-Chee Fu and Rong She and Jian Pei
- 20.1 Introduction p. 487
- 20.1.1 Motivating Example p. 488
- 20.1.2 Contributions and Paper Outline p. 490
- 20.2 Related Works p. 491
- 20.3 Problem Statement p. 493
- 20.3.1 Secure Join Stream Classification p. 493
- 20.3.2 Naive Bayesian Classifiers p. 494
- 20.4 Our Approach p. 495
- 20.4.1 Initialization p. 495
- 20.4.2 Bottom-Up Propagation p. 496
- 20.4.3 Top-Down Propagation p. 497
- 20.4.4 Using NBC p. 499
- 20.4.5 Algorithm Analysis p. 500
- 20.5 Empirical Studies p. 501
- 20.5.1 Real-life Datasets p. 502
- 20.5.2 Synthetic Datasets p. 504
- 20.5.3 Discussion p. 506
- 20.6 Conclusions p. 507
- References p. 508
- Index p. 511
Subjects
- Subject Headings A:
Other details
- Description: xxii, 513 pages : illustrations ; 25 cm.
- Series: Advances in database systems ; v. 34
- Published: New York : Springer, c2008.
- Language: English
- Notes: Includes bibliographical references and index.
- ISBN: 0387709916
- OCLC Number: 209333706
- Other Identifiers: LCCN: 2007943463
9780387709918
Related items
- Series: Advances in database systems 34.