Data virtualization for business intelligence architectures : revolutionizing data integration for data warehouses

cover image

Where to find it

Information & Library Science Library

Call Number
QA76.9.D37 L36 2012
Status
Available

Authors, etc.

Names:

Summary

Data virtualization can help you accomplish your goals with more flexibility and agility. Learn what it is and how and why it should be used with Data Virtualization for Business Intelligence Systems . In this book, expert author Rick van der Lans explains how data virtualization servers work, what techniques to use to optimize access to various data sources and how these products can be applied in different projects. You'll learn the difference is between this new form of data integration and older forms, such as ETL and replication, and gain a clear understanding of how data virtualization really works. Data Virtualization for Business Intelligence Systems outlines the advantages and disadvantages of data virtualization and illustrates how data virtualization should be applied in data warehouse environments. You'll come away with a comprehensive understanding of how data virtualization will make data warehouse environments more flexible and how it make developing operational BI applications easier. Van der Lans also describes the relationship between data virtualization and related topics, such as master data management, governance, and information management, so you come away with a big-picture understanding as well as all the practical know-how you need to virtualize your data.

Contents

  • Foreword p. xiii
  • Preface p. xv
  • About the Author p. xix
  • Chapter 1 Introduction to Data Virtualization p. 1
  • 1.1 Introduction p. 1
  • 1.2 The World of Business Intelligence Is Changing p. 1
  • 1.3 Introduction to Virtualization p. 3
  • 1.4 What Is Data Virtualization? p. 4
  • 1.5 Data Virtualization and Related Concepts p. 5
  • 1.5.1 Data Virtualization versus Encapsulation and Information Hiding p. 5
  • 1.5.2 Data Virtualization versus Abstraction p. 6
  • 1.5.3 Data Virtualization versus Data Federation p. 7
  • 1.5.4 Data Virtualization versus Data Integration p. 8
  • 1.5.5 Data Virtualization versus Enterprise Information Integration p. 9
  • 1.6 Definition of Data Virtualization p. 9
  • 1.7 Technical Advantages of Data Virtualization p. 10
  • 1.8 Different Implementations of Data Virtualization p. 14
  • 1.9 Overview of Data Virtualization Servers p. 14
  • 1.10 Open versus Closed Data Virtualization Servers p. 15
  • 1.11 Other Forms of Data Integration p. 16
  • 1.12 The Modules of a Data Virtualization Server p. 18
  • 1.13 The History of Data Virtualization p. 19
  • 1.14 The Sample Database: World Class Movies p. 22
  • 1.15 Structure of This Book p. 25
  • Chapter 2 Business Intelligence and Data Warehousing p. 27
  • 2.1 Introduction p. 27
  • 2.2 What Is Business Intelligence? p. 27
  • 2.3 Management Levels and Decision Making p. 28
  • 2.4 Business Intelligence Systems p. 29
  • 2.5 The Data Stores of a Business Intelligence System p. 30
  • 2.5.1 The Data Warehouse p. 30
  • 2.5.2 The Data Marts p. 34
  • 2.5.3 The Data Staging Area p. 35
  • 2.5.4 The Operational Data Store p. 37
  • 2.5.5 The Personal Data Stores p. 38
  • 2.5.6 A Comparison of the Different Types of Data Stores p. 38
  • 2.6 Normalized Schemas, Star Schemas, and Snowflake Schemas p. 39
  • 2.6.1 Normalized Schemas p. 40
  • 2.6.2 Denormalized Schemas p. 40
  • 2.6.3 Star Schemas p. 41
  • 2.6.4 Snowflake Schemas p. 43
  • 2.7 Data Transformation with Extract Transform Load, Extract Load Transform, and Replication p. 44
  • 2.7.1 Extract Transform Load p. 44
  • 2.7.2 Extract Load Transform p. 45
  • 2.7.3 Replication p. 46
  • 2.8 Overview of Business Intelligence Architectures p. 47
  • 2.9 New Forms of Reporting and Analytics p. 48
  • 2.9.1 Operational Reporting and Analytics p. 48
  • 2.9.2 Deep and Big Data Analytics p. 49
  • 2.9.3 Self-Service Reporting and Analytics p. 49
  • 2.9.4 Unrestricted Ad-Hoc Analysis p. 50
  • 2.9.5 360-Degree Reporting p. 51
  • 2.9.6 Exploratory Analysis p. 51
  • 2.9.7 Text-Based Analysis p. 52
  • 2.10 Disadvantages of Classic Business Intelligence Systems p. 53
  • 2.11 Summary p. 56
  • Chapter 3 Data Virtualization Server: The Building Blocks p. 59
  • 3.1 Introduction p. 59
  • 3.2 The High-Level Architecture of a Data Virtualization Server p. 59
  • 3.3 Importing Source Tables and Defining Wrappers p. 60
  • 3.4 Defining Virtual Tables and Mappings p. 62
  • 3.5 Examples of Virtual Tables and Mappings p. 66
  • 3.6 Virtual Tables and Data Modeling p. 76
  • 3.7 Nesting Virtual Tables and Shared Specifications p. 77
  • 3.8 Importing Nonrelational Data p. 79
  • 3.8.1 XML and JSON Documents p. 79
  • 3.8.2 Web Services p. 84
  • 3.8.3 Spreadsheets p. 86
  • 3.8.4 NoSQL Databases p. 86
  • 3.8.5 Multidimensional Cubes and MDX p. 89
  • 3.8.6 Semistructured Data p. 92
  • 3.8.7 Unstructured Data p. 95
  • 3.9 Publishing Virtual Tables p. 96
  • 3.10 The Internal Data Model p. 101
  • 3.11 Updatable Virtual Tables and Transaction Management p. 106
  • Chapter 4 Data Virtualization Server: Management and Security p. 109
  • 4.1 Introduction p. 109
  • 4.2 Impact and Lineage Analysis p. 109
  • 4.3 Synchronization of Source Tables, Wrapper Tables, and Virtual Tables p. 110
  • 4.4 Security of Data: Authentication and Authorization p. 112
  • 4.5 Monitoring, Management, and Administration p. 114
  • Chapter 5 Data Virtualization Server: Caching of Virtual Tables p. 119
  • 5.1 Introduction p. 119
  • 5.2 The Cache of a Virtual Table p. 119
  • 5.3 When to Use Caching p. 120
  • 5.4 Caches versus Data Marts p. 122
  • 5.5 Where Is the Cache Kept? p. 122
  • 5.6 Refreshing Caches p. 123
  • 5.7 Full Refreshing, Incremental Refreshing, and Live Refreshing p. 124
  • 5.8 Online Refreshing and Offline Refreshing p. 125
  • 5.9 Cache Replication p. 126
  • Chapter 6 Data Virtualization Server: Query Optimization Techniques p. 127
  • 6.1 Introduction p. 127
  • 6.2 A Refresher Course on Query Optimization p. 128
  • 6.3 The Ten Stages of Query Processing by a Data Virtualization Server p. 132
  • 6.4 The Intelligence Level of the Data Stores p. 134
  • 6.5 Optimization through Query Substitution p. 134
  • 6.6 Optimization through Pushdown p. 137
  • 6.7 Optimization through Query Expansion (Query Injection) p. 139
  • 6.8 Optimization through Ship Joins p. 140
  • 6.9 Optimization through Sort-Merge Joins p. 141
  • 6.10 Optimization by Caching p. 142
  • 6.11 Optimization and Statistical Data p. 142
  • 6.12 Optimization through Hints p. 143
  • 6.13 Optimization through SQL Override p. 143
  • 6.14 Explaining the Processing Strategy p. 145
  • Chapter 7 Deploying Data Virtualization in Business Intelligence Systems p. 147
  • 7.1 Introduction p. 147
  • 7.2 A Business Intelligence System Based on Data Virtualization p. 147
  • 7.3 Advantages of Deploying Data Virtualization p. 148
  • 7.4 Disadvantages of Deploying Data Virtualization p. 151
  • 7.5 Strategies for Adopting Data Virtualization p. 151
  • 7.5.1 Strategy 1: Introducing Data Virtualization in an Existing Business Intelligence System p. 152
  • 7.5.2 Strategy 2: Developing a New Business Intelligence System with Data Virtualization p. 157
  • 7.5.3 Strategy 3: Developing a New Business Intelligence System Combining Source and Transformed Data p. 161
  • 7.6 Application Areas of Data Virtualization p. 163
  • 7.6.1 Unified Data Access p. 163
  • 7.6.2 Virtual Data Mart p. 163
  • 7.6.3 Virtual Data Warehouse-Based on Data Marts p. 165
  • 7.6.4 Virtual Data Warehouse-Based on Production Databases p. 165
  • 7.6.5 Extended Data Warehouse p. 167
  • 7.6.6 Operational Reporting and Analytics p. 167
  • 7.6.7 Operational Data Warehouse p. 168
  • 7.6.8 Virtual Corporate Data Warehouse p. 169
  • 7.6.9 Self-Service Reporting and Analytics p. 170
  • 7.6.10 Virtual Sandbox p. 171
  • 7.6.11 Prototyping p. 171
  • 7.6.12 Analyzing Semistructured and Unstructured Data p. 172
  • 7.6.13 Disposable Reports p. 173
  • 7.6.14 Extending Business Intelligence Systems with External Users p. 173
  • 7.7 Myths on Data Virtualization p. 174
  • Chapter 8 Design Guidelines for Data Virtualization p. 177
  • 8.1 Introduction p. 177
  • 8.2 Incorrect Data and Data Quality p. 177
  • 8.2.1 Different Forms of Incorrect Data p. 178
  • 8.2.2 Integrity Rules and Incorrect Data p. 179
  • 8.2.3 Filtering, Flagging, and Restoring Incorrect Data p. 179
  • 8.2.4 Examples of Filtering Incorrect Data p. 180
  • 8.2.5 Examples of Flagging Incorrect Data p. 184
  • 8.2.6 Examples of Restoring Misspelled Data p. 186
  • 8.3 Complex and Irregular Data Structures p. 188
  • 8.3.1 Codes without Names p. 188
  • 8.3.2 Inconsistent Key Values p. 190
  • 8.3.3 Repeating Groups p. 192
  • 8.3.4 Recursive Data Structures p. 192
  • 8.4 Implementing Transformations in Wrappers or Mappings p. 197
  • 8.5 Analyzing Incorrect Data p. 197
  • 8.6 Different Users and Different Definitions p. 198
  • 8.7 Time Inconsistency of Data p. 199
  • 8.8 Data Stores and Data Transmission p. 200
  • 8.9 Retrieving Data from Production Systems p. 202
  • 8.10 Joining Historical and Operational Data p. 203
  • 8.11 Dealing with Organizational Changes p. 204
  • 8.12 Archiving Data p. 205
  • Chapter 9 Data Virtualization and Service-Oriented Architecture p. 207
  • 9.1 Introduction p. 207
  • 9.2 Service-Oriented Architectures in a Nutshell p. 207
  • 9.3 Basic Services, Composite Services, Business Process Services, and Data Services p. 209
  • 9.4 Developing Data Services with a Data Virtualization Server p. 211
  • 9.5 Developing Composite Services with a Data Virtualization Server p. 213
  • 9.6 Services and the Internal Data Model p. 215
  • Chapter 10 Data Virtualization and Master Data Management p. 217
  • 10.1 Introduction p. 217
  • 10.2 Data Is a Critical Asset for Every Organization p. 217
  • 10.3 The Need for a 360-Degree View of Business Objects p. 219
  • 10.4 What Is Master Data? p. 219
  • 10.5 What Is Master Data Management? p. 221
  • 10.6 A Master Data Management System p. 222
  • 10.7 Master Data Management for Integrating Data p. 224
  • 10.8 Integrating Master Data Management and Data Virtualization p. 224
  • Chapter 11 Data Virtualization, Information Management, and Data Governance p. 231
  • 11.1 Introduction p. 231
  • 11.2 Impact of Data Virtualization on Information Modeling and Database Design p. 231
  • 11.3 Impact of Data Virtualization on Data Profiling p. 234
  • 11.4 Impact of Data Virtualization on Data Cleansing p. 239
  • 11.5 Impact of Data Virtualization on Data Governance p. 239
  • Chapter 12 The Data Delivery Platform-A New Architecture for Business Intelligence Systems p. 243
  • 12.1 Introduction p. 243
  • 12.2 The Data Delivery Platform in a Nutshell p. 243
  • 12.3 The Definition of the Data Delivery Platform p. 244
  • 12.4 The Data Delivery Platform and Other Business Intelligence Architectures p. 245
  • 12.5 The Requirements of the Data Delivery Platform p. 247
  • 12.6 The Data Delivery Platform versus Data Virtualization p. 249
  • 12.7 Explanation of the Name p. 250
  • 12.8 A Personal Note p. 251
  • Chapter 13 The Future of Data Virtualization p. 253
  • 13.1 Introduction p. 253
  • 13.2 The Future of Data Virtualization According to Rick F. van der Lans p. 254
  • 13.2.1 New and Enhanced Query Optimization Techniques p. 254
  • 13.2.2 Exploiting New Hardware Technology p. 255
  • 13.2.3 Extending the Design Module p. 256
  • 13.2.4 Data Quality Features p. 258
  • 13.2.5 Support for the Push-Model for Data Access p. 258
  • 13.2.6 Blending of Data Virtualization, Extract Transform Load, Extract Load Transform, and Replication p. 259
  • 13.3 The Future of Data Virtualization According to David Besemer, CTO of Composite Software p. 260
  • 13.3.1 The Empowered Consumer Gains Ubiquitous Data Access p. 261
  • 13.3.2 IT's Back Office Becomes the Cloud p. 261
  • 13.3.3 Data Virtualization of the Future Is a Global Data Fabric p. 261
  • 13.3.4 Conclusion p. 262
  • 13.4 The Future of Data Virtualization According to Alberto Pan, CTO of Denodo Technologies p. 262
  • 13.5 The Future of Data Virtualization According to James Markarian, CTO of Informatica Corporation p. 264
  • 13.5.1 How to Maximize Return on Data with Data Virtualization p. 265
  • 13.5.2 Beyond Looking Under the Hood p. 266
  • Bibliography p. 267
  • Index p. 269

Other details