»
Measuring Data Quality for Ongoing Improvement
 
 

Measuring Data Quality for Ongoing Improvement, 1st Edition

A Data Quality Assessment Framework

 
Measuring Data Quality for Ongoing Improvement, 1st Edition,Laura Sebastian-Coleman,ISBN9780123970336
 
 
 

  

Morgan Kaufmann

9780123970336

9780123977540

376

235 X 191

A ready-to-use framework for data quality measurement!

Print Book + eBook

USD 59.94
USD 99.90

Buy both together and save 40%

Print Book

Paperback

In Stock

Estimated Delivery Time
USD 49.95

eBook
eBook Overview

VST (VitalSource Bookshelf) format

DRM-free included formats : EPUB, Mobi (for Kindle), PDF

USD 49.95
Add to Cart
 
 

Key Features

  • Demonstrates how to leverage a technology independent data quality measurement framework for your specific business priorities and data quality challenges
  • Enables discussions between business and IT with a non-technical vocabulary for data quality measurement
  • Describes how to measure data quality on an ongoing basis with generic measurement types that can be applied to any situation

Description

The Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time. You’ll start with general concepts of measurement and work your way through a detailed framework of more than three dozen measurement types related to five objective dimensions of quality: completeness, timeliness, consistency, validity, and integrity. Ongoing measurement, rather than one time activities will help your organization reach a new level of data quality. This plain-language approach to measuring data can be understood by both business and IT and provides practical guidance on how to apply the DQAF within any organization enabling you to prioritize measurements and effectively report on results. Strategies for using data measurement to govern and improve the quality of data and guidelines for applying the framework within a data asset are included. You’ll come away able to prioritize which measurement types to implement, knowing where to place them in a data flow and how frequently to measure. Common conceptual models for defining and storing of data quality results for purposes of trend analysis are also included as well as generic business requirements for ongoing measuring and monitoring including calculations and comparisons that make the measurements meaningful and help understand trends and detect anomalies.

Readership

Data quality engineers, managers and analysts, application program managers and developers, data stewards, data managers and analysts, compliance analysts, Business intelligence professionals, Database designers and administrators, Business and IT managers

Laura Sebastian-Coleman

Laura Sebastian-Coleman, a data quality architect at Optum Insight, has worked on data quality in large health care data warehouses since 2003. Optum Insight specializes in improving the performance of the health system by providing analytics, technology and consulting services. Laura has implemented data quality metrics and reporting, launched and facilitated Optum Insight’s Data Quality Community, contributed to data consumer training programs, and has led efforts to establish data standards and manage metadata. In 2009, she led a group of analysts from Optum and UnitedHealth Group in developing the original Data Quality Assessment Framework (DQAF) which is the basis for Measuring Data Quality for Ongoing Improvement. An active professional, Laura has delivered papers at MIT’s Information Quality Conferences and at conferences sponsored by the International Association for Information and Data Quality (IAIDQ) and the Data Governance Organization (DGO). From 2009-2010, she served as IAIDQ’s Director of Member Services. Before joining Optum Insight, she spent eight years in internal communications and information technology roles in the commercial insurance industry. She holds the IQCP (Information Quality Certified Professional) designation from IAIDQ, a Certificate in Information Quality from MIT, a B.A. in English and History from Franklin & Marshall College, and Ph.D. in English Literature from the University of Rochester (NY).

Affiliations and Expertise

Laura Sebastian-Coleman, a data quality architect at Optum Insight.

Measuring Data Quality for Ongoing Improvement, 1st Edition

Dedication

Acknowledgments

Foreword

Author Biography

Introduction: Measuring Data Quality for Ongoing Improvement

Data Quality Measurement: the Problem we are Trying to Solve

Recurring Challenges in the Context of Data Quality

DQAF: the Data Quality Assessment Framework

Overview of Measuring Data Quality for Ongoing Improvement

Intended Audience

What Measuring Data Quality for Ongoing Improvement Does Not Do

Why I Wrote Measuring Data Quality for Ongoing Improvement

Section 1. Concepts and Definitions

Chapter 1. Data

Purpose

Data

Data as Representation

Data as Facts

Data as a Product

Data as Input to Analyses

Data and Expectations

Information

Concluding Thoughts

Chapter 2. Data, People, and Systems

Purpose

Enterprise or Organization

IT and the Business

Data Producers

Data Consumers

Data Brokers

Data Stewards and Data Stewardship

Data Owners

Data Ownership and Data Governance

IT, the Business, and Data Owners, Redux

Data Quality Program Team

Stakeholder

Systems and System Design

Concluding Thoughts

Chapter 3. Data Management, Models, and Metadata

Purpose

Data Management

Database, Data Warehouse, Data Asset, Dataset

Source System, Target System, System of Record

Data Models

Types of Data Models

Physical Characteristics of Data

Metadata

Metadata as Explicit Knowledge

Data Chain and Information Life Cycle

Data Lineage and Data Provenance

Concluding Thoughts

Chapter 4. Data Quality and Measurement

Purpose

Data Quality

Data Quality Dimensions

Measurement

Measurement as Data

Data Quality Measurement and the Business/IT Divide

Characteristics of Effective Measurements

Data Quality Assessment

Data Quality Dimensions, DQAF Measurement Types, Specific Data Quality Metrics

Data Profiling

Data Quality Issues and Data Issue Management

Reasonability Checks

Data Quality Thresholds

Process Controls

In-line Data Quality Measurement and Monitoring

Concluding Thoughts

Section 2. DQAF Concepts and Measurement Types

Chapter 5. DQAF Concepts

Purpose

The Problem the DQAF Addresses

Data Quality Expectations and Data Management

The Scope of the DQAF

DQAF Quality Dimensions

Defining DQAF Measurement Types

Metadata Requirements

Objects of Measurement and Assessment Categories

Functions in Measurement: Collect, Calculate, Compare

Concluding Thoughts

Chapter 6. DQAF Measurement Types

Purpose

Consistency of the Data Model

Ensuring the Correct Receipt of Data for Processing

Inspecting the Condition of Data upon Receipt

Assessing the Results of Data Processing

Assessing the Validity of Data Content

Assessing the Consistency of Data Content

Comments on the Placement of In-line Measurements

Periodic Measurement of Cross-table Content Integrity

Assessing Overall Database Content

Assessing Controls and Measurements

The Measurement Types: Consolidated Listing

Concluding Thoughts

Section 3. Data Assessment Scenarios

Purpose

Assessment Scenarios

Metadata: Knowledge before Assessment

Chapter 7. Initial Data Assessment

Purpose

Initial Assessment

Input to Initial Assessments

Data Expectations

Data Profiling

Column Property Profiling

Structure Profiling

Profiling an Existing Data Asset

From Profiling to Assessment

Deliverables from Initial Assessment

Concluding Thoughts

Chapter 8. Assessment in Data Quality Improvement Projects

Purpose

Data Quality Improvement Efforts

Measurement in Improvement Projects

Chapter 9. Ongoing Measurement

Purpose

The Case for Ongoing Measurement

Example: Health Care Data

Inputs for Ongoing Measurement

Criticality and Risk

Automation

Controls

Periodic Measurement

Deliverables from Ongoing Measurement

In-Line versus Periodic Measurement

Concluding Thoughts

Section 4. Applying the DQAF to Data Requirements

Context

Chapter 10. Requirements, Risk, Criticality

Purpose

Business Requirements

Data Quality Requirements and Expected Data Characteristics

Data Quality Requirements and Risks to Data

Factors Influencing Data Criticality

Specifying Data Quality Metrics

Concluding Thoughts

Chapter 11. Asking Questions

Purpose

Asking Questions

Understanding the Project

Learning about Source Systems

Your Data Consumers’ Requirements

The Condition of the Data

The Data Model, Transformation Rules, and System Design

Measurement Specification Process

Concluding Thoughts

Section 5. A Strategic Approach to Data Quality

Chapter 12. Data Quality Strategy

Purpose

The Concept of Strategy

Systems Strategy, Data Strategy, and Data Quality Strategy

Data Quality Strategy and Data Governance

Decision Points in the Information Life Cycle

General Considerations for Data Quality Strategy

Concluding Thoughts

Chapter 13. Directives for Data Quality Strategy

Purpose

Directive 1: Obtain Management Commitment to Data Quality

Directive 2: Treat Data as an Asset

Directive 3: Apply Resources to Focus on Quality

Directive 4: Build Explicit Knowledge of Data

Directive 5: Treat Data as a Product of Processes that can be Measured and Improved

Directive 6: Recognize Quality is Defined by Data Consumers

Directive 7: Address the Root Causes of Data Problems

Directive 8: Measure Data Quality, Monitor Critical Data

Directive 9: Hold Data Producers Accountable for the Quality of their Data (and Knowledge about that Data)

Directive 10: Provide Data Consumers with the Knowledge they Require for Data Use

Directive 11: Data Needs and Uses will Evolve—Plan for Evolution

Directive 12: Data Quality Goes beyond the Data—Build a Culture Focused on Quality

Concluding Thoughts: Using the Current State Assessment

Section 6. The DQAF in Depth

Functions for Measurement: Collect, Calculate, Compare

Features of the DQAF Measurement Logical Data Model

Facets of the DQAF Measurement Types

Chapter 14. Functions of Measurement: Collection, Calculation, Comparison

Purpose

Functions in Measurement: Collect, Calculate, Compare

Collecting Raw Measurement Data

Calculating Measurement Data

Comparing Measurements to Past History

Statistics

The Control Chart: A Primary Tool for Statistical Process Control

The DQAF and Statistical Process Control

Concluding Thoughts

Chapter 15. Features of the DQAF Measurement Logical Model

Purpose

Metric Definition and Measurement Result Tables

Optional Fields

Denominator Fields

Automated Thresholds

Manual Thresholds

Emergency Thresholds

Manual or Emergency Thresholds and Results Tables

Additional System Requirements

Support Requirements

Concluding Thoughts

Chapter 16. Facets of the DQAF Measurement Types

Purpose

Facets of the DQAF

Organization of the Chapter

Measurement Type #1: Dataset Completeness—Sufficiency of Metadata and Reference Data

Measurement Type #2: Consistent Formatting in One Field

Measurement Type #3: Consistent Formatting, Cross-table

Measurement Type #4: Consistent Use of Default Value in One Field

Measurement Type #5: Consistent Use of Default Values, Cross-table

Measurement Type #6: Timely Delivery of Data for Processing

Measurement Type #7: Dataset Completeness—Availability for Processing

Measurement Type #8: Dataset Completeness—Record Counts to Control Records

Measurement Type #9: Dataset Completeness—Summarized Amount Field Data

Measurement Type #10: Dataset Completeness—Size Compared to Past Sizes

Measurement Type #11: Record Completeness—Length

Measurement Type #12: Field Completeness—Non-Nullable Fields

Measurement Type #13: Dataset Integrity—De-Duplication

Measurement Type #14: Dataset Integrity—Duplicate Record Reasonability Check

Measurement Type #15: Field Content Completeness—Defaults from Source

Measurement Type #16: Dataset Completeness Based on Date Criteria

Measurement Type #17: Dataset Reasonability Based on Date Criteria

Measurement Type #18: Field Content Completeness—Received Data is Missing Fields Critical to Processing

Measurement Type #19: Dataset Completeness—Balance Record Counts Through a Process

Measurement Type #20: Dataset Completeness—Reasons for Rejecting Records

Measurement Type #21: Dataset Completeness Through a Process—Ratio of Input to Output

Measurement Type #22: Dataset Completeness Through a Process—Balance Amount Fields

Measurement Type #23: Field Content Completeness—Ratio of Summed Amount Fields

Measurement Type #24: Field Content Completeness—Defaults from Derivation

Measurement Type #25: Data Processing Duration

Measurement Type #26: Timely Availability of Data for Access

Measurement Type #27: Validity Check, Single Field, Detailed Results

Measurement Type #28: Validity Check, Roll-up

Measurement Logical Data Model

Measurement Type #29: Validity Check, Multiple Columns within a Table, Detailed Results

Measurement Type #30: Consistent Column Profile

Measurement Type #31: Consistent Dataset Content, Distinct Count of Represented Entity, with Ratios to Record Counts

Measurement Type #32 Consistent Dataset Content, Ratio of Distinct Counts of Two Represented Entities

Measurement Type #33: Consistent Multicolumn Profile

Measurement Type #34: Chronology Consistent with Business Rules within a Table

Measurement Type #35: Consistent Time Elapsed (hours, days, months, etc.)

Measurement Type #36: Consistent Amount Field Calculations Across Secondary Fields

Measurement Type #37: Consistent Record Counts by Aggregated Date

Measurement Type #38: Consistent Amount Field Data by Aggregated Date

Measurement Type #39: Parent/Child Referential Integrity

Measurement Type #40: Child/Parent Referential Integrity

Measurement Type #41: Validity Check, Cross Table, Detailed Results

Measurement Type #42: Consistent Cross-table Multicolumn Profile

Measurement Type #43: Chronology Consistent with Business Rules Across-tables

Measurement Type #44: Consistent Cross-table Amount Column Calculations

Measurement Type #45: Consistent Cross-Table Amount Columns by Aggregated Dates

Measurement Type #46: Consistency Compared to External Benchmarks

Measurement Type #47: Dataset Completeness—Overall Sufficiency for Defined Purposes

Measurement Type #48: Dataset Completeness—Overall Sufficiency of Measures and Controls

Concluding Thoughts: Know Your Data

Glossary

Bibliography

Index

Online Materials

Appendix A. Measuring the Value of Data

Appendix B. Data Quality Dimensions

Purpose

Richard Wang’s and Diane Strong’s Data Quality Framework, 1996

Thomas Redman’s Dimensions of Data Quality, 1996

Larry English’s Information Quality Characteristics and Measures, 1999

Appendix C. Completeness, Consistency, and Integrity of the Data Model

Purpose

Process Input and Output

High-Level Assessment

Detailed Assessment

Quality of Definitions

Summary

Appendix D. Prediction, Error, and Shewhart’s Lost Disciple, Kristo Ivanov

Purpose

Limitations of the Communications Model of Information Quality

Error, Prediction, and Scientific Measurement

What Do We Learn from Ivanov?

Ivanov’s Concept of the System as Model

Appendix E. Quality Improvement and Data Quality

Purpose

A Brief History of Quality Improvement

Process Improvement Tools

Implications for Data Quality

Limitations of the Data as Product Metaphor

Concluding Thoughts: Building Quality in Means Building Knowledge in

Quotes and reviews

"This book provides a very well-structured introduction to the fundamental issue of data quality, making it a very useful tool for managers, practitioners, analysts, software developers, and systems engineers. It also helps explain what data quality management entails and provides practical approaches aimed at actual implementation. I positively recommend reading it…" --ComputingReviews.com, January 2014

"The framework she describes is a set of 48 generic measurement types based on five dimensions of data quality: completeness, timeliness, validity, consistency, and integrity. The material is for people who are charged with improving, monitoring, or ensuring data quality." --Reference and Research Book News, August 2013

"If you are intent on improving the quality of the data at your organization you would do well to read Measuring Data Quality for Ongoing Improvement and adopt the DQAF offered up in this fine book." --Data and Technology Today blog, July 2013

 
 
Free Shipping
Shop with Confidence

Free Shipping around the world
▪ Broad range of products
▪ 30 days return policy
FAQ

Contact Us