Principles of Big Data

Principles of Big Data, 1st Edition

Preparing, Sharing, and Analyzing Complex Information

Principles of Big Data, 1st Edition,Jules Berman,ISBN9780124045767


Morgan Kaufmann




235 X 191

Learn simple, but powerful methods that permit data to be shared and integrated among different big Data resources

Print Book + eBook

USD 77.34
USD 128.90

Buy both together and save 40%

Print Book


In Stock

Estimated Delivery Time
USD 64.95

eBook Overview

VST (VitalSource Bookshelf) format

DRM-free included formats : EPUB, Mobi (for Kindle), PDF

USD 63.95
Add to Cart

Key Features

  • Learn general methods for specifying Big Data in a way that is understandable to humans and to computers
  • Avoid the pitfalls in Big Data design and analysis
  • Understand how to create and use Big Data safely and responsibly with a set of laws, regulations and ethical standards that apply to the acquisition, distribution and integration of Big Data resources


Principles of Big Data helps readers avoid the common mistakes that endanger all Big Data projects. By stressing simple, fundamental concepts, this book teaches readers how to organize large volumes of complex data, and how to achieve data permanence when the content of the data is constantly changing. General methods for data verification and validation, as specifically applied to Big Data resources, are stressed throughout the book. The book demonstrates how adept analysts can find relationships among data objects held in disparate Big Data resources, when the data objects are endowed with semantic support (i.e., organized in classes of uniquely identified data objects). Readers will learn how their data can be integrated with data from other resources, and how the data extracted from Big Data resources can be used for purposes beyond those imagined by the data creators.


Data managers, data analysts, statisticians

Jules Berman

Jules Berman holds two bachelor of science degrees from MIT (Mathematics, and Earth and Planetary Sciences), a PhD from Temple University, and an MD, from the University of Miami. He was a graduate researcher in the Fels Cancer Research Institute, at Temple University, and at the American Health Foundation in Valhalla, New York. His post-doctoral studies were completed at the U.S. National Institutes of Health, and his residency was completed at the George Washington University Medical Center in Washington, D.C. Dr. Berman served as Chief of Anatomic Pathology, Surgical Pathology and Cytopathology at the Veterans Administration Medical Center in Baltimore, Maryland, where he held joint appointments at the University of Maryland Medical Center and at the Johns Hopkins Medical Institutions. In 1998, he transferred to the U.S. National Institutes of Health, as a Medical Officer, and as the Program Director for Pathology Informatics in the Cancer Diagnosis Program at the National Cancer Institute. Dr. Berman is a past President of the Association for Pathology Informatics, and the 2011 recipient of the association's Lifetime Achievement Award. He is a listed author on over 200 scientific publications and has written more than a dozen books in his three areas of expertise: informatics, computer programming, and cancer biology. Dr. Berman is currently a free-lance writer.

Affiliations and Expertise

Ph.D., M.D., freelance author with expertise in informatics, computer programming, and cancer biology

View additional works by Jules J. Berman

Principles of Big Data, 1st Edition



Author Biography



Definition of Big Data

Big Data Versus Small Data

Whence Comest Big Data?

The Most Common Purpose of Big Data is to Produce Small Data


Big Data Moves to the Center of the Information Universe

Chapter 1. Providing Structure to Unstructured Data


Machine Translation



Term Extraction


Chapter 2. Identification, Deidentification, and Reidentification


Features of an Identifier System

Registered Unique Object Identifiers

Really Bad Identifier Methods

Embedding Information in an Identifier: Not Recommended

One-Way Hashes

Use Case: Hospital Registration


Data Scrubbing


Lessons Learned


Chapter 3. Ontologies and Semantics


Classifications, the Simplest of Ontologies

Ontologies, Classes with Multiple Parents

Choosing a Class Model

Introduction to Resource Description Framework Schema

Common Pitfalls in Ontology Development


Chapter 4. Introspection


Knowledge of Self

eXtensible Markup Language

Introduction to Meaning

Namespaces and the Aggregation of Meaningful Assertions

Resource Description Framework Triples


Use Case: Trusted Time Stamp



Chapter 5. Data Integration and Software Interoperability


The Committee to Survey Standards

Standard Trajectory

Specifications and Standards


Compliance Issues

Interfaces to Big Data Resources


Chapter 6. Immutability and Immortality


Immutability and Identifiers

Data Objects

Legacy Data

Data Born from Data

Reconciling Identifiers across Institutions

Zero-Knowledge Reconciliation

The Curator’s Burden


Chapter 7. Measurement



Gene Counting

Dealing with Negations

Understanding Your Control

Practical Significance of Measurements

Obsessive-Compulsive Disorder: The Mark of a Great Data Manager


Chapter 8. Simple but Powerful Big Data Techniques


Look At the Data

Data Range


Frequency Distributions

Mean and Standard Deviation

Estimation-Only Analyses

Use Case: Watching Data Trends with Google Ngrams

Use Case: Estimating Movie Preferences


Chapter 9. Analysis


Analytic Tasks

Clustering, Classifying, Recommending, and Modeling

Data Reduction

Normalizing and Adjusting Data

Big Data Software: Speed and Scalability

Find Relationships, Not Similarities


Chapter 10. Special Considerations in Big Data Analysis


Theory in Search of Data

Data in Search of a Theory


Bigness Bias

Too Much Data

Fixing Data

Data Subsets in Big Data: Neither Additive nor Transitive

Additional Big Data Pitfalls


Chapter 11. Stepwise Approach to Big Data Analysis


Step 1. A Question Is Formulated

Step 2. Resource Evaluation

Step 3. A Question Is Reformulated

Step 4. Query Output Adequacy

Step 5. Data Description

Step 6. Data Reduction

Step 7. Algorithms Are Selected, If Absolutely Necessary

Step 8. Results Are Reviewed and Conclusions Are Asserted

Step 9. Conclusions Are Examined and Subjected to Validation


Chapter 12. Failure


Failure Is Common

Failed Standards


When Does Complexity Help?

When Redundancy Fails

Save Money; Don’t Protect Harmless Information

After Failure

Use Case: Cancer Biomedical Informatics Grid, a Bridge too Far


Chapter 13. Legalities


Responsibility for the Accuracy and Legitimacy of Contained Data

Rights to Create, Use, and Share the Resource

Copyright and Patent Infringements Incurred by Using Standards

Protections for Individuals


Unconsented Data

Good Policies Are a Good Policy

Use Case: The Havasupai Story


Chapter 14. Societal Issues


How Big Data Is Perceived

The Necessity of Data Sharing, Even When It Seems Irrelevant

Reducing Costs and Increasing Productivity with Big Data

Public Mistrust

Saving Us from Ourselves

Hubris and Hyperbole


Chapter 15. The Future


Last Words





Quotes and reviews

"By stressing simple, fundamental concepts, this book teaches readers how to organize large volumes of complex data, and how to achieve data permanence when the content of the data is constantly changing. General methods for data verification and validation, as specifically applied to Big Data resources, are stressed throughout the book." --ODBMS.org, March 2014

"The book is written in a colloquial style and is full of anecdotes, quotations from famous people, and personal opinions." --ComputingReviews.com, February 2014

"The author has produced a sober, serious treatment of this emerging phenomenon, avoiding hype and gee-whiz cases in favor of concepts and mature advice. For example, the author offers ten distinctions between big data and small data, including such factors as goals, location, data structure, preparation, and longevity. This characterization provides much greater insight into the phenomenon than the standard 3V treatment (volume, velocity, and variety)." --ComputingReviews.com, October 2013

Free Shipping
Shop with Confidence

Free Shipping around the world
▪ Broad range of products
▪ 30 days return policy

Contact Us