Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques, 3rd Edition

Data Mining: Concepts and Techniques, 3rd Edition,Jiawei Han,Micheline Kamber,Jian Pei,ISBN9780123814791

  &      &      

Morgan Kaufmann




240 X 197

A comprehensive and practical look at the concepts and techniques you need in the area of data mining and knowledge discovery

Print Book + eBook

USD 89.94
USD 149.90

Buy both together and save 40%

Print Book


In Stock

Estimated Delivery Time
USD 74.95

eBook Overview

ePUB format

PDF format

VST format

USD 52.47
USD 74.95
Add to Cart

Key Features

    * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects.
    * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields.
    *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data


    Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining.


    Data warehouse engineers, data mining professionals, database researchers, statisticians, data analysts, data modelers, and other data professionals working on data mining at the R&D and implementation levels. And upper-level undergrads and graduate students in data mining at computer science programs.

    Jiawei Han

    Jiawei Han is Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Well known for his research in the areas of data mining and database systems, he has received many awards for his contributions in the field, including the 2004 ACM SIGKDD Innovations Award. He has served as Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data, and on editorial boards of several journals, including IEEE Transactions on Knowledge and Data Engineering and Data Mining and Knowledge Discovery.

    Affiliations and Expertise

    University of Illinois, Urbana Champaign

    View additional works by Jiawei Han

    Micheline Kamber

    Micheline Kamber is a researcher with a passion for writing in easy-to-understand terms. She has a master's degree in computer science (specializing in artificial intelligence) from Concordia University, Canada.

    Affiliations and Expertise

    Simon Fraser University, Burnaby, Canada

    View additional works by Micheline Kamber

    Jian Pei

    Jian Pei is Associate Professor of Computing Science and the director of Collaborative Research and Industry Relations at the School of Computing Science at Simon Fraser University, Canada. In 2002-2004, he was an Assistant Professor of Computer Science and Engineering at the State University of New York (SUNY) at Buffalo. He received a Ph.D. degree in Computing Science from Simon Fraser University in 2002, under Dr. Jiawei Han's supervision.

    Affiliations and Expertise

    Simon Fraser University, Burnaby, Canada

    Data Mining: Concepts and Techniques, 3rd Edition

    Foreword Foreword to Second Edition Preface Acknowledgments About the Authors Chapter 1 Introduction     1.1 Why Data Mining?          1.1.1 Moving toward the Information Age          1.1.2 Data Mining as the Evolution of Information Technology     1.2 What Is Data Mining?     1.3 What Kinds of Data Can Be Mined?          1.3.1 Database Data          1.3.2 Data Warehouses          1.3.3 Transactional Data          1.3.4 Other Kinds of Data     1.4 What Kinds of Patterns Can Be Mined?          1.4.1 Class/Concept Description: Characterization and Discrimination          1.4.2 Mining Frequent Patterns, Associations, and Correlations          1.4.3 Classification and Regression for Predictive Analysis          1.4.4 Cluster Analysis          1.4.5 Outlier Analysis          1.4.6 Are All Patterns Interesting?     1.5 Which Technologies Are Used?          1.5.1 Statistics          1.5.2 Machine Learning          1.5.3 Database Systems and Data Warehouses          1.5.4 Information Retrieval     1.6 Which Kinds of Applications Are Targeted?          1.6.1 Business Intelligence          1.6.2 Web Search Engines     1.7 Major Issues in Data Mining          1.7.1 Mining Methodology          1.7.2 User Interaction          1.7.3 Efficiency and Scalability          1.7.4 Diversity of Database Types          1.7.5 Data Mining and Society     1.8 Summary     1.9 Exercises     1.10 Bibliographic Notes Chapter 2 Getting to Know Your Data     2.1 Data Objects and Attribute Types          2.1.1 What Is an Attribute?          2.1.2 Nominal Attributes          2.1.3 Binary Attributes          2.1.4 Ordinal Attributes          2.1.5 Numeric Attributes          2.1.6 Discrete versus Continuous Attributes     2.2 Basic Statistical Descriptions of Data          2.2.1 Measuring the Central Tendency: Mean, Median, and Mode          2.2.2 Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, and Interquartile Range          2.2.3 Graphic Displays of Basic Statistical Descriptions of Data     2.3 Data Visualization          2.3.1 Pixel-Oriented Visualization Techniques          2.3.2 Geometric Projection Visualization Techniques          2.3.3 Icon-Based Visualization Techniques          2.3.4 Hierarchical Visualization Techniques          2.3.5 Visualizing Complex Data and Relations     2.4 Measuring Data Similarity and Dissimilarity          2.4.1 Data Matrix versus Dissimilarity Matrix          2.4.2 Proximity Measures for Nominal Attributes          2.4.3 Proximity Measures for Binary Attributes          2.4.4 Dissimilarity of Numeric Data: Minkowski Distance          2.4.5 Proximity Measures for Ordinal Attributes          2.4.6 Dissimilarity for Attributes of Mixed Types          2.4.7 Cosine Similarity     2.5 Summary     2.6 Exercises     2.7 Bibliographic Notes Chapter 3 Data Preprocessing     3.1 Data Preprocessing: An Overview          3.1.1 Data Quality: Why Preprocess the Data?          3.1.2 Major Tasks in Data Preprocessing     3.2 Data Cleaning          3.2.1 Missing Values          3.2.2 Noisy Data          3.2.3 Data Cleaning as a Process     3.3 Data Integration          3.3.1 Entity Identification Problem          3.3.2 Redundancy and Correlation Analysis          3.3.3 Tuple Duplication          3.3.4 Data Value Conflict Detection and Resolution     3.4 Data Reduction          3.4.1 Overview of Data Reduction Strategies          3.4.2 Wavelet Transforms          3.4.3 Principal Components Analysis          3.4.4 Attribute Subset Selection          3.4.5 Regression and Log-Linear Models: Parametric Data Reduction          3.4.6 Histograms          3.4.7 Clustering          3.4.8 Sampling          3.4.9 Data Cube Aggregation     3.5 Data Transformation and Data Discretization          3.5.1 Data Transformation Strategies Overview          3.5.2 Data Transformation by Normalization          3.5.3 Discretization by Binning          3.5.4 Discretization by Histogram Analysis          3.5.5 Discretization by Cluster, Decision Tree, and Correlation Analyses          3.5.6 Concept Hierarchy Generation for Nominal Data     3.6 Summary     3.7 Exercises     3.8 Bibliographic Notes Chapter 4 Data Warehousing and Online Analytical Processing     4.1 Data Warehouse: Basic Concepts          4.1.1 What Is a Data Warehouse?          4.1.2 Differences between Operational Database Systems and Data Warehouses          4.1.3 But, Why Have a Separate Data Warehouse?          4.1.4 Data Warehousing: A Multitiered Architecture          4.1.5 Data Warehouse Models: Enterprise Warehouse, Data Mart, and Virtual Warehouse          4.1.6 Extraction, Transformation, and Loading          4.1.7 Metadata Repository     4.2 Data Warehouse Modeling: Data Cube and OLAP          4.2.1 Data Cube: A Multidimensional Data Model          4.2.2 Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional Data Models          4.2.3 Dimensions: The Role of Concept Hierarchies          4.2.4 Measures: Their Categorization and Computation          4.2.5 Typical OLAP Operations          4.2.6 A Starnet Query Model for Querying Multidimensional Databases     4.3 Data Warehouse Design and Usage          4.3.1 A Business Analysis Framework for Data Warehouse Design          4.3.2 Data Warehouse Design Process          4.3.3 Data Warehouse Usage for Information Processing          4.3.4 From Online Analytical Processing to Multidimensional Data Mining     4.4 Data Warehouse Implementation          4.4.1 Efficient Data Cube Computation: An Overview          4.4.2 Indexing OLAP Data: Bitmap Index and Join Index          4.4.3 Efficient Processing of OLAP Queries          4.4.4 OLAP Server Architectures: ROLAP versus MOLAP versus HOLAP     4.5 Data Generalization by Attribute-Oriented Induction          4.5.1 Attribute-Oriented Induction for Data Characterization          4.5.2 Efficient Implementation of Attribute-Oriented Induction          4.5.3 Attribute-Oriented Induction for Class Comparisons     4.6 Summary     4.7 Exercises     4.8 Bibliographic Notes Chapter 5 Data Cube Technology     5.1 Data Cube Computation: Preliminary Concepts          5.1.1 Cube Materialization: Full Cube, Iceberg Cube, Closed Cube, and Cube Shell          5.1.2 General Strategies for Data Cube Computation     5.2 Data Cube Computation Methods          5.2.1 Multiway Array Aggregation for Full Cube Computation          5.2.2 BUC: Computing Iceberg Cubes from the Apex Cuboid Downward          5.2.3 Star-Cubing: Computing Iceberg Cubes Using a Dynamic Star-Tree Structure          5.2.4 Precomputing Shell Fragments for Fast High-Dimensional OLAP     5.3 Processing Advanced Kinds of Queries by Exploring Cube Technology          5.3.1 Sampling Cubes: OLAP-Based Mining on Sampling Data          5.3.2 Ranking Cubes: Efficient Computation of Top-k Queries     5.4 Multidimensional Data Analysis in Cube Space          5.4.1 Prediction Cubes: Prediction Mining in Cube Space          5.4.2 Multifeature Cubes: Complex Aggregation at Multiple Granularities          5.4.3 Exception-Based, Discovery-Driven Cube Space Exploration     5.5 Summary     5.6 Exercises     5.7 Bibliographic Notes Chapter 6 Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods     6.1 Basic Concepts          6.1.1 Market Basket Analysis: A Motivating Example          6.1.2 Frequent Itemsets, Closed Itemsets, and Association Rules     6.2 Frequent Itemset Mining Methods          6.2.1 Apriori Algorithm: Finding Frequent Itemsets by Confined Candidate Generation          6.2.2 Generating Association Rules from Frequent Itemsets          6.2.3 Improving the Efficiency of Apriori          6.2.4 A Pattern-Growth Approach for Mining Frequent Itemsets          6.2.5 Mining Frequent Itemsets Using Vertical Data Format          6.2.6 Mining Closed and Max Patterns     6.3 Which Patterns Are Interesting?-Pattern Evaluation Methods          6.3.1 Strong Rules Are Not Necessarily Interesting          6.3.2 From Association Analysis to Correlation Analysis          6.3.3 A Comparison of Pattern Evaluation Measures     6.4 Summary     6.5 Exercises     6.6 Bibliographic Notes Chapter 7 Advanced Pattern Mining     7.1 Pattern Mining: A Road Map     7.2 Pattern Mining in Multilevel, Multidimensional Space          7.2.1 Mining Multilevel Associations          7.2.2 Mining Multidimensional Associations          7.2.3 Mining Quantitative Association Rules          7.2.4 Mining Rare Patterns and Negative Patterns     7.3 Constraint-Based Frequent Pattern Mining          7.3.1 Metarule-Guided Mining of Association Rules          7.3.2 Constraint-Based Pattern Generation: Pruning Pattern Space and Pruning Data Space     7.4 Mining High-Dimensional Data and Colossal Patterns          7.4.1 Mining Colossal Patterns by Pattern-Fusion     7.5 Mining Compressed or Approximate Patterns          7.5.1 Mining Compressed Patterns by Pattern Clustering          7.5.2 Extracting Redundancy-Aware Top-k Patterns     7.6 Pattern Exploration and Application          7.6.1 Semantic Annotation of Frequent Patterns          7.6.2 Applications of Pattern Mining     7.7 Summary     7.8 Exercises     7.9 Bibliographic Notes Chapter 8 Classification: Basic Concepts     8.1 Basic Concepts          8.1.1 What Is Classification?          8.1.2 General Approach to Classification     8.2 Decision Tree Induction          8.2.1 Decision Tree Induction          8.2.2 Attribute Selection Measures          8.2.3 Tree Pruning          8.2.4 Scalability and Decision Tree Induction          8.2.5 Visual Mining for Decision Tree Induction     8.3 Bayes Classification Methods          8.3.1 Bayes’ Theorem          8.3.2 Na¨ive Bayesian Classification     8.4 Rule-Based Classification          8.4.1 Using IF-THEN Rules for Classification          8.4.2 Rule Extraction from a Decision Tree          8.4.3 Rule Induction Using a Sequential Covering Algorithm     8.5 Model Evaluation and Selection          8.5.1 Metrics for Evaluating Classifier Performance          8.5.2 Holdout Method and Random Subsampling          8.5.3 Cross-Validation          8.5.4 Bootstrap          8.5.5 Model Selection Using Statistical Tests of Significance          8.5.6 Comparing Classifiers Based on Cost-Benefit and ROC Curves     8.6 Techniques to Improve Classification Accuracy          8.6.1 Introducing Ensemble Methods          8.6.2 Bagging          8.6.3 Boosting and AdaBoost          8.6.4 Random Forests          8.6.5 Improving Classification Accuracy of Class-Imbalanced Data     8.7 Summary     8.8 Exercises     8.9 Bibliographic Notes Chapter 9 Classification: Advanced Methods     9.1 Bayesian Belief Networks          9.1.1 Concepts and Mechanisms          9.1.2 Training Bayesian Belief Networks     9.2 Classification by Backpropagation          9.2.1 A Multilayer Feed-Forward Neural Network          9.2.2 Defining a Network Topology          9.2.3 Backpropagation          9.2.4 Inside the Black Box: Backpropagation and Interpretability     9.3 Support Vector Machines          9.3.1 The Case When the Data Are Linearly Separable          9.3.2 The Case When the Data Are Linearly Inseparable     9.4 Classification Using Frequent Patterns          9.4.1 Associative Classification          9.4.2 Discriminative Frequent Pattern-Based Classification     9.5 Lazy Learners (or Learning from Your Neighbors)          9.5.1 ?-Nearest-Neighbor Classifiers          9.5.2 Case-Based Reasoning     9.6 Other Classification Methods          9.6.1 Genetic Algorithms          9.6.2 Rough Set Approach          9.6.3 Fuzzy Set Approaches     9.7 Additional Topics Regarding Classification          9.7.1 Multiclass Classification          9.7.2 Semi-Supervised Classification          9.7.3 Active Learning          9.7.4 Transfer Learning     9.8 Summary     9.9 Exercises     9.10 Bibliographic Notes Chapter 10 Cluster Analysis: Basic Concepts and Methods     10.1 Cluster Analysis          10.1.1 What Is Cluster Analysis?          10.1.2 Requirements for Cluster Analysis          10.1.3 Overview of Basic Clustering Methods     10.2 Partitioning Methods          10.2.1 ?-Means: A Centroid-Based Technique          10.2.2 ?-Medoids: A Representative Object-Based Technique     10.3 Hierarchical Methods          10.3.1 Agglomerative versus Divisive Hierarchical Clustering          10.3.2 Distance Measures in Algorithmic Methods          10.3.3 BIRCH: Multiphase Hierarchical Clustering Using Clustering Feature Trees          10.3.4 Chameleon: Multiphase Hierarchical Clustering Using Dynamic Modeling          10.3.5 Probabilistic Hierarchical Clustering     10.4 Density-Based Methods          10.4.1 DBSCAN: Density-Based Clustering Based on Connected Regions with High Density          10.4.2 OPTICS: Ordering Points to Identify the Clustering Structure          10.4.3 DENCLUE: Clustering Based on Density Distribution Functions     10.5 Grid-Based Methods          10.5.1 STING: STatistical INformation Grid          10.5.2 CLIQUE: An Apriori-like Subspace Clustering Method     10.6 Evaluation of Clustering          10.6.1 Assessing Clustering Tendency          10.6.2 Determining the Number of Clusters          10.6.3 Measuring Clustering Quality     10.7 Summary     10.8 Exercises     10.9 Bibliographic Notes Chapter 11 Advanced Cluster Analysis     11.1 Probabilistic Model-Based Clustering          11.1.1 Fuzzy Clusters          11.1.2 Probabilistic Model-Based Clusters          11.1.3 Expectation-Maximization Algorithm     11.2 Clustering High-Dimensional Data          11.2.1 Clustering High-Dimensional Data: Problems, Challenges, and Major Methodologies          11.2.2 Subspace Clustering Methods          11.2.3 Biclustering          11.2.4 Dimensionality Reduction Methods and Spectral Clustering     11.3 Clustering Graph and Network Data          11.3.1 Applications and Challenges          11.3.2 Similarity Measures          11.3.3 Graph Clustering Methods     11.4 Clustering with Constraints          11.4.1 Categorization of Constraints          11.4.2 Methods for Clustering with Constraints     11.5 Summary     11.6 Exercises     11.7 Bibliographic Notes Chapter 12 Outlier Detection     12.1 Outliers and Outlier Analysis          12.1.1 What Are Outliers?          12.1.2 Types of Outliers          12.1.3 Challenges of Outlier Detection     12.2 Outlier Detection Methods          12.2.1 Supervised, Semi-Supervised, and Unsupervised Methods          12.2.2 Statistical Methods, Proximity-Based Methods, and Clustering-Based Methods     12.3 Statistical Approaches          12.3.1 Parametric Methods          12.3.2 Nonparametric Methods     12.4 Proximity-Based Approaches          12.4.1 Distance-Based Outlier Detection and a Nested Loop Method          12.4.2 A Grid-Based Method          12.4.3 Density-Based Outlier Detection     12.5 Clustering-Based Approaches     12.6 Classification-Based Approaches     12.7 Mining Contextual and Collective Outliers          12.7.1 Transforming Contextual Outlier Detection to Conventional Outlier Detection          12.7.2 Modeling Normal Behavior with Respect to Contexts          12.7.3 Mining Collective Outliers     12.8 Outlier Detection in High-Dimensional Data          12.8.1 Extending Conventional Outlier Detection          12.8.2 Finding Outliers in Subspaces          12.8.3 Modeling High-Dimensional Outliers     12.9 Summary     12.10 Exercises     12.11 Bibliographic Notes Chapter 13 Data Mining Trends and Research Frontiers     13.1 Mining Complex Data Types          13.1.1 Mining Sequence Data: Time-Series, Symbolic Sequences, and Biological Sequences          13.1.2 Mining Graphs and Networks          13.1.3 Mining Other Kinds of Data     13.2 Other Methodologies of Data Mining          13.2.1 Statistical Data Mining          13.2.2 Views on Data Mining Foundations          13.2.3 Visual and Audio Data Mining     13.3 Data Mining Applications          13.3.1 Data Mining for Financial Data Analysis          13.3.2 Data Mining for Retail and Telecommunication Industries          13.3.3 Data Mining in Science and Engineering          13.3.4 Data Mining for Intrusion Detection and Prevention          13.3.5 Data Mining and Recommender Systems     13.4 Data Mining and Society          13.4.1 Ubiquitous and Invisible Data Mining          13.4.2 Privacy, Security, and Social Impacts of Data Mining     13.5 Data Mining Trends     13.6 Summary     13.7 Exercises     13.8 Bibliographic Notes Bibliography Index

    Quotes and reviews

    ""[A] well-written textbook (2nd ed., 2006; 1st ed., 2001) on data mining or knowledge discovery. The text is supported by a strong outline. The authors preserve much of the introductory material, but add the latest techniques and developments in data mining, thus making this a comprehensive resource for both beginners and practitioners. The focus is data-all aspects. The presentation is broad, encyclopedic, and comprehensive, with ample references for interested readers to pursue in-depth research on any technique. Summing Up: Highly recommended. Upper-division undergraduates through professionals/practitioners.""--CHOICE

    ""This interesting and comprehensive introduction to data mining emphasizes the interest in multidimensional data mining--the integration of online analytical processing (OLAP) and data mining. Some chapters cover basic methods, and others focus on advanced techniques. The structure, along with the didactic presentation, makes the book suitable for both beginners and specialized readers.""--ACM’s Computing Reviews.com

    We are living in the data deluge age. The Data Mining: Concepts and Techniques shows us how to find useful knowledge in all that data. Thise 3rd editionThird Edition significantly expands the core chapters on data preprocessing, frequent pattern mining, classification, and clustering. The bookIt also comprehensively covers OLAP and outlier detection, and examines mining networks, complex data types, and important application areas. The book, with its companion website, would make a great textbook for analytics, data mining, and knowledge discovery courses.--Gregory Piatetsky, President, KDnuggets

    Jiawei, Micheline, and Jian give an encyclopaedic coverage of all the related methods, from the classic topics of clustering and classification, to database methods (association rules, data cubes) to more recent and advanced topics (SVD/PCA , wavelets, support vector machines)…. Overall, it is an excellent book on classic and modern data mining methods alike, and it is ideal not only for teaching, but as a reference book.-From the foreword by Christos Faloutsos, Carnegie Mellon University

    ""A very good textbook on data mining, this third edition reflects the changes that are occurring in the data mining field. It adds cited material from about 2006, a new section on visualization, and pattern mining with the more recent cluster methods. It’s a well-written text, with all of the supporting materials an instructor is likely to want, including Web material support, extensive problem sets, and solution manuals. Though it serves as a data mining text, readers with little experience in the area will find it readable and enlightening. That being said, readers are expected to have some coding experience, as well as database design and statistics analysis knowledge…Two additional items are worthy of note: the text’s bibliography is an excellent reference list for mining research; and the index is very complete, which makes it easy to locate information. Also, researchers and analysts from other disciplines--for example, epidemiologists, financial analysts, and psychometric researchers--may find the material very useful.""--Computing Reviews

    ""Han (engineering, U. of Illinois-Urbana-Champaign), Micheline Kamber, and Jian Pei (both computer science, Simon Fraser U., British Columbia) present a textbook for an advanced undergraduate or beginning graduate course introducing data mining. Students should have some background in statistics, database systems, and machine learning and some experience programming. Among the topics are getting to know the data, data warehousing and online analytical processing, data cube technology, cluster analysis, detecting outliers, and trends and research frontiers. Chapter-end exercises are included.""--SciTech Book News

    ""This book is an extensive and detailed guide to the principal ideas, techniques and technologies of data mining. The book is organised in 13 substantial chapters, each of which is essentially standalone, but with useful references to the book’s coverage of underlying concepts. A broad range of topics are covered, from an initial overview of the field of data mining and its fundamental concepts, to data preparation, data warehousing, OLAP, pattern discovery and data classification. The final chapter describes the current state of data mining research and active research areas.""--BCS.org

    Cyber Monday SALE Upto 50 Percent OFF | Use Code CYBER14
    Shop with Confidence

    Free Shipping around the world
    ▪ Broad range of products
    ▪ 30 days return policy

    Contact Us