»
Programming Massively Parallel Processors
 
 

Programming Massively Parallel Processors, 2nd Edition

A Hands-on Approach

 
Programming Massively Parallel Processors, 2nd Edition,David Kirk,Wen-mei Hwu,ISBN9780124159921
 
 
 

  &      

Morgan Kaufmann

9780124159921

9780123914187

514

235 X 191

Learn how to program massively parallel processors with this practical "hands-on" approach utilizing CUDA.

Print Book + eBook

USD 89.94
USD 149.90

Buy both together and save 40%

Print Book

Paperback

In Stock

Estimated Delivery Time
USD 74.95

eBook
eBook Overview

EPUB format

VST (VitalSource Bookshelf) format

PDF format

USD 74.95
Add to Cart
 
 

Key Features

Updates in this new edition include:

  • New coverage of CUDA 5.0, improved performance, enhanced development tools, increased hardware support, and more
  • Increased coverage of related technology, OpenCL and new material on algorithm patterns, GPU clusters, host programming, and data parallelism
  • Two new case studies (on MRI reconstruction and molecular visualization) explore the latest applications of CUDA and GPUs for scientific research and high-performance computing

Description

Programming Massively Parallel Processors: A Hands-on Approach shows both student and professional alike the basic concepts of parallel programming and GPU architecture. Various techniques for constructing parallel programs are explored in detail. Case studies demonstrate the development process, which begins with computational thinking and ends with effective and efficient parallel programs. Topics of performance, floating-point format, parallel patterns, and dynamic parallelism are covered in depth.

This best-selling guide to CUDA and GPU parallel programming has been revised with more parallel programming examples, commonly-used libraries such as Thrust, and explanations of the latest tools. With these improvements, the book retains its concise, intuitive, practical approach based on years of road-testing in the authors' own parallel computing courses.

Readership

Advanced students, software engineers, programmers, hardware engineers

David Kirk

David B. Kirk is well recognized for his contributions to graphics hardware and algorithm research. By the time he began his studies at Caltech, he had already earned B.S. and M.S. degrees in mechanical engineering from MIT and worked as an engineer for Raster Technologies and Hewlett-Packard's Apollo Systems Division, and after receiving his doctorate, he joined Crystal Dynamics, a video-game manufacturing company, as chief scientist and head of technology. In 1997, he took the position of Chief Scientist at NVIDIA, a leader in visual computing technologies, and he is currently an NVIDIA Fellow. At NVIDIA, Kirk led graphics-technology development for some of today's most popular consumer-entertainment platforms, playing a key role in providing mass-market graphics capabilities previously available only on workstations costing hundreds of thousands of dollars. For his role in bringing high-performance graphics to personal computers, Kirk received the 2002 Computer Graphics Achievement Award from the Association for Computing Machinery and the Special Interest Group on Graphics and Interactive Technology (ACM SIGGRAPH) and, in 2006, was elected to the National Academy of Engineering, one of the highest professional distinctions for engineers. Kirk holds 50 patents and patent applications relating to graphics design and has published more than 50 articles on graphics technology, won several best-paper awards, and edited the book Graphics Gems III. A technological "evangelist" who cares deeply about education, he has supported new curriculum initiatives at Caltech and has been a frequent university lecturer and conference keynote speaker worldwide.

Affiliations and Expertise

NVIDIA Fellow

Wen-mei Hwu

Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. His research interests are in the area of architecture, implementation, compilation, and algorithms for parallel computing. He is the chief scientist of Parallel Computing Institute and director of the IMPACT research group (www.impact.crhc.illinois.edu). He is a co-founder and CTO of MulticoreWare. For his contributions in research and teaching, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the Tau Beta Pi Daniel C. Drucker Eminent Faculty Award, the ISCA Influential Paper Award, the IEEE Computer Society B. R. Rau Award and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the NSF Blue Waters Petascale computer project. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.

Affiliations and Expertise

CTO of MulticoreWare and professor specializing in compiler design, computer architecture, microarchitecture, and parallel processing, University of Illinois at Urbana-Champaign

View additional works by Wen-mei W. Hwu

Programming Massively Parallel Processors, 2nd Edition

Preface

Target Audience

How to Use the Book

Online Supplements

Acknowledgements

Dedication

Chapter 1. Introduction

1.1 Heterogeneous Parallel Computing

1.2 Architecture of a Modern GPU

1.3 Why More Speed or Parallelism?

1.4 Speeding Up Real Applications

1.5 Parallel Programming Languages and Models

1.6 Overarching Goals

1.7 Organization of the Book

References

Chapter 2. History of GPU Computing

2.1 Evolution of Graphics Pipelines

2.2 GPGPU: An Intermediate Step

2.3 GPU Computing

References and Further Reading

Chapter 3. Introduction to Data Parallelism and CUDA C

3.1 Data Parallelism

3.2 CUDA Program Structure

3.3 A Vector Addition Kernel

3.4 Device Global Memory and Data Transfer

3.5 Kernel Functions and Threading

3.6 Summary

3.7 Exercises

References

Chapter 4. Data-Parallel Execution Model

4.1 Cuda Thread Organization

4.2 Mapping Threads to Multidimensional Data

4.3 Matrix-Matrix Multiplication—A More Complex Kernel

4.4 Synchronization and Transparent Scalability

4.5 Assigning Resources to Blocks

4.6 Querying Device Properties

4.7 Thread Scheduling and Latency Tolerance

4.8 Summary

4.9 Exercises

Chapter 5. CUDA Memories

5.1 Importance of Memory Access Efficiency

5.2 CUDA Device Memory Types

5.3 A Strategy for Reducing Global Memory Traffic

5.4 A Tiled Matrix–Matrix Multiplication Kernel

5.5 Memory as a Limiting Factor to Parallelism

5.6 Summary

5.7 Exercises

Chapter 6. Performance Considerations

6.1 Warps and Thread Execution

6.2 Global Memory Bandwidth

6.3 Dynamic Partitioning of Execution Resources

6.4 Instruction Mix and Thread Granularity

6.5 Summary

6.6 Exercises

References

Chapter 7. Floating-Point Considerations

7.1 Floating-Point Format

7.2 Representable Numbers

7.3 Special Bit Patterns and Precision in IEEE Format

7.4 Arithmetic Accuracy and Rounding

7.5 Algorithm Considerations

7.6 Numerical Stability

7.7 Summary

7.8 Exercises

References

Chapter 8. Parallel Patterns: Convolution: With an Introduction to Constant Memory and Caches

8.1 Background

8.2 1D Parallel Convolution—A Basic Algorithm

8.3 Constant Memory and Caching

8.4 Tiled 1D Convolution with Halo Elements

8.5 A Simpler Tiled 1D Convolution—General Caching

8.6 Summary

8.7 Exercises

Chapter 9. Parallel Patterns: Prefix Sum: An Introduction to Work Efficiency in Parallel Algorithms

9.1 Background

9.2 A Simple Parallel Scan

9.3 Work Efficiency Considerations

9.4 A Work-Efficient Parallel Scan

9.5 Parallel Scan for Arbitrary-Length Inputs

9.6 Summary

9.7 Exercises

Reference

Chapter 10. Parallel Patterns: Sparse Matrix–Vector Multiplication: An Introduction to Compaction and Regularization in Parallel Algorithms

10.1 Background

10.2 Parallel SpMV Using CSR

10.3 Padding and Transposition

10.4 Using Hybrid to Control Padding

10.5 Sorting and Partitioning for Regularization

10.6 Summary

10.7 Exercises

References

Chapter 11. Application Case Study: Advanced MRI Reconstruction

11.1 Application Background

11.2 Iterative Reconstruction

11.3 Computing FHD

11.4 Final Evaluation

11.5 Exercises

References

Chapter 12. Application Case Study: Molecular Visualization and Analysis

12.1 Application Background

12.2 A Simple Kernel Implementation

12.3 Thread Granularity Adjustment

12.4 Memory Coalescing

12.5 Summary

12.6 Exercises

References

Chapter 13. Parallel Programming and Computational Thinking

13.1 Goals of Parallel Computing

13.2 Problem Decomposition

13.3 Algorithm Selection

13.4 Computational Thinking

13.5 Summary

13.6 Exercises

References

Chapter 14. An Introduction to OpenCLTM

14.1 Background

14.2 Data Parallelism Model

14.3 Device Architecture

14.4 Kernel Functions

14.5 Device Management and Kernel Launch

14.6 Electrostatic Potential Map in OpenCL

14.7 Summary

14.8 Exercises

References

Chapter 15. Parallel Programming with OpenACC

15.1 OpenACC Versus CUDA C

15.2 Execution Model

15.3 Memory Model

15.4 Basic OpenACC Programs

15.5 Future Directions of OpenACC

15.6 Exercises

Chapter 16. Thrust: A Productivity-Oriented Library for CUDA

16.1 Background

16.2 Motivation

16.3 Basic Thrust Features

16.4 Generic Programming

16.5 Benefits of Abstraction

16.6 Programmer Productivity

16.7 Best Practices

16.8 Exercises

References

Chapter 17. CUDA FORTRAN

17.1 CUDA FORTRAN and CUDA C Differences

17.2 A First CUDA FORTRAN Program

17.3 Multidimensional Array in CUDA FORTRAN

17.4 Overloading Host/Device Routines With Generic Interfaces

17.5 Calling CUDA C Via Iso_C_Binding

17.6 Kernel Loop Directives and Reduction Operations

17.7 Dynamic Shared Memory

17.8 Asynchronous Data Transfers

17.9 Compilation and Profiling

17.10 Calling Thrust from CUDA FORTRAN

17.11 Exercises

Chapter 18. An Introduction to C++ AMP

18.1 Core C++ AMP Features

18.2 Details of the C++ AMP Execution Model

18.3 Managing Accelerators

18.4 Tiled Execution

18.5 C++ AMP Graphics Features

18.6 Summary

18.7 Exercises

Chapter 19. Programming a Heterogeneous Computing Cluster

19.1 Background

19.2 A Running Example

19.3 MPI Basics

19.4 MPI Point-to-Point Communication Types

19.5 Overlapping Computation and Communication

19.6 MPI Collective Communication

19.7 Summary

19.8 Exercises

Reference

Chapter 20. CUDA Dynamic Parallelism

20.1 Background

20.2 Dynamic Parallelism Overview

20.3 Important Details

20.4 Memory Visibility

20.5 A Simple Example

20.6 Runtime Limitations

20.7 A More Complex Example

20.8 Summary

Reference

Chapter 21. Conclusion and Future Outlook

21.1 Goals Revisited

21.2 Memory Model Evolution

21.3 Kernel Execution Control Evolution

21.4 Core Performance

21.5 Programming Environment

21.6 Future Outlook

References

Appendix A. Matrix Multiplication Host-Only Version Source Code

Appendix Outline

A.1 matrixmul.cu

A.2 matrixmul_gold.cpp

A.3 matrixmul.h

A.4 assist.h

A.5 Expected Output

Appendix B. GPU Compute Capabilities

Appendix Outline

B.1 GPU Compute Capability Tables

B.2 Memory Coalescing Variations

Index

Quotes and reviews

"For those interested in the GPU path to parallel enlightenment, this new book from David Kirk and Wen-mei Hwu is a godsend, as it introduces CUDA (tm), a C-like data parallel language, and Tesla(tm), the architecture of the current generation of NVIDIA GPUs. In addition to explaining the language and the architecture, they define the nature of data parallel problems that run well on the heterogeneous CPU-GPU hardware ... This book is a valuable addition to the recently reinvigorated parallel computing literature."
- David Patterson, Director of The Parallel Computing Research Laboratory and the Pardee Professor of Computer Science, U.C. Berkeley. Co-author of Computer Architecture: A Quantitative Approach

"Written by two teaching pioneers, this book is the definitive practical reference on programming massively parallel processors--a true technological gold mine. The hands-on learning included is cutting-edge, yet very readable. This is a most rewarding read for students, engineers, and scientists interested in supercharging computational resources to solve today's and tomorrow's hardest problems."
- Nicolas Pinto, MIT, NVIDIA Fellow, 2009

"I have always admired Wen-mei Hwu's and David Kirk's ability to turn complex problems into easy-to-comprehend concepts. They have done it again in this book. This joint venture of a passionate teacher and a GPU evangelizer tackles the trade-off between the simple explanation of the concepts and the in-depth analysis of the programming techniques. This is a great book to learn both massive parallel programming and CUDA."
- Mateo Valero, Director, Barcelona Supercomputing Center

"The use of GPUs is having a big impact in scientific computing. David Kirk and Wen-mei Hwu's new book is an important contribution towards educating our students on the ideas and techniques of programming for massively parallel processors."
- Mike Giles, Professor of Scientific Computing, University of Oxford

"This book is the most comprehensive and authoritative introduction to GPU computing yet. David Kirk and Wen-mei Hwu are the pioneers in this increasingly important field, and their insights are invaluable and fascinating. This book will be the standard reference for years to come."
- Hanspeter Pfister, Harvard University

"This is a vital and much-needed text. GPU programming is growing by leaps and bounds. This new book will be very welcomed and highly useful across inter-disciplinary fields."
- Shannon Steinfadt, Kent State University

"GPUs have hundreds of cores capable of delivering transformative performance increases across a wide range of computational challenges. The rise of these multi-core architectures has raised the need to teach advanced programmers a new and essential skill: how to program massively parallel processors." – CNNMoney.com

 
 
Free Shipping
Shop with Confidence

Free Shipping around the world
▪ Broad range of products
▪ 30 days return policy
FAQ

Contact Us