GPU Computing Gems Jade Edition

GPU Computing Gems Jade Edition, 1st Edition

GPU Computing Gems Jade Edition, 1st Edition,Wen-mei Hwu,ISBN9780123859631

W Hwu   

Morgan Kaufmann




240 X 197

Leading minds in GPGPU share cutting-edge parallel computing techniques that increase the speed of scientific innovation

Print Book + eBook

USD 92.34
USD 153.90

Buy both together and save 40%

Print Book


In Stock

Estimated Delivery Time
USD 78.95

eBook Overview

VST (VitalSource Bookshelf) format

EPUB format

USD 74.95
Add to Cart

Key Features

  • This second volume of GPU Computing Gems offers 100% new material of interest across industry, including finance, medicine, imaging, engineering, gaming, environmental science, green computing, and more
  • Covers new tools and frameworks for productive GPU computing application development and offers immediate benefit to researchers developing improved programming environments for GPUs
  • Even more hands-on, proven techniques demonstrating how general purpose GPU computing is changing scientific research
  • Distills the best practices of the community of CUDA programmers; each chapter provides insights and ideas as well as 'hands on' skills applicable to a variety of fields


GPU Computing Gems, Jade Edition describes successful application experiences in GPU computing and the techniques that contributed to that success. Divided into five sections, the book explains how GPU execution is achieved with algorithm implementation techniques and approaches to data structure layout. More specifically, it considers three general requirements: high level of parallelism, coherent memory access by threads within warps, and coherent control flow within warps. This book begins with an overview of parallel algorithms and data structures. The first few chapters focus on accelerating database searches, how to leverage the Fermi GPU architecture to further accelerate prefix operations, and GPU implementation of hash tables. The reader is then systematically walked through the fundamental optimization steps when implementing a bandwidth-limited algorithm, GPU-based libraries of numerical algorithms and software products for numerical analysis with dedicated GPU support, and the adoption of GPU computing techniques in production engineering simulation codes. The next chapters discuss the state of GPU computing in interactive physics and artificial intelligence, programming tools and techniques for GPU computing, and the edge and node parallelism approach for computing graph centrality metrics. The book also proposes an alternative approach that balances computation regardless of node degree variance. This book will be useful to application developers in a wide range of application areas.


Software engineers, programmers, hardware engineers, advanced students

Wen-mei Hwu

Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. His research interests are in the area of architecture, implementation, compilation, and algorithms for parallel computing. He is the chief scientist of Parallel Computing Institute and director of the IMPACT research group (www.impact.crhc.illinois.edu). He is a co-founder and CTO of MulticoreWare. For his contributions in research and teaching, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the Tau Beta Pi Daniel C. Drucker Eminent Faculty Award, the ISCA Influential Paper Award, the IEEE Computer Society B. R. Rau Award and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the NSF Blue Waters Petascale computer project. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.

Affiliations and Expertise

CTO, MulticoreWare and professor specializing in compiler design, computer architecture, microarchitecture, and parallel processing, University of Illinois at Urbana-Champaign

View additional works by Wen-mei W. Hwu

GPU Computing Gems Jade Edition, 1st Edition

Editors, Reviewers, and Authors


Managing Editor


Area Editors




State of GPU Computing

Section 1: Parallel Algorithms and Data Structures


In this Section

Chapter 1. Large-Scale GPU Search

1.1 Introduction

1.2 Memory Performance

1.3 Searching Large Data Sets

1.4 Experimental Evaluation

1.5 Conclusion


Chapter 2. Edge v. Node Parallelism for Graph Centrality Metrics

2.1 Introduction

2.2 Background

2.3 Node v. Edge Parallelism

2.4 Data Structure

2.5 Implementation

2.6 Analysis

2.7 Results

2.8 Conclusions


Chapter 3. Optimizing Parallel Prefix Operations for the Fermi Architecture

3.1 Introduction to Parallel Prefix Operations

3.2 Efficient Binary Prefix Operations on Fermi

3.3 Conclusion


Chapter 4. Building an Efficient Hash Table on the GPU

4.1 Introduction

4.2 Overview

4.3 Building and Querying a Basic Hash Table

4.4 Specializing the Hash Table

4.5 Analysis

4.6 Conclusion



Chapter 5. Efficient CUDA Algorithms for the Maximum Network Flow Problem

5.1 Introduction, Problem Statement, and Context

5.2 Core Method

5.3 Algorithms, Implementations, and Evaluations

5.4 Final Evaluation

5.5 Future Directions


Chapter 6. Optimizing Memory Access Patterns for Cellular Automata on GPUs

6.1 Introduction, Problem Statement, and Context

6.2 Core Methods

6.3 Algorithms, Implementations, and Evaluations

6.4 Final Results

6.5 Future Directions


Chapter 7. Fast Minimum Spanning Tree Computation

7.1 Introduction, Problem Statement, and Context

7.2 The MST Algorithm: Overview

7.3 CUDA Implementation of MST

7.4 Evaluation

7.5 Conclusions


Chapter 8. Comparison-Based In-Place Sorting with CUDA

8.1 Introduction

8.2 Bitonic Sort

8.3 Implementation

8.4 Evaluation

8.5 Conclusion


Section 2: Numerical Algorithms


State of GPU-Based Numerical Algorithms

In this Section

Chapter 9. Interval Arithmetic in CUDA

9.1 Interval Arithmetic

9.2 Importance of Rounding Modes

9.3 Interval Operators in CUDA

9.4 Some Evaluations: Synthetic Benchmark

9.5 Application-Level Benchmark

9.6 Conclusion


Chapter 10. Approximating the erfinv Function

10.1 Introduction

10.2 New erfinv Approximations

10.3 Performance and Accuracy

10.4 Conclusions


Chapter 11. A Hybrid Method for Solving Tridiagonal Systems on the GPU

11.1 Introduction

11.3 Algorithms

11.4 Implementation

11.5 Results and Evaluation

11.6 Future Directions

Source code


Chapter 12. Accelerating CULA Linear Algebra Routines with Hybrid GPU and Multicore Computing

12.1 Introduction, Problem Statement, and Context

12.2 Core Methods

12.3 Algorithms, Implementations, and Evaluations

12.4 Final Evaluation and Validation]{Final Evaluation and Validation of Results, Total Benefits, and Limitations

12.5 Future Directions


Chapter 13. GPU Accelerated Derivative-Free Mesh Optimization

13.1 Introduction, Problem Statement, and Context

13.2 Core Method

13.3 Algorithms, Implementations, and Evaluations

13.4 Final Evaluation

13.5 Future Direction


Section 3: Engineering Simulation


State of GPU Computing in Engineering Simulations

In this Section

Chapter 14. Large-Scale Gas Turbine Simulations on GPU Clusters

14.1 Introduction, Problem Statement, and Context

14.2 Core Method

14.3 Algorithms, Implementations, and Evaluations

14.4 Final Evaluation

14.5 Test Case and Parallel Performance

14.6 Future Directions


Chapter 15. GPU Acceleration of Rarefied Gas Dynamic Simulations

15.1 Introduction, Problem Statement, and Context

15.2 Core Methods

15.3 Algorithms, Implementations, and Evaluations

15.4 Final Evaluation

15.5 Future Directions


Chapter 16. Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics

16.1 Introduction, Problem Statement, and Context

16.2 Core Method

16.3 Algorithms, Implementations, and Evaluations

16.4 Evaluation and Validation of Results, Total Benefits, Limitations

16.5 Future Directions



Chapter 17. CUDA Implementation of Vertex-Centered, Finite Volume CFD Methods on Unstructured Grids with Flow Control Applications

17.1 Introduction, Problem Statement, and Context

17.2 Core (CFD and Optimization) Methods

17.3 Implementations and Evaluation

17.4 Applications to Flow Control — Optimization


Chapter 18. Solving Wave Equations on Unstructured Geometries

18.1 Introduction, Problem Statement, and Context

18.2 Core Method

18.3 Algorithms, Implementations, and Evaluations

18.4 Final Evaluation

18.5 Future Directions



Chapter 19. Fast Electromagnetic Integral Equation Solvers on Graphics Processing Units

19.1 Problem Statement and Background

19.2 Algorithms Introduction

19.3 Algorithm Description

19.4 GPU Implementations

19.5 Results

19.6 Integrating the GPU NGIM Algorithms with Iterative IE Solvers

19.7 Future directions


Section 4: Interactive Physics and AI for Games and Engineering Simulation


State of GPU Computing in Interactive Physics and AI

In this Section

Chapter 20. Solving Large Multibody Dynamics Problems on the GPU

20.1 Introduction, Problem Statement, and Context

20.2 Core Method

20.3 The Time-Stepping Scheme

20.4 Algorithms, Implementations, and Evaluations

20.5 Final Evaluation

20.6 Future Directions



Chapter 21. Implicit FEM Solver on GPU for Interactive Deformation Simulation

21.1 Problem Statement and Context

21.2 Core Method

21.3 Algorithms and Implementations

21.4 Results and Evaluation

21.5 Future Directions



Chapter 22. Real-Time Adaptive GPU Multiagent Path Planning

22.1 Introduction

22.2 Core Method

22.3 Implementation

22.4 Results


Section 5: Computational Finance


State of GPU Computing in Computational Finance

In this Section

Chapter 23. Pricing Financial Derivatives with High Performance Finite Difference Solvers on GPUs

23.1 Introduction, Problem Statement, and Context

23.2 Core Method

23.3 Algorithms, Implementations, and Evaluations

23.4 Final Evaluation

23.5 Future Directions


Chapter 24. Large-Scale Credit Risk Loss Simulation

24.1 Introduction, Problem Statement, and Context

24.2 Core Methods

24.3 Algorithms, Implementations, Evaluations

24.4 Results and Conclusions

24.5 Future Developments



Chapter 25. Monte Carlo–Based Financial Market Value-at-Risk Estimation on GPUs

25.1 Introduction, Problem Statement, and Context

25.2 Core Methods

25.3 Algorithms, Implementations, and Evaluations

25.4 Final Results

25.5 Conclusion


Section 6: Programming Tools and Techniques


Programming Tools and Techniques for GPU Computing

In this Section

Chapter 26. Thrust: A Productivity-Oriented Library for CUDA

26.1 Motivation

26.2 Diving In

26.3 Generic Programming

26.4 Benefits of Abstraction

26.5 Best Practices


Chapter 27. GPU Scripting and Code Generation with PyCUDA

27.1 Introduction, Problem Statement, and Context

27.2 Core Method

27.3 Algorithms, Implementations, and Evaluations

27.4 Evaluation

27.5 Availability

27.6 Future Directions



Chapter 28. Jacket: GPU Powered MATLAB Acceleration

28.1 Introduction

28.2 Jacket

28.3 Benchmarking Procedures

28.4 Experimental Results

28.5 Future Directions


Chapter 29. Accelerating Development and Execution Speed with Just-in-Time GPU Code Generation

29.1 Introduction, Problem Statement, and Context

29.2 Core Methods

29.3 Algorithms, Implementations, and Evaluations

29.4 Final Evaluation

29.5 Future Directions


Chapter 30. GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot

30.1 Introduction

30.2 Core Technology

30.3 Algorithm, Implementation, and Benefits

30.4 Future Directions



Chapter 31. Abstraction for AoS and SoA Layout in C++

31.1 Introduction, Problem Statement, and Context

31.2 Core Method

31.3 Implementation

31.4 ASA in Practice

31.5 Final Evaluation



Chapter 32. Processing Device Arrays with C++ Metaprogramming

32.1 Introduction, Problem Statement, and Context

32.2 Core Method

32.3 Implementation

32.4 Evaluation

32.5 Future Directions


Chapter 33. GPU Metaprogramming: A Case Study in Biologically Inspired Machine Vision

33.1 Introduction, Problem Statement, and Context

33.2 Core Method

33.3 Algorithms, Implementations, and Evaluations

33.4 Final Evaluation

33.5 Future Directions


Chapter 34. A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs

34.1 Introduction, Problem Statement, and Context

34.2 Core Method

34.3 Algorithms, Implementations, and Evaluations

34.4 Final Evaluation

34.5 Future Directions


Chapter 35. Dynamic Load Balancing Using Work-Stealing

35.1 Introduction

35.2 Core Method

35.3 Algorithms and Implementations

35.4 Case Studies and Evaluation

35.5 Future Directions



Chapter 36. Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads

36.1 Introduction, Problem Statement, and Context

36.2 Core Method

36.3 Algorithms, Implementations, and Evaluations

36.4 Final Evaluation



Quotes and reviews

It wasn't until recently that parallel [GPU] computing made people realize that there are whole areas in computing science that we can tackle. … When you can do something 10 or 100 times faster, something magical happens and you can do something completely different.

—Jen-Hsun Huang, CEO, NVIDIA

Free Shipping
Shop with Confidence

Free Shipping around the world
▪ Broad range of products
▪ 30 days return policy

Contact Us