Biostatistics, 2nd Edition

1. INTRODUCTION

1.1 What is Biostatistics?

1.2 Data – The Key Component of a Study

1.3 Design – The Road to Relevant Data

1.4 Replication – Part of the Scientific Method

1.5 Applying Statistical Methods

Concluding Remarks

Exercises

References

2. DATA AND NUMBERS

2.1 Data: Numerical Representation

2.2 Observations and Variables

2.3 Scales Used with Variables

2.4 Reliability and Validity

2.5 Randomized Response Technique

2.6 Common Data Problems

Concluding Remarks

Exercises

References

3. DESCRIPTIVE METHODS

3.1 Introduction to Descriptive Methods

3.2 Tabular and Graphic Presentation of Data

3.2.1 Frequency Tables

3.2.2 Line Graphs

3.2.3 Bar Charts

3.2.4 Histograms

3.2.5 Stem-and-Leaf Plots

3.2.6 Dot Plots

3.2.7 Scatter Plots

3.3 Measures of Central Tendency

3.3.1 Mean, Median, and Mode

3.3.2 Use of the Measures of Central Tendency

3.3.3 The Geometric Mean

3.4 Measures of Variability

3.4.1 Ranges and Percentiles

3.4.2 Box Plots

3.4.3 Variance and Standard Deviation

3.5 Rates and Ratios

3.5.1 Crude and Specific Rates

3.5.2 Adjusted Rates

3.6 Measures of Change Over Time

3.6.1 Linear Growth

3.6.2 Geometric Growth

3.6.3 Exponential Growth

3.7 Correlation Coefficients

3.7.1 Pearson Correlation Coefficient

3.7.2 Spearman Rank Correlation Coefficient

Concluding Remarks

Exercises

References

4. PROBABILITY AND LIFE TABLES

4.1 A Definition of Probability

4.2 Rules for Calculating Probabilities

4.2.1 Addition Rule for Probabilities

4.2.2 Conditional Probabilities

4.2.3 Independent Events

4.3 Definitions from Epidemiology

4.3.1 Rates and Probabilities

4.3.2 Sensitivity, Specificity, and Predicted Value Positive and Negative

4.3.3 Receiver Operating Characteristic Plot

4.4 Bayes’ Theorem

4.5 Probability in Sampling

4.5.1 Sampling With Replacement

4.5.2 Sampling Without Replacement

4.6 Estimating Probabilities by Simulation

4.7 Probability and the Life Table

4.7.1 The First Four Columns of the Life Table

4.7.2 Some Uses of the Life Table

4.7.3 Expected Values in the Life Table

4.7.4 Other Expected Values in the Life Table

Concluding Remarks

Exercises

References

5. PROBABILITY DISTRIBUTIONS

5.1 The Binomial Distribution

5.1.1 Binomial Probabilities

5.1.2 Mean and Variance of the Binomial Distribution

5.1.3. Shapes of the Binomial Distribution

5.2 The Poisson Distribution

5.2.1 Poisson Probabilities

5.2.2 Mean and Variance of the Poisson Distribution

5.2.3 Finding Poisson Probabilities

5.3 The Normal Distribution

5.3.1 Normal Probabilities

5.3.2 Transforming to the Standard Normal Distribution

5.3.3 Calculation of Normal Probabilities

5.3.4 Normal Probability Plot

5.4 The Central Limit Theorem

5.5 Approximations to the Binomial and Poisson Distributions

5.5.1 Normal Approximation to the Binomial Distribution

5.5.2 Normal Approximation to the Poisson Distribution

Concluding Remarks

Exercises

References

6. STUDY DESIGNS

6.1 Design: Putting Chance to Work

6.2 Sample Surveys and Experiments

6.3 Sampling and Sample Designs

6.3.1 Sampling Frame

6.3.2 Importance of Probability Sampling

6.3.3 Simple Random Sampling

6.3.4 Systematic Sampling

6.3.5 Stratified Random Sampling

6.3.6 Cluster Sampling

6.3.7 Problems Due to Unintended Sampling

6.4 Designed Experiments

6.4.1 Comparison Groups and Randomization

6.4.2 Random Assignment

6.4.3 Sample Size

6.4.4 Single and Double Blind Experiments

6.4.5 Blocking and Extraneous Variables

6.4.6 Limitations of Experiments

6.5 Variations in Study Designs

6.5.1 The Cross-Over Design

6.5.2 The Case Control Design

6.5.3 The Cohort Study Design

Concluding Remarks

Exercises

References

7. INTERVAL ESTIMATION

7.1 Prediction, Confidence, and Tolerance Intervals

7.2 Distribution-Free Intervals

7.2.1 Prediction Interval

7.2.2 Confidence Interval

7.2.3 Tolerance Interval

7.3 Confidence Intervals Based on the Normal Distribution

7.3.1 Confidence Interval for the Mean

7.3.2 Confidence Interval for a Proportion

7.3.3 Confidence Interval for Crude and Adjusted Rates

7.4 Confidence Interval for the Difference of Two Means and Proportions

Difference of Two Independent Means

7.4.1 Difference of Two Dependent Means

7.4.2 Difference of Two Independent Proportions

7.4.3 Difference of Two Dependent Proportions

7.5 Confidence Interval and Sample Size

7.6 Confidence Interval for Other Measures

7.6.1 Confidence Interval for the Variance

7.6.2 Confidence Interval for Pearson Correlation Coefficient

7.7 Prediction and Tolerance Intervals Based on the Normal Distribution

7.7.1 Prediction Interval

7.7.2 Tolerance Interval

Concluding Remarks

Exercises

References

8. TESTS OF HYPOTHESES

8.1 Preliminaries in Tests of Hyppotheses

8.1.1 Definitions of Terms Used in Hypothesis Testing

8.1.2 Determination of Decision Rule

8.1.3 Relationship of the Decision Rule, á and â

8.1.4 Conducting the Test

8.2 Testing Hypotheses about the Mean

8.2.1 Known Variance

8.2.2 Unknown Varinace

8.3 Testing Hypotheses about the Proportion and Rates

8.4 Testing Hypotheses about the Variance

8.5 Testing Hypotheses about the Pearson Correlation Coefficient

8.6 Testing Hypotheses about the Difference of Two Means

8.6.1 Difference of Two Independent Means

8.6.2 Difference of Two Dependent Means

8.7 Testing Hypotheses about the Difference of Two Proportions

8.7.1 Difference of Two Independent Proportions

8.7.2 Difference of Two Dependent Means

8.8 Tests of Hypotheses and Sample Size

8.9 Statistical and Practical Significance

Concluding Remarks

Exercises

References

9. NONPARAMETRIC TESTS

9.1 Why Nonparametric Tests?

9.2 The Sign Test

9.3 The Wilcoxon Signed Rank Test

9.4 The Wilcoxon Rank Sum Test

9.5 The Kruskal-Wallis Test

9.6 The Friedman Test

Concluding Remarks

Exercises

References

10. ANALYSIS OF CATEGORICAL DATA

10.1 Goodness-of-Fit Test

10.2 The 2 by 2 Contingency Table

10.2.1 Comparing Two Independent Binomial Proportions

10.2.2 Expected Cell Counts Assuming No Association

10.2.3 The Odds Ratio – a Measure of Association

10.2.4 The Fisher’s Exact Test

10.2.5 Analysis of Paired Data: The McNemar Test

10.3 The r by c Contingency Table

10.3.1 Testing Hypothesis of Non Association: The Chi-Square Test

10.3.2 Testing Hypothesis of No Trend

10.4 Multiple 2 by 2 Tables

10.4.1 Analyzing the Tables Separately

10.4.2 The Cochran-Mantel-Haenszel Test

104.3 The Mantel-Haenszel Common Odds Ratio

Concluding Remarks

Exercises

References

11. ANALYSIS OF SURVIVAL DATA

11.1 Data Collection in Follow-Up Studies

11.2 The Life Table Method

11.3 The Product-Limit Method

11.4 Comparison of Two Survival Distributions

11.4.1 The Cochran-Mantel-Haenszel Test

11.4.2 The Log-Rank Test

Concluding Remarks

Exercises

References

12. ANALYSIS OF VARIANCE

12.1 Assumptions for the Use of the ANOVA

12.2 One-Way ANOVA

12.2.1 Sums of Squares and Mean Squares

12.2.2 The F Statistics

12.2.3 The ANOVA Table

12.3 Multiple Comparisons

12.3.1 Error Rates: Individual and Family

12.3.2 Tukey-Kramer Method

12.3.3 Fisher’s Least Significant Difference Method

12.3.4 Dunnett’s Method

12.4 Two-Way ANOVA for the Randomized Block Design with m Replicates

12.5 Two-Way ANOVA with Interaction

12.6 Linear Model Representation of the ANOVA

12.6.1 The Completely Randomized Design

12.6.2 The Randomized Block Design with m Replicates

12.6.3 Two-Way ANOVA with Interaction

12.7 ANOVA with Unequal Numbers of Observations in Subgroups

Concluding Remarks

Exercises

References

13. LINEAR REGRESSION

13.1 Simple Linear Regression

13.1.1 Estimation of Coefficients

13.1.2 The Variance of Y|X

13.1.3 The Coefficient of Determination (R2)

13.2 Inference about the Coefficients

13.2.1 Assumptions for Inference in Linear Regression

13.2.2 Regression Diagnostics

13.2.3 The Slope Coefficient

13.2.4 The Y-Intercept Coefficient

13.2.5 The ANOVA Summary Table

13.3 Interval Estimation for and

13.3.1 Confidence Interval for

13.3.2 Prediction Interval for

13.4 Multiple Linear Regression

13.4.1 The Multiple Linear Regression Model

13.4.2 Specification of a Multiple Linear Regression Model

13.4.3 The Parameter Estimates, ANOVA, and Diagnostics

13.4.4 Multicollinearity Problems

13.4.5 Extending the Regression Model: Dummy Variables

Concluding Remarks

Exercises

References

14. LOGISTIC AND PROPORTIONAL HAZARD REGRESSION

14.1 Introduction to Logistic Regression

14.2 Simple Logistic Regression

14.3 Multiple Logistic Regression

14.4 Ordered Logistic Regression

14.5 Introduction to Proportional Hazard Regression

Concluding Remarks

Exercises

References

15. ANALYSIS OF SURVEY DATA

15.1 Introduction to Design-Based Inference

15.2 Complex Design and Unequal Selection Probability

15.2.1 Sample Weight

15.2.2 Poststratification

15.2.3 The Design Effect

15.3 Strategies for Variance Estimation

15.3.1 Replicated Sampling: A General Approach

15.3.2 Balanced Repeated Replication

15.3.3 Jackknife Repeated Replication

15.3.4 Linearization Method

15.4 Strategies for Analysis

15.4.1 Preliminary Analysis

15.4.2 Subpopulation Analysis

15.4.3 Descriptive Analysis

15.4.4 Contingency Table Analysis

15.4.5 Linear and Logistic Regression Analysis

Concluding Remarks

Exercises

References

Appendices

A. BASIC MATHEMATIC CONCEPTS

B. STATISTICAL TABLES

B1. Random Digits

B2. Binomial Probabilities

B3. Poisson Probabilities

B4. Critical Values for the t Distribution

B6. Charts for Confidence Intervals for the Proportion

B7. Critical Values for the Chi-Square Distribution

B8. Factors, k, for Two-Sided Tolerance Limits for Normal Distribution

B9. Critical Values for the Wilcoxon Signed Rank Test

B10. Critical Values for the Wilcoxon Rank Sum Test

B11. Critical Values for the F Distribution

B12. The Studentized Range for the Kramer-Tukey Procedure

B13. The Studentized Range for the Dunnett Procedure

C. SELECTED GOVERNMENTAL BIOSTATISTICAL DATA

C1. Population Census Data

C2. Vital Statistics

C3. Sample Surveys

C4. Life Tables

D. SOLUTIONS TO SELECTED EXERCISES