Bader, Martin (Professor),
R-ticulate : a beginner's guide to data analysis for natural scientists / Beginner's guide to data analysis for natural scientists. Martin Bader, Sebastian Leuzinger. - 1 online resource (xiii, 205 pages) : illustrations (some color) -
Includes bibliographical references and index.
Table of Contents
Foreword ix
Preface xi
About the Companion Website xiii
1 Hypotheses, Variables, Data 1
1.1 Occam’s Razor 2
1.2 Scientific Hypotheses 2
1.3 The Choice of a Software 3
1.3.1 First Steps in R 3
1.4 Variables 5
1.4.1 Variable Names and Values 5
1.4.2 Types of Variables 10
1.4.3 Predictor and Response Variables 11
1.5 Data Processing and Data Formats 12
1.5.1 The Long vs. the Wide Format 12
1.5.2 Choice of Variable, Dataset, and File Names 12
1.5.3 Adding, Removing, and Subsetting Variables and Data Frames 14
1.5.4 Aggregating Data 17
1.5.5 Working with Time and Strings 19
2 Measuring Variation 23
2.1 What Is Variation? 23
2.2 Treatment vs. Control 23
2.3 Systematic and Unsystematic Variation 24
2.4 The Signal-to-Noise Ratio 25
2.5 Measuring Variation Graphically 26
2.6 Measuring Variation Using Metrics 27
2.7 The Standard Error 29
2.8 Population vs. Sample 31
3 Distributions and Probabilities 35
3.1 Probability Distributions 35
3.2 Finding the Best Fitting Distribution for Sample Data 37
3.2.1 Graphical Tools 37
3.2.2 Goodness-of-Fit Tests 39
3.3 Quantiles 42
3.4 Probabilities 44
3.4.1 Density Functions (dnorm, dbinom, .) 44
3.4.2 Probability Distribution Functions (pnorm, pbinom, .) 46
3.4.3 Quantile Functions (qnorm, qbinom, .) 48
3.4.4 Random Sampling Functions (rnorm, rbinom, .) 49
3.5 The Normal Distribution 50
3.6 Central Limit Theorem 50
3.7 Test Statistics 52
3.7.1 Null and Alternative Hypotheses 53
3.7.2 The Alpha Threshold and Significance Levels 54
3.7.3 Type I and Type II Errors 54
References 56
4 Replication and Randomisation 57
4.1 Replication 57
4.2 Statistical Independence 60
4.3 Randomisation 61
4.4 Randomisation in R 64
4.5 Spatial Replication and Randomisation in Observational Studies 65
5 Two-Sample and One-Sample Tests 67
5.1 The t-Statistic 67
5.2 Two Sample Tests: Comparing Two Groups 67
5.2.1 Student’s t-Test 67
5.2.1.1 Testing for Normality 68
5.2.1.2 What to Write in a Report or Paper and How to Visualise the Results of a t-Test 74
5.2.1.3 Two-Tailed vs. One-Tailed t-Tests 75
5.2.2 Rank-Based Two-Sample Tests 77
5.3 One-Sample Tests 78
5.4 Power Analyses and Sample Size Determination 79
6 Communicating Quantitative Information Using Visuals 83
6.1 The Fundamentals of Scientific Plotting 84
6.2 Scatter Plots 85
6.3 Line Plots 87
6.4 Box Plots and Bar Plots 89
6.5 Multipanel Plots and Plotting Regions 91
6.6 Adding Text, Formulae, and Colour 92
6.7 Interaction Plots 94
6.8 Images, Colour Contour Plots, and 3D Plots 94
6.8.1 Adding Images to Plots 94
6.8.2 Colour Contour Plots 96
References 101
7 Working with Categorical Data 103
7.1 Tabling and Visualising Categorical Data 103
7.2 Contingency Tables 105
7.3 The Chi-squared Test 106
7.4 Decision Trees 108
7.5 Optimising Decision Trees 111
References 113
8 Working with Continuous Data 115
8.1 Covariance 115
8.2 Correlation Coefficient 116
8.3 Transformations 118
8.4 Plotting Correlations 120
8.5 Correlation Tests 122
References 124
9 Linear Regression 125
9.1 Basics and Simple Linear Regression 125
9.1.1 Making Sense of the summary Output for Regression Models Fitted with lm 128
9.1.2 Model Diagnostics 131
9.1.3 Model Predictions and Visualisation 135
9.1.4 What to Write in a Report or Paper? 137
9.1.4.1 Material and Methods 137
9.1.4.2 Results 137
9.1.5 Dealing with Variance Heterogeneity 137
9.2 Multiple Linear Regression 140
9.2.1 Multicollinearity in Multiple Regression Models 143
9.2.2 Testing Interactions Among Predictors 147
9.2.3 Model Selection and Comparison 148
9.2.4 Variable Importance 151
9.2.5 Visualising Multiple Linear Regression Results 151
References 154
10 One or More Categorical Predictors – Analysis of Variance 155
10.1 Comparing Groups 155
10.2 Comparing Groups Numerically 155
10.3 One-way ANOVA Using R 161
10.4 Checking for the Model Assumptions 162
10.5 Post Hoc Comparisons 162
10.6 Two-way ANOVA and Interactions 165
10.7 What If the Model Assumptions Are Violated? 166
Reference 168
11 Analysis of Covariance (ANCOVA) 169
11.1 Interpreting ANCOVA Results 171
11.2 Post Hoc Test for ANCOVA 176
References 177
12 Some of What Lies Ahead 179
12.1 Generalised Linear Models 179
12.2 Nonlinear Regression 185
12.2.1 Initial Parameter Estimates (Starting Values) 187
12.2.2 Nonlinear Model Fitting and Visualisation 187
12.3 Generalised Additive Models 189
12.4 Modern Approaches to Dealing with Heteroscedasticity 191
12.4.1 Variance Modelling Using Generalised Least-squares Estimation 193
12.4.2 Robust, Heteroscedasticity-Consistent Covariance Matrix Estimation 195
References 198
Index 201
"This book is a compact, example-based statistics textbook that closely follows contemporary curricula taught in large parts of the world. It is a user-friendly textbook without unnecessary frills, but instead filled with real-world examples, practical tips, online exercises, resources, and references to extensions, all on a level that is commonly taught at introductory postgraduate and levels. Several features clearly distinguish this book from what is currently available on the market. On the one hand, a lot of the easier textbooks available are lengthy, covering a wide range of topics that are not necessarily taught at university, some including methods that are now rarely used, particularly in the private sector. Further, most texts assume familiarity with statistical software already, and lack a gentle introduction to the specific software that is used. On the other hand, the more specialized textbooks are well outside the reach of most of today's students, even at postgraduate level, again often assuming a high level of statistical programming skills. This book aims to fill that gap, which, while in its core a traditional printed book, will come with a wealth of online teaching material for lecturers and students. The authors make use of R, quite simply the most used statistics software in science. The content structure is peculiar in the sense that statistical skills are introduced at the same time as software (programming) skills in R. This poses a challenge for students and their lecturers but seems by far the best way of teaching from the author's experience. By a careful, but concurrent, step by step introduction to both statistical principles and software skills, this text guides the student in an unprecedented way. A color coding system is used to keep the two content matters apart."--
About the Author
Martin Bader gained an MSc in geography at Saarland University in Germany and an MSc in biology at Waikato University, New Zealand. He earned a PhD in plant ecology at the University of Basel, Switzerland. After post-doctoral stints in Switzerland and Australia he joined the New Zealand Forest Research Institute as a forest ecologist and biostatistician. Following a senior lecturer appointment at Auckland University of Technology, New Zealand, he is now a professor of forest ecology at Linnaeus University, Sweden. He has taught undergraduate and postgraduate courses in statistics at universities and research institutes in various parts of the world. His research focuses on the physiological responses of plants to climate change and their biotic interactions.
Sebastian Leuzinger did his first degree in marine biology at James Cook University, Australia, with a postgraduate degree in statistics (University of Neuchatel, Switzerland) and a PhD in plant ecology (University of Basel, Switzerland). He has done post-doctoral studies at ETH Zurich, Switzerland, in forest ecology and modelling before joining Auckland University of Technology where he is a full professor in ecology. He has taught undergraduate and postgraduate statistics for natural scientists for over a decade. His research is on global change impacts on plants, with a special interest in meta-analysis of global change experiments.
9781119717997 1119717981 1119718007 1119718023 9781119717980 9781119718000 9781119718024
9781119717997 O'Reilly Media
R (Computer program language)
Science--Data processing.
Science--Statistical methods.
Electronic books.
Q183.9 / .B33 2024
502.85/5133
R-ticulate : a beginner's guide to data analysis for natural scientists / Beginner's guide to data analysis for natural scientists. Martin Bader, Sebastian Leuzinger. - 1 online resource (xiii, 205 pages) : illustrations (some color) -
Includes bibliographical references and index.
Table of Contents
Foreword ix
Preface xi
About the Companion Website xiii
1 Hypotheses, Variables, Data 1
1.1 Occam’s Razor 2
1.2 Scientific Hypotheses 2
1.3 The Choice of a Software 3
1.3.1 First Steps in R 3
1.4 Variables 5
1.4.1 Variable Names and Values 5
1.4.2 Types of Variables 10
1.4.3 Predictor and Response Variables 11
1.5 Data Processing and Data Formats 12
1.5.1 The Long vs. the Wide Format 12
1.5.2 Choice of Variable, Dataset, and File Names 12
1.5.3 Adding, Removing, and Subsetting Variables and Data Frames 14
1.5.4 Aggregating Data 17
1.5.5 Working with Time and Strings 19
2 Measuring Variation 23
2.1 What Is Variation? 23
2.2 Treatment vs. Control 23
2.3 Systematic and Unsystematic Variation 24
2.4 The Signal-to-Noise Ratio 25
2.5 Measuring Variation Graphically 26
2.6 Measuring Variation Using Metrics 27
2.7 The Standard Error 29
2.8 Population vs. Sample 31
3 Distributions and Probabilities 35
3.1 Probability Distributions 35
3.2 Finding the Best Fitting Distribution for Sample Data 37
3.2.1 Graphical Tools 37
3.2.2 Goodness-of-Fit Tests 39
3.3 Quantiles 42
3.4 Probabilities 44
3.4.1 Density Functions (dnorm, dbinom, .) 44
3.4.2 Probability Distribution Functions (pnorm, pbinom, .) 46
3.4.3 Quantile Functions (qnorm, qbinom, .) 48
3.4.4 Random Sampling Functions (rnorm, rbinom, .) 49
3.5 The Normal Distribution 50
3.6 Central Limit Theorem 50
3.7 Test Statistics 52
3.7.1 Null and Alternative Hypotheses 53
3.7.2 The Alpha Threshold and Significance Levels 54
3.7.3 Type I and Type II Errors 54
References 56
4 Replication and Randomisation 57
4.1 Replication 57
4.2 Statistical Independence 60
4.3 Randomisation 61
4.4 Randomisation in R 64
4.5 Spatial Replication and Randomisation in Observational Studies 65
5 Two-Sample and One-Sample Tests 67
5.1 The t-Statistic 67
5.2 Two Sample Tests: Comparing Two Groups 67
5.2.1 Student’s t-Test 67
5.2.1.1 Testing for Normality 68
5.2.1.2 What to Write in a Report or Paper and How to Visualise the Results of a t-Test 74
5.2.1.3 Two-Tailed vs. One-Tailed t-Tests 75
5.2.2 Rank-Based Two-Sample Tests 77
5.3 One-Sample Tests 78
5.4 Power Analyses and Sample Size Determination 79
6 Communicating Quantitative Information Using Visuals 83
6.1 The Fundamentals of Scientific Plotting 84
6.2 Scatter Plots 85
6.3 Line Plots 87
6.4 Box Plots and Bar Plots 89
6.5 Multipanel Plots and Plotting Regions 91
6.6 Adding Text, Formulae, and Colour 92
6.7 Interaction Plots 94
6.8 Images, Colour Contour Plots, and 3D Plots 94
6.8.1 Adding Images to Plots 94
6.8.2 Colour Contour Plots 96
References 101
7 Working with Categorical Data 103
7.1 Tabling and Visualising Categorical Data 103
7.2 Contingency Tables 105
7.3 The Chi-squared Test 106
7.4 Decision Trees 108
7.5 Optimising Decision Trees 111
References 113
8 Working with Continuous Data 115
8.1 Covariance 115
8.2 Correlation Coefficient 116
8.3 Transformations 118
8.4 Plotting Correlations 120
8.5 Correlation Tests 122
References 124
9 Linear Regression 125
9.1 Basics and Simple Linear Regression 125
9.1.1 Making Sense of the summary Output for Regression Models Fitted with lm 128
9.1.2 Model Diagnostics 131
9.1.3 Model Predictions and Visualisation 135
9.1.4 What to Write in a Report or Paper? 137
9.1.4.1 Material and Methods 137
9.1.4.2 Results 137
9.1.5 Dealing with Variance Heterogeneity 137
9.2 Multiple Linear Regression 140
9.2.1 Multicollinearity in Multiple Regression Models 143
9.2.2 Testing Interactions Among Predictors 147
9.2.3 Model Selection and Comparison 148
9.2.4 Variable Importance 151
9.2.5 Visualising Multiple Linear Regression Results 151
References 154
10 One or More Categorical Predictors – Analysis of Variance 155
10.1 Comparing Groups 155
10.2 Comparing Groups Numerically 155
10.3 One-way ANOVA Using R 161
10.4 Checking for the Model Assumptions 162
10.5 Post Hoc Comparisons 162
10.6 Two-way ANOVA and Interactions 165
10.7 What If the Model Assumptions Are Violated? 166
Reference 168
11 Analysis of Covariance (ANCOVA) 169
11.1 Interpreting ANCOVA Results 171
11.2 Post Hoc Test for ANCOVA 176
References 177
12 Some of What Lies Ahead 179
12.1 Generalised Linear Models 179
12.2 Nonlinear Regression 185
12.2.1 Initial Parameter Estimates (Starting Values) 187
12.2.2 Nonlinear Model Fitting and Visualisation 187
12.3 Generalised Additive Models 189
12.4 Modern Approaches to Dealing with Heteroscedasticity 191
12.4.1 Variance Modelling Using Generalised Least-squares Estimation 193
12.4.2 Robust, Heteroscedasticity-Consistent Covariance Matrix Estimation 195
References 198
Index 201
"This book is a compact, example-based statistics textbook that closely follows contemporary curricula taught in large parts of the world. It is a user-friendly textbook without unnecessary frills, but instead filled with real-world examples, practical tips, online exercises, resources, and references to extensions, all on a level that is commonly taught at introductory postgraduate and levels. Several features clearly distinguish this book from what is currently available on the market. On the one hand, a lot of the easier textbooks available are lengthy, covering a wide range of topics that are not necessarily taught at university, some including methods that are now rarely used, particularly in the private sector. Further, most texts assume familiarity with statistical software already, and lack a gentle introduction to the specific software that is used. On the other hand, the more specialized textbooks are well outside the reach of most of today's students, even at postgraduate level, again often assuming a high level of statistical programming skills. This book aims to fill that gap, which, while in its core a traditional printed book, will come with a wealth of online teaching material for lecturers and students. The authors make use of R, quite simply the most used statistics software in science. The content structure is peculiar in the sense that statistical skills are introduced at the same time as software (programming) skills in R. This poses a challenge for students and their lecturers but seems by far the best way of teaching from the author's experience. By a careful, but concurrent, step by step introduction to both statistical principles and software skills, this text guides the student in an unprecedented way. A color coding system is used to keep the two content matters apart."--
About the Author
Martin Bader gained an MSc in geography at Saarland University in Germany and an MSc in biology at Waikato University, New Zealand. He earned a PhD in plant ecology at the University of Basel, Switzerland. After post-doctoral stints in Switzerland and Australia he joined the New Zealand Forest Research Institute as a forest ecologist and biostatistician. Following a senior lecturer appointment at Auckland University of Technology, New Zealand, he is now a professor of forest ecology at Linnaeus University, Sweden. He has taught undergraduate and postgraduate courses in statistics at universities and research institutes in various parts of the world. His research focuses on the physiological responses of plants to climate change and their biotic interactions.
Sebastian Leuzinger did his first degree in marine biology at James Cook University, Australia, with a postgraduate degree in statistics (University of Neuchatel, Switzerland) and a PhD in plant ecology (University of Basel, Switzerland). He has done post-doctoral studies at ETH Zurich, Switzerland, in forest ecology and modelling before joining Auckland University of Technology where he is a full professor in ecology. He has taught undergraduate and postgraduate statistics for natural scientists for over a decade. His research is on global change impacts on plants, with a special interest in meta-analysis of global change experiments.
9781119717997 1119717981 1119718007 1119718023 9781119717980 9781119718000 9781119718024
9781119717997 O'Reilly Media
R (Computer program language)
Science--Data processing.
Science--Statistical methods.
Electronic books.
Q183.9 / .B33 2024
502.85/5133