U�na-�Alvarez, Jacobo de, 1972-
The statistical analysis of doubly truncated data : with applications in R / Jacobo de Una-Alvarez, Carla Moreira, Rosa M. Crujeiras. - 1 online resource. - Wiley series in probability and statistics. . - Wiley series in probability and statistics. .
Includes bibliographical references and index.
Table of Contents
Preface xi
List of Abbreviations xiii
Notation xv
1 Introduction 1
1.1 Random Truncation 1
1.2 One-sided Truncation 2
1.2.1 Left-truncation 2
1.2.2 Right-truncation 2
1.2.3 Truncation vs. Censoring 3
1.3 Double Truncation 3
1.4 Real Data Examples 5
1.4.1 Childhood Cancer Data 5
1.4.2 AIDS Blood Transfusion Data 6
1.4.3 Equipment-S Rounded Failure Time Data 7
1.4.4 Quasar Data 7
1.4.5 Parkinson’s Disease Data 8
1.4.6 Acute Coronary Syndrome Data 9
References 10
2 One-Sample Problems 13
2.1 Nonparametric Estimation of a Distribution Function 13
2.1.1 The NPMLE 14
2.1.2 Numerical Algorithms for Computing the NPMLE 21
2.1.3 Theoretical Properties of the NPMLE 24
2.1.4 Standard Errors and Confidence Limits 36
2.2 Semiparametric and Parametric Approaches 43
2.2.1 Semiparametric Approach 44
2.2.2 Parametric Approach 52
2.3 R Code for the Examples 56
2.3.1 Code for Example 2.1.8 56
2.3.2 Code for Examples 2.1.11 and 2.1.13 56
2.3.3 Code for Example 2.1.14 58
2.3.4 Code for Example 2.1.15 59
2.3.5 Code for Example 2.1.22 60
2.3.6 Code for Example 2.2.6 61
2.3.7 Code for Example 2.2.8 62
References 65
3 Smoothing Methods 69
3.1 Some Background in Kernel Estimation 69
3.2 Estimating the Density Function 71
3.3 Asymptotic Properties 71
3.4 Data-driven Bandwidth Selection 77
3.4.1 Normal Reference Bandwidth Selection 78
3.4.2 Plug-in Bandwidth Selection 79
3.4.3 Least-squares Cross-validation Bandwidth Selection 80
3.4.4 Smoothed Bootstrap Bandwidth Selection 81
3.4.5 Bandwidth Selectors in Practice 82
3.5 Further Issues in Kernel Density Estimation 88
3.6 Estimating the Hazard Function 90
3.7 R Code for the Examples 98
3.7.1 Code for Example 3.2.1 98
3.7.2 Code for Examples 3.3.4 and 3.3.5 99
3.7.3 Code for Examples 3.4.2 and 3.4.3 100
3.7.4 Code for Example 3.5.1 102
3.7.5 Code for Example 3.6.4 104
3.7.6 Code for Example 3.6.5 105
References 106
4 Regression Analysis 109
4.1 Observational Bias in Regression 109
4.2 Proportional Hazards Regression 114
4.3 Accelerated Failure Time Regression 117
4.4 Nonparametric Regression 121
4.5 R Code for the Examples 126
4.5.1 Code for Example 4.1.1 126
4.5.2 Code for Example 4.1.4 126
4.5.3 Code for Example 4.2.4 127
4.5.4 Code for Example 4.3.2 127
4.5.5 Code for Example 4.4.2 128
References 129
5 Further Topics 131
5.1 Two-Sample Problems 132
5.2 Competing Risks 137
5.2.1 Cumulative Incidences 139
5.2.2 Regression Models for Competing Risks 142
5.3 Testing for Quasi-independence 146
5.4 Dependent Truncation 150
5.5 R Code for the Examples 157
5.5.1 Code for Example 5.1.3 157
5.5.2 Code for Example 5.2.4 159
5.5.3 Code for Example 5.2.6 160
5.5.4 Code for Example 5.3.1 161
5.5.5 Code for Example 5.4.3 161
References 162
A Packages and Functions in R 165
A.1 Computing the NPMLE and Standard Errors 166
A.2 Assessing the Existence and Uniqueness of the NPMLE 167
A.3 Semiparametric and Parametric Estimation 168
A.4 Kernel Estimation 168
A.5 Regression Analysis 169
A.6 Competing Risks 169
A.7 Simulating Data 170
A.8 Testing Quasi-independence 170
A.9 Dependent Truncation 170
References 171
Index 173
"This book is the result of a long-standing collaboration among the three authors, which began when Carla Moreira was a PhD student under the supervision of Jacobo de Un�a-�Alvarez. Carla successfully defended her thesis, entitled 'The Statistical Analysis of Doubly Truncated Data: New Methods, Software Development, and Biomedical Applications', at the Universidade de Vigo in July 2010. At that time, just a reduced group of people seemed to be aware of the importance of random double truncation. Research papers on this topic were scarce before 2010, with the contribution by Bradley Efron and Vahe Petrosian in 1999 as the most relevant one. And, of course, no software was available. So, for us, it was a risky and exciting research exercise to embrace such an initiative. This book aims to serve as a companion for those ones interested in learning about doubly truncated data analysis and inference, presenting a wide range of tools for estimating distribution and regression models. All the methods presented in this book are accompanied by real data and simulated examples and, at the end of each chapter, the reader will find the do-it-yourself code, mostly based on DTDA package. This book is not written with the aim of being just read: its main purpose is to invite the reader to think, explore and experience"--
About the Authors
Jacobo de Uña-Álvarez is Professor at the Department of Statistics and Operations Research, University of Vigo in Spain.
Carla Moreira is Associate Researcher at the Centre of Mathematics -School of Sciences, University of Minho in Portugal. She is also affiliated to the Statistical Inference, Decision and Operations Research group, University of Vigo, Spain, and to the Epidemiology Research unit- Institute of Public Health, University of Porto, Portugal.
Rosa M. Crujeiras is Associate Professor at the Department of Statistics, Mathematical Analysis and Optimization, University of Santiago de Compostela, Spain.
1119951372 9781119500476 1119500478 9781119500483 1119500486 9781119500469 111950046X
Biometry--methods.
Statistics as Topic.
Data Interpretation, Statistical.
Programming Languages.
Models, Statistical.
Biometry--Methods.
Statistics--Computer programs.
R (Computer program language)
Electronic books.
Methods (Music)
QH323.5
570.1/5195
QH 323.5
The statistical analysis of doubly truncated data : with applications in R / Jacobo de Una-Alvarez, Carla Moreira, Rosa M. Crujeiras. - 1 online resource. - Wiley series in probability and statistics. . - Wiley series in probability and statistics. .
Includes bibliographical references and index.
Table of Contents
Preface xi
List of Abbreviations xiii
Notation xv
1 Introduction 1
1.1 Random Truncation 1
1.2 One-sided Truncation 2
1.2.1 Left-truncation 2
1.2.2 Right-truncation 2
1.2.3 Truncation vs. Censoring 3
1.3 Double Truncation 3
1.4 Real Data Examples 5
1.4.1 Childhood Cancer Data 5
1.4.2 AIDS Blood Transfusion Data 6
1.4.3 Equipment-S Rounded Failure Time Data 7
1.4.4 Quasar Data 7
1.4.5 Parkinson’s Disease Data 8
1.4.6 Acute Coronary Syndrome Data 9
References 10
2 One-Sample Problems 13
2.1 Nonparametric Estimation of a Distribution Function 13
2.1.1 The NPMLE 14
2.1.2 Numerical Algorithms for Computing the NPMLE 21
2.1.3 Theoretical Properties of the NPMLE 24
2.1.4 Standard Errors and Confidence Limits 36
2.2 Semiparametric and Parametric Approaches 43
2.2.1 Semiparametric Approach 44
2.2.2 Parametric Approach 52
2.3 R Code for the Examples 56
2.3.1 Code for Example 2.1.8 56
2.3.2 Code for Examples 2.1.11 and 2.1.13 56
2.3.3 Code for Example 2.1.14 58
2.3.4 Code for Example 2.1.15 59
2.3.5 Code for Example 2.1.22 60
2.3.6 Code for Example 2.2.6 61
2.3.7 Code for Example 2.2.8 62
References 65
3 Smoothing Methods 69
3.1 Some Background in Kernel Estimation 69
3.2 Estimating the Density Function 71
3.3 Asymptotic Properties 71
3.4 Data-driven Bandwidth Selection 77
3.4.1 Normal Reference Bandwidth Selection 78
3.4.2 Plug-in Bandwidth Selection 79
3.4.3 Least-squares Cross-validation Bandwidth Selection 80
3.4.4 Smoothed Bootstrap Bandwidth Selection 81
3.4.5 Bandwidth Selectors in Practice 82
3.5 Further Issues in Kernel Density Estimation 88
3.6 Estimating the Hazard Function 90
3.7 R Code for the Examples 98
3.7.1 Code for Example 3.2.1 98
3.7.2 Code for Examples 3.3.4 and 3.3.5 99
3.7.3 Code for Examples 3.4.2 and 3.4.3 100
3.7.4 Code for Example 3.5.1 102
3.7.5 Code for Example 3.6.4 104
3.7.6 Code for Example 3.6.5 105
References 106
4 Regression Analysis 109
4.1 Observational Bias in Regression 109
4.2 Proportional Hazards Regression 114
4.3 Accelerated Failure Time Regression 117
4.4 Nonparametric Regression 121
4.5 R Code for the Examples 126
4.5.1 Code for Example 4.1.1 126
4.5.2 Code for Example 4.1.4 126
4.5.3 Code for Example 4.2.4 127
4.5.4 Code for Example 4.3.2 127
4.5.5 Code for Example 4.4.2 128
References 129
5 Further Topics 131
5.1 Two-Sample Problems 132
5.2 Competing Risks 137
5.2.1 Cumulative Incidences 139
5.2.2 Regression Models for Competing Risks 142
5.3 Testing for Quasi-independence 146
5.4 Dependent Truncation 150
5.5 R Code for the Examples 157
5.5.1 Code for Example 5.1.3 157
5.5.2 Code for Example 5.2.4 159
5.5.3 Code for Example 5.2.6 160
5.5.4 Code for Example 5.3.1 161
5.5.5 Code for Example 5.4.3 161
References 162
A Packages and Functions in R 165
A.1 Computing the NPMLE and Standard Errors 166
A.2 Assessing the Existence and Uniqueness of the NPMLE 167
A.3 Semiparametric and Parametric Estimation 168
A.4 Kernel Estimation 168
A.5 Regression Analysis 169
A.6 Competing Risks 169
A.7 Simulating Data 170
A.8 Testing Quasi-independence 170
A.9 Dependent Truncation 170
References 171
Index 173
"This book is the result of a long-standing collaboration among the three authors, which began when Carla Moreira was a PhD student under the supervision of Jacobo de Un�a-�Alvarez. Carla successfully defended her thesis, entitled 'The Statistical Analysis of Doubly Truncated Data: New Methods, Software Development, and Biomedical Applications', at the Universidade de Vigo in July 2010. At that time, just a reduced group of people seemed to be aware of the importance of random double truncation. Research papers on this topic were scarce before 2010, with the contribution by Bradley Efron and Vahe Petrosian in 1999 as the most relevant one. And, of course, no software was available. So, for us, it was a risky and exciting research exercise to embrace such an initiative. This book aims to serve as a companion for those ones interested in learning about doubly truncated data analysis and inference, presenting a wide range of tools for estimating distribution and regression models. All the methods presented in this book are accompanied by real data and simulated examples and, at the end of each chapter, the reader will find the do-it-yourself code, mostly based on DTDA package. This book is not written with the aim of being just read: its main purpose is to invite the reader to think, explore and experience"--
About the Authors
Jacobo de Uña-Álvarez is Professor at the Department of Statistics and Operations Research, University of Vigo in Spain.
Carla Moreira is Associate Researcher at the Centre of Mathematics -School of Sciences, University of Minho in Portugal. She is also affiliated to the Statistical Inference, Decision and Operations Research group, University of Vigo, Spain, and to the Epidemiology Research unit- Institute of Public Health, University of Porto, Portugal.
Rosa M. Crujeiras is Associate Professor at the Department of Statistics, Mathematical Analysis and Optimization, University of Santiago de Compostela, Spain.
1119951372 9781119500476 1119500478 9781119500483 1119500486 9781119500469 111950046X
Biometry--methods.
Statistics as Topic.
Data Interpretation, Statistical.
Programming Languages.
Models, Statistical.
Biometry--Methods.
Statistics--Computer programs.
R (Computer program language)
Electronic books.
Methods (Music)
QH323.5
570.1/5195
QH 323.5