Machine learning and big data with kdb+/q / Jan Novotny, Paul A. Bilokon, Aris Galiotos, Frédéric Déléze.

By: Novotny, Jan, 1982- [author.]
Contributor(s): Bilokon, Paul A, 1982- [author.] | Galiotos, Aris, 1979- [author.] | Délèze, Frédéric, 1975- [author.]
Language: English Series: Wiley financePublisher: Chichester, West Sussex : Wiley, 2020Description: 1 online resourceContent type: text Media type: computer Carrier type: online resourceISBN: 9781119404736; 9781119404729 ; 9781119404743Subject(s): kdb+ (Computer file) | Query languages (Computer science) | Databases | Machine learning | Big dataGenre/Form: Electronic books.DDC classification: 005.74 LOC classification: QA76.7Online resources: Full text is available at Wiley Online Library Click here to view
Contents:
TABLE OF CONTENTS Preface xvii About the Authors xxiii Part One Language Fundamentals Chapter 1 Fundamentals of the q Programming Language 3 1.1 The (Not So Very) First Steps in q 3 1.2 Atoms and Lists 5 1.2.1 Casting Types 11 1.3 Basic Language Constructs 14 1.3.1 Assigning, Equality and Matching 14 1.3.2 Arithmetic Operations and Right-to-Left Evaluation: Introduction to q Philosophy 17 1.4 Basic Operators 19 1.5 Difference between Strings and Symbols 31 1.5.1 Enumeration 31 1.6 Matrices and Basic Linear Algebra in q 33 1.7 Launching the Session: Additional Options 35 1.8 Summary and How-To’s 38 Chapter 2 Dictionaries and Tables: The q Fundamentals 41 2.1 Dictionary 41 2.2 Table 44 2.3 The Truth about Tables 48 2.4 Keyed Tables are Dictionaries 50 2.5 From a Vector Language to an Algebraic Language 51 Chapter 3 Functions 57 3.1 Namespace 59 3.1.0.1 .quantQ. Namespace 60 3.2 The Six Adverbs 60 3.2.1 Each 60 3.2.1.1 Each 61 3.2.1.2 Each-left \: 61 3.2.1.3 Each-right /: 62 3.2.1.4 Cross Product /: \: 62 3.2.1.5 Each-both ' 63 3.2.2 Each-prior ': 66 3.2.3 Compose (’) 67 3.2.4 Over and Fold / 67 3.2.5 Scan 68 3.2.5.1 EMA: The Exponential Moving Average 69 3.2.6 Converge 70 3.2.6.1 Converge-repeat 70 3.2.6.2 Converge-iterate 71 3.3 Apply 72 3.3.1 @ (apply) 72 3.3.2 . (apply) 73 3.4 Protected Evaluations 75 3.5 Vector Operations 76 3.5.1 Aggregators 76 3.5.1.1 Simple Aggregators 76 3.5.1.2 Weighted Aggregators 77 3.5.2 Uniform Functions 77 3.5.2.1 Running Functions 77 3.5.2.2 Window Functions 78 3.6 Convention for User-Defined Functions 79 Chapter 4 Editors and Other Tools 81 4.1 Console 81 4.2 Jupyter Notebook 82 4.3 GUIs 84 4.3.1 qStudio 85 4.3.2 Q Insight Pad 88 4.4 IDEs: IntelliJ IDEA 90 4.5 Conclusion 92 Chapter 5 Debugging q Code 93 5.1 Introduction to Making It Wrong: Errors 93 5.1.1 Syntax Errors 94 5.1.2 Runtime Errors 94 5.1.2.1 The Type Error 95 5.1.2.2 Other Errors 98 5.2 Debugging the Code 100 5.3 Debugging Server-Side 102 Part Two Data Operations Chapter 6 Splayed and Partitioned Tables 107 6.1 Introduction 107 6.2 Saving a Table as a Single Binary File 108 6.3 Splayed Tables 110 6.4 Partitioned Tables 113 6.5 Conclusion 119 Chapter 7 Joins 121 7.1 Comma Operator 121 7.2 Join Functions 125 7.2.1 ij 125 7.2.2 ej 126 7.2.3 lj 126 7.2.4 pj 127 7.2.5 upsert 128 7.2.6 uj 129 7.2.7 aj 131 7.2.8 aj0 134 7.2.8.1 The Next Valid Join 135 7.2.9 asof 138 7.2.10 wj 140 7.3 Advanced Example: Running TWAP 144 Chapter 8 Parallelisation 151 8.1 Parallel Vector Operations 152 8.2 Parallelisation over Processes 155 8.3 Map-Reduce 155 8.4 Advanced Topic: Parallel File/Directory Access 158 Chapter 9 Data Cleaning and Filtering 161 9.1 Predicate Filtering 161 9.1.1 The Where Clause 161 9.1.2 Aggregation Filtering 163 9.2 Data Cleaning, Normalising and APIs 163 Chapter 10 Parse Trees 165 10.1 Definition 166 10.1.1 Evaluation 166 10.1.2 Parse Tree Creation 170 10.1.3 Read-Only Evaluation 170 10.2 Functional Queries 171 10.2.1 Functional Select 174 10.2.2 Functional Exec 178 10.2.3 Functional Update 179 10.2.4 Functional Delete 180 Chapter 11 A Few Use Cases 181 11.1 Rolling VWAP 181 11.1.1 N Tick VWAP 181 11.1.2 TimeWindow VWAP 182 11.2 Weighted Mid for N Levels of an Order Book 183 11.3 Consecutive Runs of a Rule 185 11.4 Real-Time Signals and Alerts 186 Part Three Data Science Chapter 12 Basic Overview of Statistics 191 12.1 Histogram 191 12.2 First Moments 196 12.3 Hypothesis Testing 198 12.3.1 Normal p-values 198 12.3.2 Correlation 201 12.3.2.1 Implementation 202 12.3.3 t-test: One Sample 202 12.3.3.1 Implementation 204 12.3.4 t-test: Two Samples 204 12.3.4.1 Implementation 205 12.3.5 Sign Test 206 12.3.5.1 Implementation of the Test 208 12.3.5.2 Median Test 211 12.3.6 Wilcoxon Signed-Rank Test 212 12.3.7 Rank Correlation and Somers’ D 214 12.3.7.1 Implementation 216 12.3.8 Multiple Hypothesis Testing 221 12.3.8.1 Bonferroni Correction 224 12.3.8.2 Šidák’s Correction 224 12.3.8.3 Holm’s Method 225 12.3.8.4 Example 226 Chapter 13 Linear Regression 229 13.1 Linear Regression 230 13.2 Ordinary Least Squares 231 13.3 The Geometric Representation of Linear Regression 233 13.3.1 Moore–Penrose Pseudoinverse 235 13.3.2 Adding Intercept 237 13.4 Implementation of the OLS 240 13.5 Significance of Parameters 243 13.6 How Good is the Fit: R2 244 13.6.1 Adjusted R-squared 247 13.7 Relationship with Maximum Likelihood Estimation and AIC with Small Sample Correction 248 13.8 Estimation Suite 252 13.9 Comparing Two Nested Models: Towards a Stopping Rule 254 13.9.1 Comparing Two General Models 256 13.10 In-/Out-of-Sample Operations 257 13.11 Cross-validation 262 13.12 Conclusion 264 Chapter 14 Time Series Econometrics 265 14.1 Autoregressive and Moving Average Processes 265 14.1.1 Introduction 265 14.1.2 AR(p) Process 266 14.1.2.1 Simulation 266 14.1.2.2 Estimation of AR(p) Parameters 268 14.1.2.3 Least Square Method 268 14.1.2.4 Example 269 14.1.2.5 Maximum Likelihood Estimator 269 14.1.2.6 Yule-Walker Technique 269 14.1.3 MA(q) Process 271 14.1.3.1 Estimation of MA(q) Parameters 272 14.1.3.2 Simulation 272 14.1.3.3 Example 273 14.1.4 ARMA(p, q) Process 273 14.1.4.1 Invertibility of the ARMA(p, q) Process 274 14.1.4.2 Hannan-Rissanen Algorithm: Two-Step Regression Estimation 274 14.1.4.3 Yule-Walker Estimation 274 14.1.4.4 Maximum Likelihood Estimation 275 14.1.4.5 Simulation 275 14.1.4.6 Forecast 276 14.1.5 ARIMA(p, d, q) Process 276 14.1.6 Code 276 14.1.6.1 Simulation 277 14.1.6.2 Estimation 278 14.1.6.3 Forecast 282 14.2 Stationarity and Granger Causality 285 14.2.1 Stationarity 285 14.2.2 Test of Stationarity – Dickey-Fuller and Augmented Dickey-Fuller Tests 286 14.2.3 Granger Causality 286 14.3 Vector Autoregression 287 14.3.1 VAR(p) Process 288 14.3.1.1 Notation 288 14.3.1.2 Estimator 288 14.3.1.3 Example 289 14.3.1.4 Code 293 14.3.2 VARX(p, q) Process 297 14.3.2.1 Estimator 297 14.3.2.2 Code 298 Chapter 15 Fourier Transform 301 15.1 Complex Numbers 301 15.1.1 Properties of Complex Numbers 302 15.2 Discrete Fourier Transform 308 15.3 Addendum: Quaternions 314 15.4 Addendum: Fractals 321 Chapter 16 Eigensystem and PCA 325 16.1 Theory 325 16.2 Algorithms 327 16.2.1 QR Decomposition 328 16.2.2 QR Algorithm for Eigenvalues 330 16.2.3 Inverse Iteration 331 16.3 Implementation of Eigensystem Calculation 332 16.3.1 QR Decomposition 333 16.3.2 Inverse Iteration 337 16.4 The Data Matrix and the Principal Component Analysis 341 16.4.1 The Data Matrix 341 16.4.2 PCA: The First Principal Component 344 16.4.3 Second Principal Component 345 16.4.4 Terminology and Explained Variance 347 16.4.5 Dimensionality Reduction 349 16.4.6 PCA Regression (PCR) 350 16.5 Implementation of PCA 351 16.6 Appendix: Determinant 354 16.6.1 Theory 354 16.6.2 Techniques to Calculate a Determinant 355 16.6.3 Implementation of the Determinant 356 Chapter 17 Outlier Detection 359 17.1 Local Outlier Factor 360 Chapter 18 Simulating Asset Prices 369 18.1 Stochastic Volatility Process with Price Jumps 369 18.2 Towards the Numerical Example 371 18.2.1 Numerical Preliminaries 371 18.2.2 Implementing Stochastic Volatility Process with Jumps 374 18.3 Conclusion 378 Part Four Machine Learning Chapter 19 Basic Principles of Machine Learning 381 19.1 Non-Numeric Features and Normalisation 381 19.1.1 Non-Numeric Features 381 19.1.1.1 Ordinal Features 382 19.1.1.2 Categorical Features 383 19.1.2 Normalisation 383 19.1.2.1 Normal Score 384 19.1.2.2 Range Scaling 385 19.2 Iteration: Constructing Machine Learning Algorithms 386 19.2.1 Iteration 386 19.2.2 Constructing Machine Learning Algorithms 389 Chapter 20 Linear Regression with Regularisation 391 20.1 Bias–Variance Trade-off 392 20.2 Regularisation 393 20.3 Ridge Regression 394 20.4 Implementation of the Ridge Regression 396 20.4.1 Optimisation of the Regularisation Parameter 401 20.5 Lasso Regression 403 20.6 Implementation of the Lasso Regression 405 Chapter 21 Nearest Neighbours 419 21.1 k-Nearest Neighbours Classifier 419 21.2 Prototype Clustering 423 21.3 Feature Selection: Local Nearest Neighbours Approach 429 21.3.1 Implementation 430 Chapter 22 Neural Networks 437 22.1 Theoretical Introduction 437 22.1.1 Calibration 440 22.1.1.1 Backpropagation 441 22.1.2 The Learning Rate Parameter 443 22.1.3 Initialisation 443 22.1.4 Overfitting 444 22.1.5 Dimension of the Hidden Layer(s) 444 22.2 Implementation of Neural Networks 445 22.2.1 Multivariate Encoder 445 22.2.2 Neurons 446 22.2.3 Training the Neural Network 448 22.3 Examples 451 22.3.1 Binary Classification 451 22.3.2 M-class Classification 454 22.3.3 Regression 457 22.4 Possible Suggestions 463 Chapter 23 AdaBoost with Stumps 465 23.1 Boosting 465 23.2 Decision Stumps 466 23.3 AdaBoost 467 23.4 Implementation of AdaBoost 468 23.5 Recommendation for Readers 474 Chapter 24 Trees 477 24.1 Introduction to Trees 477 24.2 Regression Trees 479 24.2.1 Cost-Complexity Pruning 481 24.3 Classification Tree 482 24.4 Miscellaneous 484 24.5 Implementation of Trees 485 Chapter 25 Forests 495 25.1 Bootstrap 495 25.2 Bagging 498 25.2.1 Out-of-Bag 499 25.3 Implementation 500 25.3.1 Prediction 503 25.3.2 Feature Selection 505 Chapter 26 Unsupervised Machine Learning: The Apriori Algorithm 509 26.1 Apriori Algorithm 510 26.2 Implementation of the Apriori Algorithm 511 Chapter 27 Processing Information 523 27.1 Information Retrieval 523 27.1.1 Corpus: Leonardo da Vinci 523 27.1.2 Frequency Counting 524 27.1.3 tf-idf 528 27.2 Information as Features 532 27.2.1 Sample: Simulated Proteins 533 27.2.2 Kernels and Metrics for Proteins 535 27.2.3 Implementation of Inner Products and Nearest Neighbours Principles 535 27.2.4 Further Topics 539 Chapter 28 Towards AI – Monte Carlo Tree Search 541 28.1 Multi-Armed Bandit Problem 541 28.1.1 Analytic Solutions 543 28.1.2 Greedy Algorithms 543 28.1.3 Confidence-Based Algorithms 544 28.1.4 Bayesian Algorithms 546 28.1.5 Online Gradient Descent Algorithms 547 28.1.6 Implementation of Some Learning Algorithms 547 28.2 Monte Carlo Tree Search 558 28.2.1 Selection Step 561 28.2.2 Expansion Step 562 28.2.3 Simulation Step 563 28.2.4 Back Propagation Step 563 28.2.5 Finishing the Algorithm 563 28.2.6 Remarks and Extensions 564 28.3 Monte Carlo Tree Search Implementation – Tic-tac-toe 565 28.3.1 Random Games 566 28.3.2 Towards the MCTS 570 28.3.3 Case Study 579 28.4 Monte Carlo Tree Search – Additional Comments 579 28.4.1 Policy and Value Networks 579 28.4.2 Reinforcement Learning 581 Chapter 29 Econophysics: The Agent-Based Computational Models 583 29.1 Agent-Based Modelling 584 29.1.1 Agent-Based Models in Society 584 29.1.2 Agent-Based Models in Finance 586 29.2 Ising Agent-Based Model for Financial Markets 587 29.2.1 Ising Model in Physics 587 29.2.2 Ising Model of Interacting Agents 587 29.2.3 Numerical Implementation 588 29.3 Conclusion 592 Chapter 30 Epilogue: Art 595 Bibliography 601 Index 607
Summary: "The book will start with an examination of the foundations of kdb+/q and will proceed to consider the practicalities of dealing with real high-frequency data, and then demonstrate how kdb+/q can be used to solve econometric problems of practical importance. The exploratory journey of the language follows the path the high-frequency quants undertake every time they develop a working strategy: from data description and summary statistics to basic regression methods and cointegration, from volatility estimation and modelling to optimal execution, from market impact and microstructure analyses to advanced machine learning techniques including the neural networks"-- Provided by publisher.
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
Item type Current location Home library Call number Status Date due Barcode Item holds
EBOOK EBOOK COLLEGE LIBRARY
COLLEGE LIBRARY
005.74 N859 2020 (Browse shelf) Available CL-52957
Total holds: 0

ABOUT THE AUTHOR
JAN NOVOTNY is an eFX quant trader at Deutsche Bank. Previously, he worked at the Centre for Econometric Analysis on high-frequency econometric models. He holds a PhD from CERGE-EI, Charles University, Prague.

PAUL A. BILOKON is CEO and founder of Thalesians Ltd and an expert in algorithmic trading. He previously worked at Nomura, Lehman Brothers, and Morgan Stanley. Paul was educated at Christ Church College, Oxford, and Imperial College.

ARIS GALIOTOS is the global technical lead for the eFX kdb+ team at HSBC, where he helps develop a big data installation processing billions of real-time records per day. Aris holds an MSc in Financial Mathematics with Distinction from the University of Edinburgh.

FRÉDÉRIC DÉLÈZE is an independent algorithm trader and consultant. He has designed automated trading strategies for hedge funds and developed quantitative risk models for investment banks. He holds a PhD in Finance from Hanken School of Economics, Helsinki.

Includes bibliographical references and index.

TABLE OF CONTENTS
Preface xvii

About the Authors xxiii

Part One Language Fundamentals

Chapter 1 Fundamentals of the q Programming Language 3

1.1 The (Not So Very) First Steps in q 3

1.2 Atoms and Lists 5

1.2.1 Casting Types 11

1.3 Basic Language Constructs 14

1.3.1 Assigning, Equality and Matching 14

1.3.2 Arithmetic Operations and Right-to-Left Evaluation: Introduction to q Philosophy 17

1.4 Basic Operators 19

1.5 Difference between Strings and Symbols 31

1.5.1 Enumeration 31

1.6 Matrices and Basic Linear Algebra in q 33

1.7 Launching the Session: Additional Options 35

1.8 Summary and How-To’s 38

Chapter 2 Dictionaries and Tables: The q Fundamentals 41

2.1 Dictionary 41

2.2 Table 44

2.3 The Truth about Tables 48

2.4 Keyed Tables are Dictionaries 50

2.5 From a Vector Language to an Algebraic Language 51

Chapter 3 Functions 57

3.1 Namespace 59

3.1.0.1 .quantQ. Namespace 60

3.2 The Six Adverbs 60

3.2.1 Each 60

3.2.1.1 Each 61

3.2.1.2 Each-left \: 61

3.2.1.3 Each-right /: 62

3.2.1.4 Cross Product /: \: 62

3.2.1.5 Each-both ' 63

3.2.2 Each-prior ': 66

3.2.3 Compose (’) 67

3.2.4 Over and Fold / 67

3.2.5 Scan 68

3.2.5.1 EMA: The Exponential Moving Average 69

3.2.6 Converge 70

3.2.6.1 Converge-repeat 70

3.2.6.2 Converge-iterate 71

3.3 Apply 72

3.3.1 @ (apply) 72

3.3.2 . (apply) 73

3.4 Protected Evaluations 75

3.5 Vector Operations 76

3.5.1 Aggregators 76

3.5.1.1 Simple Aggregators 76

3.5.1.2 Weighted Aggregators 77

3.5.2 Uniform Functions 77

3.5.2.1 Running Functions 77

3.5.2.2 Window Functions 78

3.6 Convention for User-Defined Functions 79

Chapter 4 Editors and Other Tools 81

4.1 Console 81

4.2 Jupyter Notebook 82

4.3 GUIs 84

4.3.1 qStudio 85

4.3.2 Q Insight Pad 88

4.4 IDEs: IntelliJ IDEA 90

4.5 Conclusion 92

Chapter 5 Debugging q Code 93

5.1 Introduction to Making It Wrong: Errors 93

5.1.1 Syntax Errors 94

5.1.2 Runtime Errors 94

5.1.2.1 The Type Error 95

5.1.2.2 Other Errors 98

5.2 Debugging the Code 100

5.3 Debugging Server-Side 102

Part Two Data Operations

Chapter 6 Splayed and Partitioned Tables 107

6.1 Introduction 107

6.2 Saving a Table as a Single Binary File 108

6.3 Splayed Tables 110

6.4 Partitioned Tables 113

6.5 Conclusion 119

Chapter 7 Joins 121

7.1 Comma Operator 121

7.2 Join Functions 125

7.2.1 ij 125

7.2.2 ej 126

7.2.3 lj 126

7.2.4 pj 127

7.2.5 upsert 128

7.2.6 uj 129

7.2.7 aj 131

7.2.8 aj0 134

7.2.8.1 The Next Valid Join 135

7.2.9 asof 138

7.2.10 wj 140

7.3 Advanced Example: Running TWAP 144

Chapter 8 Parallelisation 151

8.1 Parallel Vector Operations 152

8.2 Parallelisation over Processes 155

8.3 Map-Reduce 155

8.4 Advanced Topic: Parallel File/Directory Access 158

Chapter 9 Data Cleaning and Filtering 161

9.1 Predicate Filtering 161

9.1.1 The Where Clause 161

9.1.2 Aggregation Filtering 163

9.2 Data Cleaning, Normalising and APIs 163

Chapter 10 Parse Trees 165

10.1 Definition 166

10.1.1 Evaluation 166

10.1.2 Parse Tree Creation 170

10.1.3 Read-Only Evaluation 170

10.2 Functional Queries 171

10.2.1 Functional Select 174

10.2.2 Functional Exec 178

10.2.3 Functional Update 179

10.2.4 Functional Delete 180

Chapter 11 A Few Use Cases 181

11.1 Rolling VWAP 181

11.1.1 N Tick VWAP 181

11.1.2 TimeWindow VWAP 182

11.2 Weighted Mid for N Levels of an Order Book 183

11.3 Consecutive Runs of a Rule 185

11.4 Real-Time Signals and Alerts 186

Part Three Data Science

Chapter 12 Basic Overview of Statistics 191

12.1 Histogram 191

12.2 First Moments 196

12.3 Hypothesis Testing 198

12.3.1 Normal p-values 198

12.3.2 Correlation 201

12.3.2.1 Implementation 202

12.3.3 t-test: One Sample 202

12.3.3.1 Implementation 204

12.3.4 t-test: Two Samples 204

12.3.4.1 Implementation 205

12.3.5 Sign Test 206

12.3.5.1 Implementation of the Test 208

12.3.5.2 Median Test 211

12.3.6 Wilcoxon Signed-Rank Test 212

12.3.7 Rank Correlation and Somers’ D 214

12.3.7.1 Implementation 216

12.3.8 Multiple Hypothesis Testing 221

12.3.8.1 Bonferroni Correction 224

12.3.8.2 Šidák’s Correction 224

12.3.8.3 Holm’s Method 225

12.3.8.4 Example 226

Chapter 13 Linear Regression 229

13.1 Linear Regression 230

13.2 Ordinary Least Squares 231

13.3 The Geometric Representation of Linear Regression 233

13.3.1 Moore–Penrose Pseudoinverse 235

13.3.2 Adding Intercept 237

13.4 Implementation of the OLS 240

13.5 Significance of Parameters 243

13.6 How Good is the Fit: R2 244

13.6.1 Adjusted R-squared 247

13.7 Relationship with Maximum Likelihood Estimation and AIC with Small Sample Correction 248

13.8 Estimation Suite 252

13.9 Comparing Two Nested Models: Towards a Stopping Rule 254

13.9.1 Comparing Two General Models 256

13.10 In-/Out-of-Sample Operations 257

13.11 Cross-validation 262

13.12 Conclusion 264

Chapter 14 Time Series Econometrics 265

14.1 Autoregressive and Moving Average Processes 265

14.1.1 Introduction 265

14.1.2 AR(p) Process 266

14.1.2.1 Simulation 266

14.1.2.2 Estimation of AR(p) Parameters 268

14.1.2.3 Least Square Method 268

14.1.2.4 Example 269

14.1.2.5 Maximum Likelihood Estimator 269

14.1.2.6 Yule-Walker Technique 269

14.1.3 MA(q) Process 271

14.1.3.1 Estimation of MA(q) Parameters 272

14.1.3.2 Simulation 272

14.1.3.3 Example 273

14.1.4 ARMA(p, q) Process 273

14.1.4.1 Invertibility of the ARMA(p, q) Process 274

14.1.4.2 Hannan-Rissanen Algorithm: Two-Step Regression Estimation 274

14.1.4.3 Yule-Walker Estimation 274

14.1.4.4 Maximum Likelihood Estimation 275

14.1.4.5 Simulation 275

14.1.4.6 Forecast 276

14.1.5 ARIMA(p, d, q) Process 276

14.1.6 Code 276

14.1.6.1 Simulation 277

14.1.6.2 Estimation 278

14.1.6.3 Forecast 282

14.2 Stationarity and Granger Causality 285

14.2.1 Stationarity 285

14.2.2 Test of Stationarity – Dickey-Fuller and Augmented Dickey-Fuller Tests 286

14.2.3 Granger Causality 286

14.3 Vector Autoregression 287

14.3.1 VAR(p) Process 288

14.3.1.1 Notation 288

14.3.1.2 Estimator 288

14.3.1.3 Example 289

14.3.1.4 Code 293

14.3.2 VARX(p, q) Process 297

14.3.2.1 Estimator 297

14.3.2.2 Code 298

Chapter 15 Fourier Transform 301

15.1 Complex Numbers 301

15.1.1 Properties of Complex Numbers 302

15.2 Discrete Fourier Transform 308

15.3 Addendum: Quaternions 314

15.4 Addendum: Fractals 321

Chapter 16 Eigensystem and PCA 325

16.1 Theory 325

16.2 Algorithms 327

16.2.1 QR Decomposition 328

16.2.2 QR Algorithm for Eigenvalues 330

16.2.3 Inverse Iteration 331

16.3 Implementation of Eigensystem Calculation 332

16.3.1 QR Decomposition 333

16.3.2 Inverse Iteration 337

16.4 The Data Matrix and the Principal Component Analysis 341

16.4.1 The Data Matrix 341

16.4.2 PCA: The First Principal Component 344

16.4.3 Second Principal Component 345

16.4.4 Terminology and Explained Variance 347

16.4.5 Dimensionality Reduction 349

16.4.6 PCA Regression (PCR) 350

16.5 Implementation of PCA 351

16.6 Appendix: Determinant 354

16.6.1 Theory 354

16.6.2 Techniques to Calculate a Determinant 355

16.6.3 Implementation of the Determinant 356

Chapter 17 Outlier Detection 359

17.1 Local Outlier Factor 360

Chapter 18 Simulating Asset Prices 369

18.1 Stochastic Volatility Process with Price Jumps 369

18.2 Towards the Numerical Example 371

18.2.1 Numerical Preliminaries 371

18.2.2 Implementing Stochastic Volatility Process with Jumps 374

18.3 Conclusion 378

Part Four Machine Learning

Chapter 19 Basic Principles of Machine Learning 381

19.1 Non-Numeric Features and Normalisation 381

19.1.1 Non-Numeric Features 381

19.1.1.1 Ordinal Features 382

19.1.1.2 Categorical Features 383

19.1.2 Normalisation 383

19.1.2.1 Normal Score 384

19.1.2.2 Range Scaling 385

19.2 Iteration: Constructing Machine Learning Algorithms 386

19.2.1 Iteration 386

19.2.2 Constructing Machine Learning Algorithms 389

Chapter 20 Linear Regression with Regularisation 391

20.1 Bias–Variance Trade-off 392

20.2 Regularisation 393

20.3 Ridge Regression 394

20.4 Implementation of the Ridge Regression 396

20.4.1 Optimisation of the Regularisation Parameter 401

20.5 Lasso Regression 403

20.6 Implementation of the Lasso Regression 405

Chapter 21 Nearest Neighbours 419

21.1 k-Nearest Neighbours Classifier 419

21.2 Prototype Clustering 423

21.3 Feature Selection: Local Nearest Neighbours Approach 429

21.3.1 Implementation 430

Chapter 22 Neural Networks 437

22.1 Theoretical Introduction 437

22.1.1 Calibration 440

22.1.1.1 Backpropagation 441

22.1.2 The Learning Rate Parameter 443

22.1.3 Initialisation 443

22.1.4 Overfitting 444

22.1.5 Dimension of the Hidden Layer(s) 444

22.2 Implementation of Neural Networks 445

22.2.1 Multivariate Encoder 445

22.2.2 Neurons 446

22.2.3 Training the Neural Network 448

22.3 Examples 451

22.3.1 Binary Classification 451

22.3.2 M-class Classification 454

22.3.3 Regression 457

22.4 Possible Suggestions 463

Chapter 23 AdaBoost with Stumps 465

23.1 Boosting 465

23.2 Decision Stumps 466

23.3 AdaBoost 467

23.4 Implementation of AdaBoost 468

23.5 Recommendation for Readers 474

Chapter 24 Trees 477

24.1 Introduction to Trees 477

24.2 Regression Trees 479

24.2.1 Cost-Complexity Pruning 481

24.3 Classification Tree 482

24.4 Miscellaneous 484

24.5 Implementation of Trees 485

Chapter 25 Forests 495

25.1 Bootstrap 495

25.2 Bagging 498

25.2.1 Out-of-Bag 499

25.3 Implementation 500

25.3.1 Prediction 503

25.3.2 Feature Selection 505

Chapter 26 Unsupervised Machine Learning: The Apriori Algorithm 509

26.1 Apriori Algorithm 510

26.2 Implementation of the Apriori Algorithm 511

Chapter 27 Processing Information 523

27.1 Information Retrieval 523

27.1.1 Corpus: Leonardo da Vinci 523

27.1.2 Frequency Counting 524

27.1.3 tf-idf 528

27.2 Information as Features 532

27.2.1 Sample: Simulated Proteins 533

27.2.2 Kernels and Metrics for Proteins 535

27.2.3 Implementation of Inner Products and Nearest Neighbours Principles 535

27.2.4 Further Topics 539

Chapter 28 Towards AI – Monte Carlo Tree Search 541

28.1 Multi-Armed Bandit Problem 541

28.1.1 Analytic Solutions 543

28.1.2 Greedy Algorithms 543

28.1.3 Confidence-Based Algorithms 544

28.1.4 Bayesian Algorithms 546

28.1.5 Online Gradient Descent Algorithms 547

28.1.6 Implementation of Some Learning Algorithms 547

28.2 Monte Carlo Tree Search 558

28.2.1 Selection Step 561

28.2.2 Expansion Step 562

28.2.3 Simulation Step 563

28.2.4 Back Propagation Step 563

28.2.5 Finishing the Algorithm 563

28.2.6 Remarks and Extensions 564

28.3 Monte Carlo Tree Search Implementation – Tic-tac-toe 565

28.3.1 Random Games 566

28.3.2 Towards the MCTS 570

28.3.3 Case Study 579

28.4 Monte Carlo Tree Search – Additional Comments 579

28.4.1 Policy and Value Networks 579

28.4.2 Reinforcement Learning 581

Chapter 29 Econophysics: The Agent-Based Computational Models 583

29.1 Agent-Based Modelling 584

29.1.1 Agent-Based Models in Society 584

29.1.2 Agent-Based Models in Finance 586

29.2 Ising Agent-Based Model for Financial Markets 587

29.2.1 Ising Model in Physics 587

29.2.2 Ising Model of Interacting Agents 587

29.2.3 Numerical Implementation 588

29.3 Conclusion 592

Chapter 30 Epilogue: Art 595

Bibliography 601

Index 607

"The book will start with an examination of the foundations of kdb+/q and will proceed to consider the practicalities of dealing with real high-frequency data, and then demonstrate how kdb+/q can be used to solve econometric problems of practical importance. The exploratory journey of the language follows the path the high-frequency quants undertake every time they develop a working strategy: from data description and summary statistics to basic regression methods and cointegration, from volatility estimation and modelling to optimal execution, from market impact and microstructure analyses to advanced machine learning techniques including the neural networks"-- Provided by publisher.

Description based on print version record and CIP data provided by publisher.

There are no comments for this item.

to post a comment.