A general introduction to data analytics / (Record no. 59879)

000 -LEADER
fixed length control field 12268cam a2200505 i 4500
001 - CONTROL NUMBER
control field 20348447
003 - CONTROL NUMBER IDENTIFIER
control field CITU
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20230216165728.0
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS--GENERAL INFORMATION
fixed length control field m |o d |
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field cr |n|||||||||
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 180207s2018 nju ob 001 0 eng
010 ## - LIBRARY OF CONGRESS CONTROL NUMBER
LC control number 2018005929
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781119296256 (pdf)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781119296263 (epub)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
Cancelled/invalid ISBN 9781119296249 (cloth)
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 9781119296294
040 ## - CATALOGING SOURCE
Original cataloging agency DLC
Language of cataloging eng
Description conventions rda
Transcribing agency DLC
Modifying agency DLC
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title eng.
042 ## - AUTHENTICATION CODE
Authentication code pcc
050 00 - LIBRARY OF CONGRESS CALL NUMBER
Classification number QA276.4
082 00 - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 519.50285
Edition number 23
100 1# - MAIN ENTRY--PERSONAL NAME
Preferred name for the person Moreira, João,
Dates associated with a name 1969-
Relator term author.
245 12 - TITLE STATEMENT
Title A general introduction to data analytics /
Statement of responsibility, etc by Jõao Mendes Moreira, André C. P. L. F. de Carvalho, Tomás Horváth.
264 #1 - PUBLICATION, DISTRIBUTION, ETC. (IMPRINT)
Place of publication, distribution, etc Hoboken, NJ :
Name of publisher, distributor, etc John Wiley & Sons,
Date of publication, distribution, etc 2018.
300 ## - PHYSICAL DESCRIPTION
Extent 1 online resource.
336 ## - CONTENT TYPE
Source rdacontent
Content type term text
Content type code txt
337 ## - MEDIA TYPE
Source rdamedia
Media type term computer
Media type code c
338 ## - CARRIER TYPE
Source rdacarrier
Carrier type term online resource
Carrier type code cr
500 ## - GENERAL NOTE
General note ABOUT THE AUTHORS<br/>João Mendes Moreira, PhD, is an assistant professor in the Faculty of Engineering at the University of Porto, Porto, Portugal and is also a researcher in LIAAD-INESC TEC, Porto, Portugal.<br/><br/>André de Carvalho, PhD, is a full professor in the Institute of Mathematics and Computer Science at the University of São Paulo, Brazil.<br/><br/>Tomáš Horváth, PhD, is an assistant professor at the Faculty of Informatics of the Eötvös Loránd University in Budapest, Hungary, and is also associated with the Faculty of Science at the Pavol Jozef Šafárik University in Košice, Slovakia.
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc Includes bibliographical references and index.
505 0# - CONTENTS
Formatted contents note TABLE OF CONTENTS<br/>Preface xiii<br/><br/>Acknowledgments xv<br/><br/>Presentational Conventions xvii<br/><br/>About the Companion Website xix<br/><br/>Part I Introductory Background 1<br/><br/>1 What Can We Do With Data? 3<br/><br/>1.1 Big Data and Data Science 4<br/><br/>1.2 Big Data Architectures 5<br/><br/>1.3 Small Data 6<br/><br/>1.4 What is Data? 7<br/><br/>1.5 A Short Taxonomy of Data Analytics 9<br/><br/>1.6 Examples of Data Use 10<br/><br/>1.6.1 Breast Cancer in Wisconsin 11<br/><br/>1.6.2 Polish Company Insolvency Data 11<br/><br/>1.7 A Project on Data Analytics 12<br/><br/>1.7.1 A Little History on Methodologies for Data Analytics 12<br/><br/>1.7.2 The KDD Process 14<br/><br/>1.7.3 The CRISP-DM Methodology 15<br/><br/>1.8 How this Book is Organized 16<br/><br/>1.9 Who Should Read this Book 18<br/><br/>Part II Getting Insights from Data 19<br/><br/>2 Descriptive Statistics 21<br/><br/>2.1 Scale Types 22<br/><br/>2.2 Descriptive Univariate Analysis 25<br/><br/>2.2.1 Univariate Frequencies 25<br/><br/>2.2.2 Univariate Data Visualization 27<br/><br/>2.2.3 Univariate Statistics 32<br/><br/>2.2.4 Common Univariate Probability Distributions 38<br/><br/>2.3 Descriptive Bivariate Analysis 40<br/><br/>2.3.1 Two Quantitative Attributes 41<br/><br/>2.3.2 Two Qualitative Attributes, at Least one of them Nominal 45<br/><br/>2.3.3 Two Ordinal Attributes 46<br/><br/>2.4 Final Remarks 47<br/><br/>2.5 Exercises 47<br/><br/>3 Descriptive Multivariate Analysis 49<br/><br/>3.1 Multivariate Frequencies 49<br/><br/>3.2 Multivariate Data Visualization 50<br/><br/>3.3 Multivariate Statistics 59<br/><br/>3.3.1 Location Multivariate Statistics 59<br/><br/>3.3.2 Dispersion Multivariate Statistics 60<br/><br/>3.4 Infographics and Word Clouds 66<br/><br/>3.4.1 Infographics 66<br/><br/>3.4.2 Word Clouds 67<br/><br/>3.5 Final Remarks 67<br/><br/>3.6 Exercises 68<br/><br/>4 Data Quality and Preprocessing 71<br/><br/>4.1 Data Quality 71<br/><br/>4.1.1 Missing Values 72<br/><br/>4.1.2 Redundant Data 74<br/><br/>4.1.3 Inconsistent Data 75<br/><br/>4.1.4 Noisy Data 76<br/><br/>4.1.5 Outliers 77<br/><br/>4.2 Converting to a Different Scale Type 77<br/><br/>4.2.1 Converting Nominal to Relative 78<br/><br/>4.2.2 Converting Ordinal to Relative or Absolute 81<br/><br/>4.2.3 Converting Relative or Absolute to Ordinal or Nominal 82<br/><br/>4.3 Converting to a Different Scale 83<br/><br/>4.4 Data Transformation 85<br/><br/>4.5 Dimensionality Reduction 86<br/><br/>4.5.1 Attribute Aggregation 88<br/><br/>4.5.1.1 Principal Component Analysis 88<br/><br/>4.5.1.2 Independent Component Analysis 91<br/><br/>4.5.1.3 Multidimensional Scaling 91<br/><br/>4.5.2 Attribute Selection 92<br/><br/>4.5.2.1 Filters 92<br/><br/>4.5.2.2 Wrappers 93<br/><br/>4.5.2.3 Embedded 94<br/><br/>4.5.2.4 Search Strategies 95<br/><br/>4.6 Final Remarks 96<br/><br/>4.7 Exercises 96<br/><br/>5 Clustering 99<br/><br/>5.1 Distance Measures 100<br/><br/>5.1.1 Differences between Values of Common Attribute Types 101<br/><br/>5.1.2 Distance Measures for Objects with Quantitative Attributes 103<br/><br/>5.1.3 Distance Measures for Non-conventional Attributes 104<br/><br/>5.2 Clustering Validation 107<br/><br/>5.3 Clustering Techniques 108<br/><br/>5.3.1 K-means 110<br/><br/>5.3.1.1 Centroids and Distance Measures 110<br/><br/>5.3.1.2 How K-means Works 111<br/><br/>5.3.2 DBSCAN 115<br/><br/>5.3.3 Agglomerative Hierarchical Clustering Technique 117<br/><br/>5.3.3.1 Linkage Criterion 119<br/><br/>5.3.3.2 Dendrograms 120<br/><br/>5.4 Final Remarks 122<br/><br/>5.5 Exercises 123<br/><br/>6 Frequent Pattern Mining 125<br/><br/>6.1 Frequent Itemsets 127<br/><br/>6.1.1 Setting the min_sup Threshold 128<br/><br/>6.1.2 Apriori – a Join-based Method 131<br/><br/>6.1.3 Eclat 133<br/><br/>6.1.4 FP-Growth 134<br/><br/>6.1.5 Maximal and Closed Frequent Itemsets 138<br/><br/>6.2 Association Rules 139<br/><br/>6.3 Behind Support and Confidence 142<br/><br/>6.3.1 Cross-support Patterns 143<br/><br/>6.3.2 Lift 144<br/><br/>6.3.3 Simpson’s Paradox 145<br/><br/>6.4 Other Types of Pattern 147<br/><br/>6.4.1 Sequential patterns 147<br/><br/>6.4.2 Frequent Sequence Mining 148<br/><br/>6.4.3 Closed and Maximal Sequences 148<br/><br/>6.5 Final Remarks 149<br/><br/>6.6 Exercises 149<br/><br/>7 Cheat Sheet and Project on Descriptive Analytics 151<br/><br/>7.1 Cheat Sheet of Descriptive Analytics 151<br/><br/>7.1.1 On Data Summarization 151<br/><br/>7.1.2 On Clustering 151<br/><br/>7.1.3 On Frequent Pattern Mining 153<br/><br/>7.2 Project on Descriptive Analytics 154<br/><br/>7.2.1 Business Understanding 154<br/><br/>7.2.2 Data Understanding 155<br/><br/>7.2.3 Data Preparation 155<br/><br/>7.2.4 Modeling 157<br/><br/>7.2.5 Evaluation 158<br/><br/>7.2.6 Deployment 158<br/><br/>Part III Predicting the Unknown 159<br/><br/>8 Regression 161<br/><br/>8.1 Predictive Performance Estimation 164<br/><br/>8.1.1 Generalization 164<br/><br/>8.1.2 Model Validation 165<br/><br/>8.1.3 Predictive Performance Measures for Regression 169<br/><br/>8.2 Finding the Parameters of the Model 171<br/><br/>8.2.1 Linear Regression 171<br/><br/>8.2.1.1 Empirical Error 173<br/><br/>8.2.2 The Bias-variance Trade-off 175<br/><br/>8.2.3 Shrinkage Methods 177<br/><br/>8.2.3.1 Ridge Regression 179<br/><br/>8.2.3.2 Lasso Regression 180<br/><br/>8.2.4 Methods that use Linear Combinations of Attributes 181<br/><br/>8.2.4.1 Principal Components Regression 181<br/><br/>8.2.4.2 Partial Least Squares Regression 182<br/><br/>8.3 Technique and Model Selection 182<br/><br/>8.4 Final Remarks 183<br/><br/>8.5 Exercises 184<br/><br/>9 Classification 187<br/><br/>9.1 Binary Classification 188<br/><br/>9.2 Predictive Performance Measures for Classification 192<br/><br/>9.3 Distance-based Learning Algorithms 199<br/><br/>9.3.1 K-nearest Neighbor Algorithms 199<br/><br/>9.3.2 Case-based Reasoning 202<br/><br/>9.4 Probabilistic Classification Algorithms 203<br/><br/>9.4.1 Logistic Regression Algorithm 205<br/><br/>9.4.2 Naive Bayes Algorithm 207<br/><br/>9.5 Final Remarks 208<br/><br/>9.6 Exercises 210<br/><br/>10 Additional Predictive Methods 211<br/><br/>10.1 Search-based Algorithms 211<br/><br/>10.1.1 Decision Tree Induction Algorithms 212<br/><br/>10.1.2 Decision Trees for Regression 217<br/><br/>10.1.2.1 Model Trees 218<br/><br/>10.1.2.2 Multivariate Adaptive Regression Splines 219<br/><br/>10.2 Optimization-based Algorithms 221<br/><br/>10.2.1 Artificial Neural Networks 222<br/><br/>10.2.1.1 Backpropagation 224<br/><br/>10.2.1.2 Deep Networks and Deep Learning Algorithms 230<br/><br/>10.2.2 Support Vector Machines 233<br/><br/>10.2.2.1 SVM for Regression 237<br/><br/>10.3 Final Remarks 238<br/><br/>10.4 Exercises 239<br/><br/>11 Advanced Predictive Topics 241<br/><br/>11.1 Ensemble Learning 241<br/><br/>11.1.1 Bagging 243<br/><br/>11.1.2 Random Forests 244<br/><br/>11.1.3 AdaBoost 245<br/><br/>11.2 Algorithm Bias 246<br/><br/>11.3 Non-binary Classification Tasks 248<br/><br/>11.3.1 One-class Classification 248<br/><br/>11.3.2 Multi-class Classification 249<br/><br/>11.3.3 Ranking Classification 250<br/><br/>11.3.4 Multi-label Classification 251<br/><br/>11.3.5 Hierarchical Classification 252<br/><br/>11.4 Advanced Data Preparation Techniques for Prediction 253<br/><br/>11.4.1 Imbalanced Data Classification 253<br/><br/>11.4.2 For Incomplete Target Labeling 254<br/><br/>11.4.2.1 Semi-supervised Learning 254<br/><br/>11.4.2.2 Active Learning 255<br/><br/>11.5 Description and Prediction with Supervised Interpretable Techniques 255<br/><br/>11.6 Exercises 256<br/><br/>12 Cheat Sheet and Project on Predictive Analytics 259<br/><br/>12.1 Cheat Sheet on Predictive Analytics 259<br/><br/>12.2 Project on Predictive Analytics 259<br/><br/>12.2.1 Business Understanding 260<br/><br/>12.2.2 Data Understanding 260<br/><br/>12.2.3 Data Preparation 265<br/><br/>12.2.4 Modeling 265<br/><br/>12.2.5 Evaluation 265<br/><br/>12.2.6 Deployment 266<br/><br/>Part IV Popular Data Analytics Applications 267<br/><br/>13 Applications for Text, Web and Social Media 269<br/><br/>13.1 Working with Texts 269<br/><br/>13.1.1 Data Acquisition 271<br/><br/>13.1.2 Feature Extraction 271<br/><br/>13.1.2.1 Tokenization 272<br/><br/>13.1.2.2 Stemming 272<br/><br/>13.1.2.3 Conversion to Structured Data 275<br/><br/>13.1.2.4 Is the Bag of Words Enough? 276<br/><br/>13.1.3 Remaining Phases 277<br/><br/>13.1.4 Trends 277<br/><br/>13.1.4.1 Sentiment Analysis 278<br/><br/>13.1.4.2 Web Mining 278<br/><br/>13.2 Recommender Systems 278<br/><br/>13.2.1 Feedback 279<br/><br/>13.2.2 Recommendation Tasks 280<br/><br/>13.2.3 Recommendation Techniques 281<br/><br/>13.2.3.1 Knowledge-based Techniques 281<br/><br/>13.2.3.2 Content-based Techniques 282<br/><br/>13.2.3.3 Collaborative Filtering Techniques 282<br/><br/>13.2.4 Final Remarks 289<br/><br/>13.3 Social Network Analysis 291<br/><br/>13.3.1 Representing Social Networks 291<br/><br/>13.3.2 Basic Properties of Nodes 294<br/><br/>13.3.2.1 Degree 294<br/><br/>13.3.2.2 Distance 294<br/><br/>13.3.2.3 Closeness 295<br/><br/>13.3.2.4 Betweenness 296<br/><br/>13.3.2.5 Clustering Coefficient 297<br/><br/>13.3.3 Basic and Structural Properties of Networks 297<br/><br/>13.3.3.1 Diameter 297<br/><br/>13.3.3.2 Centralization 297<br/><br/>13.3.3.3 Cliques 299<br/><br/>13.3.3.4 Clustering Coefficient 299<br/><br/>13.3.3.5 Modularity 299<br/><br/>13.3.4 Trends and Final Remarks 299<br/><br/>13.4 Exercises 300<br/><br/>Apendix A: Comprehensive Description of the CRISP-DM Methodology 303<br/><br/>References 311<br/><br/>Index 315
520 ## - SUMMARY, ETC.
Summary, etc A guide to the principles and methods of data analysis that does not require knowledge of statistics or programming<br/><br/>A General Introduction to Data Analytics is an essential guide to understand and use data analytics. This book is written using easy-to-understand terms and does not require familiarity with statistics or programming. The authors—noted experts in the field—highlight an explanation of the intuition behind the basic data analytics techniques. The text also contains exercises and illustrative examples.<br/><br/>Thought to be easily accessible to non-experts, the book provides motivation to the necessity of analyzing data. It explains how to visualize and summarize data, and how to find natural groups and frequent patterns in a dataset. The book also explores predictive tasks, be them classification or regression. Finally, the book discusses popular data analytic applications, like mining the web, information retrieval, social network analysis, working with text, and recommender systems. The learning resources offer:<br/><br/>A guide to the reasoning behind data mining techniques<br/>A unique illustrative example that extends throughout all the chapters<br/>Exercises at the end of each chapter and larger projects at the end of each of the text’s two main parts<br/>Together with these learning resources, the book can be used in a 13-week course guide, one chapter per course topic.<br/><br/>The book was written in a format that allows the understanding of the main data analytics concepts by non-mathematicians, non-statisticians and non-computer scientists interested in getting an introduction to data science. A General Introduction to Data Analytics is a basic guide to data analytics written in highly accessible terms.
526 ## - STUDY PROGRAM INFORMATION NOTE
-- 500-599
-- 510
588 ## - SOURCE OF DESCRIPTION NOTE
Source of description note Description based on print version record and CIP data provided by publisher.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Mathematical statistics
General subdivision Methodology.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Electronic data processing.
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name as entry element Data mining.
655 #4 - INDEX TERM--GENRE/FORM
Genre/form data or focus term Electronic books
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name Carvalho, André Carlos Ponce de Leon Ferreira,
Relator term author.
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name Horváth, Tomáš,
Dates associated with a name 1976-
Relator term author.
856 ## - ELECTRONIC LOCATION AND ACCESS
Link text Full text available at Wiley Online Library Click here to view
Uniform Resource Identifier https://onlinelibrary.wiley.com/doi/book/10.1002/9781119296294
906 ## - LOCAL DATA ELEMENT F, LDF (RLIN)
a 7
b cbc
c origcop
d 1
e ecip
f 20
g y-gencatlg
942 ## - ADDED ENTRY ELEMENTS
Source of classification or shelving scheme
Item type EBOOK
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Permanent Location Current Location Date acquired Shelving control number Full call number Barcode Date last seen Price effective from Item type
          COLLEGE LIBRARY COLLEGE LIBRARY 2021-03-24 127 519.50285 M8139 2019 CL-50391 2021-03-24 2021-03-24 EBOOK
          COLLEGE LIBRARY COLLEGE LIBRARY 2021-03-24 127 519.50285 M8139 2019 CL-51087 2021-09-16 2021-03-24 EBOOK