Big data : concepts, technology and architecture / Balamurugan Balusamy, Nandhini Abirami R, Amir H. Gandomi.
By: Balusamy, Balamurugan [author.]
Contributor(s): R, Nandhini Abirami [author.] | Gandomi, Amir Hossein [author.]
Language: English Publisher: Hoboken, NJ : John Wiley and Sons, Inc., 2021Edition: First editionDescription: 1 online resourceContent type: text Media type: computer Carrier type: online resourceISBN: 9781119701828Subject(s): Big data | Data miningGenre/Form: Electronic books.DDC classification: 005.7 LOC classification: QA76.9.B45 | B35 2021Online resources: Full text is available at Wiley Online Library Click here to view| Item type | Current location | Home library | Call number | Status | Date due | Barcode | Item holds |
|---|---|---|---|---|---|---|---|
EBOOK
|
COLLEGE LIBRARY | COLLEGE LIBRARY | 005.7 B2191 2021 (Browse shelf) | Available |
Includes index.
Table of Contents
Acknowledgments xi
About the Author xii
1 Introduction to the World of Big Data 1
1.1 Understanding Big Data 1
1.2 Evolution of Big Data 2
1.3 Failure of Traditional Database in Handling Big Data 3
1.4 3 Vs of Big Data 4
1.5 Sources of Big Data 7
1.6 Different Types of Data 8
1.7 Big Data Infrastructure 11
1.8 Big Data Life Cycle 12
1.9 Big Data Technology 18
1.10 Big Data Applications 21
1.11 Big Data Use Cases 21
Chapter 1 Refresher 24
2 Big Data Storage Concepts 31
2.1 Cluster Computing 32
2.2 Distribution Models 37
2.3 Distributed File System 43
2.4 Relational and Non-Relational Databases 43
2.5 Scaling Up and Scaling Out Storage 47
Chapter 2 Refresher 48
3 NoSQL Database 53
3.1 Introduction to NoSQL 53
3.2 Why NoSQL 54
3.3 CAP Theorem 54
3.4 ACID 56
3.5 BASE 56
3.6 Schemaless Databases 57
3.7 NoSQL (Not Only SQL) 57
3.8 Migrating from RDBMS to NoSQL 76
Chapter 3 Refresher 77
4 Processing, Management Concepts, and Cloud Computing 83
Part I: Big Data Processing and Management Concepts 83
4.1 Data Processing 83
4.2 Shared Everything Architecture 85
4.3 Shared-Nothing Architecture 86
4.4 Batch Processing 88
4.5 Real-Time Data Processing 88
4.6 Parallel Computing 89
4.7 Distributed Computing 90
4.8 Big Data Virtualization 90
Part II: Managing and Processing Big Data in Cloud Computing 93
4.9 Introduction 93
4.10 Cloud Computing Types 94
4.11 Cloud Services 95
4.12 Cloud Storage 96
4.13 Cloud Architecture 101
Chapter 4 Refresher 103
5 Driving Big Data with Hadoop Tools and Technologies 111
5.1 Apache Hadoop 111
5.2 Hadoop Storage 114
5.3 Hadoop Computation 119
5.4 Hadoop 2.0 129
5.5 HBASE 138
5.6 Apache Cassandra 141
5.7 SQOOP 141
5.8 Flume 143
5.9 Apache Avro 144
5.10 Apache Pig 145
5.11 Apache Mahout 146
5.12 Apache Oozie 146
5.13 Apache Hive 149
5.14 Hive Architecture 151
5.15 Hadoop Distributions 152
Chapter 5 Refresher 153
6 Big Data Analytics 161
6.1 Terminology of Big Data Analytics 161
6.2 Big Data Analytics 162
6.3 Data Analytics Life Cycle 166
6.4 Big Data Analytics Techniques 170
6.5 Semantic Analysis 175
6.6 Visual analysis 178
6.7 Big Data Business Intelligence 178
6.8 Big Data Real-Time Analytics Processing 180
6.9 Enterprise Data Warehouse 181
Chapter 6 Refresher 182
7 Big Data Analytics with Machine Learning 187
7.1 Introduction to Machine Learning 187
7.2 Machine Learning Use Cases 188
7.3 Types of Machine Learning 189
Chapter 7 Refresher 196
8 Mining Data Streams and Frequent Itemset 201
8.1 Itemset Mining 201
8.2 Association Rules 206
8.3 Frequent Itemset Generation 210
8.4 Itemset Mining Algorithms 211
8.5 Maximal and Closed Frequent Itemset 229
8.6 Mining Maximal Frequent Itemsets: the GenMax Algorithm 233
8.7 Mining Closed Frequent Itemsets: the Charm Algorithm 236
8.8 CHARM Algorithm Implementation 236
8.9 Data Mining Methods 239
8.10 Prediction 240
8.11 Important Terms Used in Bayesian Network 241
8.12 Density Based Clustering Algorithm 249
8.13 DBSCAN 249
8.14 Kernel Density Estimation 250
8.15 Mining Data Streams 254
8.16 Time Series Forecasting 255
9 Cluster Analysis 259
9.1 Clustering 259
9.2 Distance Measurement Techniques 261
9.3 Hierarchical Clustering 263
9.4 Analysis of Protein Patterns in the Human Cancer-Associated Liver 266
9.5 Recognition Using Biometrics of Hands 267
9.6 Expectation Maximization Clustering Algorithm 274
9.7 Representative-Based Clustering 277
9.8 Methods of Determining the Number of Clusters 277
9.9 Optimization Algorithm 284
9.10 Choosing the Number of Clusters 288
9.11 Bayesian Analysis of Mixtures 290
9.12 Fuzzy Clustering 290
9.13 Fuzzy C-Means Clustering 291
10 Big Data Visualization 293
10.1 Big Data Visualization 293
10.2 Conventional Data Visualization Techniques 294
10.3 Tableau 297
10.4 Bar Chart in Tableau 309
10.5 Line Chart 310
10.6 Pie Chart 311
10.7 Bubble Chart 312
10.8 Box Plot 313
10.9 Tableau Use Cases 313
10.10 Installing R and Getting Ready 318
10.11 Data Structures in R 321
10.12 Importing Data from a File 335
10.13 Importing Data from a Delimited Text File 336
10.14 Control Structures in R 337
10.15 Basic Graphs in R 341
Index 347
"This book offers comprehensive coverage of Big Data tools, terminologies and technologies for researchers, business professionals and graduates. This book begins with an overview of what Big Data is and emphasizes all the key concepts of big data end to end. Big Data concepts, technologies, terminologies and storing, processing and analysis techniques and much more -- are all logically organized and reinforced by diagrams and case studies. This book refines readers' understanding of Big Data with in-depth analysis of key concepts. The case studies provided in this book give insight on key concepts. The initial chapters of the book shed light on various characteristics of Big Data that distinguish it from traditional Database Management systems. Big Data Analytics are covered in detail in a separate chapter. Hadoop, the heart of Big Data is handled in the Big Data processing chapter and a deep understanding of its concepts is provided"-- Provided by publisher.
About the Author
BALAMURUGAN BALUSAMY, PHD, is a Professor with the School of Computing Science and Engineering at Galgotias University, Greater Noida, India
NANDHINI ABIRAMI. R is an IT Consultant and Research Scholar at VIT University in Vellore.
SEIFEDINE KADRY, PhD, is a Professor of Data Science at the Faculty of Applied Computing and Technology at Noroff University College, Kristiansand, Norway.
AMIR H. GANDOMI, PHD, is a Professor of Data Science at the Faculty of Engineering & Information Technology, University of Technology Sydney, Australia.

EBOOK
There are no comments for this item.