Big data analytics : a practical guide for managers / Kim H. Pries, Robert Dunnigan.

By: Pries, Kim H, 1955-
Contributor(s): Dunnigan, Robert
Language: English Publisher: Boca Raton, FL : CRC Press, [2015]Copyright date: c2015Description: xix, 556 pages ; 24 cmContent type: text Media type: unmediated Carrier type: volumeISBN: 9781482234510Subject(s): Management -- Statistical methods | Management -- Data processing | Big data | Data mining | Database managementDDC classification: 658/.0557 LOC classification: HD30.215 | .P75 2015
Contents:
Table of Contents Introduction So What Is Big Data? Growing Interest in Decision Making What This Book Addresses The Conversation about Big Data Technological Change as a Driver of Big Data The Central Question: So What? Our Goals as Authors References The Mother of Invention?s Triplets: Moore?s Law, the Proliferation of Data, and Data Storage Technology Moore?s Law Parallel Computing, Between and Within Machines Quantum Computing Recap of Growth in Computing Power Storage, Storage Everywhere Grist for the Mill: Data Used and Unused Agriculture Automotive Marketing in the Physical World Online Marketing Asset Reliability and Efficiency Process Tracking and Automation Toward a Definition of Big Data Putting Big Data in Context Key Concepts of Big Data and Their Consequences Summary References. Hadoop Power through Distribution Cost Effectiveness of Hadoop Not Every Problem Is a Nail Some Technical Aspects Troubleshooting Hadoop Running Hadoop Hadoop File System MapReduce Pig and Hive Installation Current Hadoop Ecosystem Hadoop Vendors Cloudera Amazon Web Services (AWS) Hortonworks IBM Intel MapR Microsoft To Run Pig Latin Using Powershell Pivotal References HBase and Other Big Data Databases Evolution from Flat File to the Three V?s Flat File Hierarchical Database Network Database Relational Database Object-Oriented Databases Relational-Object Databases Transition to Big Data Databases What Is Different bbout HBase? What Is Bigtable? What Is MapReduce? What Are the Various Modalities for Big Data Databases? Graph Databases How Does a Graph Database Work? What is the Performance of a Graph Database? Document Databases Key-Value Databases Column-Oriented Databases HBase Apache Accumulo References Machine Learning Machine Learning Basics Classifying with Nearest Neighbors Naive Bayes Support Vector Machines Improving Classification with Adaptive Boosting Regression Logistic Regression Tree-Based Regression K-Means Clustering Apriori Algorithm Frequent Pattern-Growth Principal Component Analysis (PCA) Singular Value Decomposition Neural Networks Big Data and MapReduce Data Exploration Spam Filtering Ranking Predictive Regression Text Regression Multidimensional Scaling Social Graphing References Statistics Statistics, Statistics Everywhere Digging into the Data Standard Deviation: The Standard Measure of Dispersion The Power of Shapes: Distributions Distributions: Gaussian Curve Distributions: Why Be Normal? Distributions: The Long Arm of the Power Law The Upshot? Statistics Are not Bloodless Fooling Ourselves: Seeing What We Want to See in the Data We Can Learn Much from an Octopus Hypothesis Testing: Seeking a Verdict Two-Tailed Testing Hypothesis Testing: A Broad Field Moving on to Specific Hypothesis Tests Regression and Correlation p Value in Hypothesis Testing: A Successful Gatekeeper? Specious Correlations and Overfitting the Data A Sample of Common Statistical Software Packages Minitab SPSS R SAS Big Data Analytics Hadoop Integration Angoss Statistica Capabilities Summary References Google Big Data Giants Google Go Android Google Product Offerings Google Analytics Advertising and Campaign Performance Analysis and Testing Facebook Ning Non-United States Social Media Tencent Line Sina Weibo Odnoklassniki Vkontakte Nimbuzz Ranking Network Sites Negative Issues with Social Networks Amazon Some Final Words References Geographic Information Systems (GIS) GIS Implementations A GIS Example GIS Tools GIS Databases References Discovery Faceted Search versus Strict Taxonomy First Key Ability: Breaking Down Barriers Second Key Ability: Flexible Search and Navigation Underlying Technology The Upshot Summary References Data Quality Know Thy Data and Thyself Structured, Unstructured, and Semistructured Data Data Inconsistency: An Example from This Book The Black Swan and Incomplete Data How Data Can Fool Us Ambiguous Data Aging of Data or Variables Missing Variables May Change the Meaning Inconsistent Use of Units and Terminology Biases Sampling Bias Publication Bias Survivorship Bias Data as a Video, Not a Snapshot: Different Viewpoints as a Noise Filter What Is My Toolkit for Improving My Data? Ishikawa Diagram Interrelationship Digraph Force Field Analysis Data-Centric Methods Troubleshooting Queries from Source Data Troubleshooting Data Quality beyond the Source System Using Our Hidden Resources Summary References Benefits Data Serendipity Converting Data Dreck to Usefulness Sales Returned Merchandise Security Medical Travel Lodging Vehicle Meals Geographical Information Systems New York City Chicago CLEARMAP Baltimore San Francisco Los Angeles Tucson, Arizona, University of Arizona, and COPLINK Social Networking Education General Educational Data Legacy Data Grades and other Indicators Testing Results Addresses, Phone Numbers, and More Concluding Comments References Concerns Part Two: Basic Principles of National Application Collection Limitation Principle Data Quality Principle Purpose Specification Principle Use Limitation Principle Security Safeguards Principle Openness Principle Individual Participation Principle Accountability Principle Logical Fallacies Affirming the Consequent Denying the Antecedent Ludic Fallacy Cognitive Biases Confirmation Bias Notational Bias Selection/Sample Bias Halo Effect Consistency and Hindsight Biases Congruence Bias Von Restorff Effect Data Serendipity Converting Data Dreck to Usefulness Sales Merchandise Returns Security CompStat Medical Travel Lodging Vehicle Meals Social Networking Education Making Yourself Harder to Track Misinformation Disinformation Reducing/Eliminating Profiles Social Media Self Redefinition Identity Theft Facebook Concluding Comments References Epilogue Michael Porter?s Five Forces Model Bargaining Power of Customers Bargaining Power of Suppliers Threat of New Entrants Others The OODA Loop Implementing Big Data Nonlinear, Qualitative Thinking Closing References
Subject: With this book, managers and decision makers are given the tools to make more informed decisions about big data purchasing initiatives. Big Data Analytics: A Practical Guide for Managers not only supplies descriptions of common tools, but also surveys the various products and vendors that supply the big data market. Comparing and contrasting the different types of analysis commonly conducted with big data, this accessible reference presents clear-cut explanations of the general workings of big data tools. Instead of spending time on HOW to install specific packages, it focuses on the reasons WHY readers would install a given package. The book provides authoritative guidance on a range of tools, including open source and proprietary systems. It details the strengths and weaknesses of incorporating big data analysis into decision-making and explains how to leverage the strengths while mitigating the weaknesses. Describes the benefits of distributed computing in simple terms; includes substantial vendor/tool material, especially for open source decisions; covers prominent software packages, including Hadoop and Oracle Endeca; Examines GIS and machine learning applications; Considers privacy and surveillance issues. The book further explores basic statistical concepts that, when misapplied, can be the source of errors. Time and again, big data is treated as an oracle that discovers results nobody would have imagined. While big data can serve this valuable function, all too often these results are incorrect yet are still reported unquestioningly. The probability of having erroneous results increases as a larger number of variables are compared unless preventative measures are taken. The approach taken by the authors is to explain these concepts so managers can ask better questions of their analysts and vendors about the appropriateness of the methods used to arrive at a conclusion. Because the world of science and medicine has been grappling with similar issues in the publication of studies, the authors draw on their efforts and apply them to big data. Features: Provides guidance on an array of tools, including open source and proprietary systems Supplies proven techniques for ensuring data quality Considers surveillance and privacy issues Examines GIS applications and machine learning Describes how different types of organizations address their data needs
Tags from this library: No tags from this library for this title. Log in to add tags.
    Average rating: 0.0 (0 votes)
Item type Current location Home library Call number Status Date due Barcode Item holds
BOOK BOOK COLLEGE LIBRARY
COLLEGE LIBRARY
SUBJECT REFERENCE
658.0557 P9332 2015 (Browse shelf) Checked out 08/20/2025 CITU-CL-47532
Total holds: 0

ABOUT THE AUTHOR/S
Kim H. Pries has four college degrees: a bachelor of arts in history from the University of Texas at El Paso (UTEP), a bachelor of science in metallurgical engineering from UTEP, a master of science in engineering from UTEP, and a master of science in metallurgical engineering and materials science from Carnegie-Mellon University.

Pries worked as a computer systems manager, a software engineer for an electrical utility, and a scientific programmer under a defense contract for Stoneridge, Incorporated (SRI). He has worked as software manager, engineering services manager, reliability section manager, and product integrity and reliability director.

In addition to his other responsibilities, Pries has provided Six Sigma training for both UTEP and SRI and cost reduction initiatives for SRI. Pries is also a founding faculty member of Practical Project Management. Additionally, in concert with Jon Quigley, Pries was a cofounder and principal with Value Transformation, LLC, a training, testing, cost improvement, and product development consultancy.

He trained for Introduction to Engineering Design and Computer Science and Software Engineering with Project Lead the Way. He currently teaches biotechnology, computer science and software engineering, and introduction to engineering design at the beautiful Parkland High School in the Ysleta Independent School District of El Paso, Texas.

Robert Dunnigan is a manager with Janus Consulting Partners and is based in Dallas, Texas. He holds a bachelor of science in psychology and in sociology with an anthropology emphasis from North Dakota State University. He also holds a master of business administration from INSEAD, "the business school for the world," where he attended the Singapore campus.

As a Peace Corps volunteer, Robert served over 3 years in Honduras developing agribusiness opportunities. As a consultant, he later worked on the Afghanistan Small and Medium Enterprise Development project in Afghanistan, where he traveled the country with his Afghan colleagues and friends seeking opportunities to develop a manufacturing sector in the country.

Robert is an American Society for Quality–certified Six Sigma Black Belt and a Scrum Alliance–certified Scrum Master.

Includes bibliographical references and index.

Table of Contents

Introduction
So What Is Big Data?
Growing Interest in Decision Making
What This Book Addresses
The Conversation about Big Data
Technological Change as a Driver of Big Data
The Central Question: So What?
Our Goals as Authors
References

The Mother of Invention?s Triplets: Moore?s Law, the Proliferation of Data, and Data Storage Technology
Moore?s Law
Parallel Computing, Between and Within Machines
Quantum Computing
Recap of Growth in Computing Power
Storage, Storage Everywhere
Grist for the Mill: Data Used and Unused
Agriculture
Automotive
Marketing in the Physical World
Online Marketing
Asset Reliability and Efficiency
Process Tracking and Automation
Toward a Definition of Big Data
Putting Big Data in Context
Key Concepts of Big Data and Their Consequences
Summary
References.

Hadoop
Power through Distribution
Cost Effectiveness of Hadoop
Not Every Problem Is a Nail
Some Technical Aspects
Troubleshooting Hadoop
Running Hadoop
Hadoop File System
MapReduce
Pig and Hive
Installation
Current Hadoop Ecosystem
Hadoop Vendors
Cloudera
Amazon Web Services (AWS)
Hortonworks
IBM
Intel
MapR
Microsoft
To Run Pig Latin Using Powershell
Pivotal
References

HBase and Other Big Data Databases
Evolution from Flat File to the Three V?s
Flat File
Hierarchical Database
Network Database
Relational Database
Object-Oriented Databases
Relational-Object Databases
Transition to Big Data Databases
What Is Different bbout HBase?
What Is Bigtable?
What Is MapReduce?
What Are the Various Modalities for Big Data Databases?
Graph Databases
How Does a Graph Database Work?
What is the Performance of a Graph Database?
Document Databases
Key-Value Databases
Column-Oriented Databases
HBase
Apache Accumulo
References

Machine Learning
Machine Learning Basics
Classifying with Nearest Neighbors
Naive Bayes
Support Vector Machines
Improving Classification with Adaptive Boosting
Regression
Logistic Regression
Tree-Based Regression
K-Means Clustering
Apriori Algorithm
Frequent Pattern-Growth
Principal Component Analysis (PCA)
Singular Value Decomposition
Neural Networks
Big Data and MapReduce
Data Exploration
Spam Filtering
Ranking
Predictive Regression
Text Regression
Multidimensional Scaling
Social Graphing
References

Statistics
Statistics, Statistics Everywhere
Digging into the Data
Standard Deviation: The Standard Measure of Dispersion
The Power of Shapes: Distributions
Distributions: Gaussian Curve
Distributions: Why Be Normal?
Distributions: The Long Arm of the Power Law
The Upshot? Statistics Are not Bloodless
Fooling Ourselves: Seeing What We Want to See in the Data
We Can Learn Much from an Octopus
Hypothesis Testing: Seeking a Verdict
Two-Tailed Testing
Hypothesis Testing: A Broad Field
Moving on to Specific Hypothesis Tests
Regression and Correlation
p Value in Hypothesis Testing: A Successful Gatekeeper?
Specious Correlations and Overfitting the Data
A Sample of Common Statistical Software Packages
Minitab
SPSS
R
SAS
Big Data Analytics
Hadoop Integration
Angoss
Statistica
Capabilities
Summary
References

Google
Big Data Giants
Google
Go
Android
Google Product Offerings
Google Analytics
Advertising and Campaign Performance
Analysis and Testing
Facebook
Ning
Non-United States Social Media
Tencent
Line
Sina Weibo
Odnoklassniki
Vkontakte
Nimbuzz
Ranking Network Sites
Negative Issues with Social Networks
Amazon
Some Final Words
References

Geographic Information Systems (GIS)
GIS Implementations
A GIS Example
GIS Tools
GIS Databases
References

Discovery
Faceted Search versus Strict Taxonomy
First Key Ability: Breaking Down Barriers
Second Key Ability: Flexible Search and Navigation
Underlying Technology
The Upshot
Summary
References

Data Quality
Know Thy Data and Thyself
Structured, Unstructured, and Semistructured Data
Data Inconsistency: An Example from This Book
The Black Swan and Incomplete Data
How Data Can Fool Us
Ambiguous Data
Aging of Data or Variables
Missing Variables May Change the Meaning
Inconsistent Use of Units and Terminology
Biases
Sampling Bias
Publication Bias
Survivorship Bias
Data as a Video, Not a Snapshot: Different Viewpoints as a Noise Filter
What Is My Toolkit for Improving My Data?
Ishikawa Diagram
Interrelationship Digraph
Force Field Analysis
Data-Centric Methods
Troubleshooting Queries from Source Data
Troubleshooting Data Quality beyond the Source System
Using Our Hidden Resources
Summary
References

Benefits
Data Serendipity
Converting Data Dreck to Usefulness
Sales
Returned Merchandise
Security
Medical
Travel
Lodging
Vehicle
Meals
Geographical Information Systems
New York City
Chicago CLEARMAP
Baltimore
San Francisco
Los Angeles
Tucson, Arizona, University of Arizona, and COPLINK
Social Networking
Education
General Educational Data
Legacy Data
Grades and other Indicators
Testing Results
Addresses, Phone Numbers, and More
Concluding Comments
References

Concerns
Part Two: Basic Principles of National Application
Collection Limitation Principle
Data Quality Principle
Purpose Specification Principle
Use Limitation Principle
Security Safeguards Principle
Openness Principle
Individual Participation Principle
Accountability Principle
Logical Fallacies
Affirming the Consequent
Denying the Antecedent
Ludic Fallacy
Cognitive Biases
Confirmation Bias
Notational Bias
Selection/Sample Bias
Halo Effect
Consistency and Hindsight Biases
Congruence Bias
Von Restorff Effect
Data Serendipity
Converting Data Dreck to Usefulness Sales
Merchandise Returns
Security
CompStat
Medical
Travel
Lodging
Vehicle
Meals
Social Networking
Education
Making Yourself Harder to Track
Misinformation
Disinformation
Reducing/Eliminating Profiles
Social Media
Self Redefinition
Identity Theft
Facebook
Concluding Comments
References

Epilogue
Michael Porter?s Five Forces Model
Bargaining Power of Customers
Bargaining Power of Suppliers
Threat of New Entrants
Others
The OODA Loop
Implementing Big Data
Nonlinear, Qualitative Thinking
Closing
References

With this book, managers and decision makers are given the tools to make more informed decisions about big data purchasing initiatives. Big Data Analytics: A Practical Guide for Managers not only supplies descriptions of common tools, but also surveys the various products and vendors that supply the big data market. Comparing and contrasting the different types of analysis commonly conducted with big data, this accessible reference presents clear-cut explanations of the general workings of big data tools. Instead of spending time on HOW to install specific packages, it focuses on the reasons WHY readers would install a given package. The book provides authoritative guidance on a range of tools, including open source and proprietary systems. It details the strengths and weaknesses of incorporating big data analysis into decision-making and explains how to leverage the strengths while mitigating the weaknesses. Describes the benefits of distributed computing in simple terms; includes substantial vendor/tool material, especially for open source decisions; covers prominent software packages, including Hadoop and Oracle Endeca; Examines GIS and machine learning applications; Considers privacy and surveillance issues. The book further explores basic statistical concepts that, when misapplied, can be the source of errors. Time and again, big data is treated as an oracle that discovers results nobody would have imagined. While big data can serve this valuable function, all too often these results are incorrect yet are still reported unquestioningly. The probability of having erroneous results increases as a larger number of variables are compared unless preventative measures are taken. The approach taken by the authors is to explain these concepts so managers can ask better questions of their analysts and vendors about the appropriateness of the methods used to arrive at a conclusion. Because the world of science and medicine has been grappling with similar issues in the publication of studies, the authors draw on their efforts and apply them to big data.


Features:

Provides guidance on an array of tools, including open source and proprietary systems
Supplies proven techniques for ensuring data quality
Considers surveillance and privacy issues
Examines GIS applications and machine learning
Describes how different types of organizations address their data needs

There are no comments for this item.

to post a comment.