index

A NFC (near field communication), A/B testing, 82–84 91–93 ACID (atomicity, consistency, isolation, RFID (radio-frequency durability) principles, 33 identification), 91–93 adjacent possible, 201–202 sensors, 90–91 Adner, Ron, The Wide Lens: A New Strategy for Innovation, 202 B The Age of the Platform (Simon), xv bad data, 162–163 Amazon Bayesian methods of analysis, 81–82 Best Buy price matching, 198 Beane, Billy, xv–xvi Big Data statistics, 98–99 Howe, Art, and, xvii–xviii and Amazon model, 127–128 sabermetrics and, xvi–xvii shipping fee error, 14 Best Buy, 198 analysis, 77–79 Bezos, Jeff, 19–20 A/B testing, 82–84 Bhambhri, Anjul, 17 data visualization, 84–86 BI (business intelligence), 68–70 heat maps, 86–87 Big Brother, 187 Tableau software, 85 Big Data time series analysis, 87–88 acceptance gains within Visually software, 85 organization, 171 predictive analytics, 100–102, 136–137 bad data and, 162–163 LDMU (Law of Diminishing capabilities, 79–80 Marginal Utility), 103 characteristics, 50–52 LLN (Law of Large Numbers), checklist, 177 102–103 community knowledge, 173 regression analysis, 80–82 as complement, 56–57 sentiment analysis, 97–98 completeness, 65–68 text analytics, 95–97 conferences and, 173 Anderson, Chris consumers, 63–64 Free: The Future of a Radical Price, 14 current presence, 50–51 “Tech Is Too Cheap to Meter,” 14 data model evolution, 172–173 Angry Birds, 62 definition, 49–50 Apache COPYRIGHTEDdynamism, MATERIAL 62 Cassandra, 124 evolution, 201–203 Hadoop (See Hadoop) experiments, 169–171 Apple, Big Data statistics, 99 fragmentation, 52–54 appliances, 211 goal setting, 166–167 applications, 114 goals, 178 AppStore (Apple), 20 , 51 automation, 88 as initiative, 175–176 , 89–90 IT and, 177–178 nanotechnology, 90–91 iterativeness, 173–174

225

bindex.indd 225 2/15/2013 4:00:09 PM 226  i n d e x

Big Data (continued) Cognos, 68 limitations, 105–106 collaborative filtering, 104–105 market size, 10–12 columnar databases, 125–127 naysayers, 22 complementary role of Big Data, 56–57 network effects, 174 completeness of Big Data, 65–68 Obama re-election and, 51–52 composition of data, 39–40 pitfalls, 174–180 consumers precision, 59–61 Big Data revolution and, 12–13, predictions and, 57 63–64 revolution, 12 consumer fatigue, 189–191 consumers and, 12–13 costs platforms and, 19–20 Amazon fee shipping error, 14 social media, 21–22 data storage, 14 technology costs, 14–15 COTS (commercial off the shelf) startup, 165–167 system, 32 timing, 24–25 crowdsourcing, recommendation training for, 168–169 engines, 58 unpredictability, 62 The Cult of the Amateur (Keen), 206–207 venture capitalists and, 132–133 vision, 171–172 D Big Data: The Next Frontier for Innovation, data. See also Big Data Competition, and Productivity, born digital, 40 77–78 composition, 39–40 The Big Short (Lewis), xvi versus heuristics, 23–24 BigQuery (Google), 129 limitations, 218 (Google), 129 metadata, 29 Black Swan (Taleb), 60–61 -structured, 35 Bricklin, Dan, 30 versus rules of thumb, 23–24 BrightContext, 98 science risks, 14–17 Brin, Sergei, 19–20 scientists, 14–17 bucket testing. See A/B testing problem solving, 179–180 Burry, Dr. Michael, 60 semi-structured, 35–36 business readiness, 163–164 structured data, 30–33 BYOD (bring your own device), 63 transactional data, 31 unstructured, 35 C The Data Asset: How Smart Companies CapitalOne, A/B testing, 82–83 Govern Their Data for Business car insurance, 2–5 Success (Fisher), 41–42 Progressive, Snapshot, 3–4 Data Deluge, 29 Carnegie Mellon University, Very Laney, Douglas, 49–50 Large Information Systems variety, 50 course, 168 velocity, 50 case studies volume, 50 Explorys, 147–152 data disconnect, 44–45 NASA, 152–158 Data Driven: Profiting from Your Most Quantcast, 141–146 Important Business Asset Cassandra, 124 (Redman), 178 citizen journalists, 64 data dysfunction, 41–42 Cloudera, 117 data governance, 64 Kornacker, Marcel, 117 data management CloudFlare, machine learning and, 89 as continuum, 55–56

bindex.indd 226 2/15/2013 4:00:09 PM i n d e x  227

data governance, 64 Feldman, Konrad, 142 MDM (master data management), filtering, collaborative, 104–105 64 Fisher, Tony, The Data Asset: How Smart data mining, 69–70 Companies Govern Their Data for Data Mining 2.0, 70 Business Success, 41–42 data model evolution, 172–173 Flickr data storage costs, 14 Rush 2012 Las Vegas Photos, 37 data theft, employees, 63 searches, 37–38 data visualization, 84–86 fragmentation of Big Data, 52–54 heat maps, 86–87 Frankston, Bob, 30 Tableau software, 85 Free: The Future of a Radical Price time series analysis, 87–88 (Anderson), 14 Visually software, 85 Friedman, Thomas L., That Used to Be DataFlux, 41–42 Us: How American Fell Behind in De Goes, John, 131 the World It Invented and How We Deep Web, 66 Can Come Back, 213 digital, birth of data, 40 Fruition Sciences, 52 DLF (Data Liberation Front), 204 Dremel (Google), 129 G Dunbar’s Number, 54 gamification, 7 dynamism of Big Data, 62 goal setting, 166–167 dysfunctional data, 41–42 Goldbloom, Anthony, 130 Google, 17–19 E Amazon model and, 127–128 EDI (Electronic Data Interchange), 36 Big Data statistics, 99 EHRs (electronic health records), Big Data tools, 129 93–95 BigQuery, 129 e-mail, semi-structured data, 36 BigTable, 129 employees Dremel, 129 data theft, 63 Facebook information and, 67 knowledge workers, 191–194 machine learning and, 89 employers, BYOD (bring your own MapReduce, 129 device), 63 Safari browser, 184–185 Enterprise Miner (SAS), 70 Street View, 183–184 Epstein, Theo, xviii Trends, Big Data, 51 ERD (Entity Relationship Diagram), 32 Great Recession, 60 ERP (enterprise resource planning) Grid Engine, 134 systems, 30–31 Grimes, Seth, 97 ETL (extract, transform, and load), 32 exabytes, 11 H Exif (Exchangeable Image File) data, 37 Hack, Martin, 89 experimenting with Big Data, 169–171 Hadapt, 119 Explorys case study, 147–152 Hadoop, 114 Cloudera, 116, 117 F enterprise vendors, 120–121 Facebook Facebook and, 114 Big Data statistics, 99 Hadoop: The Definitive Guide Google and, 67 (White), 115 Hadoop and, 114 HBase, 115–116 machine learning and, 89 HDFS (Hadoop Distributed File fail whale, 170 System), 115

bindex.indd 227 2/15/2013 4:00:09 PM 228  i n d e x

Hadoop (continued) J Hive, 116 Jain, Anil, 147 Hortonworks, 118–119 James, Bill, sabermetrics and, xviii limitations, 121 Johnson, Stephen, Where Good Ideas MapR, 118–119 Come From: The Natural History of MapReduce, 115 Innovation, 201 origins, 115 Pig, 116 K Splunk, 118–119 , 129–131 start-ups, Hadoop-based, 119–120 Kahler, Scott, 115 Talend, 116 Keen, Andrew, The Cult of the Amateur, Hammerbacher, Jeff, 16 206–207 hardware considerations, 133–136 Kelley, Kevin, xviii Harris, Jim, 33 knowledge bases, 33 Hastings, Reed, 58–59 knowledge workers, 191–194 HBase, 115–116 Kornacker, Marcel, Cloudera, 117 HDFS (Hadoop Distributed File Kryder’s Law, 14 System), 115 heat maps, 86–87 L heuristics, versus data, 23–24 Laney, Douglas, 18 high elasticity of demand, 209 Big Data definition, 49–50 Hortonworks, 118–119 LDMU (Law of Diminishing Marginal housing market, 60 Utility), 103 Howard, Jeremy, 130 The Lean Startup (Reis), 83–84 Howe, Art, xvii–xviii Lewis, Michael, xv–xvi HR (Human Resources), data use, 8–9 The Big Short, xvi Huffington Post, machine learning and, Moneyball: The Art of Winning an 89–90 Unfair Game, xv–xvi Hurricane Sandy, 20 limitations of data, 218 Hyperion, 68 LLN (Law of Large Numbers), 102–103 Lougheed, Charlie, 147 I Lytro, photo focus, 39 IA (Information Access), 96 IM (information management), M employees and, 41 machine learning, 89–90 infonomics, 17–19, 162 Mandelbaum, Michael, That Used to Be information size Us: How American Fell Behind in exabytes, 11 the World It Invented and How We zettabytes, 11 Can Come Back, 213 Infosphere BigInsights, 120 MapR, 118–119 Instagram, 62 MapReduce (Google), 115, 129 insurance, car insurance, 2–5 mass market, 53 Progressive, 3–4 Mauboussin, Michael J., The Success Intelligent Miner, 70 Equation: Untangling Skill and Internet of Things, 207 Luck in Business, Sports, and intranets, 33 Investing, 194 IOD (Information on Demand) McHale, Stephen, 147 conference, xv McKinsey, Big Data: The Next Frontier iPhone, AppStore, 20 for Innovation, Competition, and IR (Information Retrieval), 96 Productivity, 77–78 IT and Big Data, 177–178 McKnight, William, 126–127

bindex.indd 228 2/15/2013 4:00:09 PM i n d e x  229

MDM (master data management), 64 platforms Meil, Doug, 147 AppStore (Apple), 20 Menino, Thomas, 6 Big Data Revolution and, 19–20 urban mechanics, 212–213 Pole, Andrew, 198–199 metadata, 29, 36–39 poly-structured data, 35 Netflix, 58–59 pothole reporting, 5–8 photos, 36–39 Precog, 131 Rush 2012 Las Vegas Photos, 36 predictions metapredictions, 211 Big Data and, 57 Microsoft Analysis Services, 70 metapredictions, 211 MicroStrategy, 68 pregnancy, 198–200 Mitchell, Tom, 89 Silver, Nate, 20–21, 218–219 Moneyball: The Art of Winning an Unfair predictive analytics, 100–102, Game (Lewis), xv–xvi 136–137 Morey, Daryl, sabermetrics and, xviii LLN (Law of Large Numbers), MRP (manufacturing resource 102–103 planning) systems, 30–31 pregnancy predictions, 198–200 privacy, 184–188 N security issues, 188–189 nanotechnology, automation and, problem solving like data scientists, 90–91 179–180 NASA case study, 152–158 Progressive insurance, Snapshot, Nest Thermostat, 209–210 3–4 Netflix, 58–59 Qwikster, 59 Q New Urban Mechanics, 212–213 Quantcast case study, 141–146 NewSQL, 124–125 querying data, JOIN statements, 32 NFC (near field communication), Qwikster, 59 91–93 NLP (Natural Language Processing), R 94–95 RainStor, 119–120 NoSQL databases, 122–124 RDBMS (relational database management system), 61 O recommendation engines, 58 Obama, Barack, re-election, 51–52 recruiting, 8–9 ODaF (Open Data Foundation), 204–205 Redman, Tom, Data Driven: Profiting OLAP (online analytical processing), 69 from Your Most Important Business Asset, 178 P Redshift, 128 Page, Larry, 19–20 refrigerators, 210–211 Pail, D.J., 16 regression analysis, 80–82 Pandora, 53 regression toward the mean, 101 paradox, 194–195 Reis, Eric Path app (iOS), 186 Book Cover Experiment Data, 85 Pho, Kevin, 94 The Lean Startup, 83–84 photos relational data models, 32 Exif (Exchangeable Image File) research, translational, 52 data, 37 retaining employees, 8–9 Lytro, 39 RFID (radio-frequency identification) Rush 2012 Las Vegas Photos, 36–37 automation and, 91–93 tagging, 36–37 student IDs, 63

bindex.indd 229 2/15/2013 4:00:09 PM 230  i n d e x

RIM (Research in Motion), 206 storage solutions, 121 road hazard reporting, 5–8 columnar databases, 125–127 Rush NewSQL, 124–125 collaborative filtering and, 104–105 NoSQL databases, 122–124 fuzzy pictures, 39 Street Bump road hazard reporting Rush 2012 Las Vegas Photos, 36 app, 6 Street View (Google), 183–184 S structured data, 30–33 sabermetrics, xvi–xvii ratio to unstructured, 40 James, Bill, xviii semi-structured data, 35–36 Morey, Daryl and, xviii sub-prime housing, 60 Safari browser, Google and, 184–185 The Success Equation: Untangling Skill SAS Enterprise Miner, 70 and Luck in Business, Sports, and Scion Capital, 60 Investing (Mauboussin), 194 searches, Flickr, 37–38 Summly, machine learning and, 89 Sears, 167 Super Bowl, 53 security issues, 188–189 Surface Web, 65–66 semi-structured data, 35–36 Sutter, Paul, 142 sensors, automation and, 90–91 Sybase, 112–113 sentiment analysis, 97–98 SEO (search engine optimization), 176 T showrooming, 197–198 Tableau software, 85 The Signal and the Noise: Why So Many tagging Predictions Fail--but Some Don’t photos, 36–37 (Silver), 21 smartphones, 38 Silver, Nate, 20–21 TALC (Technology Adoption Life data-driven predictions, 218–219 Cycle), 23 limitations of data, 218 Taleb, Nassim Nicholas, Black Swan, The Signal and the Noise: Why So Many 60–61 Predictions Fail--but Some Don’t, Tapscott, Don, Wikinomics: How Mass 21 Collaboration Changes Everything, Skytree, 89 152 Small Data, 55–56 “Tech Is Too Cheap to Meter” Smart ThinQ appliances, 211 (Anderson), 14 smartphones technology BYOD (bring your own device), 63 costs, Big Data revolution and, RIM (Research in Motion), 206 14–15 tagging, 38 TALC (Technology Adoption Life social media, 19–20 Cycle), 23 Big Data Revolution, 21–22 ubiquitous, 63 fatigue, 189–190 text analytics, 95–97 split testing. See A/B testing That Used to Be Us: How American Fell Splunk, 118–119 Behind in the World It Invented SQL (Structured Query Language), 33 and How We Can Come Back NewSQL, 124–125 (Friedman and Mandelbaum), NoSQL databases, 122–124 213 starting with Big Data, 165–167 time series analysis, 87–88 statistical significance, 106 tools, disclaimers, 112–113 statistical techniques training for Big Data, 168–169 A/B testing, 82–84 transactional data, 31 regression, 80–82 translational research, 52

bindex.indd 230 2/15/2013 4:00:09 PM i n d e x  231

Tucci, Joseph, 2–3 W Twitter Web 2.0, 33–35 Big Data statistics, 99 unstructured data, 35 machine learning and, 89 WebVan, 60 Tyreman, Gary, 134 Where Good Ideas Come From: The Natural History of Innovation U (Johnson), 201 unstructured data White, Tom, Hadoop: The Definitive ratio to structured, 40 Guide, 115 Web 2.0, 35 The Wide Lens: A New Strategy for Innovation (Adner), 202 V Wikinomics: How Mass Collaboration variety (Big Data), 50 Changes Everything (Tapscott and VDP (Vibrant Data Project), 203–204 Williams), 152 velocity (Big Data), 50 wikis, 33 venture capitalists, Big Data and, Williams, Anthony D., Wikinomics: 132–133 How Mass Collaboration Changes VisiCalc, 30 Everything, 152 vision, 171–172 visualization, 84–86 X–Y–Z heat maps, 86–87 XML (eXtensible Markup Language), Tableau software, 85 35 time series analysis, 87–88 Visually software, 85 zettabytes, 11 Visually software, 85 Zuckerberg, Mark, 19–20 VoltDB, 124–125 volume (Big Data), 50

bindex.indd 231 2/15/2013 4:00:09 PM bindex.indd 232 2/15/2013 4:00:09 PM