<<

WFRXXX10.1177/1946756720905639World Futures ReviewBoysen research-article9056392020

Review

World Futures Review 10–­1 Mine the Gap: Augmenting © The Author(s) 2020 Article reuse guidelines: sagepub.com/journals-permissions Foresight Methodologies https://doi.org/10.1177/1946756720905639DOI: 10.1177/1946756720905639 with Data Analytics journals.sagepub.com/home/wfr

Anne Boysen1

Abstract The explosion of Big Data and analytic tools in recent years has brought new opportunities to the field of foresight. Big Data and improved analytics capabilities can expand the knowledge base and act as a corrective to our cognitive . Moreover, several data mining and machine learning techniques that increase performance for businesses can be applied in foresight to help researchers discover patterns that may be early signals of change and correct our misperception of patterns where they don’t exist. This article discusses the opportunities and limitations of various data mining and machine learning techniques in foresight.

Keywords foresight, future studies, data analytics, machine learning, artificial intelligence, methodology

Without data you’re just another person with an techniques to discover novel patterns directly opinion. in primary data. While this article discusses —W. Edwards Deming applications of analytics of both primary and secondary sources, it will specifically make a The explosion of Big Data and analytic tools in case for the former. recent years has brought new opportunities to the field of Foresight. Information gathering Can Big Data and Analytics and processing that once took weeks and Help Fight Cognitive ? months can now be accomplished in much shorter time and with fewer resources. With The centrality of data and empirical deduction the increased data access and analytics capa- has waxed and waned in philosophy and aca- bilities comes not only speed and accuracy, but demic research. More than half a century ago also better opportunities to study data directly Karl Popper popularized the Hypothetico- without interpreting intermediaries, such as Deductive method which has become widely journalists, publishers, and research institu- adopted in social sciences as a means to fight tions. Walls that once existed between fore- positivistic assumptions or theories without sight professionals and raw data are crumbling, corroborating evidence. In his seminal work or at least becoming more penetrable, as both The Logic of Scientific Discovery, Popper the access to and the analytic capabilities of Big Data become ever more available. 1University of Houston, TX, USA While text mining tools that automate envi- Corresponding Author: ronmental scanning are gaining more atten- Anne Boysen, University of Houston, 4800 Calhoun Rd, tion, little has been written about applying Houston, TX 77004, USA. statistics, data mining, and machine learning Email: [email protected] 2 World Futures Review 00(0)

(2002, 18–24) warns that “it must be possible about in the mid-twentieth century are still for an empirical scientific system to be refuted valid, and a strong foundation in data can help by experience” and that “a subjective experi- us build a solid base of empirical evidence that ence, or a feeling of conviction, can never jus- helps our of with empirically tify a scientific statement.” Later schools of deduced logic rather than subjectively induced thought have posited that there were more assumptions (“ and the solid barriers between the researcher and the Power of Disconfirming Evidence”, 2017). An objective truth than merely collecting and ana- empirical approach could be our best defense lyzing data. Social privilege, language, and against biases amplified by social influence culture influence the interpretation of reality and echo chambers (Mounk 2018). and how we perceive and emphasize the data Discovering interconnection, correlations, we have access to. and causal effects in the systems we want to Discourses in Foresight have followed a understand helps us understand current similar trajectory, alternating between empiri- dynamics or locate early warnings of change. cal deduction and forecasting to more critical If we want to contemplate future state t2 or t3, approaches that question metanarratives and we should first obtain fine-tuned information values (Inayatullah 2009). After all, data from about the current state t0. While comprehen- the future on which to make falsifying state- sive data analysis may have fewer direct ments do not exist. The various courses of applications for fleshing out complete sce- events a futurist must consider will often depart narios, our assumptions around data, casual from current reality in both kind and magni- connections, and change should be informed tude, making empirical data from the present by rigorous data gathering and analysis.1 If less useful when envisioning alternative we employ analytics customized to the type futures. Questioning common default assump- of insights we want to unearth, we reduce the tions is often seen as a more viable approach chances that our research is biased or mis- than predicting the future (Dator 2009). aligned with our research objectives. By If our default assumptions color our selec- doing our own primary data analysis, we can tion of data sources, it might seem as if data ensure a more future-oriented problem focus analytic approaches fail to correct cognitive and also reduce the chance of sponsor bias, bias. After all we can decide to include some which can be hard to detect when we rely on sources and not others. But while the selection secondary reports and desk research (Sarniak, of data sources is left to human judgment, we 2015). cannot necessarily infer causal relationships Of course, a mere focus on data is not within the data. Unlike authored information enough to remove from research often curated by personalized algorithms, raw since the interpretation of objective data is data lack narrative and incentives to focus on bound by the epistemologies of the researcher. some elements at the expense of others. It is Thus, an emphasis on comprehensive data reasonable to assume that we are less likely to approaches should not be used to trivialize fall for in our overall analysis if critical discourses around ontology and para- we can better prevent personal interests to digms. For example, Inayatullah does not influence our analysis. argue that critical analysis of the metaphors In a time when “post truth” has entered our that surround data should in any case sacrifice dictionary (“Word of the Year” 2016) and anti- the data collection effort. In fact, he suggests scientific factions weaponize warped readings that the Causal Layered Analysis (CLA) of post-modernism to advance relativistic framework should not be used at the expense agendas, efforts to rebuild intersubjective con- of data orientation (Inayatullah 2014). It might sensus around verifiable facts might be more therefore be more appropriate to view data important than ever (Kuntz 2012; London centric and more post-structural approaches as School of Economics and Political Science synergistic rather than two epistemologies 2017). The subjective pitfalls Popper warned where one precludes the other. Rather data Boysen 3 should be seen in the context of meanings, and a process of analysis, interpretation, and meanings in the context of available data. prospection render an outcome that can be used Since machine-analyzed data are intrinsically in strategy formation (Voros 2003). Early in the more value-neutral and comprehensive, we information gathering process a foresight prac- should expect a more robust foundation for the titioner must identify early signals and trends. critical analysis it enables. A student in my Conway (2006) distinguishes between trend Data Analytics class in the Foresight program spotting and trend analysis, pointing out that at the University of Houston obtained granular analysis considers the existing themes and pat- knowledge around what makes a person likely terns in society. Merriam-Webster defines to consider opportunities in the emerging gig “trend” as a prevailing tendency or inclination, economy. Building a model that considered a a general movement. While a trend reflects cur- complex combination of demographic and atti- rent events that, based on the frequency of its tudinal variables, she was able to sketch out mention in blogs, twitter, news stories, and so the combination of traits of people who will be on, can be gauged as either increasing or more or less likely to thrive in the gig econ- decreasing, an emerging issue is a latent issue omy. This knowledge would be useful to sce- that has not yet reached mainstream attention. nario exercises focused on the future of work, Richard Lum writes that a trend is a job automation, or industrial reorganization. In its absence, our scenarios might be informed historical change up until the present, then an by anecdotal case examples or loosely appli- emerging issue is a possible new technology, a cable secondary findings. potential public policy issue, or a new concept or idea that, while perhaps fringe thinking today, could mature and develop into a critical Environmental Scanning mainstream issue in the future or become a through Direct Observation major trend in its own right. (Lum 2016) in Big Data Dator (2018, 7) describes emerging issues as Futurists strive to obtain somewhat similar the “far left tail of the ‘S’ curve of growth, attributes as data scientists in their information barely visible, and just beginning to pop into quests. Keeping an eye on external forces of view.” Hiltunen (2010, 15) writes that weak sociocultural, technological, economic, envi- signals are information about emerging issues ronmental, or political nature, futurists often and potential future changes. Since weak sig- adopt the STEEP-framework which allows for nals and emerging issues often signify a poten- the detection of early change in the macro- tial disruption rather than a continuation of a environment (Bishop and Hines 2012). And trend, we cannot necessarily rely on continu- while futurists consider criteria such as credi- ous data from the past. bility, novelty, likelihood, impact, and rele- Unlike trends, which we can quantify via vance of their sources (Bishop 2009), the four observable frequency metrics, weak signals Vs of Big Data—volume, velocity, veracity, and emerging issues are more vulnerable to and variety—are essential attributes of selection bias on part of the researcher as machine learning pipelines (Dea 2015). The well as the medium reporting the signal. main difference between futurists’ and data These biases are even more difficult to har- scientists’ data gathering methods is that while ness as search engines and news aggregators the former method is inductive, limited, and increasingly personalize our information. prone to availability bias, a data scientist can Foresight researchers must confront not only amass and analyze humanly impossible their own cognitive biases, but also the algo- amounts of data with less interference of sub- rithms that are optimized to exploit these jective . biases. Paradoxically, the more immersed in Essential to the environmental scanning pro- data and algorithms we become, the more cess is identifying inputs which through critical is it that we skillfully deploy 4 World Futures Review 00(0) carefully chosen algorithms to navigate our User-Generated Content and information journey. In a comprehensive lit- Unstructured Data erature review, Mühlroth and Grottke (2018) analyzed fifty foresight articles that used a Which types of data mining methods we chose combination of expert and computer-driven largely depends on what type of data we use. scanning for weak signals. The authors We can distinguish different type of data based found that queries that were led by experts on measurement level and whether the data are were more prone to human bias than those structured or unstructured. While structured that were fully computer-driven. While rec- data analysis has decades-long roots in busi- ognizing that human experts are more effec- ness intelligence, many businesses still struggle tive in the later stages such as in strategic to make sense of their unstructured data. With a decision making and implementation, data ratio between unstructured to structured data of mining and automated approaches in the 8:2 (Das and Mohan Kumar 2013), this leaves early stages of the corporate foresight pro- a lot of potential insights in the dark. Although cess search strategies are required to greatly jurisdictions surrounding the data’s origin reduce the human actor bias. restrict access and use, unstructured data are While trend analysis and forecasting tech- more often publicly available than structured niques have long traditions in foresight, the and preprocessed datasets. There seems to be study of data mining techniques for weak sig- an inverse relationship between data access and nal identification have shorter traditions. data cleanliness. While online user behavior Most data mining approaches currently used such as URL visits, click through rates, and in corporate foresight seem to revolve mostly web cursor activity easily convert into machine around searching, scraping, and summarizing readable formats, access to such insight is regu- written text. The data entities are therefore the lated and typically restricted to internet service coded ideas rather than the phenomena them- providers, search engines, and website owners. selves. Hiltunen (2008) found that most futur- However, vast sources of user data are avail- ists find weak signals from sources such as able in the form of blogs, discussion boards, politicians, domain experts, journalists and social media (Kayser and Blind 2017), and other futurists and direct observation of ordi- even aggregate-level search history (Stephens- nary people. With the exception of the last Davidowitz 2017). This type of data typically category, these are sources that merely requires preprocessing and is analyzed using express the ideas of change rather than the natural language processing (NLP) and natural change itself. Without dismissing the impor- language understanding techniques. tance of these reports of change, we risk Interestingly, while AI (artificial adopting biases that are embedded into the intelligence)-driven foresight aggregators such mediation of these reports. Big Data analytics as Shaping Tomorrow’s Athena make sophisti- on the other hand makes direct observation of cated use of unstructured data, foresight pro- early signals more feasible. One example is fessionals seem less inclined to make use of when businesses learn about new customer existing datasets data and resources found in behaviors from data mining their sales data their clients’ or employers’ relational data- 2 (Attewell and Monaghan 2015; Linoff and bases. Straightforward statistical or machine Berry 2011). In this context, scanning for learning models can be applied to public and weak signals might have more in common proprietary datasets for valuable insights. Why with precursor analysis where subcultures, this is not more common could be due to the jurisdictions, or early adopters may indicate common problem with data silos and corporate later change in larger, more slow-moving sys- cultures where foresight professionals rarely tems (Molitor 2017). By discovering smaller cross paths with analytics departments. For changes in limited datasets, it might be pos- subcontracting consultant there may also be sible to infer or deduce more impactful confidentiality boundaries that prevent the changes. foresight professional’s access to proprietary Boysen 5 data. As outside foresight professionals, it in multivariate vector spaces that are too com- might be useful to discuss data policies with plex for the human mind to handle intuitively. clients before embarking on new projects. Hence instead of using human heuristics to Futurists working internally in organizations infer relationships, we can view signals con- should make connections across departments textually using statistically deduced models. to gain access to data and insights used for Not only does this add scientific support to the other purposes than foresight. observed pattern, it also helps us avoid the temptation of inferring patterns where none Unsupervised Learning exists, a phenomenon called . Companies sometimes accidentally dis- Detects Signals—and Noise cover weak signals in their existing data while While text mining and NLP-methods that parse conducting more conventional analytics exer- and analyze volumes of text will continue to be cises (Harryson et al. 2014). Given the irregu- a core element of AI-driven foresight tools, lar nature of weak signals, identification lends machine learning methods offer untapped itself to computational anomaly detection. In potential for pattern recognition. Since these larger datasets, irregular data that could be sig- methods use classification and clustering tech- nals of change are usually paired with other niques, the foresight professional will gain attributes, which makes it possible to learn more robust scientific support for their find- more about the wider context in which the ings. By quantifying potentially interesting irregularities occur. By applying rules to detect observations or search for new patterns in associations between attributes, we can study existing datasets, we can find weak signals that emerging new trends by proxy of these more indicate change. In other words, machine learn- frequently occurring associated attributes. ing fills the gap between text-based data min- ing and traditional statistical methods. Supervised Methods and Let’s for example consider shifting con- Predictive Analytics sumer behavior. We might have a hunch that two customer groups, which otherwise have Often when we think of AI we think of predic- little else in common, suddenly show similar tive analytics. “Prediction” is sometimes seen behaviors or attitudes. This could indicate a as a contentious word among futurists because shift in underlying values in these two groups it suggests that the future can be predicted, an and affect how we should view stakeholders in assumption many futurists reject. Unfortunately, our alternative futures. Both structured and the term is not often explained precisely in unstructured data can help us discern interest- mainstream media. In machine learning, predic- ing signals. Using text mining to scan the tive analytics is mostly understood as a classifi- behaviors of these target groups in social cation model that can predict certain outcomes media, we might find word frequencies, com- on a target variable based on the correlations it binations, and sentiments around relevant finds in labeled training data. The meaning of mentions. If we have access to demographic the word “prediction” differs both in kind and in data, geolocation, or other metadata we might magnitude from the way the word is commonly be able to find approximate correlations har- used in foresight. Few data scientists would boring the weak signal of a new emerging claim predictive powers for large complex sys- issue. However, when these data are not acces- tems, especially where the interconnections are sible or require too much preparation to be unknown. Moreover, predictions are assigned practical, we can gain new insights by pulling specific probability values, and few would together various existing databases or some- argue that their models have absolute predictive times survey data to explore for patterns that power. Fit statistics are performed to alert the may spawn a new trend. Machine learning researcher if they might have run into an overfit techniques such as distance-based optimiza- model rather than a model which perfectly fits tion or cluster models allow us to view signals external reality (Attewell and Monaghan 2015, 6 World Futures Review 00(0)

32–35). Embedded into these models are tech- layers until the output more or less resembles niques that quantitatively find levels of accu- the original (Boysen 2019). ANNs are a type of racy, precision, and recall which tell us how predictive analytics capable of handling data good the model is at identifying occurrences that are of a different type and level of resolu- and also predicting their likelihood (Koehrsen tion than the aforementioned methods. While 2018). Nor are “black swans” likely to be dis- lower level data analytic methods limit the covered by predictive analytics (Taleb 2007) datapoints to those that are perceptible to since these models look for patterns in past human reasoning, ANNs can handle data ren- data.3 It is also important to keep in mind that dered in different formats. ANNs can analyze correlation does not always indicate causation elements that are not necessarily of direct rel- (Pearl and Mackenzie 2018). Predictive analyt- evance to futurists, such as hues and saturation ics informs us what is likely to happen, but not in an image. This is especially true for neural necessarily why. nets with several hidden layers, also called Predictive models have proven effective in deep neural networks. One way to think about identifying new customers, victims of crime, this is the different ways a computer can under- patients with certain health risks, and other use stand a digit. If the meaning of a digit is cases where we need to do individual predic- encoded into standard character format, such tions for each observation. Since actionable as UTF-8 or ASCII, the subsequent analysis microlevel insights is a common goal when will be straightforward. However, if the digit is using predictive models, they might be found to handwritten, such as in the Modified National be of less direct relevance to futurists searching Standards and Technology database (MNIST), for change in larger systems. deep layers in a neural network are used to find However, predictive analytics may have edges, colors, saturation, and so on necessary some direct application for pattern recognition for the computer to understand that it is in fact and some indirect applications which will be a digit and not something else (LeCunn et al. addressed later. Decision trees for example n.d.). Since ANNs predict based on attributes render informative visualizations that can and not logical reasoning, they sometimes build a deep understanding of hidden connec- make nonsensical mistakes such as failing to tions and illustrate connections intuitively to a see the difference between dogs and muffins larger audience. With this application of pre- (Yao 2017). While such misclassifications are dictive analytics, we consider not only the often humorous, they can sometimes lead to individual outcomes, but also the connectivity insensitive errors. The autotagging feature of between each data point. Decision trees in this the photo sharing site Flickr ran into a faux pas context can be used as an alternative to other when it suggested a photo from the Dachau multivariate approaches. And while parametric concentration camp be tagged as “jungle gym” multivariate approaches that are used to find (Auerback 2018). central tendencies usually require numeric The “synaptic” reasoning inside the network interval data, decisions trees can handle cate- is often so complex that we cannot determine gorical data (Gunluk et al. 2019). the reason why the network makes a particular connection. Since we don’t know the inner con- Artificial Neural Networks nections between the input and output layers, (ANNs) and New Types of Data these methods have been called blackbox algo- rithms. Since futurists have a greater need to The most sophisticated form of predictive understand the reasoning behind the connec- models are neural networks, especially what tions and not only just predict individual out- we know as deep learning. ANNs mimic the comes, the value of neural networks that use highly complex nonlinear structure of neurons deep learning might be in analyzing a greater in the human brain. An input layer, such as an variety of data sources. While of limited impor- image, feeds information that triggers the fir- tance today, as more nontraditional data become ing of “neurons” connected in various hidden available to futurists, deep learning can be Boysen 7 useful to process a more diverse pool of data programs such as Facebook’s Fairness Flow, such as image or audio data to look for trends IBM’s AI Fairness 360, and TensorFlow’s and weak signals. What-If tools which helps analysts find how a model responds to a single feature, overcom- Are Machines Biased? ing the problem with blackbox algorithms mentioned earlier (Wiggers 2018, 2019). Using data analytics to mitigate human bias is Google proactively applies sophisticated meth- not helpful if the data our machines train on ods to ensure search results not only yield nar- have adopted these biases.4 In fact, without rowly relevant queries, but also strive to meet ethical goals and adequate feature engineering, ethical standards (Webster et al. 2018). we risk amplifying rather than eliminating the But there can also value in studying the out- cognitive biases we first set out to combat. come of biased algorithmic output for diagnos- In her book Weapons of Math Destruction tic reasons. By holding up a “mirror” machine Cathy O’Neil (2016) points to examples where learning models can help organizations learn unprivileged loan applicants have been denied from deep-rooted, sometimes unconscious mortgages, where schoolteachers get punished biases that affect organizational cultures. Before for schools’ past performances, and where ditching their AI recruiting system, Amazon hir- poor people are microtargeted with predatory ing managers were able to learn about the hid- schemes culminating from their metadata, den biases and language influences that had such as zip codes and home value. In O’Neill’s penalized female candidates in their hiring pro- cases injustices embedded in historic data not cesses. They learned that successful resumes only exacerbate but give a veil of false often contained words like “executed” and machine-driven objectivity. Social bias has “captured” which exuded confidence. These been found in NLP where deep learning algo- words were more frequently expressed by male rithms draw word associations when trained on candidates (Logg 2019). When training data are corpora that reflect cultural stereotypes cleansed for , they are inherently (Bolukbasi et al. 2016). Recently, Amazon had more transparent and reliable in their predictive to scrap an AI recruiting tool after it trained on outcomes than humans. human resources data which demonstrated that Machine learning projects that can result in women were less likely to get technical jobs in algorithmic bias are in most instances very dif- the company (Lavanchy 2018). ferent from the type of models, data types, and However, arguing that algorithmic methods objectives most futurists will encounter in their are inherently biased implies that the machines analytic endeavor. Hence, when futurists con- themselves and not the humans feeding them trol not only the sources of data to be processed are the sources of this bias, and this is clearly but also interpretation of the results, chances of not the case. Several efforts at are social bias that could skew the analysis are already happening, and computers are being almost nonexistent. trained to disambiguate unintended word asso- ciations. Removing bias from algorithms can be as simple as removing a few input variables A Digital Wild West, the or more complex such as neutralizing word New Oil Barons, and Privacy embeddings. Algorithms learn meanings and Protection associations by finding the cosine distance between co-occurring terms in word vector Data-driven foresight can only be as good as spaces, so by deliberately assigning equidis- the data we have access to. As the saying goes, tance between stereotypically loaded words “garbage in, garbage out.” Ironically, while and its offensive simile we can greatly reduce over 2.5-quintillion bytes of data are created these otherwise unfortunate associations. Big every single day, the majority of it is still inac- tech companies are rolling out initiatives to cessible to most people (Ahmad 2018). prevent machine-originated prejudice via Moreover, the information discrepancy between 8 World Futures Review 00(0) those who own and those who don’t own data valuable because of how it complements, not is intensifying. One example is data generated replaces human cognition. The same features by connected devices. When once dumb gad- that make our brain so efficient at deriving gets and appliances become smarter, they meaning are also the qualities that distract us don’t only simplify our lives and help us make from being thorough and impartial analyst. We economical and efficient choices. Whole sup- can multitask and make sense out of very lim- ply chains use these data to monitor our ited data, but in doing so we make shortcuts domestic habits and behaviors. Such data and put creative spins which can distort our could be treasure troves for futurists who want analysis. We don’t always know where our to learn about changes in people’s lived envi- objective analysis ends, and our subjective ronment. However, little of these data will interpretations start. Data mining and machine likely become accessible to the public anytime learning methods offer the impartiality and soon. One reason is obviously to protect users’ analytical capability that humans lack. While privacy. But even when anonymized, compa- full objectivity remains a tenuous goal, data nies who use these data to maximize profit for deductive approaches early in the foresight themselves have few incentives to share it. process can help ensure that we base our inter- A leader article in the Economist (“The pretations on a robust body of unbiased World’s Most Valuable Resource Is No Longer insights. However, while providing more com- Oil, But Data” 2017) urges governments to prehensive data and analytic capabilities, AI is open up some of their data vaults while recog- not likely to replace the human capacity to nizing the data economy as public infrastruc- ponder various future outcomes. Instead, it ture. India’s digital identity system, Aadhaar, is will ensure that we can assess the current situ- one such example. Other available data sources ation with a quantitative measurement of pre- are provided by companies that submit datas- cision and accuracy we would not be able to ets to public repositories such as Kaggle to cre- find using human approaches alone. ate competitions where companies can draw We can approach the future not with certi- benefit from the collective knowledge of data tude, but with enough granular insights about scientists. Yet, there is still no widespread what happens today to make meaningful and available repositories, so the onus is on the provocative scenarios about the future. We individual analyst to not only extract, clean apply data analytics in ways that let us dis- and transform raw data into useable formats, cover big trends and nascent change, and a but also understand privacy regulations around good understanding of likely cause and effect the data they want to use for analytical pur- in closed systems so that we can anticipate poses. It is worth mentioning that while futur- what might happen when these systems are ists can make use of the same tools that allow affected by autonomous forces in open sys- businesses to micro target individuals, our tems. Data mining helps us do the dirty work research is intrinsically disinterested in per- of understanding relationships where they sonally identifiable information. This because exist, but it’s up to us to find implications and our objective is measuring change, not pitching build narratives on these relationships. individuals with product offerings. Futurists may therefore find use in anonymized data that Declaration of Conflicting Interests are of less value for businesses whose objec- tive is direct customer interaction. The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Conclusion Human intelligence has evolved over millions Funding of years. Machine intelligence for less than a The author(s) received no financial support for the century. Machine intelligence is not superior to research, authorship, and/or publication of this that of humans. Data analytic methods are article. Boysen 9

ORCID iD as-woman-is-to-homemaker-debiasing-word- embeddings.pdf. Anne Boysen https://orcid.org/0000-0001- Boysen, A. 2019. “How Data Science Enhances 7673-5589 Foresight.” Futurist.com by Glen Hiemstra, Notes February 26. https://www.futurist.com//// how-data-science-enhances-foresight-by-anne- 1. While traditional hypothesis testing uses sig- boysen/20190226. nificance testing to prevent type 1 and type “Confirmation Bias and the Power of Disconfirming 2 errors, Big Data analytics and machine Evidence.” 2017. Farnam Street. https://fs.blog/ learning often use different types of valida- 2017/05/confirmation-bias/. tion techniques. This is useful because of the Conway, M. 2006. “An Overview of Foresight diminishing return of p values in large datasets Methodologies.” http://www.forschungsnetzwerk and the ability to test new information against .at/downloadpub/An-Overview-of-Foresight- training data. Methodologies1.pdf. 2. This presumption is based on the absence of Das, T. K., and P. Mohan Kumar. 2013. “Big Data such mentions in common foresight discus- Analytics: A Framework for Unstructured Data sions. Customer databases may be more fre- Analysis.” International Journal of Engineering quently used in corporate foresight without and Technology 5 (1): 153–56. being a central topic, which could be due to cli- Dator, J. 2009. “Alternative futures at the Manoa ent agreements and confidentiality constraints. School.” Journal of Futures Studies 14 (2): 1–18. 3. To the extent black swans can be discovered Dator, J. 2018. “Emerging Issues Analysis: Because in Big Data, the types of anomaly detections of Graham Molitor.” World Futures Review 10 mentioned earlier is likely to be more produc- (1): 5–10. tive than predictive modeling. Dea, J. 2015. “Do You Know the 4 V’s of Big Data?” 4. It’s important to appreciate that the word bias INTELEX Blog, July 16. https://blog.intelex. usually has a different meaning in machine com/2015/07/16/do-you-know-the-4-vs-of-big- learning. Bias is a parameter in artificial neural data/. networks which has the same function as the Gunluk, O., J. Kalagnanam, M. Li, M. Menickelly, intercept in a linear equation. In this context, and K. Scheinberg. 2019. “Optimal Decision bias is used to reference social bias. Trees for Categorical Data via Integer Programming.” http://www.optimization-online References .org/DB_FILE/2018/01/6404.pdf. Ahmad, I. 2018. “How Much Data Is Generated Harryson, M., E. Métayer, and H. Sarrazin. 2014. “The Every Minute? [Infographic].” Social Media strength of ‘weak signals.’” Mckinsey Quarterly, Today, June 15. https://www.socialmediatoday. February . https://www.mckinsey.com/industries/ com/news/how-much-data-is-generated-every- technology-media-and-telecommunications/our- minute-infographic-1/525692/ insights/the-strength-of-weak-signals. Attewell, P. A., and D. Monaghan. 2015. Data Hiltunen, E. 2008. “Good Sources for Weak Signals: Mining for the Social Sciences: An Introduction. A Global Study of Where Futurists Look For Oakland: University of California Press. Weak Signals.” Journal of Future Studies 12 Auerback. 2018. Bitwise: A Life in Code. New (4): 21–44. York: Pantheon Book. Hiltunen, E. 2010. Weak Signals in Organizational Bishop, P. 2009. “Horizon Scanning Why Is It Futures Learning. Aalto University School of So Hard?” http://law.uh.edu/faculty/thes- Economics: Aalto Print. http://epub.lib.aalto.fi/ ter/courses/Emerging%20Tech%202011/ pdf/diss/a365.pdf. Horizon%20Scanning.pdf. Inayatullah, S. 2009. “Causal Layered Analysis: Bishop, P., and A. Hines. 2012. Teaching about the An Integrative and Transformative Theory and Future. New York: Palgrave Macmillan. Method.” In Futures Research Methodology, Bolukbasi, T., K. Chang, J. Zou, V. Saligrama, and A. Version 3.0, edited by J. Glenn, and T. Gordon. Kalai. 2016. “Man Is to Computer Programmer Washington, DC: The Millennium Project. as Woman Is to Homemaker? Debiasing Word Inayatullah, S. 2014. “Causal Layered Analysis Embeddings.” Neural Information Processing (CLA) Defined (2014).” Metafuture. https:// Systems Foundation. https://papers.nips.cc/ www.metafuture.org/causal-layered-analysis- paper/6228-man-is-to-computer-programmer- cla-defined-2014/. 10 World Futures Review 00(0)

Kayser, V., and K. Blind. 2017. “Extending the Popper, K. R. 2002. The Logic of Scientific knowledge base of foresight: The contribution Discovery. New York: Routledge. of text mining.” Technological Forecasting & Sarniak, R. 2015. “9 Types of Research Bias and Social Change 116:208–15. How to Avoid Them.” https://www.quirks. Koehrsen, W. 2018. “Beyond Accuracy: Precision com/articles/9-types-of-research-bias-and- and Recall,” towards Data Science.” https:// how-to-avoid-them towardsdatascience.com/beyond-accuracy-pre- Stephens-Davidowitz, S. 2017. Everybody Lies: What cision-and-recall-3da06bea9f6c. the Internet Can Tell Us about Who We Really Kuntz, M. 2012. “The Postmodern Assault on Are. New York: HarperCollins Publishers. Science: If All Truths are Equal, who Cares Taleb, N. N. 2007. The Black Swan: The Impact of What Science Has to Say?” EMBO Reports the Highly Improbable. New York: Random 13:885–89. doi:10.1038/embor.2012.130. House. Lavanchy, M. 2018. “Amazon’s Sexist Hiring “The World’s Most Valuable Resource Is No Algorithm Could Still Be Better than a Longer Oil, But Data.” (2017). The Economist, Human.” Phys.org, November 1. https://phys. May 6. https://www.economist.com/lead- org/news/2018-11-amazon-sexist-hiring-algo- ers/2017/05/06/the-worlds-most-valuable- rithm-human.html. resource-is-no-longer-oil-but-data. LeCunn, Y., C. Cortes, and C. Burgess. (n.d.). “The Voros, J. 2003. “A Generic Foresight Process MNIST Database of Handwritten Digits.” http:// Framework.” Foresight 5 (3): 10–21. yann.lecun.com/exdb/mnist/. Webster, K., V. Axelrod, J. Baldridge, and Linoff, G., and M. J. A. Berry. 2011. Data Mining M. Recasens. 2018. “Mind the GAP: A Techniques: For Marketing, Sales, and Customer Balanced Corpus of Gendered Ambiguous Relationship Management. Indianapolis: John Pronouns.” Transactions of the Association for Wiley. Computational Linguistics 6, 605–17. https:// Logg, J. M. 2019. “Using Algorithms to Understand arxiv.org/pdf/1810.05201.pdf. the Biases in Your Organization.” Harvard Wiggers, K. 2018. “Google’s What-If Tool for Business Review, August 9. https://hbr. TensorBoard Helps Users Visualize AI Bias.” org/2019/08/using-algorithms-to-understand- https://venturebeat.com/2018/09/11/googles- the-biases-in-your-organization. what-if-tool-for-tensorboard-lets-users-visual- London School of Economics and Political Science. ize-ai-bias/. 2017. “Is Post-Modernism to Blame for our Wiggers, K. 2019. “MIT CSAIL Researchers Post-Truth World?” [Audio Podcast], Public Propose Automated Method for Debiasing AI Lectures and Events, October 2. http://www. Algorithms.” https://venturebeat.com/2019/01/ lse.ac.uk/lse-player?id=3892. 26/mit-csail-researchers-propose-automated- Lum, R. 2016. “Trends vs. Emerging Issues: What method-for-debiasing-ai-algorithms/. Is the Difference?” Visionforesightstrategy. “Word of the Year.” 2016. Oxford Languages. wordpress.com, April 3. https://visionforesight- https://languages.oup.com/word-of-the-year/ strategy.wordpress.com////trends-vs-emerging- word-of-the-year-2016. issues-what-is-the-difference/20160403. Yao, M. 2017. “Chihuahua or Muffin? Searching Molitor, G. T. T. 2017. “The Molitor Model of for the Best Computer Vision API.” TOPBOTS, Change.” World Futures Review 10 (1): 13–21. September 22. https://www.topbots.com/ doi:10.1177/1946756717747636. chihuahua-muffin-searching-best-computer- Mounk, Y. 2018. “What an Audacious Hoax vision-api/. Reveals about Academia.” The Atlantic, October 5. https://www.theatlantic.com/ideas/ archive/2018/10/new-sokal-hoax/572212/. Author Biography Mühlroth, C., and M. Grottke. 2018. “A Systematic Anne Boysen holds a Masters Degree in Foresight Literature Review of Mining Weak Signals and from the University of Houston and a Graduate Trends for Corporate Foresight.” Journal of Certificate in Business Analytics from Penn State Business Economics 88 (5): 643–87. University. She has worked as an analyst in corpo- O’Neil, C. 2016. Weapons of Math Destruction: rate settings, as an independent data analyst and as a How Big Data Increases Inequality and foresight consultant in collaboration with other Threatens Democracy. New York: Crown. futurists, enriching foresight projects with data ana- Pearl, J., and D. Mackenzie. 2018. The Book of lytics and insights. Ms. Boysen teaches data mining Why: The New Science of Cause and Effect. for the graduate program in Foresight at the New York: Basic Books. University of Houston.