<<

Big Data and Real-Time Analytics for Agile Global Development Robert Kirkpatrick Director, UN Global Pulse

www.unglobalpulse.org @unglobalpulse Hyperconnectivity

Airplane Flights Telephone Calls

Internet Traffic Social Networking

Real-time isn’t just faster. It’s different. ABOUT GLOBAL PULSE

BIG DATA IN REAL TIME: 3 OPPORTUNITIES

1. Real-time Events: Earlier detection of anomalies and events allows rapid response to crises.

2. Real-time Trends: A current analysis of population activities and dynamics supports more effective programme planning and implementation

3. Real-time Evaluation: Real-time measurement of impact-related behavior change allows for rapid, adaptive course corrections in programmes AGILE GLOBAL DEVELOPMENT? What Do We by Real-Time?

Global Pulse Definition: “Information about a phenomenon available quickly enough to maintain an accurate reflection of its current state, such that effective action may be taken in response.”

Note: timeframe for effective response is relative to context:

• Malnutrition Months

• Starvation Weeks

• Cholera Days

• Earthquake Hours GLOBAL PULSE CHALLENGES BIG DATA IS A HUMAN RIGHTS ISSUE

• Never analyze personally identifiable information • Never analyze confidential data • Never seek to re-identify individuals User-generated Data

• Falsification • Sensor network distribution • Perceptions vs. facts: Flu Trends detects ILI, not Influenza. • Sentiment Analysis: sarcasm, irony, hyperbole, humor, and the elusiveness of intent. • Text mining: context and significance • Expressed vs. actual intentions Behavioral Data

• Selection : income, education, age, gender, technical aptitude, service provider • Media coverage drives behavior change • : correlation is not causality KNOW YOUR DATA Big Data is….just data. However…

• News organizations have developed verification methodologies • Perceptual data is useful for detecting events • False perceptions drive population behavior • Selection bias can be an advantage: in developing countries, online inflation may precede offline inflation THE PROBLEM WITH TELESCOPES…AND MACROSCOPES

I know the data is out there…somewhere. Data Philanthropy? Actually, we need a global real-time public/private data commons How to integrate real-time data into institutions?

• This data may be less accurate that official sources.

• But it’s faster.

• And it’s cheaper to collect.

• How can we leverage USGS Twitter Earthquake Detector the speed to change (TED) the outcome? Trending topic: are vaccines halal?

25 October 2012 | www.unglobalpulse.org Online vaccination conversation breakdown Examples BACKGROUND: A GROWING BODY OF EVIDENCE RESEARCH How Mobile Phone Carriers See the World

Call Detail Records (CDRs) • Caller ID (hashed phone #) • Caller Tower Location • Receiver ID (hashed phone #) • Receiver Tower Location • Call Start Time • Call Duration Airtime Expense Records • Caller ID (hashed phone #) • Caller Tower Location • Amount of Purchase • Time of Purchase • Balance at Time of Purchase Modeling Behaviors in Mobile Data

Consumption variables • Number of calls, call duration, SMS/MMS/voice • Size, frequency, total number of airtime purchases • Handset Type and Features

Social variables • Degree of the social network • Weight of the contacts, frequency of communication

Mobility variables • Diameter of mobility and social network • Radius of gyration • Mobility Patterns Source: Telefonica Research, 2011 EXAMPLE: AIRTIME CREDIT PURCHASE DATA SIZE AND FREQUENCY PREDICT HOUSEHOLD INCOME

Higher household income

Lower household income Average size of purchase Average

Average # of purchases / month CALLING PATTERNS AND ECONOMIC OPPORTUNITY

Lower Higher socioeconomic level socioeconomic level

MEN AND WOMEN USE THEIR PHONES DIFFERENTLY Men: Women: • Fewer calls • More calls • Shorter calls • Longer calls • Larger social network • Smaller social network • More work-related calls • More personal calls Tracking population movement to predict cholera

Source: Linus Bengtsson et. al., PLoS Medicine, 2011 A mobility index to evaluate H1N1 response in Mexico City

Source: Telefonica Research, 2011 See: http://www.unglobalpulse.org/publicpolicyandcellphonedata TWITTER PREDICTS SPREAD OF INFLUENZA

r2 = .958

“You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011.” http://www.cs.jhu.edu/%7Empaul/files/2011.icwsm.twitter_health.pdf Rumi Chunara et. al., American Journal of Tropical Medicine and Hygiene, 2012 86:39-45 GOOGLE SEARCHES FOR SYMPTOMS PREDICT DENGUE GLOBAL PULSE RESEARCH 2011 PROOF OF CONCEPT STUDIES Online at: http://www.unglobalpulse.org/applyingbigdatatodevelopment http://www.unglobalpulse.org/projects/can-social-media-mining-add-depth-unemployment-statistics Online Discussions & Unemployment Ireland Online Discussions & Unemployment United States http://www.unglobalpulse.org/projects/twitter-and-perceptions-crisis-related-stress Jakarta: nine million tweets per day

Map of Twier usage in Jakarta – by Eric Fischer

Tweets per day about food, during Ramadan in Indonesia

Start of Ramadan

End of Ramadan Tweets predict food basket inflation (rice, chilies, fish, sugar, corn, cooking oil)

Tweets about the price of rice

(per month)

Official Food Price Inflation

(monthly from 25 cities) PULSE LABS Joint Research | Rapid Prototyping | Capacity Building

Pulse Lab Network

Pulse Lab Jakarta…………October 2012 Pulse Lab Kampala……….June 2013 Other locations…………...??? Pulse Lab New York Research Measuring Change in Perception of HIV/AIDS

See: http://www.unglobalpulse.org/WorldAIDSDay-Part2 Pulse Lab New York Research Mobile Networks as Drought Impact Sensors

Proposal • Obtain 2011-2012 mobile CDRs and airtime purchases. • Derive mobility, consumption, and social variables. • Correlate variables with precipitation levels, survey data. • Identify signatures of drought impacts in 2011. • Identify signatures of aid impact in 2012. • Develop and evaluate prototype during next drought. • Release open source “appliance” through GSMA.

Research Tool 1 Crimson Hexagon: ForSight

Food Prices: What a real crisis looks like

23 July -­‐ 2 Aug ‘tempeh’ and tofu ‘ ’ hot debate

14 -­‐ 21 Aug Ramadhan / Idul Fitri Comparing Crises

Tweets about food

18 Mar -­‐ 7 Apr 23 July -­‐ 2 Aug Fuel subsidy cut plan and ‘tempeh’ and ‘tofu’ hot debate protests against it during soybean shortage Research Tool 2 SAS Social Media Analytics and SAS Text Miner

Analytic Workflow

1) Over 200,000 new 5) Explore results and correlate 3) Capture senment Indonesian language with official stascs to official and mood for Bahasa BPS stascs : Consumer Price documents per day Index (CPI) for 12 common foods

Global Pulse Sentiment, Topic & Internet Relevance Mood & Geography Interactive Dashboard Conversation Filter Influence Categories

2) Extract conversaons 4) Detect locaon, price, about rice, cooking oil, availability, specific fuel, employment, etc. govt. programs, etc.

anxious, confident , confused, hosle , sad , happy (-”-) ;-) ((+_+)) :@ :( :)

The Signals Are Getting Stronger

è Big increase in volume of relevant conversations over 18 months

40000

35000 minyak (oil) ketahanan pangan (food security) 30000 budidaya (culvaon) 25000 telur (eggs) 20000

15000

10000

5000

0

Indonesians are increasingly using social media to discuss basic needs So Are the Temporal Correlations

è Listening to social conversations provides insight on official data

2.5

2

1.5

1

0.5

0

-­‐0.5

-­‐1

-­‐1.5 Social Media Food Index

-­‐2 BAPPENAS Food Price Index -­‐2.5 2013 Research Agenda 1. Social media for social protection 2. Mobile vulnerability maps 3. Real-time food security – social media, PriceStats, Nokia Life and Nielsen 4. Twitter Dengue monitor 5. Predicting malaria risk through mobility 6. Passive polling for gender discrimination 7. Digital M&E Toolkit

ROBERT KIRKPATRICK Director UN Global Pulse www.unglobalpulse.org [email protected] +1 (650) 796-5709

Image credit: Aaron Koblin 24 hours of AT&T phone calls and Internet traffic flowing through New York City