A*STAR AI Initiative From Perception to Cognition: Towards Human-Understanding and Human- Centricity in AI

Kenneth Kwok, PhD Principal Scientist, Institute of High Performance Computing Programme Manager, A*STAR AI Initiative Let there be AI…

… We think that significant advance can be made in one or more of these problems if a carefully selected a group of scientists work on it together for a summer.

• Simulate every aspect of human learning and intelligence • Programming Computers to Use a Language • Self Improvement (Learn) • Randomness and Creativity

2 A Short History of AI

• CMU Libratus (2017) • • DeepMind AlphaGo • Semantic Nets (1966) Commercial Expert Systems • (2016) • ELIZA (1965) LISP machines (1980) • • Logic based Q&A (1964) PROSPECTOR (1979) • DeepMind Atari Games (2015) • LSTM (1997) • ADALINE (1959) • MYCIN(1974) • Convolutional NN (1979) • IBM Deep Blue (1997) • IBM (2011) • GPS (1957) • Backpropagation (1974) • Robo‐Cup (1997) • Apple (2011) • LISP (1958) • Frames(1975) • NetTalk (1985) • Perceptron (1957) • SVM(1983) • ImageNet(2009) • (1972) • Q‐learning (1989) • EDVAC (1949) • NLP at Stanford (1970) • DARPA Urban Challenge (2007) • ENIAC (1945) • SOAR (1983) • Return of NNs and • DARPA Grand Challenge (2005) • Hopfield Nets (1982) Backpropagation • McCulloch and Pitts • Minsky and Papert (1943) paper (1969)

1945 1970 1980 1990 2000 2010 2020 Dartmouth 1st AI Winter 2nd AI Winter Big Data Conference / (1956) Expert Systems Reinforcement Learning Knowledge and Reasoning Computational Power Reasoning as Search Logic Grand Challenges Neural Networks 3 Recent Successes (State of the Art)

• Object recognition (ImageNet Challenge) • Voice assistants (Siri, Alexa/Echo, Google Home) • Machine translation (Google, Nuance) • Go, Chess and Poker (IBM Deep Blue, DeepMind AlphaGo, Libratus) • Autonomous vehicles (Google, Uber, nuTonomy) • Trivia/Q&A (IBM Watson for Jeopardy) • Medical/legal assistance (DeepMind, Watson)

4 A*STAR AI Capabilities A*STAR Speech and Language

English, Chinese and Southeast Asian Languages

Speech Language Recognition Understanding

Dialogue Management

Speech Response Synthesis Generation Spoken Dialogue System

Speech Recognition Machine Translation Speech Synthesis Speaker Recognition (Voice Biometrics) 6 Benchmarking of Capabilities

I²R’s English speech recognition solution won the 2015 ASpIRE (Automatic Speech Recognition in Reverberant Environments) Challenge organized by IARPA of US, participated in by 169 teams from 32 countries

7 Benchmarking of Capabilities

I2R’s engine performed >10% better than acoustic feature extraction engine of Nuance and Google for Mandarin speech recognition. Benchmarking was conducted by a Japanese firm in several application scenarios, under both clean and noisy conditions

8 A*STAR Computer Vision

• Image Segmentation • Image / Object Classification • Action, Activity Recognition

9 Virtual Radiologist for CT Images Task : Nodule Segmentation

Approach: Deep Convolutional Neural Networks (DCNN) Using Human Organ Medical Images

Lung nodule detection from CT Images: Classification Accuracy 80%

10 Tumor Tissue Image Classification

Description

•Target: automatically classify a given tissue Convolutional Neural Networks image into tumor or non‐tumor groups. In Conv Pool Conv Pool PC Out •SPIE paper: extract 43 features: colour + 32x32 32x32 16x16 16x16 8x8 texture (Gabor & Co‐occurrence matrix), then depth: apply ELM/SVM to do classification. 16 16 32 32 128 2 weight: 3x3x3x16 3x3x16x32 8x8x32x128 •Classification result: ~91% accuracy by ELM and ~89% accuracy by SVM, in contrast, deep 3x3 convolutional neural networks achieves ~96% 2x2 3x3 accuracy. 2x2

Method / Approach tumor

•Deep convolutional neural networks: 1 input layer + 2*convolutional layer s+ 2*max‐pooling layers + 1 full connection layer + 1 output layer

Expected Impact Input Images Non-tumor •Significantly higher classification accuracy •General approach applicable to other biomedical applications. 11 Automation for Medical Imaging: Human Anatomy Recognition Using Multi-Modal Data and Deep Learning

Motivation

•Human operator has to configure imaging parameters depending on diagnostic procedure (e.g. lung, heart, etc.), which is slow and inaccurate if patient moves or is covered by a blanket, even if he has medical knowledge to estimate organ position based on surface features. •Non‐intrusive technology is needed (does not interfere with medical devices)

Approach

•Develop vision‐based sensing technology and algorithms to estimate patient position •Low‐cost depth & thermal sensors make multi‐modal data more accessible for higher recognition accuracy •Human skeleton dataset of 40K frames collected •Poisson surface reconstruction, human detection using depth image, pose estimation & recognition, skeleton recognition •Deep learning (CNN) to automate alignment during medical imaging

Achievement

•Ability to accurately predict feature points such as joints and internal organs, generalisable to other object detection applications •Automate medical imaging and increase throughput •2 cm mean error of joint locations achieved, comparable with current gold standard of human operators •Runs in real‐time on a laptop equipped with mid‐range GPU

12 Coating Surface Defect Inspection

‐ Need regular checking ‐ Time consuming and labor intensive ‐ Exposure to dangerous environment

13 Virtual Defect Inspection Engineer Description

• Traditional methods: detect defect + feature extraction + pattern classification • An integrated approach using deep Coating Surface learning Image • High potential to reduce the time taken over traditional methods In • High potential to reduce the complexity of developing accurate models Deep Method / Approach Learning • Develop an automatic recognition system based on deep neural networks architecture. Out Defect Patterns & Localization Performance

• 95% for defect classification • 80% for detection of coating surface defect 14 A*STAR Data Analytics

• Machine Health Analytics • Biomedical Informatics • Consumer Analytics

15 Machine Health Monitoring

Time to change bearings?

‐ Requires regular checking and maintenance

‐ Time consuming and labour intensive

16 Virtual Machine Health Doctor : AI-based Predictive Maintenance • Accurately predict the remaining useful life of a machine based on sensor data. • Effectively reduce machine downtime. • Effectively reduce labor cost on regular maintenance.

Data Module Feature Module Modellingg Module Evaluation Module

Extraction Selection Techniques Metrics Raw Signals Pre-processing Time domain Fisher’s ratio Neural Networks (NN) Root-Mean Squared Error (RMSE) Frequency domain Support Vector Mean Absolute Percentage Error Time-frequency Regression Precision (PR) domain Prognostics Horizon (PH) Confidence Interval (CI)

17 Virtual Data Scientist: Use AI to Automate Model Building

Virtual Scientist : Automated Model Building System Data

Feature Hyper‐ Data Algorithm Extraction and Parameter Validation Deployment Clean Selection Engineering Tuning

Key Features: ‐ Automatic Feature Extraction and Engineering ‐ Automatic Algorithm and Hyper‐Parameter Tuning ‐ Self‐adapting and learning for new dataset

Manufacturing HealthcareServices & Digital Urban Systems Economy Working with Embraer team on this automated approach for predictive maintenance under A*STAR Aerospace Programme. 18 Preliminary Results on Bearing Fault Detection

Task Faulty or Normal?

Lot of parameters Manual Model Building to be manually adjustedj

Prediction Accuracy 0.65

Auto Model Building

Prediction Accuracy 0.85

19 1st Prize Rakuten-Viki Global TV Recommender Challenge

Motivation / Objectives

• To build a personalized TV Recommender system for world‐wide Rakuten‐Viki fans • Recommend videos that a user is likely to watch (precision) and watch for long time (engagement) • “Cold‐Start” problem : 20+% users do not apprear in training data) • Data sparsity problem : most users viewed <= 5 videos in training data

Approach

• Typical recommendation algorithms do not well here due to sparsity and cold‐start problems • Formulate as classification problem instead of a typical Distribution of No. of Videos viewed in the training data recommendation problem to predict the probability of a for Users tested in Feb 2015(Left) and Mar 2015 (Right) video that a user is likely to watch IHPC

Achievement / Impact / Value Capture

•1st Prize Winner • Overcome “cold‐star” and data sparsity problems • Robust and scalable approach for online recommendations Leader Board Ranking • Flexible to incorporate other general features Performance Score Leader Board Journey 20 Gap between AI and True Intelligence

 Essentially pattern matching ~ Mostly PERCEPTION  No UNDERSTANDING, largely BLACK BOX approaches

Where is Understanding?

21 Cognition: Human-Level AI

• A computer program capable of acting intelligently in the world must have a general representation of the world in terms of which its inputs are interpreted. • Designing such a program requires commitments about what knowledge is and how it is obtained... • More specifically, we want a computer program that decides what to do by inferring... a certain strategy will achieve its assigned goal. This requires formalising concepts of causality, ability, and knowledge. JOHN McCARTHY 1927–2011

22 A*STAR Social Cognitive Computation

IndividualGroups Communities

Social Collaborative Integrative Psychometrics & intelligence thinking & psychological decision science & technologies modelling Cognitive systems

Applications

Behaviour motivations Consumer preferences Understand ground sentiments Enhance learning productivity Brand surveillance People and behaviour profiling Cognitive ability assessment Targeted marketing Optimized business transactions Person-product matching Strategic crowdsourcing Improve consumer satisfaction

23 Fine-Grained SentiMo API & SDK released Design Features and Novelty Performance to licensees

• Fine‐grained multi‐dimensional Average of F1 ‐ Score for Positivity, outputs (positive, negative, Negativity, Neutrality Recognition neutral, mixed, sadness, anger, 1.00 happiness, excitement…) 0.80 • Comprehensive lexicons, fully in‐ 0.60 SentiMoAPI v1.0 house developed (English, 0.40 Internet slangs, local language 0.20 ToolA 0.00 and domain words collections) ToolB • Linguistic processing units (decomposer, negation handler,

http://172.20.98.207:8080/sentimo‐webportal/sentimo_api.html amplifier handler…)

Our Real‐World Sentiment Analysis Case Studies

Ground sensing of day‐to‐day Discovering consumer preferences Understanding brand perceptions Quantifying positivity generated from commuter sentiments across products across cities public campaigns

http://imageanalysis.socialanalyticsplus.net 24 “A method and system for sentiment classification and emotion classification”, Patent Cooperation Treaty (PCT) Application PCT/SG2015/050469 People Analytics

System and Platform

Application 25 Learning from Experience: Rapid Causal Learning Experiment: Learning Causality from Experience Ground Level Knowledge Level Causal Learning Relationship between Lightning and Thunder Causal Learning (Temporal Correlation) Lightning Blinking beacon (B) flash (L) Build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems+ Causality from Temporal Correlation HG Strength(Cause(Event1, Event2)) = Headlight flashes of vehicles Prob(Cause(Event1, Event2)) – Wt * Uncert(Cause(Event1, Event2)) Lightning reliably predicts Thunder Inspired by Contingency Model of Causal Learning from Psychology + See also: Building Machines that Learn and Think Like People (Lake, Ullman, Tenenbaum, and Gershman, 2016) in arXiv:1604.00289v3

26 Learning to play Atari Games in a human-like way

• Google DeepMind published a Nature paper+ on learning to play 50 Atari games using deep reinforcement learning + Mnih, V. et al.(2015) Human‐level control through deep reinforcement learning, Nature, 518, p. 529 • DeepMind achieved human level score in 50% of the games. However, learning took a long time – hours! – with many iterations, and the system had to learn from scratch for each game.

That is NOT human performance ...50+ games • We are currently building a system using our causal learning method to learn to play the same games in a human‐like way

Learning Object Human‐level scores • Properties of barriers, gaps, Learning Behavioural Interactions in Game Learning Game Play missiles, shields etc. Scripts Fast on‐line learning Environment • Predicting trajectories Transfer between games • Effects of a successful shot and of being shot etc. •Behaviours such as shooting • Rules of specific games dodging shots, hiding, chasing • Adapt behaviours to game

27 Learning to play Space Invaders

28 Commonsense Knowledge Representation and Reasoning

 Achievements: • Codified a commonsense (KB) using semantic graph representation Semantic Graph using Neo4J • 3.4 million concepts in about 10 million relational assertions • Sourced from KBs such as ConceptNet, YAGO, DBpedia, augmented by an 8‐billion‐word text corpus represented as word embedding in vector space using Word2Vec • Applied KB in tasks such as topic categorisation, sentiment analysis and commonsense reasoning Representation for Events and Scripts  Current Work: • Develop representations for narrative knowledge (modelling temporally extended events) • Reasoning about events for event monitoring and prediction

Event Models for Event Monitoring 29 A*STAR AI Initiative

Human-Centric AI Programme Human-Centric AI Programme

Human‐Centric AI Singaporean and Asian Culture

Speech & Social Cognitive Video & Image Data Analytics Language Computing

Deep Learning / Machine Learning

Good Old Fashion AI (GOFAI) 31 Human‐Centric AI Research

AI that understands humans, reasons for humans and learns like humans.

• Knowledge of human needs/motivations, social/cultural norms, and commonsense • Personalised and Explainable • Instructable through real‐time instruction and demonstration, or learn from experience with a small number of examples Specifically, human‐centric AI that understands Singaporean and Asian cultures. Human‐Centric AI Research

Learns like humans CCognitive ognitive Explicit instructions HHuman‐likeuman‐like Understands humans EEmpatheticmpathetic Implicit signals Human AI Explainable (Socio‐cultural behaviors, commonsense, mental state) Machine‐learning Explanations Reasons for humans Human-Centric AI Programme

Social‐Cultural Visual Intelligence Understanding Language and Expressions

Human‐Centric AI Singaporean and Asian Culture

Speech & Social Cognitive Video & Image Data Analytics Language Computing

Deep Learning / Machine Learning

Good Old Fashion AI (GOFAI) 34 Towards Understanding Humans in Multi‐modal Content

“Understanding Humans” means • Being able to extract and create representations of humans from multi‐modal data sources*

• In order to reason about How are these people related? What are their intentions? • Human roles (Roles and Relationships) (Goals and Intentions) • Relationships with other Humans and Objects • Behaviours • Goals and intentions • Mental Models What is the woman thinking? What could happen next? * primarily images and videos, such as these (Mental Models) (Behaviours) From Perception to Cognition: Still a long journey...

• Knowledge – the next frontier • Understanding Humans – Goals, Intentions – Motivations – Mental models – etc... • “Explainability”

36 Thank you Contact us

Kenneth Kwok A*STAR Initiative Programme Manager

[email protected]

Cheston Tan A*STAR Artificial Intelligence Initiative Programme Manager

[email protected]

38