Projet Pensées Profondes .

December 18, 2014

École Normale Supérieure de Lyon Raphaël Charrondière Marc Chevalier

Quentin Cormier

Tom. Cornebize Yassine Hamoudi Thomas Pellissier Tanon

Valentin Lorentz

Adviser: Eddy Caron

1 A need for answers

Who was the wife of Louis Pasteur?

2 Existing tools

WolframAlpha Marie Pasteur Google page of Marie Pasteur Bing Wikipedia page of Louis Pasteur Yahoo answers.com page containing this question

Closed-source, bad on some questions.

3 PPP introduces Platypus

http://askplatyp.us/

4 Demo time!

5

Overview . Architecture

Web User Interface Logging backend

Machine Learning Computer Algebra Reformulation

Spell-checker Core. Grammatical

Machine Learning OEIS Standalone

Other modules Question Parsing

10 Datamodel

How to represent formally a question asked in natural language? Tree structure.

11 Datamodel – Leaves: resources

∙ Strings: ”Isaac Newton” ∙ Dates: 25 December 1642 ∙ Geographic coordinates: 47° 30’ 18” N ; 9° 44’ 57” E ∙ …

12 Datamodel – Nodes: operations

∙ Full triples: (Isaac Newton, birth date, 25 December 1642 (Julian)) → true ∙ Missing: (?, birth date, 25 December 1642 (Julian)) → [Isaac Newton] ∙ Union, Intersection, Difference, … ∙ And, Or, Not ∙ Exists ∙ Sort, First, Last

13 Datamodel – Examples

”Is Brussels the capital of Belgium and the European Union?”

Triple Triple

pred. obj. pred. obj. subj. subj.

Belgium capital Brussels European Union capital Brussels

14

. Datamodel – Examples

”Who is the mayor of the capital of Kreis Bergstraße?”

Triple

pred. obj. subj.

Triple mayor ?

pred. obj. subj.

Kreis Bergstraße capital ?

14

. Datamodel – Examples

”What is the birth date of the first president of Germany?”

Triple

pred. obj. subj.

First birth date ?

Sort resource list

Triple mandate

pred. obj. subj.

Germany president ?

14

. Datamodel – Examples

”Is there a medical treatment for Ebola?”

Exists

Triple

pred. obj. subj.

Ebola medical treatment ?

14

. Datamodel – Examples

”Who are the children of François Mitterrand and Anne Pingeot?”

Triple Triple pred. pred. obj. obj. subj. subj.

François Mitterrand child ? Anne Pingeot child ?

14

. Question parsing . Where does the prime minister of the United Kingdom live?

Triple

pred. obj. subj. Triple Triple residence ? pred. obj.

subj. pred. obj. subj.

Unitedprime Kingdom minister ofprimeresidence minister ? the United Kingdom

.

Map the input question to the most relevant normal form.

16

. Triple

pred. obj. subj.

Triple residence ?

pred. obj. subj.

United Kingdom prime minister ?

.

Map the input question to the most relevant normal form.

Where does the prime minister of the United Kingdom live?

Triple

pred. obj. subj.

prime minister of residence ? the United Kingdom

16

. Triple

pred. obj. subj.

prime minister of residence ? the United Kingdom

.

Map the input question to the most relevant normal form.

Where does the prime minister of the United Kingdom live?

Triple

pred. obj. subj.

Triple residence ?

pred. obj. subj.

United Kingdom prime minister ?

16

. Dependency tree output by the Stanford Parser

ROOT

root

live

nsubj aux advmod

minister does Where

det amod prep_of

the prime Kingdom

det nn

the United

17

. Identify question word

ROOT

root

live

nsubj aux advmod

minister does Where Question word det amod prep_of

the prime Kingdom

det nn

the United

18

. advmod

Where Question word

.

Identify question word

ROOT

root

live

nsubj aux

minister does

det amod prep_of

the prime Kingdom

det nn

the United

18

. Merge - nn - amod nsubj - ... prime minister

det prep_of

United Kingdom

det

.

Merging

ROOT

root

live

nsubj aux

minister does

det amod prep_of

the prime Kingdom Same named-entity det nn tag (LOCATION)

the United

19

. Merge - nn - amod nsubj - ... prime minister

det prep_of

Kingdom Same named-entity det nn tag (LOCATION)

United

.

Merging

ROOT

root

live

nsubj aux

minister does

det amod prep_of

the prime United Kingdom

det

the

19

. nsubj

prime minister

det prep_of

Kingdom Same named-entity det nn tag (LOCATION)

United

.

Merging

ROOT

root Merge live - nn - amod nsubj aux - ... minister does

det amod prep_of

the prime United Kingdom

det

the

19

. nsubj

minister

det amod prep_of

prime Kingdom Same named-entity det nn tag (LOCATION)

United

.

Merging

ROOT

root Merge live - nn - amod nsubj aux - ... prime minister does

det prep_of

the United Kingdom

det

the

19

. Removal

ROOT

root Remove live - det - aux nsubj aux - ... prime minister does

det prep_of

the United Kingdom

det

the

20

. aux

does

det

the

det

the

.

Removal

ROOT

root Remove live - det - aux nsubj - ... prime minister

prep_of

United Kingdom

20

. root

residence

nsubj

.

Nounification

ROOT

root Nounify live

nsubj

prime minister

prep_of

United Kingdom

21

. root

live

nsubj

.

Nounification

ROOT

root Nounify residence

nsubj

prime minister

prep_of

United Kingdom

21

. Triple

pred. obj. subj. subj. Normalize Triple residence ?

pred. obj. subj. subj.

United KingdomNormalize prime minister ?

Normalize

.

Normalization

ROOT Normalize root

residence

nsubj

prime minister

prep_of

United Kingdom

22

. Triple Normalize pred. obj. subj. subj.

Triple residence ?

pred. obj. subj. subj.

United KingdomNormalize prime minister ?

Normalize

.

Normalization

ROOT

root Normalize residence

nsubj

prime minister

prep_of

United Kingdom

22

. Normalize subj. Normalize Triple

pred. obj. subj. subj.

United Kingdom prime minister ?

Normalize

.

Normalization

ROOT Triple

pred. obj. root subj.

residence residence ?

nsubj

prime minister Normalize

prep_of

United Kingdom

22

. Normalize

subj. Normalize

subj.

United KingdomNormalize

.

Normalization

ROOT Triple

pred. obj. root subj.

residence Triple residence ?

pred. obj. nsubj subj.

prime minister prime minister ?

prep_of Normalize United Kingdom

22

. Normalize

subj. Normalize

subj.

Normalize

Normalize

.

Normalization

ROOT Triple

pred. obj. root subj.

residence Triple residence ?

pred. obj. nsubj subj.

prime minister United Kingdom prime minister ?

prep_of

United Kingdom

22

. pred.

birth date

subj.

president

.

Reformulation: a translation module

When was the president of the United States born?

Triple

pred. obj. subj.

Triple president ?

pred. obj. subj.

birth date United States ?

23

. pred.

president

subj.

birth date

.

Reformulation: a translation module

When was the president of the United States born?

Triple

pred. obj. subj.

Triple birth date ?

pred. obj. subj.

president United States ?

23

. Operation

Projection: trees 7→ vector space. The dictionary associates a vector triple to a vector.

24 Projection

myrequest=compact(...)

anotherpredicate

anothersubject=compact(...) anotherobject

predicate

. subject object

25 From vector to tree

root

uncompact

. subject predicate object

nearest nearest nearest

wordsubject wordpredicate wordobject

26 Keywords questions

United States president

president United States

27 Goal of the module

Classify each word into one of the four categories:

{to ignore, subject, predicate, object }

28 Example of classification

(United, subject) (States, subject) (president, predicate)

(president, predicate) (United, subject) (States, subject)

29 Look-up table

25 Word w 7→ Vector Vw ∈ R

If two words w1 and w2 are synonymous then,

∥ − ∥ ≈ Vw2 Vw1 2 0

30 Back-end . Wikidata (http://wikidata.org)?

∙ A free ∙ Structured version of Wikipedia ∙ 12 millions of entries ∙ Multilingual

32 An ņſįŝ Answer extraction: Question

Where does the prime minister of the United Kingdom live?

34 Answer extraction: Normal form

Triple

pred. obj. subj.

Triple residence ?

pred. obj. subj.

United Kingdom prime minister ?

35

. Q169101

(10 Downing Street)obj. subj.

Q192 Q169101 (David Cameron) (10 Downing Street)

Q192 (David Cameron)

.

Answer extraction: Module work

Triple pred. obj. subj. P551 Triple ?

pred. (residence) subj. obj.

[Q145, Q7887906] P6 ? (United Kingdom) (prime minister)

36

. Q169101

(10 Downing Street)obj. subj.

Q192 Q169101 (David Cameron) (10 Downing Street)

?

.

Answer extraction: Module work

Triple pred. obj. subj. P551 Triple ?

pred. (residence) subj. obj.

[Q145, Q7887906] P6 Q192 (United Kingdom) (prime minister) (David Cameron)

36

. Q169101

(10 Downing Street)obj. subj. Q169101 Triple

pred. (10 Downing Street) subj. obj.

[Q145, Q7887906] P6 Q192 ? (United Kingdom) (prime minister) (David Cameron)

.

Answer extraction: Module work

Triple pred. subj. obj.

Q192 P551 ? (David Cameron) (residence)

36

. Q169101 (10 Downing Street) obj. subj.

Triple ? pred. subj. obj.

[Q145, Q7887906] P6 Q192 ? (United Kingdom) (prime minister) (David Cameron)

.

Answer extraction: Module work

Triple

pred. obj. subj.

Q192 P551 Q169101 (David Cameron) (residence) (10 Downing Street)

36

. Triple

pred. obj. subj. obj. subj. Q192 P551 Q169101 Triple ?

(David Cameron)pred. (residence) (10 Downing Street) subj. obj.

[Q145, Q7887906] P6 Q192 ? (United Kingdom) (prime minister) (David Cameron)

.

Answer extraction: Module work

Q169101 (10 Downing Street)

36

. Answer extraction: Final output Simple research Two syntaxes:

∙ Mathematica ∙ Intuitive syntax with permissive notations

Computer Algebra System

Based on Sympy (math) and PLY (parsers).

39 ∙ Mathematica ∙ Intuitive syntax with permissive notations

Computer Algebra System

Based on Sympy (math) and PLY (parsers).

Two syntaxes:

39 ∙ Intuitive syntax with permissive notations

Computer Algebra System

Based on Sympy (math) and PLY (parsers).

Two syntaxes:

∙ Mathematica

39 Computer Algebra System

Based on Sympy (math) and PLY (parsers).

Two syntaxes:

∙ Mathematica ∙ Intuitive syntax with permissive notations

39 Let’s sum up . VS

41 VS

41 VS

41 What are the birth dates of the daughters of the wife of the president of the United States? WolframAlpha Barack Obama Platypus Saturday, July 4, 1998 & Sunday, June 10, 2001

Nested question

Who is the wife of the president of the United States? WolframAlpha Barack Obama Platypus Michelle Obama

42 Nested question

Who is the wife of the president of the United States? WolframAlpha Barack Obama Platypus Michelle Obama

What are the birth dates of the daughters of the wife of the president of the United States? WolframAlpha Barack Obama Platypus Saturday, July 4, 1998 & Sunday, June 10, 2001

42 Conjunction

Who is the actor of Inception and Titanic? WolframAlpha all the actors of the two movies Platypus Leonardo DiCaprio

43 Better database

Not answered by Wikidata:

∙ “How fast is the TGV?” ∙ “How wide is a tennis court?”

→ Improve Wikidata? → Use another database?

44 Better question parsing

Not parsed correctly:

∙ “What is the date of birth of Isaac Newton?” ∙ “In which band does Bono sing?”

→ Train the Stanford CoreNLP library? → Improve the algorithm of the Grammatical module? → Better datasets for the ML modules?

45 New modules

cooking recipes train timetable

HAL

programming language. interpreter

meteo literature

cinema music translation sport statistics and predictions

46 Other ideas...

∙ Other languages support (French...) ∙ Improve user experience ∙ Advertise Platypus

47 Some facts

23 repositories & 2313 commits 6 PHP Wikidata libraries and module 12 Python Other modules, core, and libraries 1 C++ ML-Reformulation 1 Shell Deployment scripts 1 LATEX This presentation and the report 1 Markdown The specification 1 HTML/CSS/Javascript The Web User Interface 26k lines of code (13k in PHP, 10k in Python) 4.3k lines of Latex & 2.9k lines of Markdown 6 accepted pull requests to libraries we use (corenlp-python, aspell-python, python-datautil, Wikibase)

48 The PPP?

∙ A powerful and open source framework ∙ Innovative question parsing algorithms ∙ A demo, Platypus, with general knowledge and math

49 Questions?

http://projetpp.github.io/ https://twitter.com/ProjetPP https://github.com/ProjetPP [email protected]

50