Projet Pensées Profondes .
December 18, 2014
École Normale Supérieure de Lyon Raphaël Charrondière Marc Chevalier
Quentin Cormier
Tom. Cornebize Yassine Hamoudi Thomas Pellissier Tanon
Valentin Lorentz
Adviser: Eddy Caron
1 A need for answers
Who was the wife of Louis Pasteur?
2 Existing tools
WolframAlpha Marie Pasteur Google Wikipedia page of Marie Pasteur Bing Wikipedia page of Louis Pasteur Yahoo answers.com page containing this question
Closed-source, bad on some questions.
3 PPP introduces Platypus
http://askplatyp.us/
4 Demo time!
5
Overview . Architecture
Web User Wikidata Interface Logging backend
Machine Learning Computer Algebra Reformulation
Spell-checker Core. Grammatical
Machine Learning OEIS Standalone
Other modules Question Parsing
10 Datamodel
How to represent formally a question asked in natural language? Tree structure.
11 Datamodel – Leaves: resources
∙ Strings: ”Isaac Newton” ∙ Dates: 25 December 1642 ∙ Geographic coordinates: 47° 30’ 18” N ; 9° 44’ 57” E ∙ …
12 Datamodel – Nodes: operations
∙ Full triples: (Isaac Newton, birth date, 25 December 1642 (Julian)) → true ∙ Missing: (?, birth date, 25 December 1642 (Julian)) → [Isaac Newton] ∙ Union, Intersection, Difference, … ∙ And, Or, Not ∙ Exists ∙ Sort, First, Last
13 Datamodel – Examples
”Is Brussels the capital of Belgium and the European Union?”
∧
Triple Triple
pred. obj. pred. obj. subj. subj.
Belgium capital Brussels European Union capital Brussels
14
. Datamodel – Examples
”Who is the mayor of the capital of Kreis Bergstraße?”
Triple
pred. obj. subj.
Triple mayor ?
pred. obj. subj.
Kreis Bergstraße capital ?
14
. Datamodel – Examples
”What is the birth date of the first president of Germany?”
Triple
pred. obj. subj.
First birth date ?
Sort resource list
Triple mandate
pred. obj. subj.
Germany president ?
14
. Datamodel – Examples
”Is there a medical treatment for Ebola?”
Exists
Triple
pred. obj. subj.
Ebola medical treatment ?
14
. Datamodel – Examples
”Who are the children of François Mitterrand and Anne Pingeot?”
∩
Triple Triple pred. pred. obj. obj. subj. subj.
François Mitterrand child ? Anne Pingeot child ?
14
. Question parsing . Where does the prime minister of the United Kingdom live?
Triple
pred. obj. subj. Triple Triple residence ? pred. obj.
subj. pred. obj. subj.
Unitedprime Kingdom minister ofprimeresidence minister ? the United Kingdom
.
Map the input question to the most relevant normal form.
16
. Triple
pred. obj. subj.
Triple residence ?
pred. obj. subj.
United Kingdom prime minister ?
.
Map the input question to the most relevant normal form.
Where does the prime minister of the United Kingdom live?
Triple
pred. obj. subj.
prime minister of residence ? the United Kingdom
16
. Triple
pred. obj. subj.
prime minister of residence ? the United Kingdom
.
Map the input question to the most relevant normal form.
Where does the prime minister of the United Kingdom live?
Triple
pred. obj. subj.
Triple residence ?
pred. obj. subj.
United Kingdom prime minister ?
16
. Dependency tree output by the Stanford Parser
ROOT
root
live
nsubj aux advmod
minister does Where
det amod prep_of
the prime Kingdom
det nn
the United
17
. Identify question word
ROOT
root
live
nsubj aux advmod
minister does Where Question word det amod prep_of
the prime Kingdom
det nn
the United
18
. advmod
Where Question word
.
Identify question word
ROOT
root
live
nsubj aux
minister does
det amod prep_of
the prime Kingdom
det nn
the United
18
. Merge - nn - amod nsubj - ... prime minister
det prep_of
United Kingdom
det
.
Merging
ROOT
root
live
nsubj aux
minister does
det amod prep_of
the prime Kingdom Same named-entity det nn tag (LOCATION)
the United
19
. Merge - nn - amod nsubj - ... prime minister
det prep_of
Kingdom Same named-entity det nn tag (LOCATION)
United
.
Merging
ROOT
root
live
nsubj aux
minister does
det amod prep_of
the prime United Kingdom
det
the
19
. nsubj
prime minister
det prep_of
Kingdom Same named-entity det nn tag (LOCATION)
United
.
Merging
ROOT
root Merge live - nn - amod nsubj aux - ... minister does
det amod prep_of
the prime United Kingdom
det
the
19
. nsubj
minister
det amod prep_of
prime Kingdom Same named-entity det nn tag (LOCATION)
United
.
Merging
ROOT
root Merge live - nn - amod nsubj aux - ... prime minister does
det prep_of
the United Kingdom
det
the
19
. Removal
ROOT
root Remove live - det - aux nsubj aux - ... prime minister does
det prep_of
the United Kingdom
det
the
20
. aux
does
det
the
det
the
.
Removal
ROOT
root Remove live - det - aux nsubj - ... prime minister
prep_of
United Kingdom
20
. root
residence
nsubj
.
Nounification
ROOT
root Nounify live
nsubj
prime minister
prep_of
United Kingdom
21
. root
live
nsubj
.
Nounification
ROOT
root Nounify residence
nsubj
prime minister
prep_of
United Kingdom
21
. Triple
pred. obj. subj. subj. Normalize Triple residence ?
pred. obj. subj. subj.
United KingdomNormalize prime minister ?
Normalize
.
Normalization
ROOT Normalize root
residence
nsubj
prime minister
prep_of
United Kingdom
22
. Triple Normalize pred. obj. subj. subj.
Triple residence ?
pred. obj. subj. subj.
United KingdomNormalize prime minister ?
Normalize
.
Normalization
ROOT
root Normalize residence
nsubj
prime minister
prep_of
United Kingdom
22
. Normalize subj. Normalize Triple
pred. obj. subj. subj.
United Kingdom prime minister ?
Normalize
.
Normalization
ROOT Triple
pred. obj. root subj.
residence residence ?
nsubj
prime minister Normalize
prep_of
United Kingdom
22
. Normalize
subj. Normalize
subj.
United KingdomNormalize
.
Normalization
ROOT Triple
pred. obj. root subj.
residence Triple residence ?
pred. obj. nsubj subj.
prime minister prime minister ?
prep_of Normalize United Kingdom
22
. Normalize
subj. Normalize
subj.
Normalize
Normalize
.
Normalization
ROOT Triple
pred. obj. root subj.
residence Triple residence ?
pred. obj. nsubj subj.
prime minister United Kingdom prime minister ?
prep_of
United Kingdom
22
. pred.
birth date
subj.
president
.
Reformulation: a translation module
When was the president of the United States born?
Triple
pred. obj. subj.
Triple president ?
pred. obj. subj.
birth date United States ?
23
. pred.
president
subj.
birth date
.
Reformulation: a translation module
When was the president of the United States born?
Triple
pred. obj. subj.
Triple birth date ?
pred. obj. subj.
president United States ?
23
. Operation
Projection: trees 7→ vector space. The dictionary associates a vector triple to a vector.
24 Projection
myrequest=compact(...)
anotherpredicate
anothersubject=compact(...) anotherobject
predicate
. subject object
25 From vector to tree
root
uncompact
. subject predicate object
nearest nearest nearest
wordsubject wordpredicate wordobject
26 Keywords questions
United States president
president United States
27 Goal of the module
Classify each word into one of the four categories:
{to ignore, subject, predicate, object }
28 Example of classification
(United, subject) (States, subject) (president, predicate)
(president, predicate) (United, subject) (States, subject)
29 Look-up table
25 Word w 7→ Vector Vw ∈ R
If two words w1 and w2 are synonymous then,
∥ − ∥ ≈ Vw2 Vw1 2 0
30 Back-end . Wikidata (http://wikidata.org)?
∙ A free knowledge base ∙ Structured version of Wikipedia ∙ 12 millions of entries ∙ Multilingual
32 An ņſįŝ Answer extraction: Question
Where does the prime minister of the United Kingdom live?
34 Answer extraction: Normal form
Triple
pred. obj. subj.
Triple residence ?
pred. obj. subj.
United Kingdom prime minister ?
35
. Q169101
(10 Downing Street)obj. subj.
Q192 Q169101 (David Cameron) (10 Downing Street)
Q192 (David Cameron)
.
Answer extraction: Module work
Triple pred. obj. subj. P551 Triple ?
pred. (residence) subj. obj.
[Q145, Q7887906] P6 ? (United Kingdom) (prime minister)
36
. Q169101
(10 Downing Street)obj. subj.
Q192 Q169101 (David Cameron) (10 Downing Street)
?
.
Answer extraction: Module work
Triple pred. obj. subj. P551 Triple ?
pred. (residence) subj. obj.
[Q145, Q7887906] P6 Q192 (United Kingdom) (prime minister) (David Cameron)
36
. Q169101
(10 Downing Street)obj. subj. Q169101 Triple
pred. (10 Downing Street) subj. obj.
[Q145, Q7887906] P6 Q192 ? (United Kingdom) (prime minister) (David Cameron)
.
Answer extraction: Module work
Triple pred. subj. obj.
Q192 P551 ? (David Cameron) (residence)
36
. Q169101 (10 Downing Street) obj. subj.
Triple ? pred. subj. obj.
[Q145, Q7887906] P6 Q192 ? (United Kingdom) (prime minister) (David Cameron)
.
Answer extraction: Module work
Triple
pred. obj. subj.
Q192 P551 Q169101 (David Cameron) (residence) (10 Downing Street)
36
. Triple
pred. obj. subj. obj. subj. Q192 P551 Q169101 Triple ?
(David Cameron)pred. (residence) (10 Downing Street) subj. obj.
[Q145, Q7887906] P6 Q192 ? (United Kingdom) (prime minister) (David Cameron)
.
Answer extraction: Module work
Q169101 (10 Downing Street)
36
. Answer extraction: Final output Simple research Two syntaxes:
∙ Mathematica ∙ Intuitive syntax with permissive notations
Computer Algebra System
Based on Sympy (math) and PLY (parsers).
39 ∙ Mathematica ∙ Intuitive syntax with permissive notations
Computer Algebra System
Based on Sympy (math) and PLY (parsers).
Two syntaxes:
39 ∙ Intuitive syntax with permissive notations
Computer Algebra System
Based on Sympy (math) and PLY (parsers).
Two syntaxes:
∙ Mathematica
39 Computer Algebra System
Based on Sympy (math) and PLY (parsers).
Two syntaxes:
∙ Mathematica ∙ Intuitive syntax with permissive notations
39 Let’s sum up . VS
41 VS
41 VS
41 What are the birth dates of the daughters of the wife of the president of the United States? WolframAlpha Barack Obama Platypus Saturday, July 4, 1998 & Sunday, June 10, 2001
Nested question
Who is the wife of the president of the United States? WolframAlpha Barack Obama Platypus Michelle Obama
42 Nested question
Who is the wife of the president of the United States? WolframAlpha Barack Obama Platypus Michelle Obama
What are the birth dates of the daughters of the wife of the president of the United States? WolframAlpha Barack Obama Platypus Saturday, July 4, 1998 & Sunday, June 10, 2001
42 Conjunction
Who is the actor of Inception and Titanic? WolframAlpha all the actors of the two movies Platypus Leonardo DiCaprio
43 Better database
Not answered by Wikidata:
∙ “How fast is the TGV?” ∙ “How wide is a tennis court?”
→ Improve Wikidata? → Use another database?
44 Better question parsing
Not parsed correctly:
∙ “What is the date of birth of Isaac Newton?” ∙ “In which band does Bono sing?”
→ Train the Stanford CoreNLP library? → Improve the algorithm of the Grammatical module? → Better datasets for the ML modules?
45 New modules
cooking recipes train timetable
HAL
programming language. interpreter
meteo literature
cinema music translation sport statistics and predictions
46 Other ideas...
∙ Other languages support (French...) ∙ Improve user experience ∙ Advertise Platypus
47 Some facts
23 repositories & 2313 commits 6 PHP Wikidata libraries and module 12 Python Other modules, core, and libraries 1 C++ ML-Reformulation 1 Shell Deployment scripts 1 LATEX This presentation and the report 1 Markdown The specification 1 HTML/CSS/Javascript The Web User Interface 26k lines of code (13k in PHP, 10k in Python) 4.3k lines of Latex & 2.9k lines of Markdown 6 accepted pull requests to libraries we use (corenlp-python, aspell-python, python-datautil, Wikibase)
48 The PPP?
∙ A powerful and open source question answering framework ∙ Innovative question parsing algorithms ∙ A demo, Platypus, with general knowledge and math
49 Questions?
http://projetpp.github.io/ https://twitter.com/ProjetPP https://github.com/ProjetPP [email protected]
50