<<

Neural Theorem Proving in using Proof Artifact Co-training and Language Models

Jason Rute Jesse Michael Han Jason Rute Yuhuai Tony Wu Edward Ayers Stanislas Polu

Univ. of Pittsburgh CIBO Technologies Univ. of Toronto Univ. of Cambridge OpenAI

With thanks to the N2Formal team at Google AI The Lean Theorem Prover Why formal theorem proving?

● Mechanically check mathematical proofs

● Digitize mathematics

● Unify and archive mathematical knowledge

● Prove correctness of software and hardware

● Make mathematics accessible to computers in a new way Why Lean?

Popular and newsworthy Extensive and Growing MathLib Library

Perfectoid Con(ZFC - CH) Spaces

Easy to learn and use Great tools and customization

meta

Active user base and supportive community IMO Grand Challenge (In Lean 4) Lean versions

Version Maintainer Github Repo Website mathlib Lean GPT-f

Lean 3.4.2 leanprover/lean (archived) leanprover..io No No Research

Lean 3.27.0c Lean leanprover-community/lean leanprover-community.github.io Yes Yes Community

Lean 4.0.0-m1 Microsoft leanprover/lean4 leanprover.github.io No No Research

For all Lean (any version), mathlib, and Lean GPT-f questions: https://leanprover.zulipchat.com Demo of Lean and gptf Autoregressive Language Modeling

Next word (token) prediction Transformers (GPT-2, GPT-3, etc)

Prompt and completion

Today, there will be a talk at the New Technologies in Mathematics Seminar on "Neural Theorem Proving in Lean using Proof Artifact Co-training and Language Models". The talk will be delivered by Pascal Dascouet, Assistant Professor at the French Mathematics Centre (CNRS) and the director of the TMEM Group.

The talk will explore how applications of machine learning may help in the "proof exploration of elegant theorems", including foundations, differential equations, topology and group theory.

Example from Talk to Transformer (https://app.inferkit.com/demo) Seq-to-seq modeling with autoregressive LMs

p q : Prop, h : p ∧ q cases h with hp hq ⊢ q ∧ p

Training Example: GOAL p q : Prop, ⇥ h : p ∧ q ⇥ ⊢ q ∧ p PROOFSTEP cases h with hp hq

Inference Example: apply 0.10 Repeatedly sample next GOAL a b : Prop, ⇥ h : a ∧ b ⇥ ⊢ b ∧ a PROOFSTEP cases 0.81 token† from distribution. rcases 0.04

GOAL a b : Prop, ⇥ h : a ∧ b ⇥ ⊢ b ∧ a PROOFSTEP cases h with ha hb

GOAL a b : Prop, ⇥ h : a ∧ b ⇥ ⊢ b ∧ a PROOFSTEP cases h with ha hb

† Tokens are generated via byte pair encoding. They may not be whole words. Extracting Proof Data from Lean

LeanStep datasets Proof modes

Tactic proof Term proof

lemma and_swap : p ∧ q → q ∧ p := lemma and_swap : p ∧ q → q ∧ p := begin λ (h : p ∧ q), ⟨h.right, h.left⟩ intro h, cases h with hp hq, constructor, exact hq, exact hp end LeanStep: Tactic proofs Tactic proof dataset (as needed for Lean GPT-f)

Tactic proof LeanStep Dataset

lemma and_swap : p ∧ q → q ∧ p := ● Human-written tactic command (text) p q : Prop, begin ● Hypotheses and goals (text) h : p ∧ q ● Declaration name intro h, ⊢ q ∧ p cases h with hp hq , ● ~140k human-written goal-tactic pairs constructor, exact hq, exact hp ● ~19k tactic-proved theorems from end mathlib and lean core. Even more tactic information available

Tactic proof Data to extract

lemma and_swap : p ∧ q → q ∧ p := ● Tactic command and position p q : Prop, begin ● Hypotheses and goals h : p ∧ q ● Tactic name intro h, ⊢ q ∧ p ● cases h with hp hq , Tactic arguments ● Full abstract syntax tree of the proof constructor, exact hq, exact hp ● Declaration name end ● Hidden tactic state information: ○ Open namespaces ○ Environment ○ Metavariables ○ Other hidden information

tactic.interactive.cases (none, ``(h)) [`hp, `hq] LeanStep: Term proofs Lean stores term proofs for all theorems

#print of_iff_true

theorem of_iff_true : ∀ {a : Prop}, (a ↔ true) → a := λ {a : Prop} (h : a ↔ true), iff.mp (iff.symm h) trivial Generate datasets by adding holes to proof term

Proof term with hole Hypotheses and goal (tactic state) Term that fulfills the goal

_ λ {a : Prop} (h : a ↔ true), ⊢ ∀ {a : Prop}, (a ↔ true) → a iff.mp (iff.symm h) trivial

λ {a : Prop}, _ a : Prop λ (h : a ↔ true), ⊢ (a ↔ true) → a iff.mp (iff.symm h) trivial

λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true iff.mp (iff.symm h) trivial _ ⊢ a

λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true iff.symm h iff.mp _ trivial ⊢ true ↔ a

λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true h iff.mp (iff.symm _) trivial ⊢ a ↔ true

λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true trivial iff.mp (iff.symm h) _ ⊢ true Mix1: Derived tactic steps from term proof data

Proof term with hole Hypotheses and goal (tactic state) Term that fulfills the goal

λ {a : Prop} (h : a ↔ true), a : Prop, iff.symm h iff.mp _ trivial h : a ↔ true ⊢ true ↔ a

● Proof term prediction: Predict masked proof term from hyps. and goal. Treat as exact tactic. a : Prop,

h : a ↔ true exact (iff.symm h) ⊢ true ↔ a

● Next lemma prediction: Predict outer-most lemma in masked proof term. Treat as apply tactic.

a : Prop, h : a ↔ true apply (iff.symm) ⊢ true ↔ a Mix2: Fill-in-the-blank tasks from term proof data

Proof term with hole Hypotheses and goal (tactic state) Term that fulfills the goal

λ {a : Prop} (h : a ↔ true), a : Prop, iff.symm h iff.mp _ trivial h : a ↔ true ⊢ true ↔ a

● Skip proof: Predict masked term from partial proof (.f. N2Formal's "skip tree task") λ {a : Prop} (h : a ↔ true), iff.symm h iff.mp _ trivial

● Type prediction: Predict type (i.e. the goal) of masked out term from partial proof.

λ {a : Prop} (h : a ↔ true), true ↔ a iff.mp _ trivial Mix2: Classification tasks from term proof data

Proof term with hole Hypotheses and goal (tactic state) Term that fulfills the goal

λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true iff.symm h iff.mp _ trivial ⊢ true ↔ a

λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true trivial iff.mp (iff.symm h) _ ⊢ true

● Premise classification: Predict if theorem in library is used in proof of goal

a : Prop, a : Prop, h : a ↔ true iff.symm h : a ↔ true trivial ⊢ true ↔ a ⊢ true ↔ a

● Local context classification: Predict which local variables are used in the proof.

a : Prop, a : Prop, h : a ↔ true a, h h : a ↔ true ⊢ true ↔ a ⊢ true Mix2: Elaboration tasks from theorem proving

● Proof term elaboration: Predict fully elaborated proof from pretty printed proof

λ {a : Prop} (h : a ↔ true), λ {a : Prop} (h : iff a true), h.symm.mp trivial @iff.mp true a (@iff.symm a true h) trivial

● Tactic state elaboration: Predict fully elaborated tactic state from pretty printed state

a : Prop, a : Prop, h : a ↔ true h : iff a true ⊢ true ↔ a ⊢ iff true a Mix2: Naming tasks from proof term data

● Theorem naming: Predict name of theorem from its type (theorem statement)

∀ {a : Prop}, (a ↔ true) → of_iff_true a Language Model Training objectives A theorem proving AI environment Proof search and evaluation

Train a model on LeanStep tactic proof dataset Breadth-first tree search (implemented with Lean metaprogramming) p q : Prop, ⊢ ∧ cases h with hp hq h : p q ⊢ q ∧ p ⊢ ⊢

Incorporate into Lean testing environment ⊢ no goals! (implemented with Lean metaprogramming)

cases h with hp hq p q : Prop, p q : Prop, hp : p, h : p ∧ q hq : q ⊢ q ∧ p ⊢ q ∧ p Breadth-first proof search a b: ℕ h: a.succ < b ⊢ a ≤ b ∧ ¬b ≤ a

query N tactic commands from model

exact ⟨le_of_lt h, not_le_of_lt h⟩ split ... cases le_total a b

failed ⊢ ⊢

Perform breadth-first search ⊢ ... ⊢ ⊢ ... ⊢ Up to a fixed depth D Restricting max size Q of the queue no goals! Results Lean GPT-f language model

● Based on GPT-f model ● Decoder-only Transformer similar to GPT-3 ● 837M Trainable Parameters ● Pretrained on ○ CommonCrawl ○ WebMath (Github, arXiv Math, Math StackExchange) Training and Evaluation

Training Evaluation

● Split all data by (hash of) theorem name: ● Evaluate model on test theorems ○ train (80%) ● Use breadth-first proof search ○ validate (5%) ○ test (15%)

● Proof-artifact co-training (PACT). Co-train transformer using all of: ○ Tactic data ○ Mix1 (next lemma and proof term prediction) ○ Mix2 (all other tasks) Results and Co-training vs Pre-training Ablation Results by modules Examples and Testimonials lie algebra.morphism.map_bot_iff

Human-written proof Lean GPT-f proof Thank You!

Paper on arXiv: Proof Artifact Co-training for Theorem Proving with Language Models

gptf tactic is available at https://github.com/jesse-michael-han/lean-gptf

Contact us for more about the Lean datasets Appendix LeanStep tactic proof dataset

● ~140k human-written goal-tactic pairs

● Spanning ~19k tactic-proved theorems from mathlib and lean core.