Neural Theorem Proving in Lean Using Proof Artifact Co-Training and Language Models
Total Page:16
File Type:pdf, Size:1020Kb
Neural Theorem Proving in Lean using Proof Artifact Co-training and Language Models Jason Rute Jesse Michael Han Jason Rute Yuhuai Tony Wu Edward Ayers Stanislas Polu Univ. of Pittsburgh CIBO Technologies Univ. of Toronto Univ. of Cambridge OpenAI With thanks to the N2Formal team at Google AI The Lean Theorem Prover Why formal theorem proving? ● Mechanically check mathematical proofs ● Digitize mathematics ● Unify and archive mathematical knowledge ● Prove correctness of software and hardware ● Make mathematics accessible to computers in a new way Why Lean? Popular and newsworthy Extensive and Growing MathLib Library Perfectoid Con(ZFC - CH) Spaces Easy to learn and use Great tools and customization meta Active user base and supportive community IMO Grand Challenge (In Lean 4) Lean versions Version Maintainer Github Repo Website mathlib Lean GPT-f Lean 3.4.2 Microsoft leanprover/lean (archived) leanprover.github.io No No Research Lean 3.27.0c Lean leanprover-community/lean leanprover-community.github.io Yes Yes Community Lean 4.0.0-m1 Microsoft leanprover/lean4 leanprover.github.io No No Research For all Lean (any version), mathlib, and Lean GPT-f questions: https://leanprover.zulipchat.com Demo of Lean and gptf Autoregressive Language Modeling Next word (token) prediction Transformers (GPT-2, GPT-3, etc) Prompt and completion Today, there will be a talk at the New Technologies in Mathematics Seminar on "Neural Theorem Proving in Lean using Proof Artifact Co-training and Language Models". The talk will be delivered by Pascal Dascouet, Assistant Professor at the French Mathematics Centre (CNRS) and the director of the TMEM Group. The talk will explore how applications of machine learning may help in the "proof exploration of elegant theorems", including foundations, differential equations, topology and group theory. Example from Talk to Transformer (https://app.inferkit.com/demo) Seq-to-seq modeling with autoregressive LMs p q : Prop, h : p ∧ q cases h with hp hq ⊢ q ∧ p Training Example: GOAL p q : Prop, ⇥ h : p ∧ q ⇥ ⊢ q ∧ p PROOFSTEP cases h with hp hq Inference Example: apply 0.10 Repeatedly sample next GOAL a b : Prop, ⇥ h : a ∧ b ⇥ ⊢ b ∧ a PROOFSTEP cases 0.81 token† from distribution. rcases 0.04 GOAL a b : Prop, ⇥ h : a ∧ b ⇥ ⊢ b ∧ a PROOFSTEP cases h with ha hb GOAL a b : Prop, ⇥ h : a ∧ b ⇥ ⊢ b ∧ a PROOFSTEP cases h with ha hb † Tokens are generated via byte pair encoding. They may not be whole words. Extracting Proof Data from Lean LeanStep datasets Proof modes Tactic proof Term proof lemma and_swap : p ∧ q → q ∧ p := lemma and_swap : p ∧ q → q ∧ p := begin λ (h : p ∧ q), ⟨h.right, h.left⟩ intro h, cases h with hp hq, constructor, exact hq, exact hp end LeanStep: Tactic proofs Tactic proof dataset (as needed for Lean GPT-f) Tactic proof LeanStep Dataset lemma and_swap : p ∧ q → q ∧ p := ● Human-written tactic command (text) p q : Prop, begin ● Hypotheses and goals (text) h : p ∧ q ● Declaration name intro h, ⊢ q ∧ p cases h with hp hq , ● ~140k human-written goal-tactic pairs constructor, exact hq, exact hp ● ~19k tactic-proved theorems from end mathlib and lean core. Even more tactic information available Tactic proof Data to extract lemma and_swap : p ∧ q → q ∧ p := ● Tactic command and position p q : Prop, begin ● Hypotheses and goals h : p ∧ q ● Tactic name intro h, ⊢ q ∧ p ● cases h with hp hq , Tactic arguments ● Full abstract syntax tree of the proof constructor, exact hq, exact hp ● Declaration name end ● Hidden tactic state information: ○ Open namespaces ○ Environment ○ Metavariables ○ Other hidden information tactic.interactive.cases (none, ``(h)) [`hp, `hq] LeanStep: Term proofs Lean stores term proofs for all theorems #print of_iff_true theorem of_iff_true : ∀ {a : Prop}, (a ↔ true) → a := λ {a : Prop} (h : a ↔ true), iff.mp (iff.symm h) trivial Generate datasets by adding holes to proof term Proof term with hole Hypotheses and goal (tactic state) Term that fulfills the goal _ λ {a : Prop} (h : a ↔ true), ⊢ ∀ {a : Prop}, (a ↔ true) → a iff.mp (iff.symm h) trivial λ {a : Prop}, _ a : Prop λ (h : a ↔ true), ⊢ (a ↔ true) → a iff.mp (iff.symm h) trivial λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true iff.mp (iff.symm h) trivial _ ⊢ a λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true iff.symm h iff.mp _ trivial ⊢ true ↔ a λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true h iff.mp (iff.symm _) trivial ⊢ a ↔ true λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true trivial iff.mp (iff.symm h) _ ⊢ true Mix1: Derived tactic steps from term proof data Proof term with hole Hypotheses and goal (tactic state) Term that fulfills the goal λ {a : Prop} (h : a ↔ true), a : Prop, iff.symm h iff.mp _ trivial h : a ↔ true ⊢ true ↔ a ● Proof term prediction: Predict masked proof term from hyps. and goal. Treat as exact tactic. a : Prop, h : a ↔ true exact (iff.symm h) ⊢ true ↔ a ● Next lemma prediction: Predict outer-most lemma in masked proof term. Treat as apply tactic. a : Prop, h : a ↔ true apply (iff.symm) ⊢ true ↔ a Mix2: Fill-in-the-blank tasks from term proof data Proof term with hole Hypotheses and goal (tactic state) Term that fulfills the goal λ {a : Prop} (h : a ↔ true), a : Prop, iff.symm h iff.mp _ trivial h : a ↔ true ⊢ true ↔ a ● Skip proof: Predict masked term from partial proof (c.f. N2Formal's "skip tree task") λ {a : Prop} (h : a ↔ true), iff.symm h iff.mp _ trivial ● Type prediction: Predict type (i.e. the goal) of masked out term from partial proof. λ {a : Prop} (h : a ↔ true), true ↔ a iff.mp _ trivial Mix2: Classification tasks from term proof data Proof term with hole Hypotheses and goal (tactic state) Term that fulfills the goal λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true iff.symm h iff.mp _ trivial ⊢ true ↔ a λ {a : Prop} (h : a ↔ true), a : Prop, h : a ↔ true trivial iff.mp (iff.symm h) _ ⊢ true ● Premise classification: Predict if theorem in library is used in proof of goal a : Prop, a : Prop, h : a ↔ true iff.symm h : a ↔ true trivial ⊢ true ↔ a ⊢ true ↔ a ● Local context classification: Predict which local variables are used in the proof. a : Prop, a : Prop, h : a ↔ true a, h h : a ↔ true ⊢ true ↔ a ⊢ true Mix2: Elaboration tasks from theorem proving ● Proof term elaboration: Predict fully elaborated proof from pretty printed proof λ {a : Prop} (h : a ↔ true), λ {a : Prop} (h : iff a true), h.symm.mp trivial @iff.mp true a (@iff.symm a true h) trivial ● Tactic state elaboration: Predict fully elaborated tactic state from pretty printed state a : Prop, a : Prop, h : a ↔ true h : iff a true ⊢ true ↔ a ⊢ iff true a Mix2: Naming tasks from proof term data ● Theorem naming: Predict name of theorem from its type (theorem statement) ∀ {a : Prop}, (a ↔ true) → of_iff_true a Language Model Training objectives A theorem proving AI environment Proof search and evaluation Train a model on LeanStep tactic proof dataset Breadth-first tree search (implemented with Lean metaprogramming) p q : Prop, ⊢ h : p ∧ q cases h with hp hq ⊢ q ∧ p ⊢ ⊢ Incorporate into Lean testing environment ⊢ no goals! (implemented with Lean metaprogramming) cases h with hp hq p q : Prop, p q : Prop, hp : p, h : p ∧ q hq : q ⊢ q ∧ p ⊢ q ∧ p Breadth-first proof search a b: ℕ h: a.succ < b ⊢ a ≤ b ∧ ¬b ≤ a query N tactic commands from model exact ⟨le_of_lt h, not_le_of_lt h⟩ split ... cases le_total a b failed ⊢ ⊢ Perform breadth-first search ⊢ ... ⊢ ⊢ ... ⊢ Up to a fixed depth D Restricting max size Q of the queue no goals! Results Lean GPT-f language model ● Based on MetaMath GPT-f model ● Decoder-only Transformer similar to GPT-3 ● 837M Trainable Parameters ● Pretrained on ○ CommonCrawl ○ WebMath (Github, arXiv Math, Math StackExchange) Training and Evaluation Training Evaluation ● Split all data by (hash of) theorem name: ● Evaluate model on test theorems ○ train (80%) ● Use breadth-first proof search ○ validate (5%) ○ test (15%) ● Proof-artifact co-training (PACT). Co-train transformer using all of: ○ Tactic data ○ Mix1 (next lemma and proof term prediction) ○ Mix2 (all other tasks) Results and Co-training vs Pre-training Ablation Results by modules Examples and Testimonials lie algebra.morphism.map_bot_iff Human-written proof Lean GPT-f proof Thank You! Paper on arXiv: Proof Artifact Co-training for Theorem Proving with Language Models gptf tactic is available at https://github.com/jesse-michael-han/lean-gptf Contact us for more about the Lean datasets Appendix LeanStep tactic proof dataset ● ~140k human-written goal-tactic pairs ● Spanning ~19k tactic-proved theorems from mathlib and lean core..