<<

Some applications in human behavior modeling (and open questions)

Jerry Zhu

University of Wisconsin-Madison

Simons Institute Workshop on Spectral Algorithms: From Theory to Practice Oct. 2014

1 / 30 Outline

Human Memory Search

Machine Teaching

2 / 30 Verbal fluency

Say as many as you can without repeating in one minute.

3 / 30 Semantic “runs” 1. cow, horse, chicken, , , , tiger, porcupine, gopher, rat, mouse, duck, goose, horse, bird, pelican, alligator, crocodile, iguana, goose 2. elephant, tiger, dog, cow, horse, sheep, cat, lynx, elk, moose, antelope, deer, tiger, wolverine, bobcat, mink, rabbit, wolf, coyote, fox, cow, zebra 3. cat, dog, horse, chicken, duck, cow, pig, gorilla, giraffe, tiger, lion, ostrich, elephant, squirrel, gopher, rat, mouse, gerbil, hamster, duck, goose 4. cat, dog, sheep, goat, elephant, tiger, dog, deer, lynx, wolf, mountain goat, bear, giraffe, moose, elk, , aardvark, platypus, lion, skunk, wolverine, raccoon 5. dog, cat, , elephant, monkey, sea lion, tiger, leopard, bird, squirrel, deer, antelope, snake, beaver, robin, panda, vulture 6. deer, muskrat, bear, fish, raccoon, zebra, elephant, giraffe, cat, dog, mouse, rat, bird, snake, lizard, lamb, hippopotamus, elephant, skunk, lion, tiger 4 / 30 7. dog, cat, ferret, fish, cow, horse, pig, sheep, elephant, tiger, lion, bear, giraffe, bird, groundhog, ox 8. antelope, baboon, cat, dog, elephant, frog, giraffe, hippopotamus, jaguar, dog, cat, horse, pig, chicken, bird, hippopotamus, cow 9. dog, cat, cow, sheep, lamb, rooster, chicken, tiger, elephant, monkey, orangutan, lemur, crocodile, giraffe, owl, bird, lion, elephant, hippopotamus, rhinoceros, penguin, seal, walrus, whale, dolphin 10. dog, cat, bear, deer, puma, opossum, moose, bear, wolf, coyote, lion, tiger, elephant, giraffe, hyena, cougar, mountain lion, antelope, elk Memory search

K = 12, obj = 14.1208

human hippopotamus whale

dolphin muskyshark squid 1 jaguar walrusporpoise baboon alligator

muskox iguana boa onstrictor eel c polar bear tiger crocodile cobra snake crab elephant turtle gorilla rhinoceroscougar monkey rattlesnake panda kangaroo sea lion moleseal clam cow oxlion puma pig cheetahbisonlamb otter worm 0.5 wolf ocelotaardvark chipmunk armadillo cockroach rabbit gilamonster tortoise python moosedog wallaby elk raccoon llamalemur skunk muskrat wildebeest ape chimp fish anemone prairie dog salamander amphibiantadpole bear panther porcupine tasmanian devil Guinea pig bobcat goat hedgehogwolverine chameleon octopus frog ram fox caterpillar leopard badger mongoosecat snail platypus dingo hyena lobster centipede buffaloyak shrimp 0 alpaca pony mule donkey deer opossum newt horse gazelle impalasloth chinchilla rodent lizard spider coyote tapir wombat reindeer beaver gerbil lynx anteaterkoala racoon orangutan gibbonjackrabbit vole penguin ladybug antelope bunny groundhog mouse jackal sheep woodchuck weasel gecko wasp zebra camelcaribou shrew marten duck mare giraffe pelican hornet beetleflea fly −0.5 bull squirrel hamster parrot butterflygnat hare roadrunnerswan moth gopher colt raven bee rat fowlcrow woodpecker roosterquail bat loonflamingooriole mosquito ostrich vulture wren owl sparrow emu heron bumblebee buzzard pigeon gull seagullturkey crane dove bird ibis bluejay canary −1 chickenhawkcockatiel eagle falcon robinchickadeepeacock cricketgrasshopper grouse cardinal hummingbird goose

−1.5 −1 −0.5 0 0.5 1 1.5

5 / 30 Memory search

K = 12, obj = 14.1208

human hippopotamus whale

dolphin muskyshark squid 1 jaguar walrusporpoise baboon alligator

muskox iguana boa onstrictor eel manatee c polar bear tiger crocodile cobra snake crab elephant turtle gorilla rhinoceroscougar monkey rattlesnake panda kangaroo sea lion moleseal clam cow oxlion puma pig cheetahbisonlamb otter worm 0.5 wolf ocelotaardvark chipmunk armadillo cockroach rabbit gilamonster tortoise python ant moosedog wallaby elk raccoon llamalemur skunk muskrat wildebeest ape chimp fish anemone prairie dog salamander amphibiantadpole bear panther porcupine tasmanian devil Guinea pig bobcat goat hedgehogwolverine chameleon octopus frog ram fox caterpillar leopard badger mongoosecat 6 / 30 snail platypus dingo hyena lobster centipede buffaloyak shrimp 0 alpaca pony mule donkey deer opossum newt insect horse gazelle impalasloth chinchilla rodent lizard spider coyote tapir wombat reindeer beaver gerbil lynx anteaterkoala racoon orangutan gibbonjackrabbit vole penguin ladybug antelope bunny groundhog mouse jackal sheep woodchuck weasel gecko wasp zebra camelcaribou shrew marten duck mare giraffe pelican hornet beetleflea fly −0.5 bull squirrel hamster parrot butterflygnat hare roadrunnerswan moth gopher colt raven bee rat fowlcrow woodpecker roosterquail bat loonflamingooriole mosquito ostrich vulture wren owl sparrow emu heron bumblebee buzzard pigeon gull seagullturkey crane dove bird ibis bluejay canary −1 chickenhawkcockatiel eagle falcon robinchickadeepeacock cricketgrasshopper grouse cardinal hummingbird goose

−1.5 −1 −0.5 0 0.5 1 1.5 Censored random walk

[Abbott, Austerweil, Griffiths 2012]

I V: n word types in English

I P : (dense) n × n transition matrix

I Censored random walk: observing only the first token of each type x1, x2, . . . , xt,... ⇒ a1, . . . , an I (star example)

7 / 30 The estimation problem

Given m censored random walks n (1) (1)  (m) (m)o D = a1 , ..., an , ..., a1 , ..., an , estimate P .

8 / 30 Each observed step is an absorbing random walk

(with Kwang-Sung Jun)

I P (ak+1 | a1, . . . , ak) may contain infinite latent steps I Instead, model this observed step as an absorbing random walk with absorbing states V\{a1, . . . , ak} QR P ⇒ I 0 I

9 / 30 Maximum Likelihood

−1 I Fundamental matrix N = (I − Q) , Nij is the expected number of visits to j before absorption when starting at i Pk I p(ak+1 | a1, . . . , ak) = i=1 NkiRi1 Pm Pn (i) (i) (i) I log likelihood i=1 k=1 log p(ak+1 | a1 , ..., ak ) I Nonconvex, gradient method

10 / 30 Other estimators

I PRW: pretend a1, . . . , an not censored

I PFirst2: Use only a1, a2 in each walk (consistent)

11 / 30 Star graph 2 x-axis: m, y-axis: kPb − P kF

12 / 30 2D Grid

13 / 30 Erd¨os-R´enyiwith p = log(n)/n

14 / 30 Ring graph

15 / 30 Questions

I Consistency?

I Rate?

16 / 30 Outline

Human Memory Search

Machine Teaching

17 / 30 Passive learning: iid 1. given training data D = (x1, y1) ... (xn, yn) ∼ p(x, y) 2. finds θˆ consistent with D Risk |θˆ − θ∗| = O(n−1)

Learning a noiseless 1D threshold classifier

θ∗

x ∼ uniform[0, 1]  −1, x < θ∗ y = 1, x ≥ θ∗ Θ = {θ : 0 ≤ θ ≤ 1}

18 / 30 Risk |θˆ − θ∗| = O(n−1)

Learning a noiseless 1D threshold classifier

θ∗

x ∼ uniform[0, 1]  −1, x < θ∗ y = 1, x ≥ θ∗ Θ = {θ : 0 ≤ θ ≤ 1} Passive learning: iid 1. given training data D = (x1, y1) ... (xn, yn) ∼ p(x, y) 2. finds θˆ consistent with D

18 / 30 Learning a noiseless 1D threshold classifier

θ∗

x ∼ uniform[0, 1]  −1, x < θ∗ y = 1, x ≥ θ∗ Θ = {θ : 0 ≤ θ ≤ 1} Passive learning: iid 1. given training data D = (x1, y1) ... (xn, yn) ∼ p(x, y) 2. finds θˆ consistent with D Risk |θˆ − θ∗| = O(n−1)

18 / 30 Risk |θˆ − θ∗| = O(2−n)

Active learning (sequential experimental design)

Binary search θ∗

19 / 30 Active learning (sequential experimental design)

Binary search θ∗

Risk |θˆ − θ∗| = O(2−n)

19 / 30 θ∗

Risk |θˆ − θ∗| = , ∀ > 0

Machine teaching

What is the minimum training set a helpful teacher can construct?

20 / 30 Machine teaching

What is the minimum training set a helpful teacher can construct? θ∗

Risk |θˆ − θ∗| = , ∀ > 0

20 / 30 Comparing the three

θ∗ θ∗ θ∗ { O(1/n){ O(1/2n ) passive learning "waits" active learning "explores" teaching "guides" The teacher knows θ∗ and the learning algorithm.

21 / 30 non-iid, n = d + 1

Example 2: Teaching a Gaussian distribution d Given a training set x1 . . . xn ∈ R , let the learning algorithm be n 1 X µˆ = x n i i=1 n 1 X Σˆ = (x − µˆ)(x − µˆ)> n − 1 i i i=1 How to teach N(µ∗, Σ∗) to the learner quickly?

22 / 30 Example 2: Teaching a Gaussian distribution d Given a training set x1 . . . xn ∈ R , let the learning algorithm be n 1 X µˆ = x n i i=1 n 1 X Σˆ = (x − µˆ)(x − µˆ)> n − 1 i i i=1 How to teach N(µ∗, Σ∗) to the learner quickly?

non-iid, n = d + 1

22 / 30 An Optimization View of Machine Teaching A D A(D)

θ∗ −1 Α−1 (θ ∗ ) Α Θ

DD

min |D| D∈D s.t. A(D) = θ∗

The objective is the teaching dimension [Goldman, Kearns 1995] of θ∗ with respect to A, Θ.

23 / 30 Example 3: Works on Humans

Human categorization on 1D stimuli [Patil, Z, Kope´c,Love 2014]

human training set human test accuracy machine teaching 72.5% iid 69.8% (statistically significant)

24 / 30 New task: teaching humans how to label a graph

Given:

I a graph G = (V,E) ∗ I target labels y : V 7→ {−1, 1} I a label-completion cognitive model A (graph diffusion algorithm) such as:

I mincut I harmonic function [Z, Gharahmani, Lafferty 2003] I local global consistency [Zhou et al. 2004] I ... Find the smallest seed set:

min |S| S⊆V s.t. A(y∗(S)) = y∗

(inverse problem of semi-supervised learning)

25 / 30 Example: A = local-global consistency [Zhou et al. 2004]

F = (1 − α)(I − αD−1/2WD−1/2)−1y∗(S)   1  y = sgn F −1

26 / 30 |A−1(y∗)| is large Turns out 4649 out of 216 = 65536 training sets S lead to the target label completion

27 / 30 Optimal seed set 1: |S| = 3

28 / 30 Optimal seed set 2: |S| = 3

29 / 30 Questions

I How does |S| relate to (spectral) properties of G?

I How to solve for S?

30 / 30