Higher-order (with hypergraph cuts)

Austin Benson · Cornell University WPI CS Colloquium · May 7, 2021

1 Graph or network data modeling important complex systems are everywhere.

Communications Physical proximity nodes are people/accounts nodes are people edges show info. exchange edges link those that interact in close proximity

Commerce Cell biology nodes are products nodes are proteins edges link co-purchased edge between two proteins products that interact

2 Network science studies the model to gain insight and make predictions about these systems.

1. Evolution / changes What new connections will form? (email auto-fill suggestions, rec. systems)

2. Clustering / partitioning / community detection How to find groups of related nodes? (similar products, protein functions)

3. Spreading and traversing How does stuff move over the network? (viruses or misinformation)

4. Ranking Which things are important? (PageRank and its variants)

3 Real-world systems are composed of “higher-order” interactions that we often reduce to pairwise ones.

Communications Physical proximity nodes are people/accounts nodes are people emails often have several people gather in groups recipients, not just one.

Commerce Cell biology nodes are products nodes are proteins several products can be protein complexes may purchased at once involve several proteins

4 Graph with (pairwise) edges Hypergraph with (multiway) hyperedges

What new insights does this give us?

5 We can ask the same network science questions while accounting for higher-order structure.

1. Evolution / changes What new connections will form? (email auto-fill suggestions, rec. systems)

2. Clustering / partitioning / community detection How to find groups of related nodes? (similar products, protein functions)

3. Spreading and traversing How does stuff move over the network? (viruses or misinformation)

4. Ranking Which things are important? (PageRank and its variants)

6 Graphs are a great model for pairwise relationships. But hypergraphs can encode richer “higher-order” structure.

Retail product graph 4 1 1 2 Based on this information, how should An edge indicates two 5 a grocery store separate products into 3 different departments or aisles? 2 retail products that are 6 frequently co-purchased 7 Or bundle their products?

4 Retail product hypergraph Having access to “higher-order” data 1 2 might affect your decision! Hyperedge = set of multiple 5 3 items that are frequently 6 What if people often buy products {1,2,3} purchased together 7 together, but rarely buy {1,3} alone? 1 2 A hyperedge represents a multiway + + = 3 relationship among a set of objects (nodes). 7 arXiv:2103.05031

8 w/ . Abebe, M. Schaub, Higher-order network science J. Kleinberg, A. Jadbabaie

1. Temporal evolution of higher-order interactions. Simplicial Closure and Higher-order Link Prediction, PNAS 2018.

2. Hypergraph cuts for local and global clustering. Hypergraph Cuts with General Splitting Functions, arXiv, 2020. Minimizing Localized Ratio Cuts in Hypergraphs, KDD, 2020. Hypergraph clustering: from blockmodels to modularity, arXiv, 2021.

9 We collected many datasets of timestamped hyperedges

bit.ly/sc-holp-data 1. Coauthorship in different domains. 2. Emails with multiple recipients. 3. Tags on Q&A forums. 4. Threads on Q&A forums. 5. Contact/proximity measurements. 6. Musical artist collaboration. 7. Substance makeup and classification codes applied to drugs the FDA examines. 8. U.S. Congress committee memberships and bill sponsorship. 9. Combinations of drugs seen in patients in ER visits. https://math.stackexchange.com/q/80181 10 Thinking of higher-order data as a weighted projected graph with filled-in structures is a convenient viewpoint. Data. t1 : 1, 2, 3, 4 { } 4 t2 : 1, 3, 5 1 9 { } 1 t : 1, 6 1 3 { } 1 3 t : 2, 6 1 1 4 { } 2 2 2 t5 : 1, 7, 8 1 5 { } 1 t : 3, 9 1 1 6 { } 1 1 t : 5, 8 6 1 8 7 { } 1 t : 1, 2, 6 7 8 { }

Projected graph W. Wij = # of hyperedges containing nodes i and j.

11 3 10

20

16 5

12 What’s more common in empirical data?

i i or

j k j k

Open triangle Closed triangle each pair has been in a there is some hyperedge hyperedge together but all 3 that contains all 3 nodes nodes have never been in the same hyperedge 13 There is lots of variation in the fraction of triangles that are open, but datasets from the same domain are similar.

coauth-DBLP tags-stack-overflow 1.00 coauth-MAG-geology tags-math-sx 0.75 coauth-MAG-history tags-ask-ubuntu congress-bills email-Eu 0.50 congress-committees email-Enron

threads-stack-overflow NDC-substances 0.25 threads-math-sx NDC-classes threads-ask-ubuntu 0.00 Fraction of triangles open triangles of Fraction 10 5 10 4 10 3 10 2 10 1 contact-high-school DAWN Edge density in projected graph contact-primary-school music-rap-genius

See also Patania-Petri-Vaccarino (2017) for similar ideas in collaboration networks. 14 How do new closed triangles appear? Can we predict which new higher-order interactions will occur?

15 Triangles “fill in” or “close” over time.

t : 1, 2, 3, 4 1 { } t : 1, 3, 5 2 { } t : 1, 6 3 { } t : 2, 6 4 { } t : 1, 7, 8 5 { } t : 3, 9 6 { } t : 5, 8 7 { } t : 1, 2, 6 8 { }

16 Weak and strong ties are useful characterizations.

Substances in marketed drugs recorded in the National Drug Code directory.

Bin weighted edges into “weak” and “strong ties” in the projected graph W. Wij = # of simplices containing nodes i and j.

• Weak ties. Wij = 1 (one hyperedge contains i and j) • Strong ties. Wij > 2 (at least hyperedges contain i and j)

17 Closure depends on structure in projected graph. • First 80% of the data (in time) ⟶ record configurations of triplets not in closed triangle. • Remainder of data ⟶ find fraction that are now closed triangles.

Closure probability Closure probability Closure probability

Increased edge density Increased tie strength Tension between edge increases closure probability. increases closure probability. density and tie strength.

Left and middle observations are consistent with theory and empirical studies of social networks. [Granovetter 73; Kossinets-Watts 06; Backstrom+ 06; Leskovec+ 08] 18 We used this for a new higher-order link prediction task.

Data. t : 1, 2, 3, 4 1 { } t2 : 1, 3, 5 • Observe simplices up to time t. { } t3 : 1, 6 • Predict which groups of > 2 { } t : 2, 6 nodes will appear after time t. 4 { } t : 1, 7, 8 5 { } t6 : 3, 9 { } t t : 5, 8 We predict structure that graph 7 { } t : 1, 2, 6 models would not even consider! 8 { }

19 Our structural analysis helps us with prediction.

1. Edge density matters. Predicting open triangles first is the easiest way to get a performance boost.

2. Tie strength matters a lot! We tried lots of stuff, but simple averaging works extremely well. i i scorep(i, j, k) ? Wij Wjk p p p 1/p =(W + W + W ) j k j k

AAAHdXicfVVtb9s2EFa7Lem0t7T7OAxglzlIO/ktXZZkQAADK4oVa7FsdpoCoZtR0sliTEoqSTV2Cf2o/Zph37Zfsa872k5jOdkI2DqR99zDu3tIhYXg2nQ6f966/d77H6yt3/nQ/+jjTz79bOPuvRc6L1UEx1EucvUyZBoEz+DYcCPgZaGAyVDASTj+wa2fvAGleZ4NzLSAoWSjjCc8YganzjZ+2qIGJsbqKFdQnRXbPCDnARk/IJT6W/R1yWJySLZPziw/r14V5BuC5vn4ncmd+eCV7baL6mxjs9PqzAa5bnQXxqa3GEdnd9fu0TiPSgmZiQTT+rTbKczQMmV4JKDyaamhYNGYjeAUzYxJ0EM7y7oiDZyJSZIr/GWGzGb9ZQjGUWxai2INC0vB1KQ+G+b5GFd05dcpTbI/tDwrSgNZNGdMSkFMTlwtScwVREZMSZ3W8PHbIOMRJIpFAZNaMpMGBXfbDMz4bXOkWJEGko0hAiGupuabcnDBQ8XU1GWQX+ggxMgjlZdZrIOCGQMq04g3ik8CnbICdJBwE0RMRO49dphC5EYyNdb/FbUlwTBcnBVOgLGDMjHwK8SVVRDf3+/cDwXyLnuYFEYKIKvs7OF8LlJuYMUnFCVU1v0vefgNkhpT6O/bbVRcSxuMDZMoZdkIWlEu269L0E6Uut39bvdg56CtQXLUbohSlc0LbtKmS6LJs2aICgc183u0tzl/+NQVlOEJcPXx6UjkIRMUX6mD9SDTpYJenAvsfw/1H+UxHFIFgk0usTluvq6h00F3aF3jnABqXT4a9FnmiqsggwtMQLIstjRhkotpDAkrhaks1cmlXReJTpwqKr+xTKaxgxAfdloHQSQ5kqIsBCoeCcxEJy5EPUmMTTMzcaF6c7DVD0/xqO0Oq9WkHgOeMQX9qQxz8QRTsvMourI/P39W2cxRSF5ZWVmO26V9MDc540S8CgkXkAWHA/TLENtpStfSmwlWGfpPnruSXBIMurXy2XBSWS2uSJzzHG2foqerARNFyqqrrf72dKXq8UgAj9LmvPY3rWCjNd4u9etBujDLXZZ9PpLIROeqcuEsDaWl8/nqmizkM7yT45sQi4WqTvGQTkKmTlF8NA3ziaVv3H/Dp6kqBZAU+Cg1eLnu7RaGNMggBcIiUzJBEObTMd4QndbOLkwa5HI0yGP8nrAsAhKCucDz63wJkhE9K6M/p2r4hMwCNDutLsjGJbqf5gqrw7MRyTOCoiICEkM0j8EhlvLa7FbvguD9/+h/g6hZJrMolasCfkW6q9+M68aLnVYXt/fLt5u9/cX35I73hfeVt+11vT2v5/3oHXnHXuT97v3h/eX9vfbP+pfrX69vzV1v31pgPvdqY739L4qSnVk= ij jk ik Wjk In contrast… Long paths help with classical link prediction methods [Liben-Nowell & Kleinberg 07] Complex k-hop neighborhood computations work well for modern graph neural networks 20 Higher-order network science w/ N. Veldt, J. Kleinberg, P. Chodrow

1. Temporal evolution of higher-order interactions. Simplicial Closure and Higher-order Link Prediction, PNAS 2018.

2. Hypergraph cuts for local and global clustering. Hypergraph Cuts with General Splitting Functions, arXiv, 2020. Minimizing Localized Ratio Cuts in Hypergraphs, KDD, 2020. Hypergraph clustering: from blockmodels to modularity, arXiv, 2021.

21 In network science, a wide array of applications rely on finding graph clusters and small graph cuts. 4 14 13 16 8 9 11 15 Applications 6 33 23 3 17 1 31 community detection 2 21 7 10 5 graph partitioning 19 12 18 20 34 semi-supervised learning 22 27 29 routing/flow problems 30 28 25 24 dense subgraph detection 32 localized clustering 26

Cluster = densely connected node set that is sparsely connected to rest of graph

Cut = number of edges crossing a cluster boundary

22 Cut and clustering problems are well understood and

widely applied in graph analysis. Types of cut problems minimum s-t cut multiway cut 1 2 1 2 An edge is cut if its nodes are separated. min conductance cut sparsest cut uncut cut

A classical example of a graph cut problem is the minimum s-t cut

s 5 1 3 t 6 8 minimizeS V cut(S) subject to ⇢ s S, t / S.

AAAHn3icfVVtb9s2EFa6rW61t3T9uC/MAg9dITt2uizJhgIGVnQr0G7Z7KQdQiOjpJPEmaQ0kmrsCPqh26/Z0XIWy80mwBZ1vOee48M7MiwEN3Yw+Gvrznvvf3C3c+++/+FHH3/y6faDz85MXuoITqNc5PpNyAwIruDUcivgTaGByVDA63D2vZt//Ra04bma2EUBU8lSxRMeMYumi21LQ0i5qpjWbFFXQtQ+lVxxya/gohoTasrQgCVnNfmSUAtzW0WlrR+NvyKU+o0BXf6AyBKbE+dlCOWKjAP6HbGEqty6r75PQcUrmovt3UF/sHzIu4PharDrrZ6Tiwd3H9I4j0oJykaCGXM+HBR2iuEsjwRgzqWBgkUzlsI5DhWTYKbVUp6adNESkyTX+FOWLK3+OmSVVstkWVgKpudta5jnM5wxte+3OW1yNK24KkoLKmook1I4SZzqJOYaFRIL0ua1fHYVKB5BolkUMGkks1lQcJdnYGdXvVSzIgskm0EEQtyYmqwcXPBQM71wS8gvTRBi5FTnpYpNUDBrQSuDeKv5PDAZK8AECbdBxETkvmOHKURuJdMz819R+xIsw8mlcgJsNSkTC79CXFca4p2jwU4okHfdw2aQagBVV8uX87nMuIUNn1CUUFfuf83D75LM2sJ8u7eH9dU3FmPDPMqYSqEf5XLvzxKMK1+zN/zm4Hj/eM+A5FjlIRa17F1ym/XcInpc9ULsBdBLvyeHu80LKxwFZdgrTh+fpiIPmaD4SR1sBMqUGkZxLrAARtgpUR7DU6pBsPk1Nsfk20V0PhlOK7dxrgBau3wyGTPlxNWg4BIXIBm2Ak2Y5GIRQ8JKYeuKmuR63C4Sk7iqqP3uOpnBHYT46aB/HETYrxbVZgJLHgns3CQuRHuRGJsqO3ehRg24Mo/PsdcOpvXmop4BNpmG8UKGuXiOS6qaKKaufn71sq6Uo5C8rmRdcUyXjsHe5oyGeBMSriArDgcY4xGDR1fptvR2gk2G8fNXTpJrgsmwJV8VzuvKiBsS59ygqxfo6TRgoshYfZPq7y82VI9TATzKeo32t83gRhs8Xtrng3Rh1ndZjnkqkYk2VeXCVTSUFW3s9TtlIV/i6R3fhlhN1G2Kx3QeMn2OxUezMJ9X9K377/o006UAkgFPM4un6+FBYUmXTDIgLLIlEwRhPp3hCTHo7x/AvEuuny55hjcPUxGQEOwl9q/zJUhGzFJGv6Hq+oQsA/QG/SHI7jV6nOUa1eEqJbkiWFREQGKJ4TE4xNq6dof1v0HwAnjyv0H0ciXLKLVTAa+R4eal8e7gbL8/xPR++Xp3dLS6UO55n3tfeI+8oXfojbwfvRPv1Iu8v7e8rftbfmen80Pnp85J43pna4V56LWezm//AOeeq5s= 2 4 2 2 7 The penalty for cutting an edge is its weight.

cutAAAE2nicjVNLbxMxEHZpgBJeLT1y2VJVKihE2QKiHCpVgCoqUVFI+pC6UfF6ZxMr3ofs2Tap5Qs3xJU7Vzjxe/g32ElKs2kPWEoymfnmm5nPnjAXXGGj8Wfm2mzl+o2bc7eqt+/cvXd/fuHBvsoKyWCPZSKThyFVIHgKe8hRwGEugSahgIOw98bFD05AKp6lLRzk0E5oJ+UxZxSt63h+MUDoYxhrVqBZbT72Nry14/nlRr0xPN5lwx8by2R8do8XZn8HUcaKBFJkgip15DdybGsqkTMBphoUCnLKerQDR9ZMaQKqrYfdG2/FeiIvzqT9pOgNvZMZlkbSgSm7EqUGSWiTE4pdNR1zzitjSMNCUNkvs4VZ1rMRNdUoxuttzdO8QEjZqM+4EB5mnlPSi7gEhmLglSv0zp52JM27IzLkvTPBQ0nlwM2Rnaqa6tIcVI1RwWoxR4sbjiwAdauIET5BZLSEaGm9sRQKSzuJwC50JEBq9PDHYU67HGEKE4oCjHbfk4iJ4Vp+Wzu53WSl/ndbTZo6JSSkcMqyJKFppIOYJlwMIohpIdDoQMXndilbxUPpqyuTtZQdFaKNRv1VjSXc1rT6CHv/lh/7KrYMCe0BtW8XEezVW+Ygxb5j2hzlavXkyL67F+1/2MwOaaFvwb43CU37FjKxZefRIxJl9Ied90anrkDCjU6MHkrdBLwKbB3RdEo4ThnXcAnNIlR2yQq3O/9VoLm14/Q452/5Jel02DdaiYsaDjzK1tsW6RSgIu9Sc9Hp521TrQaTV+M2V0+usSnHTzJxEXd/LIFdcX96oS8b+2t1/1l97ePz5c3X42WfIw/JI7JKfPKSbJJ3ZJfsEUYG5Af5SX5VgsqXytfKtxH02sw4Z5GUTuX7X1AGtjA= (S)=2 23 How do we define hypergraph cut problems? Once defined, how do we solve them?

A first thought. Apply a bipartite graph expansion and solve the cut problem on the graph.

5 5 1 3 1 3 6 6 8 8 2 4 2 4 7 7

What exactly is this cut measuring? Is it right for applications? Are there alternatives?

24 There are two major challenges.

We’ll address these with

1. Modeling Questions. There are many ways to generalize graph cut and clustering Generalized and unifying models for hypergraph cut functions. problems to the hypergraph setting.

2. Scalability Issues. Hypergraphs can grow Methods that scale up to hypergraphs not only in terms of nodes and with millions of nodes, and millions of hyperedges, but also hyperedge size. very large hyperedges.

25 A The hypergraph cut function has existed for decades.

1 2 1 2 A hyperedge is cut if its nodes are separated. 3 4 3 4

uncut cut The hypergraph cut is the number of cut hyperedges.

s The hypergraph minimum s-t cut problem separates s and t in a way that minimizes the hypergraph cut. t

This has a polynomial-time solution [Lawler 1973].

This cut function seems natural at first, but does it always make sense? 26 There are 14 distinct ways to cut a 4-node hyperedge

1 2 1 2 1 2 1 2 1 2

3 4 3 4 3 4 3 4 3 4

1 2 1 2 1 2 1 2 1 2

3 4 3 4 3 4 3 4 3 4

1 2 1 2 1 2 1 2 1 2

3 4 3 4 3 4 3 4 3 4

Seven when restricting to two clusters. 27 Here’s how the standard hypergraph cut function sees them

1 2 1 2 1 2 1 2 1 2

3 4 3 4 3 4 3 4 3 4

1 2 1 2 1 2 1 2 1 2

3 4 3 4 3 4 3 4 3 4

1 2 1 2 1 2 1 2 1 2

3 4 3 4 3 4 3 4 3 4

28 Should we really treat the same as ?

29 We introduce a generalized class of hypergraph cut functions based on splitting functions. A splitting function associates a penalty to each configuration of nodes in a hyperedge.

1 2

we( {1} ) = penalty for {1} vs. {2,3,4} split 3 4

Assumptions. Uncut ignoring w (e)=w ( )=0 e e ; Non-negativity w (U) 0 for all U e. e ⇢ Symmetry (U)= (e U) for all U e. AAAGEXicnVTNjhtFEB4nGIL5SRaOXDpsghK0a40XooRDpBWgiEgEFmwnkXaspaenxm65u2fo7ll70uqngJeBEwJueQLehuoZb/yTPSD6YNdUffV9VdU/aSm4sXH8T+fK1Te6b7517e3eO+++9/71G3sfPDVFpRmMWSEK/TylBgRXMLbcCnheaqAyFfAsnX8V4s/OQRteqJGtS5hIOlU854xadJ3tdQ4TC0vrxopVlvCpKjRXU08+IcnMlJSBi/tH95j0hCTnizO4A3fJw5WZgCxtbcAGV5wkvZbqu0IdKpiiwDm3NVLtMrXZ47skmcLPJF4DDgf9+y2iISJ5oQkVgtwak8RUKSoRuNXH8IXUsJYSrK4vqfeVyEW1QJKUsrkR1MwI+v+Pau/sxn7cj5tFXjcGK2M/Wq2Ts72rL5OsYJUEZRlKm9NBXNqJo9pyJsD3ksoAljGnUzhFU1EJZuKajfXkNnqypqC8UJY03s0MpNG09tsuaUwtU0yW1M7Mbiw4L41ZmlaC6uU2W1oUc4yYnUJt/mDiuCorC4q1deaVILYg4ZCRjGtgVtRkW2H+4nCqaTlrySyfvxA81VTXoY9iYQ7MjJZgDhgV7CDnFnFNywKsG1W5hR8h805DdvNBfDMVSLuJsDOYagDlXfMXMIsZt7CDSUUF3oXfTcRGc6PBxIVxh8626j8ZDakKk9CgYMEKKanKXJJTyUWdQU4rYb1LTH5hb2WbvBl97/amlsFWIXsY9784YJKjJs5H4P4jv12aHBkknQPFa20t4NYjc6LsMjAdt7nOfHqK5+7e5BW2wCYR+jXgedOAVyQtxCPsx7Ukxrvvn3zrnQoCknsnvWtGPQR7GRgd2W5KukpZaYSEId4UfH+q8Kz8J4HhoydhHhf8o8HW6Fy69M6ItUYAt9nuMSLDBKgoZ9SvK/3pse/1ks2twTcNseFap7kLH347fl6IdTx87MYX6/ACg3j9B7uX/XXj6VF/8Fn/6IfP94+/XD0E16KPoo+jO9Eguh8dR99EJ9E4Yp1fO793/ur83f2l+1v3j+6fLfRKZ5XzYbS1ui//BSMiI5s= w w e e \ ⇢ 1 2 4 2 we( {1} ) = we( {2,3,4} ) 3 4 3 1

30 We focus on a very natural class of splitting functions.

Hypergraph minimum s-t cut problem.

s minimizeS V e E we(e S) cut (S) ⇢ 2 / \ ⌘ H

subjectAAAICXicfVXdbhw1FN4UaMrw05RecuMSLQrR7mY3VUgCqhSJULVSKwKbtJXiVfDMnNlx1/ZMbU+yieUn4Gm4Q9zyFNzwLBzPbJrdJOCLGfv4nO/z+bPjUnBj+/2/l+588OFHd5fvfRx98ulnn99fefDFK1NUOoGjpBCFfhMzA4IrOLLcCnhTamAyFvA6nvwQ9l+fgja8UIf2vISRZGPFM54wi6KTlX9oDGOuHLyrasm6j2YSpjU7904IlFiYWie54pJfgD9xQ0JNFRuw5JUnX4eFPHFAKFfkR0/o6dkJrOEyYSUZfkMogvNT0qAklUUAKpnNEybcM+/XggqNHFWFlkxkhbIzXeR4C4kltiA+8JiaYdih3xNUUYUNq15EQaWz4zbzK2dOVlb7vX49yM3JYDZZbc3GwcmDuw9pWiSVBGUTwYw5HvRLO0J4yxMBSFAZKFkyYWM4xqliEszI1YnwpI2SlGSFJrUXtTSaN3l/zDmRZXElmJ4uSuOimOCO8VG0yGmznZHjqqwsqKShzCoRYhTyS1KuMWTinCzyWj656CieQKZZ0mHShAR0Sh7O2bGTi+5YszLvSDaBBIS4EjWnCuaCx5rp8+BCcWY6MSKPdVGp1HRKZi1oZdDeaj7tmJyVYDoZtx1MchLWabApRWEl0xPzX6g9CZbhZh05AdYdVpmFXyD1TkP6aKf/KBbIO69hcxhrAOVd/Qs6Zzm3cE0nFhV4F75zGlGb5NaW5ruNDSy4nrGIDdMkZ2oMvaSQG+8qMKGSzMbg263dzd0NA5JjP8VYX7J7xm3eDU50uerG2HWga73H26vNL6IhoAy7MsQnomNRxExQXNJgtgfKVBr20kJgAexhTyZFCk+oBsGml7YFHn6xiI4PByMXEhcKYCHLB4dDpkJwNSg4Qwckw3agGZNcnKeQsUpY76jJLueLRWKyUBU+as+TGcwgpE/6vd1OgleAxWgzgSWPBHZqsgCx6CRiU2WnAWqvMXZm/Rh7bWvkrzu1D9hkGobnMi7EU3TJNSjGu59evvBOBQrJvZPecTwuHYK9TRkF6XWTeGYy4wgGQ7y08JKsQkpvJ7jOMHz6MoTkkuBwsBA+F0+9M+KKJCg31u65b245Jsqc+auj/vr8WtTTsQCe5N0m9rftYKINXi+L94MMMPNZlkM+lshEm6oKcI7G0tFG7m+UhXyB70R6m8Vswy9SrNNpzPQxFh/N42Lq6Gn4tiOa60oAyYGPc4u36/ZWaUmbHOZAWGIrJgiaRXSCN0S/t7kF0za5HG2yj28cUwmQGOwZ9m/QJUhGTB3GqKFqR4TUAN1+bwCyfWk9zAuN0eFqTApFsKiIgMwSw1MIFnN+rQ78exB8AB7/L4iuPalRfIgCPiOD64/Gzcmrzd4Aj/fz5urezuxBudf6svVVa601aG239lrPWgeto1aytL/0dsks2eXfln9f/mP5z0b1ztLM5mFrYSz/9S/5ANcF to s S, t S. t P2 2 Assume all hyperedges of the same size have the same splitting function. SAAAHLHicfVVdj9w0FE0LDCV8tfSRF5ftSKjKfG217C5SpZGoKiq1YmFm20qbUXGSm8Qa2wm2083Uyk/gFf4Fv4YXhHjld3A9mWUnswuWZuLY95zje31iRyVn2ozHf9y4+c677/Xev/WB/+FHH3/y6e07n73QRaViOI0LXqhXEdXAmYRTwwyHV6UCKiIOL6PlN27+5RtQmhVyblYlLATNJEtZTA0Oze7P7r++vTcejteNXO1MNp09b9NOXt/p3Q2TIq4ESBNzqvXZZFyahaXKsJhD44eVhpLGS5rBGXYlFaAXdr3WhvRxJCFpofAnDVmP+tsQ5FF01WGxhkYVp6rujkZFscQZ3fh+V9OkRwvLZFkZkHErmVacmIK4EpCEKYgNX5GurmHLt4FkMaSKxgEVWlCTByVz6wzM8u0gU7TMA0GXEAPnl0Ptqhycs0hRtXIpFOc6iJA5U0UlEx2U1BhQUiPeKFYHOqcl6CBlJogpj9174jAlL4ygaqn/i3UowFCcXFeOg7HzKjXwAySNVZDcOxrfizjqbkeYHDIFIBu7friY85wZ2ImJeAWNdf9bEX6f5MaU+uvRyEA91Aa5oY5zKjMYxoUY/VSBdl7So8lXB8f7xyMNgqHlInSYGJwzkw9cEgMmBxEaE9Q67uHhXvvwQ1dQisZ19fHDjBcR5SG+hg42BakrBdOk4GiAKdo2LhJ4FCrgtL7AFrj4ronO5pOFdRvnDNDZ5ZP5jEpXXAUSzjEBQWViw5QKxlcJpLTiprGhTi/6XZPo1Lmi8fvbYhp3EJJH4+FxEAuGomgLjpZHAVPr1FF0k0TuUJraUU1bsNUPzvBbO1g0u0k9BvzIFMxWIir4E0zJtiy6sd89f9ZY6SQEa6xoLMPlhjMw1wXjQLILiTaQjYYDzKoIt9NUbkuvF9hVmD157kpyITCfdMpno7qxml+KuOAWbZ9ipKsB5WVOm8ul/vh0p+pJxoHF+aCt/XUzuNEaj5fu+SAczfYuixnLBCqFrascnQ0jYcN2vLliC/EMj9LkOsRmoulKPAjriKozNF+YR0Vtwzfuv++Huao4kBxYlhs8XQ8PSkP6ZJ4DobGpKCcI88MlnhDj4f4B1H1y0frkMV4DVMZAIjDn+P26WIJiRK/L6LdSfZ+QNcFgPJyA6F+gZ3mhsDpMZqSQBE1FOKSGaJaAQ2zltTdp/iXBC+Dh/5KodSZrlsZVAa+Rye6lcbXzYn84weV9v783PdpcKLe8z70vvC+9iXfoTb1vvRPv1Iu9zPvZ+8X7tfdb7/fen72/2tCbNzaYu16n9f7+BwqfheM= cut (S)=f (2) + f (1) In theory, we could assign a completely different function to each hyperedge. AAAHdnicfVVtb9s2EFa7Lem0t3T9OGBgFxhLUtuxU2RJBgQwsKJosRbLZqctYBkZJZ0kwiSlklRjl9CP2q8Z9m37F/u4o+UslpNNgC3qeM89vLuHZFhwpk2v98edux98+NHG5r2P/U8+/ezzL7buf/lK56WK4DzKea7ehFQDZxLODTMc3hQKqAg5vA6nP7j51+9AaZbLkZkXMBE0lSxhETVoutj6MQghZdLC23Jh2av8wMDM2Kg01YUNBDVZRLl9VlU7w11ySpKdg13yCF/9XT8AGa8gL7a2e93e4iE3B/3lYNtbPmcX9zceBHEelQKkiTjVetzvFWZiqTIs4oBLKTUUNJrSFMY4lFSAnthF1hVpoSUmSa7wJw1ZWP1VCMZRdN6IYg0NS07VrGkN83yKM7ry/SanSY4nlsmiNCCjmjIpOTE5ccUkMVMQGT4nTV7Dpu/bkkWQKBq1qdCuiu2CuXW2zfR9J1W0yNqCTiECzq9N9aocnLNQUTV3KeSXuh1i5FTlpYx1u6DGgJIa8UaxWVtntADdTphpY6ci9x07TMFzI6ia6v+K2hVgKE4uKsfB2FGZGPgF4soqiB8e9x6GHHlXPUwGqQKQlV28nM9lxgys+YS8hMq6/xUPv0UyYwr9/f4+KqyrDcaGWZRRmUI3ysX+2xK0U5Le7393eHJwsq9BMBRviPoSnUtmso5LosNkJ0SJg1r4PT7arl9+4ApKcQu4+vhByvOQ8gA/AwcbgNSlgkGccxTAADdAlMdwGijgdHaFzXHxTRGNR/2JdY1zAmh0+Ww0pNIVV4GES0xAUNwOQUIF4/MYElpyU9lAJ1fjpkh04lRR+a1VMo0dhPi01z1pR4IhKcqCo+SRwMx04kI0k8TYgTQzF2pQg63eG+NeO5xU60k9AdxkCoZzEeb8KaZk6yi6sj+9fFFZ6SgEq6yoLMPlBkMwtzmjIV6HhEvIksMBhmWI7TSla+ntBOsMw6cvXUmuCEb9RvlsOKus5tckzrlG2+dVfVRRXmS0ul7qr8/Xqh6nHFiUdera3zaDjdZ4vDTPB+HCrHZZDFkqkCmoVeXC2SAUNqjt1Q1ZiBd4KMe3IZYTVZNiL5iFVI1RfEEW5jMbvHP/LT/IVMmBZMDSzODpenRYGNIiowwIjUxJOUGYH0zxhOh1Dw5h1iJXT4s8wQuFyghICOYS96/zJUhG9KKMfk3V8glZBOj0un0QrSv0MMsVVofJlOSSoKgIh8QQzWJwiJW8tvvVv0HwAnj8v0HUIpNFlMpVAa+R/vqlcXPw6qDbx+X9fLA9OF5eKPe8r7xvvB2v7x15A++Zd+ade5H3m/e796f318bfm19vtja/rV3v3lliHniNZ7P3DzhnnvQ= H

1 2 Cardinality-based splitting functions. (U)=f (min( U , U e )) AAAHjnicfVVdb9s2FFW7re60r3R93Au7wEASyI6dIk0yIJiBFUULtFg2O22B0Mgo6UoiTFIqScV2Bf3A/YT9ir1ub7u0ncVysgmwRV3ecw55eEmGheDG9np/3Lv/yaefPWg9/Nz/4suvvv5m69G3b01e6gjOo1zk+n3IDAiu4NxyK+B9oYHJUMC7cPKT6393BdrwXI3svICxZKniCY+YxdDlVkRDSLmq4EO5iOzVPr2aXsLO+S45JckOlVztUIEUlpwTql0jIDeBkEUTI5jJCKx6d3d9CipeY7zc2u51e4uH3G70V41tb/WcXT568JjGeVRKUDZCcnPR7xV2XDFteSQAh1gaKFCZpXCBTcUkmHG1cKMmbYzEJMk1/pQli6i/DkEezeYNlsqysBRMz5rRMM8n2GNq329q2uR4XHFVlBZUtJRMSkFsTpzJJOYaIivmpKlr+eRjoHgEiWZRwKSRzGZBwd04Azv52Ek1K7JAsglEIMRNaDkqBxc81EzP3RTyqQmc/anOSxWboGDWglYG8VbzWWAyVoAJEm6DiInIfccOU4jcSqYn5r9YuxIsw86FcwJsNSoTC79CXFca4ifHvSehQN31DJtBqgFUXS1eLmeacQsbOaEooa7c/1qG3yaZtYX5YX/fwqxrLHLDLMqYSqEb5XL/QwnGVZLZ7z87PDk42TcgOZZaiPUlO1Nus46bRIerToilD3qR9/Roe/nyqTOU4dZw/vg0FXnIBMVP6mADUKbUMIhzgQUwwI0R5TGcUg2Cza6xOQ6+WUQXo/64cgvnCqCxymejIVPOXA0KpjgByXA70IRJLuYxJKwUtq6oSa7bzSIxiauK2m+vixlcQYhPe92TIMIdadFtJrDkUcDOTOIompNEbqrszFENluDK7F3gXjsc15uTeg64yTQM5zLMxQucUrVkMXX185vXdaWchOR1JeuK43DpEOxdyRiINyHhCrLScIBhGeJy2tIt6d0CmwrDF2+cJdcCo37Dviqc1ZURNyIueYmuXmGm84CJImP1zVB/e7XhepwK4FHWWXp/Vw8utMHjpXk+SEezvspyyFOJSnRZVY6uoqGs6DJe3yoL+RoP6/guxKqjbkrs0VnI9AUWH83CfFbRK/ff9mmmSwEkA55mFk/Xo8PCkjYZZUBYZEsmCMJ8OsETotc9OIRZm1w/bfIcLxqmIiAh2CnuX5dLUIyYhY3+UqrtE7Ig6PS6fZDta/QwyzW6w1VKckWwqIiAxBLDY3CItXlt9+t/SfACePq/JHoxkwVL7VzAa6S/eWncbrw96PZxeL8cbA+OVxfKQ+8773tvx+t7R97Ae+mdeede5P3u/en95f3d2mo9a522flym3r+3wjz2Gk/r5T8YGqeB we 3 4 | | | \ |

This allows us to distinguish between and .

31 Cardinality-based functions are easy to specify.

1 2 1 2 Only need to specify f(1), f(2), …, f ( ⌊r / 2⌋ ) where r = hyperedge size. 3 4 3 4 f (j)=w

AAAEz3icjVNbb9MwFPZYgVFuGzzykjFNGqhUyQAxHiZNgCYmMbHR7iLWajjOSevVcSLbWdtZRrzyziv8BX4P/wY77VjT7QFLSU7O+c53bj5hxqhUvv9n5tps5fqNm3O3qrfv3L13f37hwb5Mc0Fgj6QsFYchlsAohz1FFYPDTABOQgYHYe+tsx+cgpA05U01zKCd4A6nMSVYWdXneOXkibfu9Y9PjueX/LpfHO+yEIyFJTQ+O8cLs79bUUryBLgiDEt5FPiZamssFCUMTLWVS8gw6eEOHFmR4wRkWxcpG2/ZaiIvToV9uPIK7aSHpRF4aMqqRMphElrnBKuunLY55ZU2hcOcYTEos4Vp2rMWOZWoitfamvIsV8DJKM84Z55KPdc+L6ICiGJDrxyhd/asI3DWHZEp2jtjNBRYDF0daV/WZBdnIGsEM1KLqbK4omQGSjfzWMEniIwWEC2u+Yshs7STCNWFjgDgRhcfh+l3qYIpTMhyMNq9JxETxTWDtnbtdpWV8t9pNjB3nRDAoU/SJME80q0YJ5QNI4hxzpTRLRmfyyVvGRetry5PxpK2VIjW/frrGkmojWn7w+z8Lb8ayNgyJLgH2F5YpcCO3jK3uBo4po2Rr5ZPj+y9e9n+h01tkRb6Dux9E9CwdyFlm7YePSKRRn/c/mA0dwESanRidNHqBqirwFYRTbuEY5dxDOfQyENpNyt3C/NfARqb264f5/zNoNQ6HQ6MluwihgOPvPWWRboOYJZ1sbnI9MuWqVZbk6MhuRuIgoEKY+1+TNl+mrILu/uxBHbFg+mFvizsr9aD5/XV3RdLG2/Gyz6HHqHHaAUF6BXaQO/RDtpDBHH0A/1Evyq7lX7la+XbCHptZuzzEJVO5ftf8u+yAw== j f(1) = f(3) = 1 scale to 1 WLOG

1 2 1 2 1 2

3 4 3 4 3 4 s

t f(2) 1 2 1 2 1 2 f(0) = 0 3 4 3 4 3 4 cut (S)=f (2) + f (1) = w +1 AAAHgHicfVVbb9s2FFa7re60W7o+7oVdYCDNfJOLLEmBAAZWFC3QYtnstAUiI6OkI4kwSakkVdsl9Mv2S/a41+1P7NByFjvJRsAWL+c7H885H8mo5EybweCPO3c/+fSze637n/tffPnV19/sPPj2jS4qFcNZXPBCvYuoBs4knBlmOLwrFVARcXgbzX5y628/gNKskBOzLGEqaCZZymJqcOpi5yyMIGPSwvtqNbNf+6GBhbFxZeoLGwpq8phy+6Ku98aPyQlJ94aPyQ/4CdxofjHEQeCHIJMNHxc7u4PeYNXIzU6w7ux663Z68eDewzAp4kqANDGnWp8Hg9JMLVWGxRxwU5WGksYzmsE5diUVoKd2FX9N2jiTkLRQ+JOGrGb9TQj6UXS55cUaGlWcqsX2bFQUM1zRte9vc5r0aGqZLCsDMm4o04oTUxCXVpIwBbHhS7LNa9jsY0eyGFJF4w4V2uWzUzK3z46ZfexmipZ5R9AZxMD51VSzKwfnLFJULV0IxVx3IvScqaKSie6U1BhQUiPeKLbo6JyWoDspMx2sWezGicOUvDCCqpn+L689AYbi4ipzHIydVKmBXyGprYLk0dHgUcSRd9PC5JApAFnb1cfZzHNm4JpNxCuorfvfsPDbJDem1E/7fdRaTxv0DYs4pzKDXlyI/vsKtFOS7gc/HhwPj/saBEMZR6gv0Z0zk3ddEF0muxGKHdTK7snhbvPxQ5dQiofB5ccPM15ElIc4DB1sBFJXCkZJwVEAIzwKcZHASaiA08UltsDNb4vofBJMrSucE8BWlU8nYypdchVImGMAguJxCFMqGF8mkNKKm9qGOr3sb4tEp04Vtd/eJNNYQUhOBr3jTiwYkqIsOEoeCcxCp87FdpDoO5Rm4VyNGrDV++d41g6m9fWgngEeMgXjpYgK/hxDso0XXdufX7+qrXQUgtVW1JbhdsMxmNuMcSK5DonWkDWHA4yrCMtpKlfS2wmuM4yfv3YpuSSYBFvps9GitppfkTjjBm1f1s2lRXmZ0/pqq7+9vJb1JOPA4rzb5P62FSy0xutl+34Qzs1mlcWYZQKZwkZVzp0NI2HDZr6+IQvxCq/n5DbEeqHeptgPFxFV5yi+MI+KhQ0/uP+2H+aq4kByYFlu8HY9PCgNaZNJDoTGpqKcIMwPZ3hDDHrDA1i0yWVrk2f4tFAZA4nAzPH8OluCZESv0ug3VG2fkJWD7qAXgGhfosd5oTA7TGakkARFRTikhmiWgENsxLUb1P86wQfgyf86UatIVl5qlwV8RoLrj8bNzpthL8Dt/TLcHR2tH5T73nfe996eF3iH3sh74Z16Z17s/e796f3l/d2629pr9VtBY3r3zhrz0Ntqraf/ACQ3oHo= 2 H 32 Different weights lead to different min cuts in practice. s- seed = symplectic- linear- algebra t- seed = bernoulli- numbers node is on s side fusion- systems topological- stacks graph- invariants adjacency- matrix signed- graph node is on t side gorenstein cohen- macaulay topological- k- theory difference- sets pushforward cayley-graphs regular- rings graph- connectivity group-theory block- matrices directed- graphs finite-groups eulerian- path category-theory combinatorial-designs central- extensions hypergraphs group- extensions logic semidirect- product graph-theory terminology wreath- product graded- algebras algebraic-graph-theory supergeometry geometric- complexity math-software soliton- theory matrix- congruences teichmueller- theory linear-algebra combinatorics discrete-mathematics superalgebra string- theory riemann- surfaces group- cohomology node = tag on math.stackexchange.com dglas celestial- mechanics hyperedge = set of tags in same post 1.0 1.25 1.5 1.75 2.0 w2 33 We solve hypergraph cut problems with graph reductions

Gadgets (expansions) model a hyperedge with a small graph. ∞ 1/2 1/2 1 1 ∞ 1 ∞ ∞ 1 1/2 ∞ ∞ hyperedge clique expansion star expansion Lawler gadget [1973]

In a graph reduction, we first replace all hyperedges with graph gadgets...

s s s s

t t t t

...and then exactly solve the resulting graph s-t cut problem. 34 Existing gadgets model cardinality-based splitting functions.

Quadratic penalty Clique Gadget models Does not require adding (U)= U e U AAAE83icjVNNbxMxEHVpgBK+Wjhy2VJVKihESQFRDpUqQBWVqCgkaSt1o+D1ziZWvB+yvU1S13+CKzfEFXHlCv+Cf8M4SWmS9oCl3Z2defPG8+wJMsGVrlT+zF2ZL1y9dn3hRvHmrdt37i4u3dtXaS4ZNFgqUnkYUAWCJ9DQXAs4zCTQOBBwEHRfu/jBMUjF06SuBxk0Y9pOeMQZ1ehqLZZ8DX0dRKZnW7DWeORteqeNU89nYaq9U/D8gLKuElR1vMZpa3GlUq4Ml3fRqI6NFTJee62l+R9+mLI8hkQzJFFH1Uqmm4ZKzZkAW/RzBRlWoG04QjOhMaimGbZlvVX0hF6USnwS7Q29kxlII+nATrtipQZxgMkx1R01G3POS2OaBrmgsj/NFqRpFyNqZqM62mganmS5hoSN9hnlwtOp5yT2Qi6BaTHwpit0T560Jc06IzLNuyeCB5LKgesj7amS6tAMVIlRwUoR14gbtixAm3oeafgIoTUSwuWNynIgkHYSoTvQlgCJNcOPw/Q6XMMMJhA5WOPek4iJ5urVpnFyu86m9r9Xr9HEKSEhgR5L45gmofEjGnMxCCGiudDW+Co6s6eyVTSUvrg6WUthqxBuVsovSyzmWBP1EXj+yK/7KkKGmHaB4qXWGvDokdlPdN8xbY1yjXp8hPfuefMfNsUmEfoG8L5JqOFdSMU29mNGJMqa97vvrElcgZhbE1szlLoG+jIwOsLZlGCcMq7hEmp5oHD6cjdU/1Wgtr3r9Djjr1enpDNB3xolzms48Cjb7CDSKUBF1qH2fKefdmyx6E8eDcvdgYzn2/3Y6fhxKs7j7gcJcMSrswN90dhfL1efltc/PFvZejUe9gXygDwka6RKXpAt8pbskQZh5DP5SX6R34W88KXwtfBtBL0yN865T6ZW4ftfD3zAdg== we new vertices | |·| \ | [Agarwal+ 06; Zhou+ 06; i.e., wi = i · (r-i) Benson+ 16] Star Gadget Linear penalty Equivalent to bipartite models expansion of a (U) = min U , e U AAAE+XicjVNLbxMxEHZpgBJeLRy5uFSVCgpRUkCUQ6UKUEUlKgpJ2krdKHh3ZxMr3odsb5PU9f/gyg1x4cCdK/wF/g3jJKVJ2gOWdnd25ptvPC8/E1zpSuXP3JX5wtVr1xduFG/eun3n7uLSvX2V5jKARpCKVB76TIHgCTQ01wIOMwks9gUc+N3Xzn5wDFLxNKnrQQbNmLUTHvGAaVS1Ftc9DX3tR6ZnW7DWeEQ3qRfzhHqGnjZOS/QUqOezoKsEUx3aOKWebS2uVMqV4aEXhepYWCHjs9damv/uhWmQx5DoAHnUUbWS6aZhUvNAgC16uYIMg7A2HKGYsBhU0wyTs3QVNSGNUolPoulQO+mBNJIN7LQqVmoQ++gcM91RszanvNSmmZ8LJvvTbH6adtGiZi6qo42m4UmWa0iC0T2jXFCdUldoGnIJgRYDOh2he/KkLVnWGZFp3j0R3JdMDlweaU+VVIdloEoBE0Ep4hpxw5QFaFPPIw0fIbRGQri8UVn2BdJOInQH2hIgsWb4cZheh2uYwfgiB2vcexIxkVy92jSu3C6zqfvv1WsscZWQkEAvSOOYJaHxIhZzMQghYrnQ1ngqOpOnvFU0LH1xdTKWwlQh3KyUX5YCHD2NGTCB/Ud+3VcRMsSsCwxHW2vA1iOzl+i+Y9oa+Rr1+Ajn7nnzHzbFJBH6BnDeJNRwFlKxjfmYEYmy5v3uO2sSFyDm1sTWDEtdA30ZGBXhrIs/dhnHcA613Fe4g7lbrf8KUNvedfU4469Xp0pn/L41SpzHcOCRt9lBpKsAE1mH2fObftqxxaI32Zogdw0Zb7n7sdP241Sc290PEuCKV2cX+qKwv16uPi2vf3i2svVqvOwL5AF5SNZIlbwgW+Qt2SMNEpDP5Cf5RX4XTOFL4Wvh2wh6ZW7sc59MncKPv9zewmg= we hypergraph {| | | \ |} [Hu-Moerder 85; Heuer+ 18]

All-or-nothing penalty Lawler Gadget Models the standard models 1 if U e, hypergraph cut function we(U)= 2 { ;} 1 (0 otherwise [Lawler 73; Ihler+ 93; Yin+ 17] AAAFM3icjVRLb9NAEHbaAMW8Wjhy2VIVFRQiu4Aoh0oVoIpKVBSStEh1FNb2OFll/dDuukm62t/EiT/CDXFDXLlzZNdxaZz2wEp2xjPffLPzip9RwoXjfK8tLNavXL22dN2+cfPW7TvLK3cPeZqzADpBSlP2ycccKEmgI4ig8CljgGOfwpE/fG3sRyfAOEmTtphk0I1xPyERCbDQqt4y9QSMhR/JkerBRucR2ka250OfJDLQtFzZLnqICpBEJEIKdZBHEuRJaCAP4kxMOAjkKc+znXNkKgbARoSDsj1IwpKrt7zmNJ3ioIuCWwprVnkOeiuLX7wwDfIYEhFQzPmx62SiKzETJKCGPOeQ4WCI+3CsxQTHwLuyKItC61oToihl+kkEKrSzHpqG4YmqqmLOJ7GvnWMsBnzeZpSX2gT2c4rZuMrmp+lQW/jcRUW01ZUkyXIBSTC9Z5RTJFJkWoRCwiAQdIKqEYanT/oMZ4MpmSDDU0p8htnE5JGOeIMPcAa8EWAaNCIiNK5ImYKQ7TwS8BFCJRmEq1vOqk817SxCN6zPABIlix+DGQ2IgDmMT3NQ0rxnETPJtd2uNOU2mVXuf9Bu4cRUgkECoyCNY6wHw4twTOgkhAjnVCjp8ehMrnjzqCi9vT4bi+tUIdx2mi8bQUx0TF0fqvuv+cWYR5ohxkPAeimEAN16zewlYmyYdqa+kj8+1nP3vPsPW0yu7b0BPW8MWnoWUrqr85FTEq7k+/13SiYmQEyUjJUsSt0CcRlYK8J5F790KWMYh1buc729uVnK/wrQ2t039Tjjb7uV0kl/rCSn5zEMeOot9zTSVADTbIDV+U0/7ynb9mZbE+SmIeX/g/lQVftJSs/t5kMT6BV35xf6onC42XSfNjc/PFvbeVUu+5J133pgbViu9cLasd5aB1bHCqxv1p9arbZQ/1r/Uf9Z/zWFLtRKn3tW5dR//wWhPtb1

How can we model other cardinality-based splitting functions? 35 Other cardinality-based0 functionsif U e, are also used in other All-or-nothing we(U)= 2 { ;} hypergraph clustering (applications.1 otherwise Linear penalty w (U) = min U , e U e {| | | \ |} Used for consensus clustering. Quadratic penalty w (U)= U e U e | |·| \ | Discount cut w (U) = min U ↵ , e U ↵ e {| | | \ | } [Yaros-Imielinski 13]

1 1 U e U L-M submodular we(U)= 2 + 2 min 1, ↵| e| , |↵\e | [Li-Milenkovic 18] AAAJnnicfVbtc9M2GA9lY8R7g/FxX8Ta7BgkaVLogN1x1904btzRA9YGuKuyTrYfx7pItifJJEHVH8pfsz2yHUjSgj/EsvQ8v9/zroSF4NoMBu8vbV3+4ssrX11tB19/8+1331+7/sMrnZcqglGUi1y9CZkGwTMYGW4EvCkUMBkKeB1O//Dnr9+C0jzPjs2igLFkk4wnPGIGt06vbykawoRn1rCwFEw5K4QLCPldiF6uelluUp5NyM9kh76dncKt0S/kEWlUIuTVjgzwlBqYG8sTsjMilGeEWugSCrIwCw2Guh1HKCXDD5IIC2rGNTgKWdwg7RAUOrkn5RgNeIYOMUUKyJgwi00DZMVBzkZnXXIGaBCLplownZLRGaGO7KwgvSxZrNDd6BNgZ14linOzCbQK8pjrKC8zQ6LSfNqYvykTRcrO29Qc1Katetk7JLoMZR774G8CJ4pFdujsniN31r4qcytiKiDBlR12GwEqMN2GYBqUXzhLqEhEniuyNKEWgEYAX9Wx2wCAVfs/gH0eq4EiDl1UfJKaxt+gSvKywoLTa9uD/qB6yPnFsFlst5rnxen1KzdonEelhMxEaJE+GQ4KM7ZMYVYFuICWGgo0l03gBJcZk6DHtuoORzq4E5ME7Upyn0C/G6yqII5iizWUpbHz9d0wz6d4ol0QrHOa5MHY8qwoDWRRTZmUgpic+KYjMVcQGbEg67yGT991Mx6BD3yXSS2ZSbsF93Z2zfRdb6JYkXYlm0IEQnzcqq3y6oKHiqmFdyGf6a7P2URhoca6WzBjQGUa9Y3i865OWQG6m3DTjZiI/HfsdQqRG8nUVH8KtS/BMDysIifA2OMyMfAXxM4qiG8+GNwMBfKuSmB7TxRA5mz18jKzlBvYkAlFCc763xWJoENSYwr92+4uzoq+NogN8yhl2QT6US53/y1B++mld4e/7j/ce7irQXKsvxCbXPZm3KQ970SPZ70QRyGoSu7u/e36FVAfUIaj0scnoBORh0xgIxnq1Q4g06WCgzgXWAAHOCijPIZHVIFg86VuNb/Wa+B4OLY+cb4A1rL84viIZT64CjKYoQOSYTfQhEkuFjEkrBS+s3SyXK8XiU58Vbigs0qmMYMQPxr0H3YjnAIGo80EljwSmLlOPMS6k4hNMzP3UAe1stW3T7DX9sdu06nHgE2m4Gghw1w8QZdsjaKdfX74zNnMU0jurHSWo7n0CMxFwrgRb6qEjUrD4RWOyhDTaUqf0osJNhmOnhz6kCwJjodr4bPh3FktPpJ44VrbPkVJH4NqfLmPpv7zdCPq8UQAj9JeHfuLTjDRGsfL+nyQHmY1y/KITyQy0bqqPJylobS03nfnykI+w8s7vkijOXDrFLfpPGTqBIuPpmE+t/St/+0ENFWlAJJCNYcH/fv7hSEdcpwCYZEpmSCoFtApTohBf28f5h2yfDr+tjMsi4CEYGbYv16WIBnRVRiDmqqDF1gF0Bv0hyA7S+2jNFcYHf+/Ic8IFhWpLinNY/AaK35tD90HELwA7n4WpL5RKhTno4DXyHDz0ji/eLXXH6J5L+9tHzxoLpSrrR9bP7VutYat+62D1p+tF61RK9p6v/Xf5auX223SftI+bD+vRbcuNTo3WmtP+83/1uxaAQ== · b | |c b | |c n o Used for hypergraph spectral clustering.

No graph reduction strategy has been designed for these. Can we develop one?

36 We made a new gadget for C-B splitting functions.

AAAHvnicfVVbb9s2FFa6Le68S9PtcS/sAg9JIDt2ijTJgADGWhQr0GLZnLQFTCOjpCOJMElpJJU4JfRD97afskNf2sjJRkASRZ7vO3cyKgU3tt//e+PBZ59/sdl6+GX7q6+/+fbR1uPv3pqi0jFcxIUo9PuIGRBcwYXlVsD7UgOTkYB30fS53393BdrwQp3bmxImkmWKpzxmFpcutz7QCDKuHBM8U3t1m1qYWfe8+0tNfqK5KVkMrt87OIxlTejV9SXsXOySU5LuUMnVDhXIbckFodpPQrJcAEIjFk+NYCb/uLu722tTUMlK1+XWdr/Xnw9ydzJYTraD5Ti7fLwZ0qSIKwnKxkhtxoN+aSeOactjAWh8ZQAtnrIMxjhVTIKZuHmQatLBlYSkhcZHWTJfbd+GII9mNw0WZ1lUCaZnzdWoKKa4Y+oGfuxzYFQlI9CQhLoSkKBxIis0t7k8gDXxyqbHE8dVWVlQ8cLAtBLEFsRniiRcQ2zFDWlaafn0Q6h4DKlmccikkczmYcm9V6FkU4hBiIW9XlTwSDN9450rrk3o05LpolKJCUtmLWhlEGU1n4UmZyWYMOU2jJmI/X/iMaUorGR6av6LtSfBMtycx1SAdedVauEPSGqHkXhy3H8SCdR7W8LmkGkAVbv5x8tc59zCmkwkKqidf9+SaHdIbm1pft7fx2rtGYvcMItzpjLoxYXc/6sC48vb7A+eHZ4cnOwbkBxLMMKil91rzEbXO9Hlqhthr4Ceyz092l582tSHkWEv+fi0aSaKiAmKv9TDhqBMpWGYFAJLY4idFBcJnFINgs1W2AKNb5bX+HwwcT5JPtmNjJ6dj5jywdWg4BodkAy7hKZMcnGTQMoqYWtHTbqaNwvCpL4C6nbntjKDGYTktN87CWPsVIvRZgKbARXYmUk9RdNJ5KbKzjzVcAF2Zm+MXXg4qdedegHYfhpGNzIqxEt0yS1YTO1+e/O6dsqrkLx2snYczaUjsPcJ40KyDomWkKUODxhVEabTVj6l9ytY1zB6+caHZKXgfNAIn4tmtTPikxIvvEC7VyjpY8BEmbP6k6l/vlqLepIJ4HHeXcT+vh1MtMGDp3lySE9zO8tyxDOJmuiiqjydo5F0dLFe3ykL+RpP9+Q+xHKjbqrYo7OI6TEWH82jYubolX932jT3JxTJgWe5xXP36LC0pEPOcyAsthUTBGFtOsUTwl8AMOuQ1eiQF3gzMRUDicBeY/96WYLKiJmHsb1Q1WkTMifo9nsDkJ0VepQXGqPDVUYKRbCoiIDUEsMT8Ihbfm0P6o8keDU8/V8SPfdkzoJB8PfLYP02uTt5e9AbPOsNfj/YHh4vb5qHwQ/Bj8FOMAiOgmHwa3AWXARx8M/G5sajja3WsJW2ZKtYiD7YWGK+DxqjNfsXpVq3vA== C-B w (U)=f (min( U , e U )). e | | | \ | b This gadget models min(|U|, |e\U|, b).

Theorem [Veldt-Benson-Kleinberg 20a]. Nonnegative linear combinations of the C-B gadget can model any submodular cardinality-based splitting function. (F is submodular on X if F(A B)+F(A B) F(A)+F(B) for any A, B X.) AAAHkXicfVXfb9s2EFa7re60H02Xx72wiw0knezYKbIkAwq4WRGsWItls9MGiIyMkk4WYZJSSaqxK+g/3D+wf2Ov28OOkrNYbjY9SNTxvvt4x4/HIONMm37/jzt3P/r4k3ut+5+6n33+xZcPNh5+9VqnuQrhLEx5qs4DqoEzCWeGGQ7nmQIqAg5vgtkPdv7NO1CapXJsFhlMBJ1KFrOQGjRdbsTb7ZM2YZroPBBplHOqSCpJ+xyNMWmfbD8jfkgzcrxDviX1X179+RzeWkNtP95pkzhVhMoFaT/zyDHxMaAGg07n7d7O5cZWv9evHvLhYLAcbDnL5/Ty4b1NP0rDXIA0IadaXwz6mZkUVBkWcihdP9eQ0XBGp3CBQ0kF6ElRFaQkHbRE1YLiVBpSWd1VCMZRdNGIUhga2PTnTWuQpjOc0aXrNjlNfDgpmMxyAzKsKeOcE5MSW2cSMQWh4QvS5DVs9t6TLIRY0dCjQgtqEi9jdp2emb3vThXNEk/QGYTA+Y2pXpWFcxYoqhY2hfRKewFGnqo0l5H2MmoMKKkRbxSbezqhGWgvZsYLKQ/tf2QxGU+NoGqm/ytqT4ChOFlVjoMpxnls4FeIykJB9Oiw/yjgyLvqYRKYKgBZFtXH+lwlzMCaT8BzKAv7XvFwOyQxJtPf7+4amPe0wdgwDxMqp9ALU7H7NgdtBat3B9/tH+0d7WoQDHUdoIxF94qZpGuT6DLZDVD9oCq/Jwdb9cf1bUEpng5bH9ef8jSgHCVsfAsbgtS5gmGUchTAEM9GmEbw1FfA6fwam+LimyK6GA8mhd04K4DGLp+OR1Ta4iqQcIUJCCqjwo+pYHwRQUxzbsrC1/H1uCkSHVtVlG5nlUzjDkL0tN878kLBkBRlwVHySGDmOrYhmklibF+auQ01rMGFfnyBZ21/Uq4n9RzwkCkYLUSQ8hNMqaij6LL4+dXLspCWQrCyEGXBcLn+CMxtzmiI1iHBErLksIAR9gZsVrnd0tsJ1hlGJ69sSa4JxoNG+YpgXhaa35BY5xpdvEBPWwPKs4SWN0v97cVa1aMpBxYm3br2t83gRmtsL83+IGyY1V0WIzYVyOTXqrLhCj8QhV/byw9kIV5iv45uQywnyibFY38eUHWB4vOTIJ0X/jv77rh+onIOJAE2TQx214P9zJAOGSdAaGhyygnCXH+GHaLf29uHeYdcPx3yHO8aKkMgAZgrPL/WF9t5RHRVRrem6riEVAG6/d4AROcaPUpShdVhcmpvDxQV4RAbolkEFrGS19ag/DcIXgBP/jeIqjKpopS2CniNDNYvjQ8Hr/d6A1zeL3tbw8PlhXLf+dr5xtl2Bs6BM3R+dE6dMyd0fnf+dP5y/m5tto5aw9Zx7Xr3zhKz6TSe1k//AIESoIo= \ [  ✓ All the / applications of hypergraph cuts map to a submodular cardinality-based splitting function.

See also Graph Cuts for Minimizing Robust Higher Order Potentials, Kohli et al., 2008. 37 Submodularity is key to efficient algorithms.

s Cardinality-based splitting functions. t

AAAHiHicfVVtb9s2EFa7re60l6bbx31hlxpIC9mxUqRJBgQwtqBYgRbLZqctYBoZJZ0swiSlkVRsT9Af2q/Z1+7X7Gg7reVkIyCJIu95jvfckYwKwY3t9d7fufvJp5/da93/3P/iy6++frDz8Js3Ji91DBdxLnL9LmIGBFdwYbkV8K7QwGQk4G00/cnNv70CbXiuhnZRwFiyieIpj5nFocuds8f0anYJexdPyClJ96jkao8KRFhyQah2nYCsB4DQiMVTI5jJPsw+efLYv9zZ7XV7y0ZudsJ1Z9dbt/PLh/cCmuRxKUHZGOnMKOwVdlwxbXksoPZpaaBAX2wCI+wqJsGMq2W4NWnjSELSXOOjLFmO+psQ5NFs0WCpLItKwfS8ORrl+RRnTN3Aj5yaRpUyAg1JoEsBCS5OTHLNbSYPYMu8tOnxuOKqKC2oeLXAtBTE5sRpThKuIbZiQZqrtHz6Z6B4DKlmccCkkcxmQcFdVIFkU4hBiNV6nangkWZ64YLLZyZwqZjovFSJCQpmLWhlEGU1nwcmYwWYIOU2iJmI3X/iMIXIrWR6av6LtSvBMpxcairAVsMytfAbJHWFSjw67j2KBPrdtLAZTDSAqqvlx9nMMm5hyyYSJdSVe29Y+G2SWVuYH/b3Lcy7xiI3zOOMqQl041zu/1GCcYVq9sPnhycHJ/sGJMeyi7B8ZWeG2ei4IDpcdSKsetBLu2dHu6uPT52MDHeF08enE5FHTFD8pQ7WB2VKDf0kF1gafdwTcZ7AKdUg2Pwam+Pim+U1GobjyiXJJbuR0fPhgCknrgYFMwxAMpVUNGWSi0UCKSuFrStq0ut+syBM6iqg9tubzgxmEJLTXvckiHF3WlSbCdwM6MDOTeoomkEiN1V27qj6K3Blno5wFx6O6+2gzgC3n4bBQka5eIEhVSsWU1e/vH5VV8q5kLyuZF1xXC4dgL3NGAeSbUi0hqx9OMCgjDCdtnQpvd3BtofBi9dOkmsHw7AhXxXN68qIj06c8QpdvURLpwETRcbqj0v9/eWW6slEAI+zzkr722Yw0QYPnubJIR3NZpblgE8keqKrqnJ0FY1kRVfj9Y2ykK/wnE5uQ6wn6qaLp3QeMT3C4qNZlM8reuXebZ9m7oQiGfBJZvHcPTosLGmTYQaExbZkgiDMp1M8IXrdg0OYt8l1a5MzvGOYioFEYGe4f50tQWfELGX0V67aPiFLgk6vG4JsX6MHWa5RHa4mJFcEi4oISC0xPAGH2IhrN6w/kODV8Ox/SfQykiULilDj/RJu3yY3O28OuuHzbvjrwW7/eH3T3Pe+87739rzQO/L63s/euXfhxd5f3t/ee++flt/qtY5aJyvTu3fWmG+9Rmv9+C/m06TR w (U)=f (min( U , e U )) e | | | \ | SAAAHLHicfVVdj9w0FE0LDCV8tfSRF5ftSKjKfG217C5SpZGoKiq1YmFm20qbUXGSm8Qa2wm2083Uyk/gFf4Fv4YXhHjld3A9mWUnswuWZuLY95zje31iRyVn2ozHf9y4+c677/Xev/WB/+FHH3/y6e07n73QRaViOI0LXqhXEdXAmYRTwwyHV6UCKiIOL6PlN27+5RtQmhVyblYlLATNJEtZTA0Oze7P7r++vTcejteNXO1MNp09b9NOXt/p3Q2TIq4ESBNzqvXZZFyahaXKsJhD44eVhpLGS5rBGXYlFaAXdr3WhvRxJCFpofAnDVmP+tsQ5FF01WGxhkYVp6rujkZFscQZ3fh+V9OkRwvLZFkZkHErmVacmIK4EpCEKYgNX5GurmHLt4FkMaSKxgEVWlCTByVz6wzM8u0gU7TMA0GXEAPnl0Ptqhycs0hRtXIpFOc6iJA5U0UlEx2U1BhQUiPeKFYHOqcl6CBlJogpj9174jAlL4ygaqn/i3UowFCcXFeOg7HzKjXwAySNVZDcOxrfizjqbkeYHDIFIBu7friY85wZ2ImJeAWNdf9bEX6f5MaU+uvRyEA91Aa5oY5zKjMYxoUY/VSBdl7So8lXB8f7xyMNgqHlInSYGJwzkw9cEgMmBxEaE9Q67uHhXvvwQ1dQisZ19fHDjBcR5SG+hg42BakrBdOk4GiAKdo2LhJ4FCrgtL7AFrj4ronO5pOFdRvnDNDZ5ZP5jEpXXAUSzjEBQWViw5QKxlcJpLTiprGhTi/6XZPo1Lmi8fvbYhp3EJJH4+FxEAuGomgLjpZHAVPr1FF0k0TuUJraUU1bsNUPzvBbO1g0u0k9BvzIFMxWIir4E0zJtiy6sd89f9ZY6SQEa6xoLMPlhjMw1wXjQLILiTaQjYYDzKoIt9NUbkuvF9hVmD157kpyITCfdMpno7qxml+KuOAWbZ9ipKsB5WVOm8ul/vh0p+pJxoHF+aCt/XUzuNEaj5fu+SAczfYuixnLBCqFrascnQ0jYcN2vLliC/EMj9LkOsRmoulKPAjriKozNF+YR0Vtwzfuv++Huao4kBxYlhs8XQ8PSkP6ZJ4DobGpKCcI88MlnhDj4f4B1H1y0frkMV4DVMZAIjDn+P26WIJiRK/L6LdSfZ+QNcFgPJyA6F+gZ3mhsDpMZqSQBE1FOKSGaJaAQ2zltTdp/iXBC+Dh/5KodSZrlsZVAa+Rye6lcbXzYn84weV9v783PdpcKLe8z70vvC+9iXfoTb1vvRPv1Iu9zPvZ+8X7tfdb7/fen72/2tCbNzaYu16n9f7+BwqfheM= cut (S)=f (2) + f (1) AAAHdnicfVVtb9s2EFa7Lem0t3T9OGBgFxhLUtuxU2RJBgQwsKJosRbLZqctYBkZJZ0kwiSlklRjl9CP2q8Z9m37F/u4o+UslpNNgC3qeM89vLuHZFhwpk2v98edux98+NHG5r2P/U8+/ezzL7buf/lK56WK4DzKea7ehFQDZxLODTMc3hQKqAg5vA6nP7j51+9AaZbLkZkXMBE0lSxhETVoutj6MQghZdLC23Jh2av8wMDM2Kg01YUNBDVZRLl9VlU7w11ySpKdg13yCF/9XT8AGa8gL7a2e93e4iE3B/3lYNtbPmcX9zceBHEelQKkiTjVetzvFWZiqTIs4oBLKTUUNJrSFMY4lFSAnthF1hVpoSUmSa7wJw1ZWP1VCMZRdN6IYg0NS07VrGkN83yKM7ry/SanSY4nlsmiNCCjmjIpOTE5ccUkMVMQGT4nTV7Dpu/bkkWQKBq1qdCuiu2CuXW2zfR9J1W0yNqCTiECzq9N9aocnLNQUTV3KeSXuh1i5FTlpYx1u6DGgJIa8UaxWVtntADdTphpY6ci9x07TMFzI6ia6v+K2hVgKE4uKsfB2FGZGPgF4soqiB8e9x6GHHlXPUwGqQKQlV28nM9lxgys+YS8hMq6/xUPv0UyYwr9/f4+KqyrDcaGWZRRmUI3ysX+2xK0U5Le7393eHJwsq9BMBRviPoSnUtmso5LosNkJ0SJg1r4PT7arl9+4ApKcQu4+vhByvOQ8gA/AwcbgNSlgkGccxTAADdAlMdwGijgdHaFzXHxTRGNR/2JdY1zAmh0+Ww0pNIVV4GES0xAUNwOQUIF4/MYElpyU9lAJ1fjpkh04lRR+a1VMo0dhPi01z1pR4IhKcqCo+SRwMx04kI0k8TYgTQzF2pQg63eG+NeO5xU60k9AdxkCoZzEeb8KaZk6yi6sj+9fFFZ6SgEq6yoLMPlBkMwtzmjIV6HhEvIksMBhmWI7TSla+ntBOsMw6cvXUmuCEb9RvlsOKus5tckzrlG2+dVfVRRXmS0ul7qr8/Xqh6nHFiUdera3zaDjdZ4vDTPB+HCrHZZDFkqkCmoVeXC2SAUNqjt1Q1ZiBd4KMe3IZYTVZNiL5iFVI1RfEEW5jMbvHP/LT/IVMmBZMDSzODpenRYGNIiowwIjUxJOUGYH0zxhOh1Dw5h1iJXT4s8wQuFyghICOYS96/zJUhG9KKMfk3V8glZBOj0un0QrSv0MMsVVofJlOSSoKgIh8QQzWJwiJW8tvvVv0HwAnj8v0HUIpNFlMpVAa+R/vqlcXPw6qDbx+X9fLA9OF5eKPe8r7xvvB2v7x15A++Zd+ade5H3m/e796f318bfm19vtja/rV3v3lliHniNZ7P3DzhnnvQ= H Theorem [Veldt-Benson-Kleinberg 20a]. The hypergraph min s-t cut problem with a cardinality-based splitting function is graph-reducible (via gadgets) if and only if the splitting function is submodular.

What happens when the splitting function isn’t submodular? Can we use some other algorithm?

38 Hardness and open questions for 4-node case.

wAAAB+XicbVDLSsNAFL2pr1pfUZduBovgQkrSCrosunFZwT6gDWEynbRDJ5MwM2kpoX/ixoUibv0Td/6N0zYLbT2XC4dz7mXunCDhTGnH+bYKG5tb2zvF3dLe/sHhkX180lJxKgltkpjHshNgRTkTtKmZ5rSTSIqjgNN2MLqf++0xlYrF4klPE+pFeCBYyAjWRvJte+JXUe9qWWjiu75ddirOAmiduDkpQ46Gb3/1+jFJIyo04Viprusk2suw1IxwOiv1UkUTTEZ4QLuGChxR5WWLy2fowih9FMbStNBoof7eyHCk1DQKzGSE9VCtenPxP6+b6vDWy5hIUk0FWT4UphzpGM1jQH0mKdF8aggmkplbERliiYk2YZVMCO7ql9dJq1pxa5Xq43W5fpfHUYQzOIdLcOEG6vAADWgCgTE8wyu8WZn1Yr1bH8vRgpXvnMIfWJ8/rp6Rvg== 2 w1 w1 = 1 s v v

wAAAB+XicbVDLSsNAFL2pr1pfUZduBovgQkrSCrosunFZwT6gDWEynbRDJ5MwM2kpoX/ixoUibv0Td/6N0zYLbT2XC4dz7mXunCDhTGnH+bYKG5tb2zvF3dLe/sHhkX180lJxKgltkpjHshNgRTkTtKmZ5rSTSIqjgNN2MLqf++0xlYrF4klPE+pFeCBYyAjWRvJte+JXUe9qWWjiu75ddirOAmiduDkpQ46Gb3/1+jFJIyo04Viprusk2suw1IxwOiv1UkUTTEZ4QLuGChxR5WWLy2fowih9FMbStNBoof7eyHCk1DQKzGSE9VCtenPxP6+b6vDWy5hIUk0FWT4UphzpGM1jQH0mKdF8aggmkplbERliiYk2YZVMCO7ql9dJq1pxa5Xq43W5fpfHUYQzOIdLcOEG6vAADWgCgTE8wyu8WZn1Yr1bH8vRgpXvnMIfWJ8/rp6Rvg== 2 w1 u Graph u t NP-hard ?? Reducible

wAAAB8HicbVA9SwNBEJ3zM8avqKXNYhCswl0UtLAI2lhGMB+SHMfeZi9Zsrt37O4ZwpFfYWOhiK0/x85/4ya5QhMfDDzem2FmXphwpo3rfjsrq2vrG5uFreL2zu7efungsKnjVBHaIDGPVTvEmnImacMww2k7URSLkNNWOLyd+q0nqjSL5YMZJ9QXuC9ZxAg2VnocBVV0jUaBF5TKbsWdAS0TLydlyFEPSl/dXkxSQaUhHGvd8dzE+BlWhhFOJ8VuqmmCyRD3acdSiQXVfjY7eIJOrdJDUaxsSYNm6u+JDAutxyK0nQKbgV70puJ/Xic10ZWfMZmkhkoyXxSlHJkYTb9HPaYoMXxsCSaK2VsRGWCFibEZFW0I3uLLy6RZrXjnler9Rbl2k8dRgGM4gTPw4BJqcAd1aAABAc/wCm+Ocl6cd+dj3rri5DNH8AfO5w8+sI9j 2

Reduction from max cut 0 1 2 w2

Not graph reducible Graph Not graph reducible s reducible s Complexity unknown s NP-hard! t t t

w2 = 0.5 solution w2 = 1.5 solution w2 = 2.5 solution

39 Local Hypergraph Clustering Minimizing Localized Ratio Cut Objectives in Hypergraphs Veldt, Benson, Kleinberg KDD 2020

The goal of local graph clustering is to find a good cluster S near a seed set R.

Examples. Localize left atrial cavity in full body MRI [Veldt+ 19]. Finding a specific person’s social communities [Fountoulakis+ 20]. Determine related products from co-purchasing data.

Nate Veldt 40 HyperLocal does localized hypergraph clustering by repeated hypergraph s-t cuts. We introduce a new �� j dj Hypergraph Local R t Conductance objective. Hypergraph cut r �dr function s cut (S) minimize HLCR,"(S)= H S V vol (S R) "vol (S R¯) ⇢ H \ H \ Encourage overlap Discourage overlap with seed set. outside seed set cut (S)=cut from C-B splitting function H Theorem [Veldt-Benson-Kleinberg 2020b] vol (X)= 1=sum of hypergraph degrees in X If cut ( S ) is any cardinality-based submodular hypergraphH cut function, the HLC minimize HLCR,"(S)= H AAAInXicfVVtb9s2EHayrfW0l7brx30Yu9RDW9iOnS5rsiJAsL6gBdIti9M2QGh4lHSyCJOSSlKJU4L/c/szw46S09hOMgK2+HL3POTD411YCK5Nr/fPyupnn39x42bzy+Crr7/59tbtO9+903mpIngb5SJXRyHTIHgGbw03Ao4KBUyGAt6Hk2d+/f0JKM3z7NCcFTCUbJzxhEfM4NTozsoHKnnGJf8IIzsgVJehBkPeOfqUPg2ogakJE/tq75kb2YM2PWEKCs1FnrkHg4fkpx0S0ESxyFaWNiqNG1HJTBoxYV95G2dJvXaSi6U1GrGCHDwkHTKHe601qcxpyJQ9cA8dofR4S8phcC2z392nRZKoXJJnnd+IRlmN4dmYJGUWeRUqrI0LrEvcRzWWLuXIckJ5Ro5cPYJq9MKRPjknw3mSJyRFsdVYsSIlMYwVgCZoef/ovhvdXut1e1Ujlzv9WWetMWv7ozs37tI4j0oJmYkE0/q43yvM0DJleCTABbTUULBowsZwjN2MSdBDWwWHIy2ciUmSK/xlhlSzwbwL4ih2toBiDQtLwdR0cTbM8wmuaBcEi5wm2RpanhWlgSyqKZNSEJMTH3Mk5goiI87IIq/hk4/tjEfgI6jNpPaStwvu99k2k4+dSr62ZBOIQIiLqXpX3l3wUDF15o+Qn+p2iMhjlZdZrNsFMwZUptHfKD5t65QVoNsJN2281siPY+9TiNxIpib6OtSuBMNwsVJOgLGHZWLgAGJnFcT3tnr3QoG88xYmra48c7b6eJvTlBtYsglFCc76/zmLoEVSYwr96/o6xlNXG8SGaZSybAzdKJfrH0rQPmz1ev+Xze2N7XUNkuMbD/FJy84pN2nHH6LDs06ImQBUZff4yVr9CagXlGGm8PoEdCzykAmKQ+rddiHTpYLdOBcYALuYJ6I8hh2qQLDpuW+Om18MouPD/tD6i/MBsHDL+4cDlnlxFWRwigeQLIstTZjk4iyGhJXCOEt1ct5fDBKd+KhwQWueTOMNQrzT6263I0xfBtVmAkMeCcxUJx5i8ZCITTMz9VC7tbPVj47xrW0O3fKhngM+MgWDMxnm4iUeydYo2tk/3uw5m3kKyZ2VznLcLh2AucoYJ+Jll3DmMuPwDgPMuJi4S3+lVxMsMwxevvGSnBMc9hfks+HUWS0uSLxx7W1fo6XXgIkiZe5iq3+9XlI9HgvgUdqptb9qBS9aY3pZzA/Sw8zfshzwsUQmWkeVh7M0lJbW8+5SWMg9rF3xVR6zBbdI8YhOsSIcY/DRNMynlp74/1ZAU1UKICnwcWowuz7ZLAxpkcMUCItMyQRBt4BOMEP0uhubMG2R89Yiz7HusiwCEoI5xffrbQmSEV3JGNRUrYCQCqDT6/ZBts69B2muUB1fYbCcYVARAYkhmsfgPebOtdZ3n0CwADz+XxBVnaRCcV4FLCP95aJxufNuo9vH7f3589ru1qygNBvfN35sPGj0G08au41Xjf3G20a08vfKv6s3V5vNH5ovmnvN32vT1ZWZz93GQmu+/w8XFAmq i X e E S V vol (S objectiveR) "vol can( Sbe minimizedR¯) in polynomial time by solvingX2 a boundedX2 number of ⇢ H \hypergraph H minimum\ s-t cut problems. cut (S)=cut from C-B splitting function 41 H vol (X)= 1=sum of hypergraph degrees in X H

AAAInXicfVVtb9s2EHayrfW0l7brx30Yu9RDW9iOnS5rsiJAsL6gBdIti9M2QGh4lHSyCJOSSlKJU4L/c/szw46S09hOMgK2+HL3POTD411YCK5Nr/fPyupnn39x42bzy+Crr7/59tbtO9+903mpIngb5SJXRyHTIHgGbw03Ao4KBUyGAt6Hk2d+/f0JKM3z7NCcFTCUbJzxhEfM4NTozsoHKnnGJf8IIzsgVJehBkPeOfqUPg2ogakJE/tq75kb2YM2PWEKCs1FnrkHg4fkpx0S0ESxyFaWNiqNG1HJTBoxYV95G2dJvXaSi6U1GrGCHDwkHTKHe601qcxpyJQ9cA8dofR4S8phcC2z392nRZKoXJJnnd+IRlmN4dmYJGUWeRUqrI0LrEvcRzWWLuXIckJ5Ro5cPYJq9MKRPjknw3mSJyRFsdVYsSIlMYwVgCZoef/ovhvdXut1e1Ujlzv9WWetMWv7ozs37tI4j0oJmYkE0/q43yvM0DJleCTABbTUULBowsZwjN2MSdBDWwWHIy2ciUmSK/xlhlSzwbwL4ih2toBiDQtLwdR0cTbM8wmuaBcEi5wm2RpanhWlgSyqKZNSEJMTH3Mk5goiI87IIq/hk4/tjEfgI6jNpPaStwvu99k2k4+dSr62ZBOIQIiLqXpX3l3wUDF15o+Qn+p2iMhjlZdZrNsFMwZUptHfKD5t65QVoNsJN2281siPY+9TiNxIpib6OtSuBMNwsVJOgLGHZWLgAGJnFcT3tnr3QoG88xYmra48c7b6eJvTlBtYsglFCc76/zmLoEVSYwr96/o6xlNXG8SGaZSybAzdKJfrH0rQPmz1ev+Xze2N7XUNkuMbD/FJy84pN2nHH6LDs06ImQBUZff4yVr9CagXlGGm8PoEdCzykAmKQ+rddiHTpYLdOBcYALuYJ6I8hh2qQLDpuW+Om18MouPD/tD6i/MBsHDL+4cDlnlxFWRwigeQLIstTZjk4iyGhJXCOEt1ct5fDBKd+KhwQWueTOMNQrzT6263I0xfBtVmAkMeCcxUJx5i8ZCITTMz9VC7tbPVj47xrW0O3fKhngM+MgWDMxnm4iUeydYo2tk/3uw5m3kKyZ2VznLcLh2AucoYJ+Jll3DmMuPwDgPMuJi4S3+lVxMsMwxevvGSnBMc9hfks+HUWS0uSLxx7W1fo6XXgIkiZe5iq3+9XlI9HgvgUdqptb9qBS9aY3pZzA/Sw8zfshzwsUQmWkeVh7M0lJbW8+5SWMg9rF3xVR6zBbdI8YhOsSIcY/DRNMynlp74/1ZAU1UKICnwcWowuz7ZLAxpkcMUCItMyQRBt4BOMEP0uhubMG2R89Yiz7HusiwCEoI5xffrbQmSEV3JGNRUrYCQCqDT6/ZBts69B2muUB1fYbCcYVARAYkhmsfgPebOtdZ3n0CwADz+XxBVnaRCcV4FLCP95aJxufNuo9vH7f3589ru1qygNBvfN35sPGj0G08au41Xjf3G20a08vfKv6s3V5vNH5ovmnvN32vT1ZWZz93GQmu+/w8XFAmq i X e E X2 X2 Detecting Amazon product categories from review data Runtime and accuracy for detecting products of the same category from seed nodes

Cluster T time (s) HyperLocal Baseline1 Baseline2 | | Amazon Fashion 31 3.5 0.83 0.77 0.6 All Beauty 85 30.8 0.69 0.60 0.28 Appliances 48 9.8 0.82 0.73 0.56 Gift Cards 148 6.5 0.86 0.75 0.71 Magazine Subscriptions 157 14.5 0.87 0.72 0.56 Luxury Beauty 1581 261 0.33 0.31 0.17 Software 802 341 0.74 0.52 0.24 Industrial & Scientific 5334 503 0.55 0.49 0.15 Prime Pantry 4970 406 0.96 0.73 0.36

AAAKEnicfVbrbts2FHabrWu8S9v15/6wSx10he1YtuXLgALpBV0LJFg2uxcgCjpKOrIIU5eRVGNH1VvsafZv2N+9wF5gz7FDSY4tN5uCiBR5vu87PDzksR1zJlWn8/e16zuffHrjs5u79c+/+PKrW7fvfP1aRolw4JUT8Ui8takEzkJ4pZji8DYWQAObwxt7/lTPv3kPQrIonKplDGcBnYXMYw5VOPTuzvV/CLFsmLEwVdROOBVZyjknjv7L6gQfS0WxSDiUH7BQtpc+5YlUIDKyf//D9MP9faJYAOSB/I7sE/IChcRR5FCOX09K74yNfpdYVkEXMFdzk/zrcUAvopA8p9JH59C+p0G9trl/qdtpj3ooSjrt4TBvBmTF9RjdfgI0UUucGJkaidab0MG4gA46edMdkTU4xmjT0AGJU/0RvsZV7Khbyvbyxlzr/sA8RZ5S4WqokWMHWy4PSqxZNMZa95jO6AVGhEwSWzqCxXpbciJzmNO1NWaDalhSdbfdOEoWiViuI2CYIx297sDYxPfK6OWR7bSN4Qo+iTx1TgXo2HU0ea9fAQ77BdAshLv9FfBl6GIqCIabbe2TicMgVDrB0Mzs9fq66fQqazDNgqo/LnwwV1QnQifRCQ2V0Cvoj4d6o/qdQQU+HlR2oje4TCY7UioKyly1IHQvc7r+7vZep93JH/Jxxyg7e7XyOXl358Zdy42cJMDlOJxKeWp0YnWWUqGYwyGrW4mEmDpzOoNT7IY0AHmW5icyIw0ccYkXCfwPFclH65sQ5BF0WWFZObuojtpRNMcZmdXrVU3ljc5SFsaJgtApJL2EExURfdCJywQ4ii9JVVex+UUzZA54gjpNGsiAKr8ZM+1nU80vWjNBY78Z0Dk4wPl6qPBKwzmzBRVLvYToXDZtZJ6JKAld2YypwlshlIjHlFg0pU9jkE2PqSbeBo7+djUm5pEKqJjL/2JtB6AoTuaR46DSaeIp+BncLBXg3ht17tkcdTctlA8zARBmad5om3OfKdiysXkCWarfGxb1BvGViuX3BweYZ22pkBsWjk/DGbSdKDj4NQGZH80DY2COu+MDCQHDi9XGezRonTPlt/QiWixs2Xj9gsjtesO9oqlbOqAUr2cdn7o145FNuYWfloYdQigTAYduxDEBDvFydiIXHlkCOF2ssBE6X02i06lxluqN0wlQ2eWT6YSGOrgCQjjHBQQUT4Pl0YDxpQseTbjKUkt6q341SaSnsyKrNzbFJO4guI/wADadgKEopgXHlEcBtZCepqguErmtUC001WEBTuXDUzxr5lm2vahngIdMwGQZ2BF/jktKCxaZpT8eH2VpqCUClqVBljJ015qAusoYB9xtiF1CSg0N0LctVstEb+nVAtsKk+fHOiQrgalRCV9qL7JU8rWINi7Q6Uu01DGgPPZptnb1l5dbUXdnHJjjt4rYXzWDGy3xeqneD4Gm2dzlYMJmASpZRVZputSyg9QqxrOP0iI4wh8M7lWIciKrSjy0FjYVp5h8lm9Hi9R6r9+NuuXntdwHNvNVXu9iRRpk6gOhjkqwQiCsbs3xhsAKYsKiQVZPgzzDHzu6ABMb1DmeX21LUIzIPIz1Qqqhb3ZN0MLCAUFjhZ74kcDosHBG8JcDJhXhgGVZMresBZfr2jOySxIsAL3/JRH5SnKWTEcBy4ixXTQ+7rzutg1076f+3mG3LCg3a9/Uvq09qBm1Ye2w9qJ2UntVc3aOd+TOh51s97fd33f/2P2zML1+rcTcrVWe3b/+BehGTEg=

• 2.3M Amazon products (nodes), reviewed by 4.3M users (hyperedges). • mean hyperedge size > 17 • Product categories provide cluster labels Max hyperedge size ~9.3k nodes! • All-or-nothing penalty (wi = 1). 42 Detecting online forum questions on the same topic

Our HyperLocal 0.8 HyperLocal Neighborhood TN/BN Baselines FlowSeed 0.6

F1 Scores 0.4 Clique-Expansion + Graph Method 0.2

tcl jq dax mdx abap vhdl mule wso2 stata aem julia ocaml verilog plone typo3 axapta sparql prolog racket cypher sapui5 scheme xslt- 1.0 sitecore alfresco netsuite xslt- 2.0 xpages netlogo openerp office- js marklogic wso2esb data.table lotus- notes apache- nifi common- lisp docusignapi system- verilog codenameone ibm- mobilefirst • 15M StackOverflow questions spring-(nodes), integration answered by google- 1.1M bigquery users (hyperedges). wolfram- mathematica • mean hyperedge size 23.7, max hyperedge google- sheets- size formula ~ 60k. • Tags provide cluster labels. • Delta-linear splitting function wi = min(i, 5000). 43 We carefully apply our graph reduction techniques to growing subsets of the hypergraph.

R s s

t t

The resulting algorithm is very fast because it doesn’t have to explore the entire hypergraph.

44 HyperLocal has strong theoretical guarantees.

Theorem [Veldt-Benson-Kleinberg 2020b] Runtime guarantees. cut (S) minimize HLCR,"(S)= H The runtime of HyperLocal depends S V vol (S R) "vol (S R¯) ⇢ H \ H \ only on the size of the seed set R, not Hypergraph localized conductance the size of the hypergraph. cut (S)=cut from C-B splitting function H Theorem [Veldt-Benson-Kleinberg 2020b] vol (X)= 1=sum of hypergraph degrees in X H cut(S) cut(S) Cluster detection guarantees. AAAInXicfVVtb9s2EHayrfW0l7brx30Yu9RDW9iOnS5rsiJAsL6gBdIti9M2QGh4lHSyCJOSSlKJU4L/c/szw46S09hOMgK2+HL3POTD411YCK5Nr/fPyupnn39x42bzy+Crr7/59tbtO9+903mpIngb5SJXRyHTIHgGbw03Ao4KBUyGAt6Hk2d+/f0JKM3z7NCcFTCUbJzxhEfM4NTozsoHKnnGJf8IIzsgVJehBkPeOfqUPg2ogakJE/tq75kb2YM2PWEKCs1FnrkHg4fkpx0S0ESxyFaWNiqNG1HJTBoxYV95G2dJvXaSi6U1GrGCHDwkHTKHe601qcxpyJQ9cA8dofR4S8phcC2z392nRZKoXJJnnd+IRlmN4dmYJGUWeRUqrI0LrEvcRzWWLuXIckJ5Ro5cPYJq9MKRPjknw3mSJyRFsdVYsSIlMYwVgCZoef/ovhvdXut1e1Ujlzv9WWetMWv7ozs37tI4j0oJmYkE0/q43yvM0DJleCTABbTUULBowsZwjN2MSdBDWwWHIy2ciUmSK/xlhlSzwbwL4ih2toBiDQtLwdR0cTbM8wmuaBcEi5wm2RpanhWlgSyqKZNSEJMTH3Mk5goiI87IIq/hk4/tjEfgI6jNpPaStwvu99k2k4+dSr62ZBOIQIiLqXpX3l3wUDF15o+Qn+p2iMhjlZdZrNsFMwZUptHfKD5t65QVoNsJN2281siPY+9TiNxIpib6OtSuBMNwsVJOgLGHZWLgAGJnFcT3tnr3QoG88xYmra48c7b6eJvTlBtYsglFCc76/zmLoEVSYwr96/o6xlNXG8SGaZSybAzdKJfrH0rQPmz1ev+Xze2N7XUNkuMbD/FJy84pN2nHH6LDs06ImQBUZff4yVr9CagXlGGm8PoEdCzykAmKQ+rddiHTpYLdOBcYALuYJ6I8hh2qQLDpuW+Om18MouPD/tD6i/MBsHDL+4cDlnlxFWRwigeQLIstTZjk4iyGhJXCOEt1ct5fDBKd+KhwQWueTOMNQrzT6263I0xfBtVmAkMeCcxUJx5i8ZCITTMz9VC7tbPVj47xrW0O3fKhngM+MgWDMxnm4iUeydYo2tk/3uw5m3kKyZ2VznLcLh2AucoYJ+Jll3DmMuPwDgPMuJi4S3+lVxMsMwxevvGSnBMc9hfks+HUWS0uSLxx7W1fo6XXgIkiZe5iq3+9XlI9HgvgUdqptb9qBS9aY3pZzA/Sw8zfshzwsUQmWkeVh7M0lJbW8+5SWMg9rF3xVR6zBbdI8YhOsSIcY/DRNMynlp74/1ZAU1UKICnwcWowuz7ZLAxpkcMUCItMyQRBt4BOMEP0uhubMG2R89Yiz7HusiwCEoI5xffrbQmSEV3JGNRUrYCQCqDT6/ZBts69B2muUB1fYbCcYVARAYkhmsfgPebOtdZ3n0CwADz+XxBVnaRCcV4FLCP95aJxufNuo9vH7f3589ru1qygNBvfN35sPGj0G08au41Xjf3G20a08vfKv6s3V5vNH5ovmnvN32vT1ZWZz93GQmu+/w8XFAmq (S)= i X +e E vol(XS2 ) X2 If R overlaps enough with a target AAAFDXicjVNLbxMxEHbbACW8Wjhy2VJVaiFESQFRDpUqQBWVqCgkfUjdqHi9s4kV70O2t01q+TfwR5A4cUNcuXND8GMYJynNpj1gKdnxzDffPDwTZIIrXav9mpqeKV25em32evnGzVu378zN391TaS4Z7LJUpPIgoAoET2BXcy3gIJNA40DAftB95ez7xyAVT5Om7mfQimk74RFnVKPqaG7Tzzp8ubHirXt+JCkzPss13q3xj1PhBO/RpRY/RVoX1TTsij2aW6xVa4PjXRTqI2GRjM7O0fzMFz9MWR5DopmgSh3Wa5luGSo1ZwJs2c8VZJR1aRsOUUxoDKplBgVbbwk1oRelEn+J9gbacQ+kkbRvi6pYqX4coHNMdUdN2pzyUpumQS6o7BXZgjTtokVNJKqjtZbhSZZrSNgwzygXnk4913wv5BKYFn2vGKF7+rgtadYZkmnePRU8kFT2XR3piaqoDs1AVRgVrBJxjbhByQK0aeaRhg8QWiMhXFirLQQCaccRugNtCZBYM/g4zEmHa5jABCIHa9z/OGKsuGa9ZVy7XWWF/HeaDZq4TkhI4ISlcUyT0PgRjbnohxDRXGicGhWdyQVvFQ1aX14aj6WwVAjXa9UXFRZzjIn9Efj+yK97KkKGmHaB4rhrDfj0yOwnuueYNoa+Rj08xLl71vqHTbFIhL4GnDcJDZyFVGxiPWZIoqx5t/3WmsQFiLk1sTWDVjdAXwZGRTjpEoxcRjGcQyMPFO5l7tbtvwI0NrddP874m/VC60zQs0aJ8xgOPPQ2W4h0HaAi61B7nunHLVsu++NPg5uMWA09HUTGXWzRjgt+bncXJMAVr08u9EVhb7Vaf1Jdff90cePlaNlnyX3ygCyTOnlONsgbskN2CSOfyU/ym/wpfSp9LX0rfR9Cp6dGPvdI4ZR+/AUEuMqZ vol(S) set T, HyperLocal will find a cluster S Hypergraph Normalized Cut almost as good as T.

45 4 1 2 Global Clustering and 5 3 Community Detection 6 7 Generative Hypergraph Clustering:

In global clustering, we from Blockmodels to Modularity assign every node to one cluster. Chodrow, Veldt, Benson arXiv 2101.09611

Examples. Find social communities based on group interactions Separate retail products into different departments/categories Minimizing communication costs in parallel computing [Çatalyürek-Aykanat 99; Kabiljo+ 17]

46 Contact hypergraphs

Hyperedge = group of students that interact during the day. Measured by wearable sensors

Our new hypergraph method beats graph methods at identifying groups of students (nodes) that belong to the same class (cluster) based on group interactions (hyperedges). 47 We develop a generative model for hypergraph clustering.

AAAIH3icfVVNbxs3EJXTNk7VjzjtsRemhgAnWMmSA8dO0wAKGgQNkCCu5XwApqJyd2d3CZHcDcmN5RD8MT30X/TeW9Fr0D/TodZuLNn1AtqlyHlvOPNmyLgS3Nh+/8PKlU8+/ezq6rXP2198+dXX19dufPPSlLVO4EVSilK/jpkBwRW8sNwKeF1pYDIW8Cqe/hTWX70DbXipDuxxBWPJcsUznjCLU5O1f6iFmXV72m88JPQ+RVtL75P3EaHPJeQMv7YAy26RB6RNK12mE7dPKFeESmaLhAm37z35yMIm+xfxXI6nmWaJgzeuG0/2qeG5ZBuN38n+rYZg4z0O/cbl67feOJzxPrxv+snaer/Xnz/k/GBwMlgfDn98+cfvO+O9yY2rEU3LpJagbCKYMYeDfmXHjmnLEwG+TWsDFUumLIdDHComwYzdXAVPOjiTkqzU+FOWzGfbZyHIo9nxAouzLK4F07PF2bgsp7hi/AL+MIhsVC1j0JBGuhaQ4uZEXmpuC7kFS+a1zXbHjquqtqCSZoNZLYgtSSgFknINiRXHZHGXlk/fR4onEESJmDRBqKjiIapIsikkIESz32AqeKyZPg7BlUcmipEl12WtUhNVzFrQyiDKaj6LTMEqMFHGbYTCJ+F/GjCVKK1kemr+j7UnUWpcnOdUgHUHdWZhH1LvMBM3d/s3Y4F+z1pgdeQaQHk3/wSbo4JbWLKJRQ3ehfcZi3aHFNZW5ofNTazrnrHIDbOkYCqHXlLKzbc1mNA/ZnNwd/ve1r1NA5JjycfYVbJ7hGp0QxBdrroxNiPoud2dnfXm06YhjQybNeSnTXNRxkxQ/EsDbAjK1BqGaSmwNIbYqkmZwgOqQbDZKbbEzS+W1+HBYOyCSEHsBUX3DkZMheRqUHCEAUimUkczJrk4TiFjtbDeUZOdjhcLwmShAny7c9aZQQUhfdDv3YsSydFprpnAZkAHdmayQLEYJHJTZWeBatiAnbl9iF24PfbLQT0CbD8No2MZl+IxhuQaFuPd82dPvVPBheTeSe84bpeOwF5kjBPpMiQ+gZz4CIBRHaOctg6SXuxg2cPo8bOQklMHB4OF9Ll45p0RH50E4wbtnqBlyAETVcH8x63++mQp62kugCdFt8n9RSsotMGDZ/HkkIHmrMpyFI5L9NpUVaBzNJaONvP+XFnIp3h9pBchThb8oovbdBYzfYjFR4u4nDn6Lrw7bVqEE4oUwPPC4rm7s11Z0iEHBRCW2JoJgrA2neIJ0e9tbcOsQ06fDnmEVx9TCZAY7BH2b7Al6IyYeRrbjatOm5A5QbffG4DsnKJHRakxO1zlpFQEi4oIyCwxPIWAOBPX+sD/R4JXw51LSfQ8kjkLJiHcL4Pl2+T84OVWb3C3N/gFL5rdVvNca33X+r610Rq0dlrD1s+tvdaLVrLycCVfqVberv62+ufqX6t/N6ZXVk4w37YWntUP/wIvNeCG bR(✓R)⌦(zR) aR e (bR(✓R)⌦(zR)) Pr(A z, ⌦, ✓)= Pr(aR z, ⌦)= | | aR! R R Y2R Y2R

Ω( ) hyperedge

Ground truth Affinity function decides the 5-tuple of nodes cluster labels probability this becomes a hyperedge. no hyperedge

Maximum likelihood estimation to infer cluster labels and affinity functions involves minimizing hypergraph cuts. 48 Different affinity functions correspond to different types of hypergraph cut penalties

Example 1 (all-or-nothing). The affinity function depends only on Ω( ) = Ω( ) whether all nodes are from the same ground truth cluster. Ω( ) = Ω( ) Optimizing the maximum likelihood function involves minimizing an AON hypergraph cut penalty.

Example 2 (Group Number). The affinity function depends only on Ω( ) = Ω( ) the number of clusters appearing in a hyperedge.

Example 3 (Relative Plurality). The affinity function depends only on Ω( ) = Ω( ) the difference in count of top two most frequent clusters.

49 We can use true cluster labels to evaluate affinities.

AAAg6XicjZnrctvGFYAZpZeUvdntz05nkErKdDwURdK6tpMZ2VZ8SexY1sV2IqjuAlgSMBcXn11QlDB4h/Zfm799nj5A36ZnCeoCnENP6LEB7H67B9jd8y1MepmKtOn1/vfJ0qc/+enPfv7ZL9q//NWvf/PbO3d/91qnOfjyxE9VCm89oaWKEnliIqPk2wykiD0l33jjR7b+zUSCjtLk2Fxk8iwWoyQaRr4wWPTu7tKnzq2P68lRlBRGeLkSUBbK8Wd/oGzXMJNmkCtZK7SfLxyh1FoKa0lqwigZYcEI0jxzkjz2JOAlSIWRJ9LJVA5CReYCCx3XdeoB4iiwARqlRk6NMYWBaCJG6ZqvIn+sS+xgxY2FCb1h0e9u7WxulM6K88WKPd/aWrHVg25vsLltSx1kTRRL7fR7f9tZwchsiDDNtVzz0ziOjJGyCjLobvcHO9hL7eJWdFvW3y3nhddhNlfoA87jaJkIQwLtdrd372+s3JzWw2DZ9v2t0iHPs7FC4nipMWlMJsuVSXA9ze13d5Z73d7s49CT/vxkuTX/HLy7+8f/ukHq57FMjK+E1qf9XmbOCgEm8pUs2y4OXyb8sRjJ09w+JN7hWTFbrqWziiWBM0wB/ybGmZW2bzfBfkBc1Hq5utlpvdRL0zHW6HqpiLXtmZbaAawX2hLQQxz3JqsvYq/OjkBkYeT/qFuQuOIjI2MazaSpasBxrkwE6XlZG4VTm9O6yhwZdOwcBjjEapRCZMJ4IBt4boY7Z0WUZLmRiV8N8zBXjkkdm/lOEIH0jbpw6mNtovFlJ4l8OQThd+aD1MkiO4KdWIylL5Wq7teiKvJAwIWdovRcdzzsxaZ4EuhOJnAJQ6KxFabotKNDkUndGUam4wvl2+vAtslUamIBY72o124sjcDK2cpQ0hTH+dDIQxmUBY7E5zu9zz2FcW8TJpQjkDIpi9nBMuchTkCD8VQuy8L+e4tou4Ec4ljPwCLLDNhIh08elsWgt9vZ6HU2tpBadUJjMv2X9XVM3q42eAdy6ociGckuJvD6h1xqK1W93t/a3B3srmsZo+nAQ+HFa+c4Z2v2UdeiZM1DQ0uYcfe3l6tD27WDLdDgdhTb7kilnlAuXrq22Z5MdA5yL0gVpsEe+ttPA/mla4U6vWqL1pX1VDo97p8VdirtkqjN+8HxkUjsFIBM5Lk1kEAnuEMRR+oCB0TgoiwLVw+vzuvLRg+rZFq9HUzjPMvgy153t+PHEQbFlFGY+BjATPXQdlF/SOzbTczUdrVXNS70vVM0zuZZ2XyofYmqAXmEeZmqx/hIRdWLLouXL56XRWJDxFFZxGUR4e26R9JwMBYEzSbevMk8hm1wlHs4nSa3U8oHaEY4evzCDslVgON+bfgKb1oWWt0EsXDVuniGpB0DobJQlDe3+vdndt3dHvZgpGTkh2t08K+rcKo1arYuqdje7s0833OnnoBTXBtu6KXTwp3Yf1fbbjjbdkMZjUKDW8D2ZmacVec4lI7wTS6Ug83a7hjTvNcdbMrp6vWWsurs4+uKSHzpeNKcYxJa1sFgjp49ZbsKtYob0ayDtV63L+PVq9ZHYQp46/atIU0cnHNHyaFxdBTYrcutlr997GK5X153grvU/Y92ArMnmfVS4udmHR0I9GGwj29OVtTg+jJShavsAWcDZseF8FClqApXzY4Wr05qY1y4fm4zyG71uGvbi7Jej0auIfPrBjVJ1Q1iLxr1oxowqojVGmI3jSiYLzJMs8IdiywTzZ4I9qBsPhLWlLNl48V2Hpq36jdq69Vw0xpTpFoVVTCoeJzXeoObDhc0aNfS6WUmAYUB9xCKEpyrS7T99dlH0Tyek3m8GBTTqz6vzhahOvfe425rUsz56tT9K16wOA48jLDHspgfP0JFSUXhcQFlcBvHOzSwCAgifHlOBK4Ze7aImuAt44uGfVg8XXhLuL+hX/CeJgt7yiD1bIqEaXM9yAtZLTYUYGMlxQ+w5lbOz6gHJcEeMthDij1isEcU22ewfYp9xWBfUewxgz2m2BMGe0Kxpwz2lGLPGOwZxb5msK8p9g2DfUOx5wz2nGIvGOwFxb5lsG8p9pLBXlLsgMEOKPaKwV5R7JDBDil2xGBHFDtmsGOKnTDYCcVeM9hrir1hsDcUe8tgbyn2HYN9R7HvGex7il1KSBmyx6SqVPh/BIq6VQWdkQjVyvFVBc1KEfN8VUFTQMRewDaY19CFGUYcbovpQgkl/7RVBbPTNv0JvECBGBR4hQJxKPASBWJR4DUKxKPAixSISYFXKRCXAi9TIDYFXqdAfAq8UIEYFXilAnEq8FIFYlXgtQrEq8CLFYhZgVcrELcCL1cgdgVer0D8CrxggRgWeMUCcSzwkgViWeA1C8SzwIsWiGmBVy0Q1wIvWyC2BV63QHwLvHCBGBeulVsnrXJ1g8XXbsa5aSI5sE/ASVNVFqMGnHgM5lHMZzCfYgGDBRSTDCYpNmSwIcVGDDaiWMhgIcWaG4LF6G4wec9g7yk2ZrAxxRSDKYrFDBZTLGGwhGLNXd5iKcUyBsso9oHBPlAMGAwoxi1yTTHDYIZiOYPlFJsw2IRi5wx2TrEpg00pdsFgFxS7ZLBL5qWD5D3wiQ8k84FPfSC5D3zyA8l+4NMfSP4DLwAgBgBeAUAcALwEgFgAeA0A8QDwIgBiAuBVAMQFwMsAiA2A1wEQHwAvBCBGAF4JQJwAvBSAWAF4LQDxAvBiAGIG4NUAxA3AywGIHYDXAxA/AC8IIIYAXhFAHAG8JIBYAnhNAPEE8KIAYgpYpIpmj3H1nd6szwYb2k7DfCQb0ZQtVwJIxdBWDNPUJKmRuvqWrva1rK3XPkSZYWr1rDYW9kevek0CcfVlpP2qdfZj0enhk4dnRa+Df0rm61AZZx9rcPWV7TJ9g/K0YVsOtjv9zZ2O/ZX5Y83PFzXvb3d2NzuDZuN3d5b7zZ976cnrQbe/1e2/2lje25n/FPxZ6w+tP7X+3Oq3tlt7raetg9ZJy196v/SPpX8v/eCMnX86/3J+qNClT+Ztft+qfZz//B+AIcl/ Bayesian information criterion (BIC) all-or-nothing group number relative plurality trivago-clicks 1.6854 1.6866 2.0257 108 ⇥ house-committees 2.7128 2.7128 2.7119 105 ⇥ senate-committees 9.7934 9.7934 9.7736 104 ⇥

Trivago hypergraph

Vacation Rentals

Hyperedge = set of vacation rentals that a user “clicks out” on during a browsing session

50 We can use max. likelihood estimation to infer clusters.

Trivago hypergraph

Vacation Rentals Hypergraph method (AON)

Graph-based method (unweighted)

Adjusted Adjusted Rand Index Graph-based method (weighted) Hyperedge = set of vacation rentals that a user “clicks out” Maximum hyperedge size considered on during a browsing session

Our new hypergraph method beats graph methods at identifying vacation rentals (nodes) from the same country (cluster) based on browsing behavior. 51 Recap Vacation Rentals Hypergraphs model rich higher-order structure.

i We can revisit classical problems i ? Wij Wjk and gain fresh insights. j k j k Wjk

This are many new theoretical and algorithmic questions.

0.8 HyperLocal TN/BN FlowSeed 0.6

Applications to higher-order data F1 Scores 0.4 analysis are abundant! 0.2

tcl jq dax mdx abap vhdl mule wso2 stata aem julia ocaml verilog plone typo3 axapta sparql prolog racket cypher sapui5 scheme xslt- 1.0 sitecore alfresco netsuite xslt- 2.0 xpages netlogo openerp office- js marklogic wso2esb data.table lotus- notes docusignapi apache- nifi 52 common- lisp system- verilog codenameone ibm- mobilefirst spring- integration google- bigquery wolfram- mathematica google- sheets- formula arXiv:2103.05031

53 Higher-order network science with hypergraph cuts

THANKS! Austin R. Benson s http://cs.cornell.edu/~arb @austinbenson t [email protected]

Lots of data available at https://www.cs.cornell.edu/~arb/data/

Supported by ARO MURI, ARO Award W911NF19- 1-0057, NSF Award DMS-1830274, and JP Morgan Chase & Co. 54