Mining Text Outliers in Document Directories ICDM 2020: 20Th IEEE International Conference on Data Mining Edouard Fouche´ ∗, Y

Total Page:16

File Type:pdf, Size:1020Kb

Mining Text Outliers in Document Directories ICDM 2020: 20Th IEEE International Conference on Data Mining Edouard Fouche´ ∗, Y Mining Text Outliers in Document Directories ICDM 2020: 20th IEEE International Conference on Data Mining Edouard Fouche´ ∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ j October 19, 2020 ∗KARLSRUHE INSTITUTE OF TECHNOLOGY (KIT), ∗∗UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN KIT – The Research University in the Helmholtz Association www.kit.edu This talk is about. Classifying documents into directories For example: emails, news articles, research papers . Jobs Research Fun Email 1: Re: Your application... Email 5: Fw: Grant Proposal... Email 7: Re: e best cat memes... Email 2: Interview scheduling... Email 6: Support Vector Machines... Email 8: ICDM Registration... Email 3: Preparing your internship... Email 9: Get-together, RSVP... Email 4: Self-supervised methods... Documents may be classified wrongly: Type M: Misclassification (wrong folder) Type O: Out-of-distribution (no adequate folder) We see those mistakes as semantic “outliers” We present an approach to mine both outliers types simultaneously Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 2/18 This talk is about. Classifying documents into directories For example: emails, news articles, research papers . Jobs Research Fun Email 1: Re: Your application... Email 5: Fw: Grant Proposal... Email 7: Re: e best cat memes... Email 2: Interview scheduling... Email 6: Support Vector Machines... Email 8: ICDM Registration... Email 3: Preparing your internship... Email 9: Get-together, RSVP... Email 4: Self-supervised methods... Documents may be classified wrongly: Type M: Misclassification (wrong folder) Type O: Out-of-distribution (no adequate folder) We see those mistakes as semantic “outliers” We present an approach to mine both outliers types simultaneously Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 2/18 This talk is about. Classifying documents into directories For example: emails, news articles, research papers . Jobs Research Fun Email 1: Re: Your application... Email 5: Fw: Grant Proposal... Email 7: Re: e best cat memes... Email 2: Interview scheduling... Email 6: Support Vector Machines... Email 8: ICDM Registration... Email 3: Preparing your internship... Email 9: Get-together, RSVP... Email 4: Self-supervised methods... M Documents may be classified wrongly: Type M: Misclassification (wrong folder) Type O: Out-of-distribution (no adequate folder) We see those mistakes as semantic “outliers” We present an approach to mine both outliers types simultaneously Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 2/18 This talk is about. Classifying documents into directories For example: emails, news articles, research papers . Conference? Jobs Research Fun Travel? Email 1: Re: Your application... Email 5: Fw: Grant Proposal... Email 7: Re: e best cat memes... Email 2: Interview scheduling... Email 6: Support Vector Machines... Email 8: ICDM Registration... O Email 3: Preparing your internship... Email 9: Get-together, RSVP... Email 4: Self-supervised methods... M Documents may be classified wrongly: Type M: Misclassification (wrong folder) Type O: Out-of-distribution (no adequate folder) We see those mistakes as semantic “outliers” We present an approach to mine both outliers types simultaneously Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 2/18 This talk is about. Classifying documents into directories For example: emails, news articles, research papers . Conference? Jobs Research Fun Travel? Email 1: Re: Your application... Email 5: Fw: Grant Proposal... Email 7: Re: e best cat memes... Email 2: Interview scheduling... Email 6: Support Vector Machines... Email 8: ICDM Registration... O Email 3: Preparing your internship... Email 9: Get-together, RSVP... Email 4: Self-supervised methods... M Documents may be classified wrongly: Type M: Misclassification (wrong folder) Type O: Out-of-distribution (no adequate folder) We see those mistakes as semantic “outliers” We present an approach to mine both outliers types simultaneously Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 2/18 This talk is about. Classifying documents into directories For example: emails, news articles, research papers . Conference? Jobs Research Fun Travel? Email 1: Re: Your application... Email 5: Fw: Grant Proposal... Email 7: Re: e best cat memes... Email 2: Interview scheduling... Email 6: Support Vector Machines... Email 8: ICDM Registration... O Email 3: Preparing your internship... Email 9: Get-together, RSVP... Email 4: Self-supervised methods... M Documents may be classified wrongly: Type M: Misclassification (wrong folder) Type O: Out-of-distribution (no adequate folder) We see those mistakes as semantic “outliers” We present an approach to mine both outliers types simultaneously Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 2/18 Why are there such outliers? (1/2) We drawning in data Document repositories have grown very large They tend to be highly multi-modal: many classes, folders Number of articles on ArXiv Number of CS articles on ArXiv 1.6e+06 math CV CY OH 2.0e+05 physics+gr-qc+nlin+nucl+quant-ph IT GT SD 1.4e+06 cond-mat 1.8e+05 LG IR MM astro-ph CL CC MA 1.2e+06 hep-* 1.5e+05 AI DM NA 1.0e+06 cs NI DB ET stat 1.2e+05 DS NE GR 8.0e+05 q-bio CR HC AR 1.0e+05 DC PL SC 6.0e+05 7.5e+04 LO CG PF RO DL MS 4.0e+05 5.0e+04 SI FL OS SE CE GL 2.0e+05 2.5e+04 SY 0.0e+00 0.0e+00 1991 1995 1999 2003 2007 2011 2015 2019 1991 1995 1999 2003 2007 2011 2015 2019 1 1 Data obtained from https://www.kaggle.com/Cornell-University/arxiv Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 3/18 Why are there such outliers? (2/2) Maintenance is difficult Human classification: sloppy, unreliable Handled by many/different users Different user ! Different classification 2 2 The Daily Struggle Meme – Jake Clark – Modified (ML/AI) Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 4/18 Why are they difficult to find? (1/2) Ambiguity Documents may have complex semantic Sometimes, the correct class is unknown yet, e.g., an emerging field Folder structures are domain/user-specific ! unsupervised Rabbit, duck, both?3 3 Rabbit and Duck – Fliegende Blatter,¨ 23 October 1892 – Public Domain Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 5/18 Why are they difficult to find? (2/2) Type O/M must be detected simultaneously Type O outliers may be detected as Type M by mistake The noise from Type M outliers hinders the detection of Type O Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 6/18 Why are they difficult to find? (2/2) Type O/M must be detected simultaneously O O Type O outliers may be detected as Type M by mistake The noise from Type M outliers hinders the detection of Type O Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 6/18 Why are they difficult to find? (2/2) Type O/M must be detected simultaneously M O M M M M O Type O outliers may be detected as Type M by mistake The noise from Type M outliers hinders the detection of Type O Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 6/18 Our Contributions We explore the problem of mining text outliers in document directories We are first to distinguish between Type O/M outliers We propose a new approach to detect text outliers kj-Nearest Neighbours (kj-NN) Exploit similarities between documents/phrases Extract semantic labels and similar documents ! interpretability We provide an extensive evaluation Improve the current state of the art (real-world/synthetic data) Interpretable results Code & data: https://github.com/edouardfouche/MiningTextOutliers Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 7/18 Our Contributions We explore the problem of mining text outliers in document directories We are first to distinguish between Type O/M outliers We propose a new approach to detect text outliers kj-Nearest Neighbours (kj-NN) Exploit similarities between documents/phrases Extract semantic labels and similar documents ! interpretability We provide an extensive evaluation Improve the current state of the art (real-world/synthetic data) Interpretable results Code & data: https://github.com/edouardfouche/MiningTextOutliers Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y. Meng∗∗, F. Guo∗∗, H. Zhuang∗∗, K. Bohm¨ ∗ & J. Han∗∗ – kj-NN October 19, 2020 7/18 Our Contributions We explore the problem of mining text outliers in document directories We are first to distinguish between Type O/M outliers We propose a new approach to detect text outliers kj-Nearest Neighbours (kj-NN) Exploit similarities between documents/phrases Extract semantic labels and similar documents ! interpretability We provide an extensive evaluation Improve the current state of the art (real-world/synthetic data) Interpretable results Code & data: https://github.com/edouardfouche/MiningTextOutliers Introduction Related Work Our Method Experiments Conclusions Edouard Fouche´∗, Y.
Recommended publications
  • OUTLIERS: the STORY of SUCCESS by Malcolm Gladwell
    An Executive Summary of OUTLIERS: THE STORY OF SUCCESS by Malcolm Gladwell Who is Malcolm Gladwell Malcolm Gladwell is most known for his books, “The Tipping Point: How little things can make a big difference”, “Blink: The power of thinking without thinking”, and “Outliers: The Story of Success”. Amazingly, all the books have been featured on The New York Best Sellers list. Gladwell, who was born on September 3, 1963, is not only a writer but is also a journalist and speaker. As a staff writer for the prestigious “New Yorker” ever since 1966, he has gained a lot of popularity due to this books that deal with his research in various areas of social science. Most importantly, he is known to write in a way that provokes and challenges the reader to think differently. This is our summary of his book: Outliers: The Story of Success Preston and Stig’s General Thoughts on the Book We were initially turned on to this book by a comment that Charlie Munger made during a shareholders meeting. Although many might find the discussion of extraordinary people un-relatable, we are of the opinion that the book provides great insights into the importance of finding your own unique “assets”. Perhaps contrarily to others I don’t see this book as a testament to the concept of luck. I would much rather say that if you are willing to work hard, are talented, and you are lucky to have the right opportunities, you will become successful. Even the most successful people in this world are hard working, and a single individual has hardly been successful by merely having birthdays on a lucky date or simply being gifted with special skills.
    [Show full text]
  • Outliers (Book) 1 Outliers (Book)
    Outliers (book) 1 Outliers (book) Outliers Author(s) Malcolm Gladwell Cover artist Allison J. Warner Country United States Language English Genre(s) Psychology, sociology Publisher Little, Brown and Company Publication date November 18, 2008 Media type Hardback, paperback, audiobook Pages 304 ISBN 9780316017923 [1] OCLC Number 225870354 Dewey Decimal 302 22 LC Classification BF637.S8 G533 2008 Outliers: The Story of Success is a non-fiction book written by Malcolm Gladwell and published by Little, Brown and Company on November 18, 2008. In Outliers, Gladwell examines the factors that contribute to high levels of success. To support his thesis, he examines the causes of why the majority of Canadian ice hockey players are born in the first few months of the calendar year, how Microsoft co-founder Bill Gates achieved his extreme wealth, and how two people with exceptional intelligence, Christopher Langan and J. Robert Oppenheimer, end up with such vastly different fortunes. Throughout the publication, Gladwell repeatedly mentions the "10,000-Hour Rule", claiming that the key to success in any field is, to a large extent, a matter of practicing a specific task for a total of around 10,000 hours. The publication debuted at number one on the bestseller lists for The New York Times and The Globe and Mail, holding the position on the former for eleven consecutive weeks. Generally well-received by critics, Outliers was considered more personal than Gladwell's other works, and some reviews commented on how much Outliers felt like an autobiography. Reviews praised the connection that Gladwell draws between his own background and the rest of the publication to conclude the book.
    [Show full text]
  • Students' Book· Audio Scripts
    STUDENTS’ BOOK· AUDIO SCRIPTS throughout the book is pick up on these T: What’s that? UNIT 7 Recording 1 little things that we really need to go J: Blocking? It’s where you stand or move P = Presenter I = Ian back and look at again if we are to really to, y’know? Like, when you say your words understand why successful people are as you might have to walk quickly across the P: Hello and welcome back to the Focus successful as they are. stage. Or move in front of someone. It’s all podcast. I’m Jenny Osmond, the editor of P: I think the 10,000 hours magic number planned and er, you have to remember it. Focus, the monthly science and technology is really interesting because, as you know, T: Oh, I see. magazine from the BBC. I used to play tennis professionally, and J: But it’s funny: for, for other things I have He’s the hugely influential author of Blink I hit a load of tennis balls when I was a terrible memory. I’m totally useless. I and the Tipping Point. His work is quoted younger. And I’m sure, I must have done always forget birthdays and dates. I’m always by academics, presidents and your mates 10,000 hours’ worth, you know, I must late for things. It’s just … yeah … luckily, I’m down the pub. And now Malcolm Gladwell have done four hours a day, and stuff. OK with my lines. has turned that deft mind of his to a new And I remember speaking to Martina P: What about you, Tim? subject: the science of success.
    [Show full text]
  • Outliers-Malcolm Gladwell
    NonFiction Book Club: May 2021 Outliers: The Story of Success by Malcolm Gladwell In this stunning new book, Malcolm Gladwell takes us on an intellectual journey through the world of "outliers"--the best and the brightest, the most famous and the most successful. He asks the question: what makes high-achievers different? His answer is that we pay too much attention to what successful people are like, and too little attention to where they are from: that is, their culture, their family, their generation, and the idiosyncratic experiences of their upbringing. Along the way he explains the secrets of software billionaires, what it takes to be a great soccer player, why Asians are good at math, and what made the Beatles the greatest rock band. Author: Malcolm Gladwell Malcolm Gladwell is the author of five New York Times bestsellers—The Tipping Point, Blink, Outliers, What the Dog Saw, and David and Goliath. He is also the co-founder of Pushkin Industries, an audio content company that produces the podcasts Revisionist History, which reconsiders things both overlooked and misunderstood, and Broken Record, where he, Rick Rubin, and Bruce Headlam interview musicians across a wide range of genres. Gladwell has been included in the TIME 100 Most Influential People list and touted as one of Foreign Policy's Top Global Thinkers. Find similar reads using these book appeal terms in NoveList Plus: Genre- Business & Economics; Tone- Thought-provoking, Upbeat; Writing Style- Accessible, Engaging You might also enjoy: NonFiction Book Club: May 2021 Discussion Questions: 1. According to Lewis Terman’s research, what is the relationship between intelligence and success? Did any of the findings surprise you? 2.
    [Show full text]
  • Malcolm Gladwell: the Only Way You Can Develop Your Ideas Is If You Have a Crowd
    Malcolm Gladwell: The only way you can develop your ideas is if you have a crowd. And it's that kind of conversation that is at the heart of every, every intellectual revolution. [INTRO MUSIC] Stephen Scherr: Hi everyone. And welcome to Talks at GS. I'm Stephen Scherr, Chief Financial Officer of Goldman Sachs. I'm thrilled to be joined today by writer and podcaster Malcolm Gladwell. Malcolm is, of course, the author of several best- selling books, including The Tipping Point, Outliers, and David and Goliath. He is the host of the podcast Revisionist History, a contributor to The New Yorker since 1996, and his latest book, The Bomber Mafia, is now on sale. Malcolm, thank you for joining us and welcome back. Malcolm Gladwell: I am delighted to be on GS Talks. Stephen Scherr: Well, why don't we begin just with your new book, The Bomber Mafia, which is, you know, for me it was a kind of fascinating history of war, technology, and how innovation can lead to unexpected ends in many respects. And you start at the beginning sort of reflecting on your look for a topic that you would be quite excited about to write. And I'm curious how you came upon this story and what drew you to write about this sort of moment in history. Malcolm Gladwell: Well, you know, I've always been a little bit of a military buff. And I've always been particularly fascinated with the bombing campaigns of the Second World War because my father is English, was English.
    [Show full text]
  • Outliers the Story of Success
    Outliers The Story of Success Malcolm Gladwell New York: Little Brown and Company, 2008 “This is a book about outliers, about men and women who do things that are out of the ordinary.” [p. 17] Includes: 2 Quotes from Outliers What makes successful people so different? In this book synopsis with Randy Mayeux, you will find that the secret of success may have more to do with opportunity, the “10,000 hour rule,” luck, and what Gladwell calls 5 Detailed outline “accumulative advantage,” rather than individual merit. During this presentation, you will discover that maybe outliers aren’t really outliers after all. Outliers is a bestseller and voted as one of the Best Books 7 Observations and Discussion of 2008. Quotes 1. In Outliers, I want to do for our understanding of success what Stewart Wolf did for our understanding of health. (p. 11). 2. Canadian hockey is a meritocracy. Thousands of Canadian boys begin to play the sport at the “novice” level, before they are even in kindergarten. From that point on, there are leagues for every age class, and at each of those levels, the players are sifted and sorted and evaluated, with the most talented separated out and groomed for the next level. By the time players reach their midteens, the very best of the best have been channeled into an elite league… This is the way most sports pick their future stars… For that matter, it is not all that different from the way the world of classical music picks its future virtuosos, or the way the world of ballet picks its future ballerinas, or the way our elite educational system picks its future scientists and intellectuals.
    [Show full text]
  • Worksheet for Malcolm Gladwell | What We Should Know About Talking to Strangers (Episode 256)
    Worksheet for Malcolm Gladwell | What We Should Know about Talking to Strangers (Episode 256) To long-time listeners of this show, Malcolm Gladwell is probably a household name. He’s written bestsellers that are likely on your shelf How do you respond to expectations? right now — like The Tipping Point, Blink, Outliers, and What the Dog Saw — and he writes and Gretchen Rubin, host of the Happier hosts the popular Revisionist History Podcast, Podcast and author of The Four which goes back and reinterprets “something Tendencies: The Indispensable Personality from the past: an event, a person, an idea. Something overlooked. Something Profiles That Reveal How to Make Your misunderstood.” An interview with him has been Life Better (and Other People’s Lives on our wish list for more than a decade, so this is Better, Too), joined us for Episode 18 to especially exciting for all of us at Harbinger HQ. discuss how answering this one simple In this episode we talk to Malcolm about his question gives us a framework to make latest book (which he considers his angriest work better decisions, manage time efficiently, to date), Talking to Strangers: What We Should suffer less stress, and engage with others Know about the People We Don’t Know. We more effectively. discuss why the tools we have when we talk to our friends betray us when we talk to strangers — and what we can do about it — as well as delve into Malcolm’s intense research, writing, and project selection process. ● Upholders -- Motivated by both outer and inner expectations.
    [Show full text]
  • Athenæum Bookshelf
    ATHENÆUM BOOKSHELF June, 2021 ART, ARCHITECTURE & DESIGN ANTIQUITY IN GOTHAM ELIZABETH MACAULAY-LEWIS “The ancient architecture of New York City.” 283 p., ill. LETTERS TO CAMONDO EDMUND DE WAAL Artist Edmund de Waal turns his eye on Count Moïse de Camondo, a neighbor of the Ephrussi, who de Waal wrote about in his bestseller The Hare with Amber Eyes. Camondo created a magnificent treasure-filled home for his son Nissim. When Nissim was killed in World War I before he could inherit the masterpiece the Count bequeathed the home to France and it became a museum. De Waal’s book is presented as a series of letters to the Count. 182 p., ill. MY LIFE AS AN ARCHITECT IN TOKYO KENGO KUMA The architect of Tokyo’s new Olympic Stadium shows us his other works. 127 p., ill. THE OAK PARK STUDIO OF FRANK LLOYD WRIGHT LISA D. SCHRENK A look at the studio Wright used from 1898 to 1909. 326 p., ill. PHILADELPHIA BUILDS MICHAEL J. LEWIS “Essays on architecture.” If you missed his Zoom presentation for the Athenaeum, you may watch the recording of it from our YouTube channel. 376 p., ill. STANHOPE, CHRONOLOGICALLY CAROLYN GILLS FRAZIER “The work of Stanhope Spencer Johnson, AIA, 1881-1973, Lynchburg, Virginia.” 298 p., ill. BIOGRAPHY MY BROKEN LANGUAGE QUIARA ALEGRÍA HUDES A memoir of growing up in Philadelphia and finding her voice and ability to weave captivating stories by a Puitzer Prize-winning playwright. Quiara Alegría Hudes is also part of the team responsible for the hit Broadway musical In the Heights.
    [Show full text]
  • Talking to Strangers Event # 238 by Malcolm Gladwell Reviewed by Robert Schmidt
    Book # 128 Talking to Strangers Event # 238 by Malcolm Gladwell Reviewed by Robert Schmidt About the Author Malcolm Gladwell is the author of five New York Times bestsellers — The Tipping Point, Blink, Outliers, What the Dog Saw, and David and Goliath. He is also the co-founder of Pushkin Industries, an audio content company that produces the podcasts Revisionist History, which reconsiders things both overlooked and misunderstood, and Broken Record, where he, Rick Rubin, and Bruce Headlam interview musicians across a wide range of genres. Gladwell has been included in the TIME 100 Most Influential People list and touted as one of Foreign Policy’s Top Global Thinkers. About the Book A Best Book of the Year: The Financial Times, Bloomberg, Chicago Tribune, and Detroit Free Press Malcolm Gladwell offers a powerful examination of our interactions with strangers -- and why they often go wrong. How did Fidel Castro fool the CIA for a generation? Why did Neville Chamberlain think he could trust Adolf Hitler? Why are campus sexual assaults on the rise? Do television sitcoms teach us something about the way we relate to each other that isn't true? The Book’s ONE THING It is hard to understand strangers and peer into their mind, but what is required of us is: restraint and humility BLUE SKY LEADERSHIP CONSULTING | 210-219-9934 | [email protected] Need to grow top line revenue? Improve bottom-line profits? Build accountable and trusting teams? Improve cash flow? Develop leadership team members? Blue Sky Leadership Consulting works with small and mid-market organizations to leverage Strategic Thinking and Execution Planning.
    [Show full text]
  • Penguin-Press-Catalogue-AW21.Pdf
    Penguin Press Penguin Autumn/Winter Press 2O21 PENGUIN PRESS Autumn/Winter PRESS CATALOGUE S/S20 Date: 26 FEB 2021 Designer: Tom Prod. Controller: Pub. Date: ISBN: 9780141998176 SPINE WIDTH: 9 MM 2O 21 • Estimated • Confi rmed FORMAT 170mm x 242mm PRINT ••••CMYK PROOFING METHOD • Wet proofs • Digital only • No further proof required Cover photo: Deena Stryker photographs collection / Duke University Rubenstein Library/Gado/Getty Images CatalogueAutumnWinter2021_COV.indd 1 02/03/2021 09:52 Contents Recently Announced 4 Allen Lane 13 Particular Books 49 Pelican 59 Penguin Classics 63 Penguin Modern Classics 75 Penguin Paperbacks 91 Penguin Press, 20 Vauxhall Bridge Road, London, SW1V 2SA Recently Announced What White People Can Do Next From Allyship to Coalition Emma Dabiri An incisive and deeply practical essay from the acclaimed author of Don’t Touch My Hair Stop the Denial Emma Dabiri is a teaching fellow in the African Stop the False Equivalencies Languages, Cultures and Literatures Section of the Interrogate Whiteness African department at SOAS, a Visual Sociology Interrogate Capitalism PhD researcher at Goldsmiths and the author of Denounce the White Saviour Don’t Touch My Hair, which was an Irish Times Abandon Guilt bestseller. She has presented several television and radio programmes including BBC Radio 4’s We need to talk about racial injustice in a new critically-acclaimed documentaries Journeys into way: one that builds on the revolutionary ideas of Afro-futurism and Britain’s Lost Masterpieces. the past and forges new connections. In this robust and nuanced examination of race, class and capitalism, Emma Dabiri draws on years of academic study and lived experience, as well as personal refl ections on a year like no other.
    [Show full text]
  • Mining Text Outliers in Document Directories
    Mining Text Outliers in Document Directories Edouard Fouche´∗, Yu Mengy, Fang Guoy, Honglei Zhuangy, Klemens Bohm¨ ∗, Jiawei Hany ∗Karlsruhe Institute of Technology (KIT), Germany yUniversity of Illinois at Urbana-Champaign, USA ∗ edouard.fouche, klemens.boehm @kit.edu f g y yumeng5, fangguo1, hzhuang3, hanj @illinois.edu f g Abstract—Nowadays, it is common to classify collections of Email 1: Re: Your application. documents into (human-generated, domain-specific) directory Jobs Email 2: Preparing your internship. structures, such as email or document folders. But documents Email 3: Interview scheduling. may be classified wrongly, for a multitude of reasons. Then they are outlying w.r.t. the folder they end up in. Orthogonally to this, Email 4: FYI: Self-supervised methods. and more specifically, two kinds of errors can occur: (O) Out-of- Email 5: Fw: Grant Proposal. Type M distribution: the document does not belong to any existing folder Research in the directory; and (M) Misclassification: the document belongs Email 6: Support Vector Machines. to another folder. It is this specific combination of issues that we address in this article, i.e., we mine text outliers from massive Email 7: Re: The best cat memes. document directories, considering both error types. Fun Email 8: ICDM Registration Confirmation. We propose a new proximity-based algorithm, which we Email 9: Get-together, RSVP. dub kj-Nearest Neighbours (kj-NN). Our algorithm detects text outliers by exploiting semantic similarities and introduces a Conference? Travel? Type O self-supervision mechanism that estimates the relevance of the ??? original labels. Our approach is efficient and robust to large proportions of outliers.
    [Show full text]
  • Outliers-Summarized-2Ai1evw
    15-Page Summary Contents (Click to Jump to a section) QUICK SYNOPSIS KEY TERMS 6 KEY POINTS IN THIS BOOK PART I: OPPORTUNITY PART II: LEGACY QUOTES WORTH SHARING ACKNOWLEDGEMENTS @GETSIGNALS :: WWW.GETSIGNALS.COM Outliers by Malcolm Gladwell ON GOODREADS PUBLISHED 2008 Quick Synopsis: Genius is over-rated. Success is not just about innate ability – it’s a combination of several key factors like opportunity, hard work, and culture. Chance and luck, such as when and where you were born, can influence your opportunities. Bestselling author of The Tipping Point and Blink, Malcolm Gladwell, dives deep into this concept, exploring reasons why some succeed and others don’t. In this summary of his top hit, Outliers, we explore it all. -- (P.S. By downloading this guide, you get 30% off a ticket to INBOUND 2014 where Malcolm will be speaking. Join 5,000+ professionals and register today with the code OUTLIERS30). @GETSIGNALS :: WWW.GETSIGNALS.COM Key Terms CULTURE OF HONOR: A culture in which a man’s reputation is at the center of his livelihood and self-worth. CULTURE LEGACY: A type of social inheritance in which cultural norms are passed down from generation to generation, despite changing location or circumstances. MATTHEW EFFECT: A term Robert Merton coined describing the phenomenon where those who have an edge over others are given opportunities that increase that edge. It’s a term that describes “the rich get richer” phenomenon. MERITOCRACY: A merit-based system of reward whereby rewards are given based on an individual’s level of competence and ability. OUTLIER: Something (or someone) classified differently from a main or related group.
    [Show full text]