Harvey Mudd College at SemEval-2019 Task 4: The D.X. Beaumont Hyperpartisan News Detector Evan Amason Jake Palanker Mary Clare Shen Julie Medero Harvey Mudd College Harvey Mudd College Harvey Mudd College Harvey Mudd College 301 Platt Boulevard 301 Platt Boulevard 301 Platt Boulevard 301 Platt Boulevard Claremont, CA 91711 Claremont, CA 91711 Claremont, CA 91711 Claremont, CA 91711
[email protected] [email protected] [email protected] [email protected] Abstract of words that characterize hyperpartisan writing. We use the 600 hand-labelled articles from Se- We also considered complexity features such as mEval Task 4 (Kiesel et al., 2019) to hand- type-to-token ratio and automated readability in- tune a classifier with 3000 features for the Hy- dex. Based on the performance of these features perpartisan News Detection task. Our final we attempt to answer the question of whether hy- system uses features based on bag-of-words perpartisan writing is more or less complex than (BoW), analysis of the article title, language non-hyperpartisan writing. A successful classifier complexity, and simple sentiment analysis in could be very useful in today’s society. For exam- a naive Bayes classifier. We trained our fi- nal system on the 600,000 articles labelled by ple, it could be used to create a browser plug-in publisher. Our final system has an accuracy of to check online articles for political bias in real 0:653 on the hand-labeled test set. The most time as the user reads. People on social media effective features are the Automated Readabil- could use it to verify the legitimacy of a political ity Index and the presence of certain words in article before sharing it with their followers.