Paper, We Propose the Bigram and Uni- and Semantic Similar

PRELIMINARY VERSION: DO NOT CITE The AAAI Digital Library will contain the published version some time after the conference Bigram and Unigram Based Text Attack via Adaptive Monotonic Heuristic Search Xinghao Yang1*, Weifeng Liu2, James Bailey3, Dacheng Tao4, Wei Liu1 1School of Computer Science, University of Technology Sydney, Australia, 2School of Information and Control Engineering, China University of Petroleum (East China), China, 3School of Computing and Information Systems, The University of Melbourne, Australia, 4School of Computer Science, Faculty of Engineering, The University of Sydney, Australia [email protected], [email protected], [email protected], [email protected], [email protected] Abstract Original Input Unigram Attack Bigram Attack Deep neural networks (DNNs) are known to be vulnerable to New York Fresh York Empire State adversarial images, while their robustness in text classifica- Machine Learning Device Learning Data Mining tion are rarely studied. Several lines of text attack methods Primary School Major School Elementary School have been proposed in the literature, such as character-level, word-level, and sentence-level attacks. However, it is still a Table 1: Difference between unigram and bigram attacks. challenge to minimize the number of word distortions neces- sary to induce misclassification, while simultaneously ensur- ing the lexical correctness, syntactic correctness, and semantic similarity. In this paper, we propose the Bigram and Uni- and semantic similar. This will ensure the adversarial mod- gram based Monotonic Heuristic Search (BU-MHS) method ifications are imperceptible to human readers. Secondly, the to examine the vulnerability of deep models. Our method has words in text sequences are discrete tokens instead of con- three major merits. Firstly, we propose to attack text docu- tinuous pixel values as in images. Therefore, it is infeasible ments not only at the unigram word level but also at the bi- to directly compute the model gradient with respect to ev- gram level to avoid producing meaningless outputs. Secondly, ery word. Thirdly, making small perturbations on many pix- we propose a hybrid method to replace the input words with els may still yield a meaningful image from human percep- both their synonyms and sememe candidates, which greatly tion perspectives. However, any small changes, even a single enriches potential substitutions compared to only using syn- word, to text document can make a sentence meaningless. onyms. Lastly, we design a search algorithm, i.e., Monotonic Several text attack methods have been proposed, such Heuristic Search (MHS), to determine the priority of word replacements, aiming to reduce the modification cost in an as character-level attack, sentence-level attack, and word- adversarial attack. We evaluate the effectiveness of BU-MHS level attack (Wang et al. 2019). Character-level attack (e.g., on IMDB, AG’s News, and Yahoo! Answers text datasets by noise ! nosie) leads to lexical errors, and sentence-level at- attacking four state-of-the-art DNNs models. Experimental tack (i.e., inserting a whole sentence into the original text) results show that our BU-MHS achieves the highest attack often causes significant semantic changes. To avoid these success rate by changing the smallest number of words com- problems, many recent works focused on word-level attacks pared with other existing models. that replace the original word with another carefully se- lected one (Zhang et al. 2019). However, existing meth- 1 Introduction ods mostly generate substitution candidates for every indi- vidual word (i.e., a unigram), which can easily break com- Deep neural networks (DNNs) have exhibited brittleness to- monly used phrases, leading to meaningless outputs (e.g., wards adversarial attacks in the image domain, where an ad- high school ! tall school). In addition, when sorting word versarial image is intentionally modified with a only small replacement orders, most algorithms calculate the word im- number of pixel perturbations (Szegedy et al. 2014; Good- portant score (WIS) and attack them via a descending or- fellow, Shlens, and Szegedy 2015). This phenomenon raises der of the WIS. There are different definitions of WIS, such great interest in the Computer Vision community, while as probability weighted word saliency (PWWS) (Ren et al. the vulnerability of DNNs in Natural Language Processing 2019) and the changes of DNNs’ predictions before and after (NLP) field is generally underestimated, especially for those deleting a word (Jin et al. 2020), etc. One major drawback security-sensitive NLP tasks, such as spam filtering, web- of using such a static attack order is word substitution in- page phishing, and sentiment analysis (Atallah et al. 2001). flexibility, e.g., sequentially selecting the top-3 WIS words Compared to image attacks, there are non-trivial difficul- ftop1; top2; top3g may not fool a classifier but sometimes ties in crafting text adversarial samples. Firstly, the text ad- the combination ftop1; top3g can make it. versarial samples should be lexical correct, syntactic correct, In this work, we propose a new word-level attack method *Corresponding Author. named Bigram and Unigram based Monotonic Heuristic Copyright © 2021, Association for the Advancement of Artificial Search (BU-MHS) which effectively addresses all the draw- Intelligence (www.aaai.org). All rights reserved. backs above. Unlike traditional unigram word attack, we Input The strategic value of information technology is flawed 2019). Character-level attack generates adversarial texts by deleting, inserting, or swapping characters (Belinkov and Bisk 2018; Ebrahimi et al. 2018). However, these character- Synonym strategical time computer science level modifications lead to misspelled words, which can be candidates note easily detected by spelling check machines. Sentence-level Sememe prise exists attack concatenates an adversarial sentence before or af- candidates love remains ter the original texts to confuse deep architecture models (Jia and Liang 2017; Wallace et al. 2019a), but they usually lead to dramatic semantic changes and generate hu- Best candidate strategical love computer science exists man incomprehensible sentences (Gan and Ng 2019; Wal- lace et al. 2019b). Word-level attack replaces original in- MHS Priority determination computer science put words with carefully picked words. The core problems are (1) how to select proper candidate words and (2) how to Output The strategic value of computer science is flawed determine the word substitution order. Incipiently, Papernot et al. (2016) projected words into a 128-dimension embed- Figure 1: A successful attack example showing the workflow ding space and leveraged the Jacobian matrix to evaluates of our method. In this example, a CNN model is misled from input-output interaction. However, a small perturbation in “Sci/Tech” to “Business” by replacing one bigram. the embedding space may lead to totally irrelevant words since there is no hard guarantee that words close in the embedding space are semantically similar. Therefore, subse- consider both unigram and bigram substitutions. In our ap- quent studies focused on synonym substitution strategy that proach, we generate more natural candidates by replacing a search synonyms from the GloVe embedding space, existing bigram with its synonyms (e.g., high school ! secondary thesaurus (e.g., WordNet and HowNet), or BERT Masked school). Table 1 lists several examples that illustrate the Language Model (MLM). superiority of bigram attacks in comparison with unigram By using GloVe, Alzantot et al. (2018) designed a attacks. Additionally, we propose to replace input words population-based genetic algorithm (GA) to imitate the by considering both their synonyms and sememe-consistent natural selection. However, the GloVe embedding usually words. By incorporating these complementary candidates, fails to distinguish antonyms from synonyms. For exam- we have better choices to craft high-quality adversarial texts. ple, the nearest neighbors for expensive in GloVe space are More importantly, we propose an effective candida- fpricey; cheaper; costlyg, where cheaper is its antonyms. ture search method, Monotonic Heuristic Search (MHS), Therefore, Glove-based algorithms have to use a counter- to determine word priorities. The MHS inherits the best- fitting method to post-process adversary’s vectors to en- performed candidate combinations from the previous gen- sure the semantic constraint (Mrksiˇ c´ et al. 2016). Compared eration and determines the next replacement word with a with GloVe, utilizing well-organized linguistic thesaurus, heuristic search. For instance, if changing the ftop1g word e.g., synonym-based WordNet (Miller 1998) and sememe- cannot mislead a classifier, the static methods used in the lit- based HowNet (Dong, Dong, and Hao 2006), is a simple and erature will select the combination ftop1; top2g in the sec- easy implementation. Ren et al. (2019) sought synonyms us- ond iteration, but our adaptive MHS will check more com- ing the WordNet synsets and ranked word replacement or- binations, e.g., ftop1; top2g, ftop1; top3g, etc. Compared der via probability weighted word saliency (PWWS). Zang with static strategy, the MHS allows us to fool DNNs mod- et al. (2020) manifested that the sememe-based HowNet can els with much fewer modifications, which is significant in provide more substitute words than WordNet and proposed reducing semantic changes and grammatical mistakes.

Load more