Analysis of DNS TXT Record Usage and Consideration of Botnet Communication Detection

Title Analysis of DNS TXT Record Usage and Consideration of Botnet Communication Detection

Author(s) ICHISE, Hikaru; JIN, Yong; IIDA, Katsuyoshi

IEICE Transactions on Communications, E101.B(1), 70-79 Citation https://doi.org/10.1587/transcom.2017ITP0009

Issue Date 2018-01

Doc URL http://hdl.handle.net/2115/71053

Type article

File Information e101-b_1_70.pdf

Instructions for use

Hokkaido University Collection of Scholarly and Academic Papers : HUSCAP IEICE TRANS. COMMUN., VOL.E101–B, NO.1 JANUARY 2018 70

PAPER Special Section on Internet Technologies to Accelerate Smart Society Analysis of DNS TXT Record Usage and Consideration of Botnet Communication Detection∗

Hikaru ICHISE†a), Nonmember, Yong JIN††b), Member, and Katsuyoshi IIDA†††c), Senior Member

SUMMARY There have been several recent reports that botnet communication between bot-infected computers and Command and Control servers (C&C servers) using the Domain Name System (DNS) protocol has been used by many cyber attackers. In particular, botnet communication based on the DNS TXT record type has been observed in several kinds of botnet attack. Unfortunately, the DNS TXT record type has many forms of legitimate usage, such as hostname description. In this paper, in order to detect and block out botnet communication based on the DNS TXT record type, we first differentiate between legitimate and suspicious usages of the DNS TXT record type and then analyze real DNS TXT query data obtained from our campus network. We divide DNS queries sent out from an organization into three types — via-resolver, and indirect and direct outbound queries — and analyze the DNS TXT query data separately. We use a 99-day dataset for via-resolver DNS TXT queries and an 87-day dataset for indirect and direct outbound DNS TXT queries. The results of our analysis show that Fig. 1 Typical workflow in a botnet attack. about 30%, 8% and 19% of DNS TXT queries in via-resolver, indirect and direct outbound queries, respectively, could be identified as suspicious DNS traffic. Based on our analysis, we also consider a comprehensive the bot program sends probes to its corresponding Command botnet detection system and have designed a prototype system. and Control (C&C) server to identify its existence as well key words: botnet communication, DNS TXT record, via-resolver DNS query, direct outbound DNS query, and indirect outbound DNS query as to update its status. After it is finished collecting a number of bot-infected computers, the C&C server can instruct 1. Introduction them to perform several kinds of cyber attack. Here, we refer to the communication between a bot-infected computer Botnet, a malicious logical network of cyber attackers, has and the C&C server, which is the most important informa- become a significant security threat in cyberspace [4], [5]. tion transmission in a botnet-based cyber attack, as botnet There are many well known botnets, such as Conficker, communication. With regard to the above workflow, in this Storm, TDL4, and Zeus, each of which once infected mil- research we target the botnet communication as a means to lions of computers [6]. Once a computer is infected by a bot analyze and detect botnet-based cyber attacks. program, which is a kind of malware and also a core program To date, Internet Relay Chat (IRC), Hypertext Transfer of a botnet, it can attempt several kinds of cyber attack such Protocol (HTTP) and Peer-to-Peer (P2P) protocols have been as Advanced Persistent Threat (APT), Distributed Denial of used in botnet communication [8]. In addition, many recent Service (DDoS), spreading spam mails, and phishing [6], [7]. reports indicate that the Domain Name System (DNS) proto- Fig. 1 shows a typical workflow of a botnet-based cyber at- col [9], [10] is also used in botnet communication [11]–[13]. tack. First, a computer within an organization somehow gets Basically, DNS protocol is mainly used for name resolution, infected by a bot program such as through web browsing, which is translating the hostname to an IP address in the spam mail, or clicking a phishing site by mistake. After that, Internet, but the increasing popularity of Internet services has led to some minor records, such as DNS TXT, which Manuscript received February 20, 2017. is used in many Internet services. In [14], Xu et al. empir- Manuscript revised May 22, 2017. ically show that cyber attackers can effectively hide botnet Manuscript publicized July 5, 2017. communication by using a DNS-based stealthy messaging †The author is with Technical Department, Tokyo Institute of Technology, Tokyo, 152-8550 Japan. system that uses hash functions to encode the contents. In ††The author is with Global Scientific Information and Comput- [15], Anagnostopoulos et al. show how mobile botnets use ing Center, Tokyo Institute of Technology, Tokyo, 152-8550 Japan. DNS protocol in botnet communication and also evaluate †††The author is with Information Initiative Center, Hokkaido the magnitude of DNS-based amplification attacks. Conse- University, Sapporo-shi, 060-0811 Japan. quently, DNS packets, which are currently considered to be ∗ Preliminary versions of this paper have been presented in [1]– secure network traffic, have also become a target of mon- [3]. itored communication since network administrators cannot a) E-mail: [email protected] b) E-mail: [email protected] simply block all DNS traffic. Thus, an effective detection c) E-mail: [email protected] solution for botnet communication is needed. DOI: 10.1587/transcom.2017ITP0009 In this research, as a preliminary step to develop a Copyright © 2018 The Institute of Electronics, Information and Communication Engineers ICHISE et al.: ANALYSIS OF DNS TXT RECORD USAGE AND CONSIDERATION OF BOTNET COMMUNICATION DETECTION 71 method for detecting DNS-based botnet communication, we bot-infected computer performs botnet communication us- analyze real DNS traffic from our campus network. There ing DNS TXT record with them directly. In [21], the authors are three possible ways to use DNS traffic in botnet commu- introduced several utility software applications for creating nication: through via-resolver, indirect outbound and direct and detecting DNS tunnels. In [22], the authors analyzed an outbound communication. The via-resolver type completely archived malicious dataset covering one year and a 30-day uses DNS full resolvers in botnet communication which is a real DNS traffic obtained from a DNS full resolver, and they typical usage in normal name resolution. The indirect out- confirmed that a high ratio of DNS TXT record transactions bound type partially uses DNS full resolvers only to obtain might increase the risk of infection by botnet. the IP addresses of C&C servers. After that, the bot-infected As we can see from the existing research, although the computers communicate with the C&C servers directly us- usage of DNS protocol and DNS TXT record in botnet com- ing DNS protocol. The direct outbound type does not use munication has not gone unexamined, no clear conclusions DNS full resolvers at all, but communicates with the C&C regarding the official and malicious usages of DNS TXT servers directly from the beginning. All three types have record or the architecture of DNS-based botnet communica- recently been detected in DNS-based botnet communication tion (via-resolver, indirect and direct outbound) have been and reported in the literature [16], [17]. More importantly, provided. Since DNS TXT record is also used for legitimate recent detections also indicate that DNS TXT record, which purposes, such as Sender Policy Framework (SPF) [23] and has flexible usages compare to other DNS resource records, domainkeys [24], which are used for email sender authen- is increasingly being used in botnet communication. tication, the proper usages of DNS TXT record need to be Considering the above, in our analysis we focus on DNS identified in order to detect botnet communication. More im- TXT record and investigate the DNS traffic obtained from the portantly, in addition to the traffic of DNS TXT record, which DNS full resolvers of our campus network over a period of is a medium for botnet communication, the botnet commu- 99 days for via-resolver type botnet communication. For the nication architecture also needs to be considered since new other two types, we investigate all the DNS traffic, excluding DNS resource records can also appear as new media. the via-resolver DNS traffic, obtained at the gateway of the campus network to the Internet for 87 days. Based on our 3. DNS TXT Record Type and Scope of this Paper analysis, we also consider a detection system for the three types of DNS-based botnet communication and show the While there are three types of DNS traffic — via-resolver, basic system architecture. indirect outbound and direct outbound — the objective of this research is to devise a way to detect any DNS-based botnet 2. Related Work communication using DNS TXT records. In this section, we start by examining the DNS TXT resource record and the Hiding information in DNS traffic is not a new technology. three types of DNS traffic. The first implementation, called DNS tunneling, appeared in 1998 [8] and the target information embedded in FQDN and 3.1 DNS TXT Record Type CNAME was used. After that, in 2004, Kaminsky presented his implementation to tunnel arbitrary data over DNS traffic Although the major objective of DNS protocol is to pro- to the security community [18]. A few years later, DNS- vide name resolution between the hostname and IP address, based VPN was created [19]. it also provides supplementary functions using TXT (Text), There are some conventional ways to detect “abnor- MX (eMail eXchange), SRV (Service), and other resource mal” DNS traffic. In 2011, several reports discussed the record types. Of these, the DNS TXT record type is relatively existence of DNS-based botnet communication and detec- flexible to use and provides a field to store some short text tion approaches [12], [16], [17]. So far as we know, [12] descriptions. Conventionally, the original RFC1035 only is the earliest one about DNS-based botnet communication. provided 512 octets for each UDP DNS packet†, but now it It discussed ways to detect DNS abnormal usage attempts becomes possible that each UDP DNS packet can provide and also provided recommendations to mitigate exposure. more than 4000 octets through “Extension mechanisms for In [16], the authors analyzed 14 million DNS transactions of DNS (EDNS0)” [25]. This standard allows us to store a large DNS TXT records and found instances of botnet communi- amount of information in DNS TXT records, such as SPF cation used by “Feederbot”. In Feederbot, the IP addresses records and DomainKeys which are used to deal with spam of C&C servers are embedded in the bot program and the mail protections††. Besides SPF and DomainKeys records, botnet communication is performed directly with the C&C there are other legitimate ways to use the DNS TXT record servers using DNS TXT record without going through DNS type based on application requirements, but we will not de- full resolvers. In [17], another botnet named Morto has been scribe them in detail here. Unfortunately, EDNS0 also pro- reported, where DNS TXT record was also used in botnet vides possibilities for some malicious parties to use the DNS communication. In Morto, a bot-infected computer works in collaboration with Domain Flux [20] to find C&C servers by †DNS also supports TCP, but we focus on the majority part. querying the IP address of the generated domain names using ††SPF and DomainKeys resources records use the DNS TXT DNS full resolvers. After identifying the C&C servers, the record type in DNS protocol. IEICE TRANS. COMMUN., VOL.E101–B, NO.1 JANUARY 2018 72

Fig. 2 An example of via-resolver DNS query.

TXT record type to transport malicious informati on useful for cyber attacks depending on its flexibility. Therefore, we need to establish a method to differentiate the legitimate and malicious DNS traffic, especially that using the DNS TXT Fig. 3 Direct & Indirect outbound DNS queries. record type, which has appeared in botnet communication.

3.2 Via-Resolver DNS Query of which is illustrated as arrow numbered two in Figs. 2 and 3(a), respectively. The second type is where the bot-infected In general, most organizations set up multiple DNS full re- computer never uses the DNS full resolvers, but instead sends solvers to provide name resolution service to their internal DNS queries to the C&C server directly from the start, as computers. In this case, the internal computers send name shown in Fig. 3(b). We refer to this as a direct outbound DNS resolution requests to one of the DNS full resolvers, which query. In this case, the IP address of the C&C server might performs the name resolution on their behalf. We define have been embedded in the bot program at the installation such a name resolution request as a via-resolver DNS query stage, so the bot-infected computer can communicate with it in this research and Fig. 2 shows an example of via-resolver directly without looking for its IP address via the DNS full name resolution. The via-resolver name resolution relies resolvers. completely on DNS full resolvers, so we need to monitor all As well as the DNS TXT record type, there are other DNS traffic on the DNS full resolvers in order to analyze a exceptions where a direct outbound DNS query is a legiti- via-resolver DNS query. Several researchers have reported mate usage, such as updates of anti-virus software, the use the usage of a via-resolver DNS TXT query in botnet com- of DNS black lists to check for spam mail, and the use of munication [8], [22], so the via-resolver DNS TXT query is public DNS resolvers. Some anti-virus software use the one of our monitoring and analysis targets when we design DNS TXT record type to check the update status and some a botnet communication detection system. spam-mail checkers use the DNS TXT record type to check the domain name in DNS black lists. These usages of the 3.3 Indirect and Direct Outbound DNS Queries DNS TXT record type are legitimate, so we need to differentiate them from malicious usages. On the other hand, Even though official DNS full resolvers are available, some several public DNS full resolvers have been launched to pro- internal computers may use their private DNS full resolvers vide an effective name resolution service by using direct or public DNS full resolvers (from the Internet) for purposes outbound DNS queries. The Google public DNS [26] is a such as name resolution and independent configuration of popular public DNS resolver, which is announced with the IP a DNS full resolver. In this research, we define this kind addresses “8.8.8.8” and “8.8.4.4”, and some internal com- of name resolution request without using official DNS full puters in many organizations may use these addresses for resolvers as a direct outbound DNS query. Several studies name resolution. Obviously, the DNS queries to these pub- have also reported the use of direct outbound DNS queries lic DNS resolvers should be categorized as direct outbound in botnet communication [8], [16], [17]. DNS queries since they do not use the DNS full resolvers. Two types of outbound DNS query, which include indi- Therefore, we also need to filter them out from malicious rect outbound query and direct outbound query, are involved direct outbound DNS queries, but we have to include them in botnet communication. One is where the bot-infected into the analysis of DNS TXT usages. computer uses the DNS full resolvers to query the IP address of the C&C server only and then sends DNS queries 4. Analysis of Via-Resolver DNS TXT Query to the C&C server directly, as shown in Fig. 3(a). We refer to this as an indirect outbound DNS query. Note that the As explained in Sect. 3.2, the DNS TXT record type is used difference between via-resolver DNS query and indirect out- for botnet communication in various botnets. To detect bot- bound DNS query is the way of bot communication, each net communication based on DNS TXT record type, legiti- ICHISE et al.: ANALYSIS OF DNS TXT RECORD USAGE AND CONSIDERATION OF BOTNET COMMUNICATION DETECTION 73

Table 1 Usages of DNS TXT record.

Category Basis Usages SPF RFC7208 Formatted regular TXT record DomainKeys RFC6376 Formatted regular TXT record DNSSD RFC6763 Formatted regular TXT record NFSv4 RFC 7530 - Anti-virus e.g., Sophos Base64 encoded long TXT record Spam and DNSBL - - P2P tracker - NTP RFC5905 - Misc Various Various long TXT record Unconﬁrmed Various Base64 encoded long TXT record

Fig. 4 Architecture to obtain via-resolver DNS TXT query data. • SPF (RFC4408) and DomainKeys (RFC4870): For this category, we filtered the responses for DNS TXT query mate usages of DNS TXT record type must be distinguished which included “v=spf” and “domainkey” strings. from suspicious ones. In this section, we describe our anal- • DNS-based Service Discovery (RFC6763): For this ysis of via-resolver DNS TXT query data obtained from the category, we identified the DNS TXT queries which DNS full resolvers of our campus network. included “._dns-sd” in the FQDNs. • NFSv4 (RFC3530): We identified and fil- 4.1 DNS Traffic Capturing and Analytical Methodology tered out the DNS TXT queries which included “_nfsv4idmapdomain” in the FQDNs. Figure 4 depicts a simplified network topology of the DNS • Anti-Virus software: This category includes the update traffic capture from the DNS full resolvers set up in our cam- processes of anti-virus software which use the DNS pus network and the methodology that enabled us to obtain TXT record type. We identified DNS TXT queries that the DNS traffic. As shown in the topology, we only need to included “.sophos.” [27], “immunet” [28], and AVG capture the DNS traffic launched by the DNS full resolvers corporation’s domain [29] in the FQDNs. (steps 2 and 3) and only store DNS TXT query type in the • Spam email check and DNS black lists: This cate- DNS traffic server. With such a network configuration, we gory includes specific domain names for spam email captured and stored the DNS TXT query data for 99 days check and DNS black lists. We identified the DNS (from March 24, 2014 to June 30, 2014). Note that before TXT queries which included “spamcop.net” [30], the analysis we made the source IP addresses anonymous “spamhaus.org” [31], “rbl.maps.vix.com” and “sa- to respect the privacy of individual users. After we pas- accredit.habeas.com” in the FQDNs. sively obtained the DNS TXT query data from the DNS full • P2P tracker: This category represents P2P trackers used resolvers, we analyzed them through a two-step process: by BitTorrent. We identified the DNS TXT queries 1. Differentiate legitimate usages of the DNS TXT record included “bttorent” and “tracker” strings in the FQDNs. type and filter out unconfirmed usages (Sect. 4.2) • NTP (RFC1305): Some corporations use the DNS TXT 2. Investigate the detection ratio calculated from the un- record type to obtain the IP addresses of NTP servers. confirmed usages of DNS TXT record using a conven- We identified the DNS TXT queries which include “ntp tional method (Sect. 4.3) minpool” and “time” in the FQDNs. • Misc.: This category includes miscellaneous applications and campus internal communications. For mis- 4.2 Classify Usages of the DNS TXT Record Type cellaneous applications, we identified the DNS TXT queries which include: “time.asia.apple.com”, “ap- To identify legitimate usages of the DNS TXT record type ple.com” (push notifications for mail deliveries by and filter out unconfirmed usages, we statistically analyzed Apple), “planex.co.jp” (software updates for network the obtained DNS traffic according to all published official devices manufactured by Planex Corporation), “gate- usages of the DNS TXT record type. Table 1 lists some way.com”, and “xmpp.org” [32] in the FQDNs. For official usages of the DNS TXT record type as well as un- internal communication, we identified the DNS TXT confirmed categories. These are limited to some typical queries which included “titech.ac.jp” in the FQDNs. examples we have found, so it is possible that new usages • Unconfirmed: This is a group of usages of the DNS will appear with the introduction of new applications. Con- TXT record type which were not included in any of the sidering these possible usages of the DNS TXT record type above categories. specifically, we performed the following procedures to categorize the obtained DNS TXT query data. The last category, “Unconfirmed”, contains instances IEICE TRANS. COMMUN., VOL.E101–B, NO.1 JANUARY 2018 74

Table 2 Usages and statistics of DNS TXT record. IP address information” can be used for evaluation of our Category # of queries Ratio [%] research. The former method, we call “URL detection” in SPF and domainkey 12,223 0.24 this research, is based on the database of malicious URLs DNS-based service discovery 213,978 4.30 NFSv4 3,596,481 72.14 provided by URL scanners. And the latter method, we call Anti-Virus 597,901 12.00 “IP address detection” in this research, is based on the IP SPAM Check and DNS Blacklist 180,600 3.63 address database provided by antivirus solutions. Both de- P2P Tracker 446 0.01 tection methods can detect suspicious communications from NTP 632 0.01 Misc 380,723 7.63 the list of IP addresses. The experiment of “Virustotal.com” † Unconfirmed 2,293 0.04 was performed at Dec. 12, 2016 . Total 4,985,277 100 Table 3 shows the detection results of “Virustotal.com” by comparing with the “Unconfirmed” usages. From the results, first, we need to point out that the total detection ratio that cannot be added into any other category. Those usages of our analysis (30.3%) is much higher than “Total” usages of the DNS TXT record type have not been announced offi- that do not use our analysis (9.77%). That is, among the cially by specific application vendors nor are they based on 2,334 IP addresses in total, the “Virustotal.com” detected any standard protocols based on DNS. Therefore, these “Un- 228 IP addresses in URL or IP detection; and among the 330 confirmed” usages possibly include suspicious information IP addresses in Unconfirmed usages, 100 IP addresses were exchange, such as DNS-based botnet communication. The matched with the URL or IP detection. Note that “URL or statistical results are shown in Table 2. Note that we only IP detection” means the sum of “URL detection” and “IP ad- counted the DNS TXT queries that received responses since dress detection”. This will give us the detection results with bot-infected computers need to exchange information with the widest coverage of suspicious IP addresses among detec- the C&C servers. Here, we call a query-and-response pair tion methods we used. Details on each detection category as a query having a response. Consequently, in our statis- can be referred to Table 3. Next, note that the cost effective- tical analysis we obtained 2,293 “Unconfirmed” DNS TXT ness of the detection is much higher in our analysis. That query-and-response pairs as suspicious. is, in our analysis, we detected about 43.9% (100/228) of IP addresses which might involve in malicious communication 4.3 Results of Via-Resolver DNS TXT Query Analysis only by investigating about 14.1% (330/2334) of that in the conventional method. Note that our method cannot detect In Sect. 4.2, we investigated official usages, announced le- 128 (= 228 − 100) IP addresses. This is because Virusto- gitimate usages as well as unconfirmed usages of the DNS tal.com is not specialized to detect only bot communications; TXT record type and obtained 2,293 “Unconfirmed” usage there exist suspicious IP addresses not by bot communica- of DNS TXT query from approximately five million query- tions. To detect such IP addresses, we have to use other and-response pairs of DNS TXT record. This means we can methods by collaboration with our proposed method. greatly reduce the number of query-and-response pairs re- In all, in our analysis we extracted 330 unique desti- quired to be investigated from five millions to 2,293 through nation IP addresses of DNS TXT record queries that might extracting only the unconfirmed usage. The next question have been involved in malicious communication during the is how is the detection ratio through such extractions of the 99 days, which means an average of 3.3 IP addresses per unconfirmed usage. In this section, we therefore investigate day. Considering there are four DNS full resolvers set in the detection ratio calculated by the conventional method. our campus network and based on the network traffic condi- The results are described in Table 3. tions of our university, three or four times as many unique Before calculating the detection ratio, we first count the destination IP addresses will be detected as suspicious per number of destination IP addresses in the DNS TXT queries. day (about 13 IP addresses) from an organization with the The number of total DNS TXT queries is of 4,985,277, same scale as that of our university. We consider this to be which corresponds to 2,334 unique destination IP addresses, a reasonable number for handling by human operators. whereas that of unconfirmed TXT queries is of 2,293, which corresponds to 330 unique destination IP addresses, as shown 5. Indirect/Direct Outbound DNS TXT Query Analysis in the column of “# of unique IP addresses” in Table 3. This equivalents that extractions of the unconfirmed usage As described in Sect. 3.3, indirect and direct outbound DNS reduced the number of destination IP addresses to 14.1%. TXT queries are used for botnet communication in vari- However, if the extracted results do not include botnet com- ous botnets. To detect botnet communication using indirect munications, our method is not effective. To investigate the and direct outbound DNS TXT queries, it is important to effectiveness, we use “Virustotal.com” [33], a free third- distinguish between legitimate and suspicious usages of the party security check web site. “Virustotal.com” provides a † brief check of target IP addresses to see if they are involved The results described in [1], [2] are different from this paper. This is because the experiment in [1], [2] was performed at Apr. 18, in downloading suspicious files and hosting URLs to iden- 2015, which is different date with this paper. Even if the same list tify malware, infected files, malicious web sites, etc. Among of IP addresses sent to Virustotal.com, the statistical results will be them, “Searching for URL scan reports” and “Searching for changed if we tested at different date. ICHISE et al.: ANALYSIS OF DNS TXT RECORD USAGE AND CONSIDERATION OF BOTNET COMMUNICATION DETECTION 75

Table 3 “Virustotal.com” check results. DNS TXT queries unique IP addresses IP detection URL detection IP and URL IP or URL Ratio of IP or URL Total 4,985,277 2,334 161 190 123 228 9.77% (228/2334) Unconﬁrmed 2,293 330 55 87 42 100 30.3% (100/330) Unconﬁrmed / Total 0.0460% 14.1% - - - 43.9% -

destination IP addresses, and then check if NS_DB created in step 2 has these addresses. 4. If the destination IP address of a direct outbound DNS TXT query is in NS_DB, go to check the next record; otherwise, log it for further investigation. 5. Based on the check results of step 3, we create two unique destination IP address lists: one is for indirect outbound DNS TXT queries and the other is for direct Fig. 5 System topology for DNS traffic capture. outbound DNS TXT queries. 6. To confirm the results of our analysis, we also checked the IP addresses in the two IP address lists through a DNS TXT record in indirect and direct outbound DNS TXT security check site, “Virustotal.com”. queries. In this section, we describe our analysis of indirect and direct outbound DNS TXT query data obtained from our In general, we believe that any direct outbound DNS campus network. Note that the query data we analyze in this query can be involved in some kind of malicious commu- section do not include the via-resolver DNS TXT query data. nication and it depends on the convenience of the DNS resource record type. Since use of the DNS TXT record type in 5.1 DNS Traffic Capturing and Analytical Methodology botnet communication has been reported, however, we only analyzed DNS TXT record types that were involved in direct We obtained real DNS traffic from our campus network over outbound DNS queries. a period of about three months (87 days) and analyzed all indirect and direct outbound DNS TXT query data. The 5.2 Results of Indirect/Direct DNS TXT Query Analysis system topology we used to obtain the DNS traffic is shown in Fig. 5. There are four DNS full resolvers set up in our The number of unique destination IP addresses captured per campus network and we assume that some users use them day in indirect and direct outbound DNS TXT queries and while others send DNS queries to the outside DNS servers the corresponding results from the “Virustotal.com” check in the Internet directly. Thus, we captured all DNS traffic are shown in Figs. 6 and 7, respectively. As shown in Fig. 6, in the border routers and investigated the possibility of DNS the total number of unique destination IP addresses captured TXT record usage in botnet communication. To respect the in direct outbound DNS TXT queries varies approximately privacy of internal users, we made the IP address of the from 15 to 120, and the average number of unique IP ad- internal computers in the captured DNS packets anonymous dresses per day is 62. On the other hand, as shown in Fig. 7, in a one-to-one manner in order to trace the behavior in the the total number of unique destination IP addresses in indi- corresponding name resolution (steps 1, 4, a and b in Fig. 5) rect outbound DNS TXT queries varies approximately from after they had been made anonymous (except 2 and 3 due to 160 to 305 and the average number of unique IP addresses they belong to the official full DNS resolvers) We captured per day is 235. These numbers may include duplicated IP the DNS traffic in our campus network from Nov. 1, 2014, addresses, though, since we only divided captured DNS traf- to Jan. 26, 2015, and analyzed the DNS TXT query data fic on a daily basis without considering the TTL-based cache through the following steps. in this analysis. Therefore, in actual operation the number 1. Capture all DNS traffic by mirroring from the border of IP addresses that we would need to process is less than routers and filter out valid query-response pairs. Here, 300 per day, which is a reasonable number that will become “valid” means that in a query-response pair the query smaller if we can remove the duplicated IP addresses. is initialized from an internal computer and the corre- Next, we checked the destination IP addresses captured sponding response is returned. from the indirect and direct outbound DNS TXT queries on 2. Filter out NS records from the query-response pairs ob- “Virustotal.com”. As Fig. 6 shows, the hit rate of the IP tained in step 1 and map corresponding glue A records addresses captured in direct outbound DNS TXT queries is that can also be obtained from the same pairs or suc- comparatively stable at about 5 per day (the average hit rate is cessive queries for the glue A record in case of out-of- about 8%). This number includes the IP addresses that saw bailiwick NS record [34]. After that, store them in a hits in URL-based detection or IP-based detection or both. database called NS_DB. This result indicates that for direct outbound DNS queries, 3. Capture all DNS TXT queries sent directly from the we need to investigate in detail about 5 IP addresses per day, internal computer to the outside and obtain their query which seems not to be large for the network administrators. IEICE TRANS. COMMUN., VOL.E101–B, NO.1 JANUARY 2018 76

DNS TXT queries may have been infected by some kind of malware, especially those in direct outbound DNS queries. This is because direct outbound DNS query requires hard coding of IP addresses for servers, which seem to be suspicious compared with indirect outbound one. Therefore, it is appropriate to detect botnet attacks in the early stage (while searching for C&C server and performing botnet communication) by monitoring indirect and direct outbound DNS TXT queries. Note that DNS protocol is continuously used in botnet communication after C&C server has been identified because it is easy to avoid being monitored. In this evaluation, we used “Virustotal.com” for further investigation and the main purpose of that was to confirm the effectiveness Fig. 6 VirusTotal results for direct outbound DNS query. of our analytical method. Thus, in actual operation we can take action based just on the monitoring results. Consid- ering over blocking and misdetection, we believe it can be mitigated by prerequisite leaning for legitimate IP address of name servers. In addition, like other security solutions, our analysis also has a zero-day vulnerability [35], but this is a challenge with respect to all anti-virus and security-related solutions. Therefore, we need to be very careful when creating policies for the destination IP addresses obtained in the indirect and direct outbound DNS TXT queries.

6. Consideration of Botnet Communication Detection

In Sect. 4, we investigated DNS TXT record type to find via-resolver-based botnet communications. In Sect. 5, we examined botnet communication based on either indirect or Fig. 7 VirusTotal results for indirect outbound DNS query. direct outbound DNS TXT queries. In this section, we dis- cuss the possibility of a comprehensive DNS-based botnet detection system that would cover all the three query types On the other hand, the results for the indirect outbound DNS (via-resolver and indirect/direct outbound DNS query) and TXT queries are not so good, as we can see from Fig. 7. The look at the design of such a system. Via-resolver and in- average hit number of IP addresses per day reached about direct/direct outbound DNS queries based bot communica- 53 (the average hit rate is about 22%), which means about tions are different. We do not tend to compare the detection 19% IP addresses in total had hits in “Virustotal.com” check. methods of via-resolver DNS query and indirect/direct DNS We also checked the destination IP address lists to see which query. We tend to categorize legitimate usages of DNS public DNS resolvers were used in the campus network and TXT record type in via-resolver DNS queries and block the found only Google public DNS resolvers were involved; i.e., malicious destination IP addresses detected in indirect/direct 8.8.8.8 and 8.8.4.4. Since public DNS resolvers can also be outbound DNS queries. This system is based on botnet com- used by bot-infected computers to search for the C&C servers munication detection by monitoring DNS TXT queries. The belonging to indirect outbound DNS TXT queries, we have basic idea of the system is as follows. to include these as part of a target. Regular name resolution 1. Via-resolver DNS TXT queries need to be checked to must retrieve the corresponding authoritative NS records and confirm legitimate usages. their IP addresses during its process. However, the destina- 2. All received NS (Name Server) information, including tion IP address of a direct outbound DNS query cannot be its FQDN and glue A record, needed in an organization traced in NS_DB, and in an indirect outbound DNS query must be stored in a database and updated periodically. it is not usual to use the DNS full resolver partially in one 3. The destination IP addresses of all indirect and direct single name resolution process. Finally, in DNS-based bot- outbound DNS queries must be checked if they are net communication, even if an infected computer repeatedly included in the database. sends a direct outbound DNS query to the same IP address (a C&C server), a method with NS_DB can successfully de- Based on the above points, we introduce two databases tect it since there is no valid NS record corresponding to the into the system, as shown in Fig. 8. The first one (TXT_DB) destination IP address in NS_DB. is for storing information regarding legitimate usages of the Our results indicate that it is possible that the destina- DNS TXT record. The second one (NS_DB) is for register- tion IP addresses captured in indirect and direct outbound ing valid NS information. These two databases are used to fil- ICHISE et al.: ANALYSIS OF DNS TXT RECORD USAGE AND CONSIDERATION OF BOTNET COMMUNICATION DETECTION 77

the system checks the usages of the DNS TXT record type in the TXT_DB. If the usage of the DNS TXT record type is listed in the TXT_DB then the DNS packet will be passed. On the other hand, if the usage of the DNS TXT query is not listed on the TXT_DB, information regarding the target DNS TXT query will be logged for further investigation. Next, we consider the other case, when the source IP address does not belong to an official DNS full resolver, which is in the top branch of the flow chart. In this case, the system first checks whether the destination IP address of the DNS query is listed on NS_DB. Since direct outbound DNS queries without using an official DNS full resolver are Fig. 8 Architecture of a DNS-based botnet detection system (covers via- resolver, and indirect and indirect outbound DNS TXT queries). unusual, we must drop the query if it is not listed on NS_DB. If the destination IP address of the DNS query is listed on NS_DB and the query is the DNS TXT record type, the system then checks TXT_DB. If the DNS TXT query is not listed on TXT_DB, we should log a suspicious queries entry similar to that for the DNS full resolver case. Note that for the special destination IP addresses such as official public DNS resolver, we will register them in the NS_DB in advance. After the system has collected “suspicious queries,” the network operators should investigate the IP addresses listed on the suspicious queries with collaboration of other security facilities to confirm the fact. Basically, we consider the collected queries are suspicious due to they use abnormal usages of DNS protocol. However, this botnet detection system also has some shortcomings. First, it can only detect botnet communication using the DNS TXT record type, which means we cannot find botnet communications using the A and/or CNAME record type. Second, the attacker may implement a DNS TXT-based application that is very similar to a “legitimate” usage, and this will make it more difficult to identify suspicious DNS traffic. Third, the system needs collaboration with other published security information and we may not able to detect highly suspicious botnet communication if there is no necessary information. To overcome the first issue, we can extend the botnet detection system to support A and/or CNAME record types. For the second one, we need to find deeper characteristics of “true legitimate applications” that cannot be mimicked by attackers. For the last one, we need to develop an advanced method to find “highly suspicious” botnet communication — e.g., by using multiple third-party security check web sites in combination to ensure the necessary data is always available. Finally, we also need to consider about name resolution Fig. 9 Flow chart of comprehensive detection system. privacy. DNS over Transport Layer Security (TLS) [36] has been defined as a standard and we need to add a new feature to obtain DNS queries under its deployment. These solutions ter unconfirmed DNS TXT queries and check indirect/direct will be part of our future work. outbound DNS TXT queries which can probably be used for detecting botnet communication. 7. Conclusion The detection procedure of the comprehensive detection system works as shown in Fig. 9, which is a flow chart of the Although the use of DNS TXT record type in botnet com- system. When a firewall (denoted by FW in the figure) munication has been widely reported, the literature does not detects a DNS packet, it first checks the source IP address provide a comprehensive botnet detection system to prevent of the packet. If the source IP address belongs to an official such botnet communication. Our goal in this research has DNS full resolver and the record type is DNS TXT record, been to detect three types of botnet communication based on IEICE TRANS. COMMUN., VOL.E101–B, NO.1 JANUARY 2018 78

DNS TXT queries: through via-resolver as well as indirect [11] M. Feily, A. Shahrestani, and S. Ramadass, “A survey of botnet and and direct outbound DNS TXT queries. To deal with the botnet detection,” Proc. IEEE Int’l Conference on Emerging Secu- via-resolver DNS TXT queries, we analyzed 99-day DNS rity Information, Systems and Technologies, pp.268–273, Glyfada, traffic of TXT records obtained from DNS full resolvers of Greek, June 2009. DOI: 10.1109/SECURWARE.2009.48 [12] S. Bromberger, “DNS as a covert channel within pro- our campus network and categorized its usages. To deal with tected networks,” White paper of Department of Energy, indirect and direct outbound DNS TXT queries, we analyzed http://energy.gov/oe/downloads/dns-covert-channel-within-protecte 87-day DNS traffic of TXT records obtained from the border d-networks, Jan. 2011. gateway of our campus network, which did not include the [13] N.M. Hands, B. Yang, and R.A. Hansen, “A study on botnets utilizing via-resolver DNS TXT query data. DNS,” Proc. ACM Conference on Research in Information Technol- ogy (RIIT’15), pp.23–28, Chicago, IL, USA, Sept.-Oct. 2015. DOI: In the first case, we significantly reduced the number of 10.1145/2808062.2808070 IP addresses needed to be investigated in detail (from 2,334 [14] K. Xu, P. Butler, S. Saha, and D. Yao, “DNS for massive- to 330) by extracting “Unconfirmed” usages of DNS TXT scale command and control,” IEEE Trans. Dependable and Se- record. We then confirmed that among the “Unconfirmed” cure Computing, vol.10, no.3, pp.143–153, May/June 2013. DOI: DNS TXT queries about 30.3% were identified as “highly 10.1109/TDSC.2013.10 suspicious” and it had about 43.9 % common detection rate [15] M. Anagnostopoulos, G. Kambourakis, and S. Gritzalis, “New facets of mobile botnet: Architecture and evaluation,” Int’l Journal of In- with a conventional security check site “Virustotal.com.”. formation Security, 19 pages, Dec. 2015. DOI: 10.1007/s10207-015- In the latter case, through a similar analysis, we found that 0310-0 about 19% and 8% of destination IP addresses were identi- [16] C.J. Dietrich, C. Rossow, F.C. Freiling, H. Bos, M. Steen, and fied as “highly suspicious” in indirect and direct outbound N. Pohlmann, “On botnets that use DNS for command and con- DNS TXT queries, respectively. Based on our analysis, we trol,” Proc. IEEE European Conference on Computer Network De- fence (EC2ND’11), pp.9–16, Gothenburg, Sweden, Sept. 2011. DOI: discussed the possibility of a comprehensive botnet detection 10.1109/EC2ND.2011.16 system and proposed a design for such a system. Although [17] C. Mullaney, “Morto worm sets a (DNS) record,” http://www.syma this botnet detection system has shortcomings, we intend to ntec.com/connect/blogs/morto-worm-sets-dns-record, Aug. 2011. find ways to overcome these in our future work. [18] D. Kaminsky, “The black ops of DNS,” Presented in Black Hat USA 2004, Las Vegas, NV, July 2004. http://www.blackhat.com/presenta tions/bh-usa-04/bh-us-04-kaminsky/bh-us-04-kaminsky.ppt References [19] AnalogBit, “tcp-over-dns,” http://analogbit.com/software/tcp-over- dns, last accessed on July 6 2016. [1] H. Ichise, Y. Jin, and K. Iida, “Analysis of via-resolver DNS TXT [20] J. Riden, “How fast-flux service networks work,” http://www.honey queries and detection possibility of botnet communications,” Proc. net.org/node/132, Accessed on May 30 2016. IEEE Pacific Rim Conference on Communications, Computers and [21] G. Farnham, “Detecting DNS tunneling,” White paper of SANS insti- Signal Processing (PACRIM2015), pp.216–221, Aug. 2015. DOI: tute, https://www.sans.org/reading-room/whitepapers/dns/detecting- 10.1109/PACRIM.2015.7334837 dns-tunneling-34152, 32 pages, March 2013. [2] H. Ichise, Y. Jin, and K. Iida, “Analysis of via-resolver DNS [22] A.M. Kara, H. Binsalleeh, M. Mannan, A. Youssef, and M. Debbabi, TXT queries and detection possibility of botnet communications,” “Detection of malicious payload distribution channels in DNS,” Proc. IEICE Commun. Express, vol.5, no.3, pp.74–78, March 2016. DOI: IEEE Int’l Conference on Communications (ICC2014), pp.853–858, 10.1587/comex.2015XBL0186 Sydney, Australia, June 2014. DOI: 10.1109/ICC.2014.6883426 [3] Y. Jin, H. Ichise, and K. Iida, “Design of detecting botnet com- [23] S. Kitterman, “Sender policy framework (SPF) for Authorizing Use munication by monitoring direct outbound DNS queries,” Proc. of Domains in Email, Version 1,” IETF RFC7208, April 2014. IEEE Int’l Confrerence on Cyber Security and Cloud Comput- [24] D. Crocker, T. Hansen, and M. Kucherawy, “DomainKeys identified ing (CSCloud2015), pp.37–41, New York, NY, Nov. 2015. DOI: mail (DKIM) signatures,” IETF RFC6376, Sept. 2011. 10.1109/CSCloud.2015.53 [25] J. Damas and P. Vixie, “Extension mechanisms for DNS (EDNS0),” [4] S. Khattak, N.R. Ramay, K.R. Khan, A.A. Syed, and S.A. Khayam, IETF RFC6891, April 2013. “A taxonomy of botnet behavior, detection, and defense,” IEEE Com- [26] Google, “Introduction to Google public DNS,” https://developers. mun. Surveys & Tuts., vol.12, no.2, pp.898–924, Oct. 2013. DOI: google.com/speed/public-dns/docs/intro, Accessed on May 30 2016. 10.1109/SURV.2013.091213.00134 [27] Sophos Ltd., “Overview of the Sophos live protection architecture in [5] H. Binsalleeh, “Botnets: Analysis, detection, and mitigation,” Net- SESC 9.5+,” https://www.sophos.com/en-us/support/knowledgebase work Security Technologies: Design and Applications, ed. A. Amine, /111334.aspx, Accessed on May 30 2016. O.A. Mohamed, and B. Benatallah, pp.204–223, IGI Global, Her- [28] Cisco, “Immunet 3.0,” http://www.immunet.com/, Accessed on May shey, PA, Nov. 2013. DOI: 10.4018/978-1-4666-4789-3.ch012 30 2016. [6] S. Soltani, S.A.H. Seno, M. Nezhadkamali, and R. Budiarto, “A [29] AVG, “AVG internet security 2015,” http://www.avg.com/us-en/ survey on real world botnets and detection mechanisms,” Int’l Journal internet-security, Accessed on May 30 2016. of Information and Network Security, vol.3, no.2, pp.116–127, April [30] SpamCop, “How can I check if an IP is on the list?” https://www. 2014. spamcop.net/fom-serve/cache/351.html, Accessed on May 30 2016. [7] McAfee, “McAfee labs threats report,” http://www.mcafee.com/us/ [31] Spamhaus project, “Spamhaus,” https://www.spamhaus.org, Ac- resources/reports/rp-quarterly-threats-mar-2016.pdf, March 2016. cessed on May 30 2016. [8] OpenDNS inc., “The role of DNS in botnet command and control,” [32] J. Hildebrand, P.Saint-Andre, and L. Stout, “XEP-0156: Discovering online http://info.opendns.com/rs/opendns/images/OpenDNS_Secu alternative XMPP connection methods,” http://xmpp.org/extensions/ rityWhitepaper-DNSRoleInBotnets.pdf, 2012. xep-0156.html, Accessed on May 30 2016. [9] P. Mockapetris, “Domain names: Concepts and facilities,” IETF [33] Google, “Virustotal,” https://www.virustotal.com/en/, Accessed on RFC1034, Nov. 1987. May 30 2016. [10] P. Mockapetris, “Domain names: Implementation and specification,” [34] W. Hardaker, “Child-to-parent synchronization in DNS,” IETF IETF RFC1035, Nov. 1987. RFC7477, March 2015. ICHISE et al.: ANALYSIS OF DNS TXT RECORD USAGE AND CONSIDERATION OF BOTNET COMMUNICATION DETECTION 79

[35] Symantec, Inc., “What is a zero-day vulnerability?,” http://www. pctools.com/security-news/zero-day-vulnerability/, Accessed on May 30 2016. [36] Z. Hu, L. Zhu, J. Heidemann, A. Mankin, D. Wessels, P. Hoﬀman, “Speciﬁcation for DNS over transport layer security (TLS),” IETF RFC7858, May 2016.

Hikaru Ichise received the B.S. degree in mathematics from Kwansei Gakuin University, Sanda, Japan in 2008. Currently, he is a technical staﬀ in Tokyo Institue of Technology, Tokyo, Japan. His research interest is of detecting botnet communciations using DNS protocol.

Yong Jin received his M.E. degree in elec- tronic and information systems engineering and Ph.D. degree in Industrial Innovation Sciences from Okayama University, Japan in 2009 and 2012, respectively. In April 2012, he joined Na- tional Institute of Information and Communica- tions Technology, Japan, as a researcher. From October 2013, he joined the Global Scientiﬁc Information and Computing Center of Tokyo In- stitute of Technology as an assistant professor. His research interests include network architecture, traﬃc engineering and Internet technology. He is a member of IEICE.

Katsuyoshi Iida received B.E., M.E. and Ph.D. degrees in, respectively, Computer Sci- ence and Systems Engineering from Kyushu In- stitute of Technology (KIT), Iizuka, Japan in 1996, Information Science from Nara Institute of Science and Technology (NAIST), Ikoma, Japan in 1998, and Computer Science and Systems En- gineering from KIT in 2001. Currently, he is an Associate Professor in the Information Initiative Center, Hokkaido University, Sapporo„ Japan. His research interests include network systems engineering such as network architecture, performance evaluation, QoS, and mobile networks. He is a member of the WIDE project and IEEE. He received the 18th TELECOM System Technology Award and the Tokyo Tech Young Researcher’s Award in 2003 and 2010, respectively.