
Study on OS Fingerprinting and NAT/Tethering based on DNS Log Analysis Deliang Chang Qianli Zhang Xing Li Tsinghua University CERNET Center CERNET Center [email protected] Tsinghua University Tsinghua University [email protected] [email protected] ABSTRACT What's more, NAT devices are being widely deployed OS fingerprinting and NAT detection are considered impor- in the Internet nowadays. NAT detection is often con- tant in various researches like network troubleshooting, de- sidered important by ISPs since the share of one IP ployment of services. Previous passive approaches usually address among several devices may allow the unautho- require raw network traffic, which is often difficult to de- rized use of the network. The presence of NAT devices ploy. In this paper, a novel method is designed to fingerprint also breaks the end-to-end connectivity and make the the OS and classify the NAT only using DNS log. Features network topology more complex. of the Windows, MacOS/iOs, Android and Linux operating There're several approaches about NAT detection prob- systems can be automatically extracted from labelled DNS lem. If more than one are devices detected from the log. With these features a simple classifier can fingerprint same IP at the same time, it shows a great possibility the OS types of these devices accurately. We apply this al- of presence of NAT or tethering behavior. Many active gorithm on data set from a large scale network. Analysis or passive approaches were proposed to detect the pres- also reveals that nowadays the Windows operating systems ence of NAT or tethering. Tools [12, 1] we mentioned are widely used in tethering or NAT, which is contrary to our before are also able to detect NAT devices. TTL value previous knowledge. in [11] or IPID sequences pattern in [4] was used as feature to count hosts and detect NAT devices. Some Categories and Subject Descriptors methods combined several features to improve accuracy. For example, Beverly in [5] developed a naive Bayesian C.2.3 [Computer-Communication Networks]: Net- classifier and used TTL, don't fragment (DF) and other work OperationsNetwork Monitoring field of TCP/IP header as features. Most previous methods [12, 9, 2, 6, 11, 4] were based General Terms on traffic analysis. Especially they focused on HTTP Algorithm, Measurement traffic, which often incurs in-depth analysis on raw pack- ets. However, raw packet capture is often deemed hard Keywords for a large-scale network. In this paper, we resort to a novel method to finger- OS Fingerprinting, DNS, Supervised Learning print OS types by DNS logs. DNS service can be pro- vided by network administrators or third-party providers. 1. INTRODUCTION Any provider would have DNS requests logged when Fingerprinting operation system (OS) on a remote de- users use its service. Moreover, DNS [10] runs on appli- vice passively is often required by network administra- cation layer and need no changes of a running network. tors to make network policy or deploy network services. The logs are also easier to store than raw packets as OS fingerprinting is also widely used in NAT detection. well. There are some previous works about OS fingerprint- In our algorithm, we can successfully fingerprint OS ing, since it's a research area that attracts high atten- types by only using the DNS logs. The previous work tion for a long time. Some approaches [1, 7, 3] are based on traffic analysis focused on "how" a device active, which means sending probe packets to devices transmit a packet, and in our paper we're interested need to be fingerprinted. Another approaches [12, 8, 6] in "what" do they transmit. Since the algorithm using uses passive ways. They fingerprint the OS by process- DNS log is behavior based, methods that aim to defeat ing information collected from those devices passively. or frustrate the OS fingerprint algorithm by manipu- They used some fields in TCP/IP headers or other kinds lating the packets' format will not succeed anymore. of traffic based on knowledge about OS behavior, and And furthermore, we'll study about NAT based on the some of them use combined features as well. 1 method and result of OS fingerprinting. formula (1). Then for a domain name d and an OS cat- 0 egory os0, we further use value minimum value of C of 2. OVERVIEW OF PROPOSED METHOD os0 and each other OS category (shown in formula(2) ). The basic idea to fingerprint the OSes using DNS log It represents the "worst possible" discriminatory power is, typically, most modern operating systems may use of d for os0. os-specific DNS queries to implement tasks like achiev- At last we score and sort every domain name in the DNS log for each OS automatically by C . In this ing the latest OS patch. Also, some operating sys- d;os0 temes may have some os-specific applications to incur way we can find "the most representative domain names os-specific DNS queries, for example, iMessage in iOS for each kind of operating system". We use these do- systems. Thus we can infer the OS types by this knowl- main names as our features. We call the set of features edge. representing an OS category os0 as "a feature set" of To extract these OS specific features, we borrow the os0. We also make sure that each feature set has the text categorization concepts. same size. First of all, we use the domain name field of the DNS log entry as our training sample, since the domain (ad − bc)2(a + b + c + d) C0 = (1) names may carry information about the OS it comes d;os1;os2 (a + b)(c + d)(a + c)(b + d) from. It represents the activity of the client. In this C = min C0 (2) d;os0 d;os0;osi context, a user's activity can be viewed as an article, osi6=os0;osi2C while each queried domain name can be regarded as a word in such an article. We try to extract feature has domain d in its DNS log? from all those articles and select the best ones by using yes no some feature selection algorithm of text categorization belongs to os1 a c in section 3.1. Then the OS fingerprinting problem is belongs to os2 b d equal to classify a new article of these feature domain names. We build a simple classifier in section 3.2 to Table 1: Contingency Table deal with this problem. It is demonstrated that by only using a small group of domain names we can fingerprint OSes easily and accurately. As for NAT, we consider it 3.2 Classification a multi-class classification problem. We'll discuss it in There have been a lot of classifiers designed for var- section 4.2. At last, the paper is concluded in section ious purposes. Naive bayes, decision tree or k-nearest 5. neighbors algorithm are all capable of fingerprinting the OS after the features have been selected. 3. ALGORITHM Since the whether a domain name appearance in one IP address's DNS log is a simplest yes-no question, ham- 3.1 Feature Extraction ming distance may be appropriate for our classifier. We define the feature set of OS as S , a remote de- Devices running different OSes may have different ac- i OSi cess behavior. Based on this observation, we train DNS vice needed fingerprinting as devj, and the set consists of dev 's domain records as P . We compute the in- log labelled by DHCP fingerprinting and extract the j devj tersection of set S for each OS , then we choose the features for each category of OS. OSi i For a specific operating system, it will resolve some OS with the biggest cardinality of the intersection. In specific domain names to implement some necessary other words, for a device devj, we identify devj's OS as tasks such as fetching latest update patch or clock syn- OS0 when it meets the following equation. chronization. Some os-specific applications with auto OS0 =arg max jSOSi \ Pdevj j; update or sync function may also generate particular OSi (3) DNS queries. Moreover, users using different OSes may OSi 2 fandroid; linux; mac; wing have different preference. The task is thus to find the OS specific DNS queries. 4. EVALUATION AND DISCUSSION As we discussed in section 2, we regard a device's DNS log as an articile. Then we choose Chi-squared 4.1 OS Fingerprinting test as our feature selection methods. Since it has a Though the concept of the classifier is simple, the clear mathematical meaning, is easy to implement, and result in figure 1 clearly shows that our approach is of low complexity and good effect. functioning in OS fingerprinting problem. For a domain name and two OS categories, the con- We use DNS log dumped from the DNS resolver of tingency table is shown in table 1. a, b, c, d are num- Tsinghua University. Many of devices in Tsinghua Uni- ber of IP addresses. And the score is calculated using versity use this server as their default DNS server, and 2 ing as home routers of TP-LINK, NET-GEAR or other brands. Other OS are also identified by DHCP method, which means that it's the operating system running at device on network access point. Differences between NAT and non-NAT devices can be observed clearly in figure 2b. According to the fig- ure, about 90% of devices running a normal OS has less than 2 features from other feature sets apart from Figure 1: Evaluation in Different Amount of Features the first one, which is less than 50% for NAT devices.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages4 Page
-
File Size-