THE DOMAINTOOLS REPORT, 2016 EDITION THE DISTRIBUTION OF MALICIOUS DOMAINS

SUMMARY

In our previous reports, we profled malicious domains by describing patterns in their registration details: top level domain (TLD), free email provider, Whois privacy provider, and hosting location. In this edition, we compared the distributions of malicious domains vs neutral domains across a measure of age (both of the domain and of the name server domain) and a measure of the entropy of the . We also examined malicious domains across registrars to fnd additional clues as to how and when these domains were registered.

KEY FINDINGS

DOMAIN AGE DOMAIN NAME ENTROPY Even among young domains, there are far more Domain names with high entropy—that is, those that neutral than malicious domains. However, when we are gibberish combinations of letters and numbers— examine bad domains as a class, many more of them are more likely to be malicious than linguistically are relatively young. Neutral domains, as a class, coherent domains. While this may not be surprising, show less of a skew toward youth. it is informative to see the specifc data.

NAME SERVER DOMAIN AGE DOMAIN REGISTRAR Most domains have a name server associated with Some domain registrars stand out for having high them. The domains of the name servers themselves percentages of malicious domains registered through can act as a statistical signal; the signal shows that them. And, in one particular case, the registrar more malicious domains have comparatively young also has fairly high absolute numbers of malicious name server domains. domains in addition to a high percentage.

© Copyright DomainTools, 2016. All Rights Reserved. 1 OVERVIEW

In the DomainTools Report, we mine DomainTools As in earlier editions of The DomainTools Report, data in order to discover patterns in domain having nearly all of the existing domains’ registration registrations that may help researchers or security information at our fngertips has allowed us to pull out analysts learn more about concentrations of some interesting patterns. Ultimately, we believe it malicious activity. In the frst two reports, we will be possible to predict the likelihood that a new or examined attributes such as top level domain (TLD), previously-unseen domain will be malicious, based on its Whois privacy providers, and registration behaviors unique composition of attributes. We will do this via a of domain registrants strongly connected to high- variety of techniques including machine learning. volume malicious activity.

NOTE We believe that malicious actors behave in a DomainTools already has a proven algorithm for predictable manner, and the more we profle that predicting risk of domains based on how tightly-coupled behavior, the better we can defend against them. they are to existing malicious activity. The work described Those prior reports found high concentrations of here may be able to complement that algorithm by malicious domains in various Japanese and Chinese identifying risky domain profles even when the domains privacy providers, email providers, and bulk domain in question are not closely connected to prior bad registration agents. The data in those reports have behavior. helped us paint a broad picture, and the data in this latest report hopes to add to our understanding of cybercriminals. Like snowfakes or fngerprints, no two domains are exactly alike. At a minimum, each name is unique, but in For this edition of the report, we examined several most cases there are multiple attributes that differ. Some new attributes, some of which readers may have of these differences—individually or in concert with considered before. They include: others—may help predict the risk level of the domain.

>> Age of the domain (as of Feb 2016) >> Age of name server domain (as of Feb 2016) >> Entropy of the domain name composition >> Registrar of the domain

© Copyright DomainTools, 2016. All Rights Reserved. 2 METHODOLOGY AND CHARTING

THE DATA SET Readers of earlier editions will recall that our methodology is to look at well-vetted blacklists and at the entire population of active domains. We attempt to fnd spikes in the relative concentration of known-bad domains versus the overall background levels of badness.

For this report, the data set we examined was approximately 140,000,000 domains extracted from passive DNS data. For the set of malicious domains, we used data from high quality blacklist feeds from partners of DomainTools. To be clear, while there are well over 300 million domains currently registered worldwide, we opted to analyze the subset that are actually seen in DNS requests as we believe the results of the analysis are more relevant when they focus on domains actually receiving traffc.

CALCULATIONS In previous editions of the DomainTools Report, we introduced what we call “VCP” Charts (Volume, Concentration, Proportion). In this report, we establish a new calculation, which we call “Signal Strength.” This describes how malicious domains are distributed across a certain linear attribute, such as age, compared to how neutral domains are distributed across the same attribute. It is essentially a measurement of how much the distribution skews towards being an indicator of a malicious domain.

For example: of all blacklisted domains, 4.46% are currently between 12 and 13 months old. Of all neutral domains, 1.52% of them are in this same age range. Therefore, if we compare the two percentages, the rate at which malicious domains fall into that age range is 2.93 times the rate at which neutral domains fall into that same range. We call this a “signal strength” of 2.93.

Please note that, to improve legibility, the age data we present in this report is grouped by quarter (3 months, 6 months, 9 months, and so forth).

Why do we measure signal strength across each value in the distribution? While our report can simply compare the two distributions as a whole through standard statistical measures, we wanted our research to help inform our risk scoring and reputation scoring algorithms as to which attributes and which values within that attribute indicate maliciousness, and the relative strength of the signals.

© Copyright DomainTools, 2016. All Rights Reserved. 3 DOMAIN AGE

Many security professionals are leery of brand-new domains. Some have even suggested that all new domains should NOTE go through a mandatory “waiting period” during which Malicious domains don’t tend to stay around as long they must prove themselves to be free of harmful activity as neutral ones. Some of them are taken down by law (malware, phishing, etc) before they can be released into enforcement, ICANN, research/white-hat sinkholes, etc. DNS for general use. These ideas are well-intentioned but Others are used for a brief time and then discarded as they aren’t likely be adopted in the near future. Thus, it falls they appear on blacklists and become less effective. to the security community to provide effective defenses Blacklisted domains don’t necessarily get taken down against harmful domains of any age. per se, but they often are not renewed by their owners and thus drop out of DNS after a year or two. We examined the rates at which domains of various ages appeared on blacklists. The results (depicted in the frst chart in this section) tend to support at least some level of “age KEY TAKEAWAYS discrimination” against domains. However, compared to all As we can see, the distributions show a signifcantly younger existing and active domains, these are still comparatively average for malicious domains than for neutral domains. small numbers. It makes the most sense from an analytical Malicious domains tend to be younger, and they will not standpoint to look at the age distribution of all domains in a remain active for any extended periods of time. For anyone classifcation, the classes being “malicious” and “neutral.” who has studied spam or phishing campaigns, this may not be surprising. Domains used for those activities are often Given the huge population of domains over the history of registered, used, and discarded over a very brief period of the Internet, and because a lot of malicious domains are time—sometimes well under one day. ephemeral (see note), it is logical that neutral domains skew older than malicious ones. Of all malicious domains reported The last chart on the next page provides another way of by our blacklist feeds, over 75% are under 13 months old. seeing that maliciousness is heavily skewed towards younger However, it is still worth noting that as a percentage of domains. This is especially true at 21 months and younger, as all youthful domains, malicious ones are a relatively low these have signal strength well above the average. population.

Age Distribution of Malicious Domains (by month) Age Distribution of Neutral Domains (by month) 25% 25%

20% 20% n n 15% 15% o o i i t t u u b b i i r r t t s s i i d d

f f o o

% 10% % 10%

5% 5%

0% 0% 5 1 7 3 9 5 1 7 3 5 9 1 5 7 1 3 7 9 3 5 9 1 5 7 1 3 7 9 3 5 9 1 5 7 1 3 7 9 3 5 9 1 5 7 1 3 7 9 3 5 9 1 5 7 3 9 5 1 7 3 9 5 1 1 1 1 1 1 9 7 9 3 3 7 9 3 7 9 9 3 7 9 3 3 7 9 3 7 9 3 5 5 5 5 5 5 0 1 1 2 2 3 0 4 1 4 1 5 2 5 2 6 3 7 4 7 4 8 5 5 8 6 9 7 0 7 0 8 1 8 1 9 2 0 3 0 3 1 4 1 4 2 5 3 6 3 6 4 7 4 7 5 8 6 6 7 7 8 2 5 8 2 5 8 2 3 3 5 6 2 3 6 8 9 3 5 6 9 6 8 9 9 1 4 1 7 4 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

© Copyright DomainTools, 2016. All Rights Reserved. 4 The two charts on the previous page show the distribution over time of malicious and neutral domains respectively. The frst chart on this page overlays both distributions. It is important to note that the X-axis is “contribution to the class,” not absolute numbers—in other words, the taller red bars do not mean there are more bad domains than good domains. The last chart shows the signal strength calculation for maliciousness as a function of domain age, including a line indicating the average and a 95% confdence interval.

Age Distribution of Domains - Malicious vs Neutral (by month) 25%

20%

15% n o i t u b i r t s i d

f o

% 10%

Age of Domains - Signal Strength (by month) 4 h

5t %

g 3 n e r t S

2 l a n g i 1 S Average 0%0 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 1 1 9 7 9 3 3 7 9 3 7 9 3 5 5 5 0 1 1 2 2 3 4 4 5 5 6 7 7 8 8 9 0 0 1 1 2 3 3 4 4 5 6 6 7 7 8 2 5 8 2 3 3 5 6 6 8 9 9 1 4 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Age of Domains - Signal Strength (by month) 4.0

3.5

3.0

2.5 h t g n e r t S

l 2.0 a n g i S

1.5

1.0

0.5 Average

0.0 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 1 1 9 7 9 3 3 7 9 3 7 9 3 5 5 5 0 1 1 2 2 3 4 4 5 5 6 7 7 8 8 9 0 0 1 1 2 3 3 4 4 5 6 6 7 7 8 2 5 8 2 3 3 5 6 6 8 9 9 1 4 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

© Copyright DomainTools, 2016. All Rights Reserved. 5 NAME SERVER DOMAIN AGE

Almost every domain in DNS requires a name server host Confcker and its variants were widespread from 2008-2011 that is authoritative for the domain. Name servers, in turn, or so (and have reappeared occasionally since), but the include a domain; for example, in ns1.domaincontrol.com, the sinkholes were a key part of fghting back against the virus. name server domain is “domaincontrol.com.” We applied the Because part of Confcker’s “lifecycle” was to generate large same analytical framework to the name server domain age numbers of domains, the sinkholes accumulated the high as we did with domain age to identify whether or not there numbers that contribute to the spike seen here. Sinkholes was a strong signal of maliciousness. are a method that researchers and white-hat hackers use to neutralize command-and-control infrastructure or to study The chart on this page shows an interesting pattern. It malware. The sinkhole causes the malware either to connect turns out that if a domain is malicious, it is much more likely to non-routable IP addresses (effectively halting it) or to that it has a young name server domain. Aside from a few connect to servers under the researcher’s control. outliers, 4 years seems to mark a fairly strong threshold for this signal. In other words, for domains with name server KEY TAKEAWAYS domains older than 4 years, there’s not much of a signal, As we compare the mean and median of the age of name except for a couple of isolated spikes that represent large server domains linked to badness, we see that they are volumes associated with specifc name server domains that signifcantly younger than those corresponding to neutral came online during specifc, brief intervals. The one at 51 domains. The signal degradation over time is not as clear months is particularly high, and merited further investigation. as it is for the age of the domain itself. Nonetheless, we can confdently say that younger name server domains correlate It turned out that 51 months ago, late 2011, several “sinkhole” to more malicious activity than older ones. name servers were activated for the Confcker worm.

Age of Name Server Domain - Signal Strength (by month)

35

30

25 h t g n e r t 20 S

l a n g i S 15

10

5

Average 0 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 3 9 5 1 7 1 1 1 9 7 9 3 3 7 9 3 7 9 3 5 5 5 0 1 1 2 2 3 4 4 5 5 6 7 7 8 8 9 0 0 1 1 2 3 3 4 4 5 6 6 7 7 8 9 9 2 5 8 2 3 3 5 6 6 8 9 9 1 4 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

© Copyright DomainTools, 2016. All Rights Reserved. 6 ENTROPY IN DOMAIN NAMES

Most security analysts have likely seen high entropy domain In our calculations of entropy, the higher the number, the names. “Entropy” in this context refers to linguistically more randomness there is in the domain name. The frst chart chaotic patterns of characters in domain names. For in this section tells the tale. There is a well-defned curve, in example, the name “domaintools.com” has very low entropy, the neutral domains—by far the largest pool—where domain because the combinations of letters that make up the name names of low entropy form the bulk of the distribution, and are not random and they appear frequently in English and the numbers diminish sharply as entropy increases. They level other languages. A name like “fqwqxqyqkxqfz.com,” on the off at low numbers as the names become increasingly random. other hand, has high entropy. A human can spot a high- Malicious domains, taken collectively, have a slightly different entropy domain name at a glance, but to analyze millions profle, showing more in the high-entropy region than the of domains, we let computers do the work. We created neutral domains. algorithms that calculated the entropy of all active domains, and compared the neutral and malicious pools. The second chart (next page) breaks out three different categories of malicious activity: spam, phishing, and malware. In the vast majority of cases, gibberish domain names have Here the curves diverge, and each of the malicious categories no benefcial purpose. They are typically auto-generated has a slightly different distribution. The spam domains show a and used for machine-to-machine communication such as secondary peak in the region of high entropy. Since spammers botnet command and control channels, spam campaigns, use and discard high volumes of domains, they often use or other malicious activity. The only legitimate use of such domain generation algorithms (DGAs) to effciently generate constructions that we have ever encountered is domains large numbers of domain names. DGAs often produce high used in secure encrypted communication products; but entropy domain names, which likely explains that secondary this is a very low incidence compared to the numbers of peak. malicious high-entropy domain names.

Entropy of Domain Names (Malicious vs Neutral)

9%

8%

7%

6% n o i t u b i

r 5% t s i d

f o

% 4%

3% Entropy of Domain Names (Signal Strength)

2h %6 t g n e r t 4 S

1l % Average a n

g 2 i S 0%0 1 3 5 7 9 1 3 1 1 1 1 1 9 7 3 9 7 3 9 7 3 9 7 3 9 7 3 9 7 3 5 5 5 5 5 5 0 0 0 0 0 1 1 5 6 7 8 9 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 4 5 6 7 8 9 1 1 1 1 1 1 1

© Copyright DomainTools, 2016. All Rights Reserved. 7 KEY TAKEAWAYS As we look at the signal strength, we see a cluster of above-average signals in the higher entropy ranges, corresponding to higher rates of badness. We also notice a difference in the distributions of different types of maliciousness across the entropy spectrum, with spam being much higher than phishing and malware. Phishing domains are most similar in entropy to neutral domains, and this makes sense because phishing domains are intended to imitate legitimate domains.

Entropy of Domain Names (by Category)

Malware 8% Phishing Spam

7%

6% n o

i 5% t u b i r t s i d

f 4% o

%

3%

2% h

t 6 g n e r t 4

1S %

l Average a n

g 2 i S 0%0 1 3 5 7 9 1 3 1 1 1 1 1 9 7 3 9 7 3 9 7 3 9 7 3 9 7 3 9 7 3 5 5 5 5 5 5 0 0 0 0 0 1 1 5 6 7 8 9 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 4 5 6 7 8 9 1 1 1 1 1 1 1

Entropy of Domain Names (Signal Strength) 7

6

5 h t

g 4 n e r t S

l a n g i

S 3 Average

2

1

0 1 3 5 7 9 1 3 1 1 1 1 1 9 7 3 9 7 3 9 7 3 9 7 3 9 7 3 9 7 3 5 5 5 5 5 5 0 0 0 0 0 1 1 5 6 7 8 9 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 4 5 6 7 8 9 1 1 1 1 1 1 1

© Copyright DomainTools, 2016. All Rights Reserved. 8 DOMAIN REGISTRARS

The domain registrar, readily visible in a Whois record, can In addition to the chart on the next page, the tables below be analyzed in the way we examined attributes in previous show the top 10 registrars by total malicious domains and reports. We broke out the pool of domains by registrar to by concentration of malicious domains (minimum of 1,000 see whether specifc ones showed “hot spots” of malicious malicious domains). domains. Concentration is more important in this case than absolute number, because many registrars have lots of bad A few registrars stand out for having comparatively high domains, but their overall rates of badness are relatively percentages of malicious registrations, but at low absolute low. GoDaddy is a good example; as a signal, GoDaddy numbers. There is one notable outlier, with a relatively high registration doesn’t tell us much about the domain’s risk volume and high rate of maliciousness. This registrar (GMO) level. A lot of bad domains are registered through GoDaddy, seems to be favored by certain spammers who use the but much higher numbers of neutral domains are as well. domains—mainly registered with .co.jp email addresses—for Similarly, as shown in the original DomainTools report, many large spam campaigns. malicious domains are registered using a gmail.com email address, but the concentration of badness tied to gmail KEY TAKEAWAYS addresses is well below the average. With large-scale access to data, and some automation kung-fu, one could theoretically create a security For this analysis, we return to the VCP Chart format. The rule that blocked or quarantined messages sent from domains chart (next page) shows how registrars compare in terms of tied to a specifc registrar. In practice, such a rule would almost volume (absolute numbers of domains), concentration (rate certainly block some legitimate traffc, causing headaches of malicious versus neutral domains), and proportion of the for users. But as one attribute among many that collectively different malicious activity types for each registrar. Note the compose a risk profle, the registrar attribute does provide a ones above both averages, particularly the ones above the discernible signal as there are some registrars with very high average concentration. concentrations of malicious domains.

REGISTRAR MALICIOUS % REGISTRAR %

1 GMO Internet Inc. 307,046 11.86% 1 Nanjing Imperiosus 32.67%

2 GoDaddy.com, LLC 170,356 0.87% 2 Xiamen Nawang Technology 17.02%

3 PublicDomainRegistry.com 82,464 2.77% 3 DomainContext, Inc. 12.50%

4 eNom, Inc. 78,442 1.44% 4 GMO Internet Inc. 11.86%

5 Alpnames Limited 57,337 6.06% 5 Todaynic.com Inc. 9.99%

6 Xiamen Nawang Technology 35,924 17.02% 6 Shanghai Meicheng Technology 9.25%

7 Xin Net Technology Corporation 31,848 3.46% 7 TLD Registrar Solutions Ltd. 8.94%

8 Chengdu West Dimension 28,762 1.34% 8 Chengdu Fly-Digital Technology 8.05%

9 Name.com, Inc. 28,432 3.69% 9 Alpnames Limited 6.06%

10 HiChina Zhicheng Technology 25,992 1.21% 10 Xiamen ChinaSource Internet Service 6.01%

© Copyright DomainTools, 2016. All Rights Reserved. 9 Malicious Domains by Registrar (Volume vs Concentration)

40.00%

30.00% Nanjing Imperiosus Technology Co. Ltd.

20.00%

15.00% Xiamen Nawang Technology Co., Ltd

DomainContext, Inc. 10.00%

Shanghai Meicheng Technology Information Development Co., Ltd. GMO Internet Inc. 7.00% Chengdu Fly-Digital Technology Co., Ltd

Xiamen ChinaSource Internet Service Co., Ltd. 5.00% Alpnames Limited Mijn InternetOplossing B.V. 4.00% CommuniGal Communication Ltd. Average , Inc.

3.00% Netowl, Inc. Web Commerce Communications Limited dba WebNic.cc Xin Net Technology Corporation Limited Liability Company "Registrar of domain names REG.RU" Shanghai Yovole Networks, Inc. Jiangsu Bangning Science and technology Co. Ltd. n

o Instra Corporation Pty Ltd. PDR Ltd. d/b/a PublicDomainRegistry.com i

t 2.00%

a Beijing Innovative Linkage Technology Ltd. dba dns.com.cn r t

n BigRock Solutions Ltd. e c n

o 1.50% C Internet Domain Services BS Corp Namesilo, LLC Crazy Domains FZ-LLC Chengdu West Dimension Digital Technology Co., Ltd. eNom, Inc. 1.00% HiChina Zhicheng Technology Limited

Key-Systems GmbH Vautron Rechenzentrum AG 0.70% West263 International Limited Hangzhou AiMing Network Co., LTD OnlineNIC, Inc. GoDaddy.com, LLC DNC Holdings, Inc. Melbourne IT, Ltd 0.50% Alibaba Cloud Computing Ltd. d/b/a HiChina (www.net.cn) 0.40%

Gandi SAS Domain.com, LLC 0.30%

eName Technology Co., Ltd.

0.20% FastDomain Inc. Ascio Technologies, Inc. Danmark - Filial af Ascio technologies, Inc. USA 0.15%

malware 0.10% phishing spam , LLC 0.07% Average 1,000 2,000 5,000 10,000 20,000 50,000 100,000 200,000 Malicious Domains

ABOUT VCP CHARTS This chart plots the total number of malicious domains on the X-axis vs the concentration of malicious domains on the Y-axis, using a logarithmic scale on both axes. Each mark is a pie chart showing the relative proportion of types of malicious activity. The total size of the pie charts represents the relative volume of malicious domains. The crossing gray lines show 95% confdence intervals around the averages for each axis.

© Copyright DomainTools, 2016. All Rights Reserved. 10 BUILDING A COMPOSITE PICTURE

The signals here are all relatively subtle within the However, these signals may prove extremely valuable in approximately 140 million total domains we surfaced via combination. An ongoing DomainTools project seeks to use passive DNS data. With the possible exception of name machine learning and other techniques to analyze various entropy, none of the signals by themselves are strong composites of attribute signals to develop high-confdence enough to be dispositive. Even with entropy, if one were domain risk assessment. to block all high-entropy domain names, there could be (rare) false positives in the form of the encrypted In the meantime, we hope that these analyses are helpful to communications domains mentioned earlier. Similar security professionals, researchers, and anyone else interested caveats would apply to security rules based on any of the in better understanding large-scale patterns in domain other attributes taken one at a time. registration data with respect to nefarious activities.

ABOUT DOMAINTOOLS WORLD’S LARGEST DNS FORENSICS DATABASE**

DomainTools is the leader in domain name, DNS and Internet OSINT-based cyber >> Over 300 million known threat intelligence and cybercrime forensics products and data. With over 14 years domains in DNS of domain name, DNS and related ‘cyber fngerprint’ data across the Internet, DomainTools helps companies assess security threat risks, profle attackers, >> 10 Billion+ current and investigate online fraud and crimes, and map cyber activity in order to stop attacks. historical Whois records

Our goal is to stop security threats to your organization before they happen, using >> 4.5 Billion+ IP address domain/DNS data, predictive analysis, and monitoring of trends on the Internet. We change events collect and retain Open Source Intelligence (OSINT) data from many sources and we index and analyze the data based on various connection algorithms to deliver actionable intelligence, including domain scoring and forensic mapping. >> 1.8 Billion+ Registrar change events DomainTools uses over 10 billion related DNS data points to build a map of ‘who’s doing what’ on the Internet. Government agencies, Fortune 500 companies and >> 3 billion+ name server leading security frms use our data as a critical ingredient in their threat investigation change events and cybercrime forensics work. ** These figures are from Q1 2016, For more information about DomainTools' data and products, please visit our but they are inherently out of date, as we add about 5M records a day. at www.domaintools.com.

[email protected] 206.838.9020 www.domaintools.com © Copyright DomainTools, 2016. All Rights Reserved. 11