Qflow: a Fast Customer-Oriented Netflow Database for Accounting and Data Retention

qflow: a fast customer-oriented NetFlow database for accounting and data retention Hallgrímur H. Gunnarsson FacultyFaculty of of Industrial Industrial Engineering, Engineering, MechanicalMechanical Engineering Engineering and and ComputerComputer Science Science UniversityUniversity of of Iceland Iceland 20142014 QFLOW: A FAST CUSTOMER-ORIENTED NETFLOW DATABASE FOR ACCOUNTING AND DATA RETENTION Hallgrímur H. Gunnarsson 60 ECTS thesis submitted in partial fulfillment of a Magister Scientiarum degree in Computer Science Advisors Snorri Agnarsson Helmut Neukirchen Faculty Representative Jón Ingi Einarsson Faculty of Industrial Engineering, Mechanical Engineering and Computer Science School of Engineering and Natural Sciences University of Iceland Reykjavik, September 2014 qflow: a fast customer-oriented NetFlow database for accounting and data retention 60 ECTS thesis submitted in partial fulfillment of a M.Sc. degree in Computer Sci- ence Copyright c 2014 Hallgrímur H. Gunnarsson All rights reserved Faculty of Industrial Engineering, Mechanical Engineering and Computer Science School of Engineering and Natural Sciences University of Iceland Hjarðarhagi 2-6 107 Reykjavik, Reykjavik Iceland Telephone: 525 4000 Bibliographic information: Hallgrímur H. Gunnarsson, 2014, qflow: a fast customer-oriented NetFlow database for accounting and data retention, M.Sc. thesis, Faculty of Industrial Engineering, Mechan- ical Engineering and Computer Science, University of Iceland. Printing: Háskólaprent, Fálkagata 2, 107 Reykjavík Reykjavik, Iceland, September 2014 Abstract Internet service providers in Iceland must manage large databases of network flow data in order to charge customers and comply with data retention laws. The databases need to efficiently handle large volumes of data, often billions or trillions of records, and they must support fast queries of traffic volume per customer over time and extraction of raw flow data for given customers. Popular open-source tools for storing flow data, such as nfdump and flow-tools, are backed by flat binary files. They do not provide any type of indexing or summaries of customer traffic. As a result, flow queries for a given customer need to linearly scan through all the flow records in a given time period. We present a high-performance customer-oriented flow database that provides fast customer queries and compressed flow storage. The database is backed by indexed flow tablets that allow for fast extraction of customer flows and traffic volume per customer. Útdráttur Internetþjónustur á Íslandi þurfa að geyma mikið magn af netmælingargögnum til að gjaldfæra netnotkun og uppfylla lög um gagnageymd. Gagnagrunnskerfi fyrir net- mælingargögn þurfa að ráða við margar færslur, oft á tíðum milljarða eða billjarða, og þau verða að styðja hraðvirka uppflettingu á gagnamagni og hráum netmælin- garfærslum fyrir notendur. Vinsælar opnar lausnir, t.d. nfdump og flow-tools, geyma færslur í flötum skrám. Þau bjóða ekki upp á flýtivísa fyrir hraðvirka leit eða samantektir á gagnamagni notenda. Þar af leiðandi þarf að lesa allar færslurnar til að svara fyrirspurnum. Í þessu verkefni kynnum við til sögunar nýtt afkastamikið notendaskipt gagnagrunnskerfi fyrir netmælingargögn. Kerfið geymir gögnin á þjöppuðu sniði en býður samt upp á hraðvirkar fyrirspurnir um notendur. Gagnagrunnskerfið byggir ofan á safni af litlum töflusneiðum sem mynda eina heild. Hverri töflusneið fylgir flýtivísir sem gerir kleift að framkvæma hraðvirkar fyrirspurnir um færslur og gagnamagn notenda. v Contents List of Figures ix List of Tables xi 1. Introduction 1 1.1. Motivation . .1 1.2. Requirements . .2 1.3. Contribution . .4 1.4. Related work . .5 1.5. Structure of thesis . .6 2. Flow-based monitoring 7 2.1. Network monitoring . .7 2.2. Flow probes . .8 2.2.1. Overview . .8 2.2.2. Flow export . 10 2.2.3. Packet sampling . 11 2.3. Cisco NetFlow . 13 2.3.1. History . 13 2.3.2. Version 5 . 13 2.3.3. Version 9 . 16 2.3.4. Storage requirements . 19 2.4. Observation points . 19 2.4.1. Edge deployment . 19 2.4.2. Ingress/egress monitoring . 20 2.4.3. Deployment strategies . 22 2.4.4. Customer traffic . 22 3. Design and implementation 23 3.1. Architecture . 23 3.2. Collector . 24 3.2.1. Design . 24 3.2.2. Flow format . 25 3.2.3. Configuration . 27 3.2.4. Backend protocol . 30 vii Contents 3.3. Database . 31 3.3.1. Design . 31 3.3.2. Table queue . 32 3.3.3. Record format . 33 3.3.4. Indexer . 34 3.3.5. Tablets . 35 3.3.6. Materialized views . 36 3.4. Filtering . 38 3.4.1. Language . 38 3.4.2. Implementation . 39 3.5. Reports . 40 3.5.1. Flow extraction . 41 3.5.2. Flow summary . 41 3.5.3. Flow filter . 41 3.5.4. Time-based reports . 42 3.5.5. Customer reports . 43 4. Evaluation 45 4.1. Environment . 45 4.2. Collector . 45 4.2.1. Preparation . 46 4.2.2. Results . 48 4.3. Indexer . 51 4.4. Flow storage . 53 4.5. Flow extraction . 54 4.5.1. Preparation . 54 4.5.2. Results . 56 4.6. Materialized views . 57 5. Conclusions 61 5.1. Summary . 61 5.2. Future work . 61 Bibliography 63 A. Flow protobuf 65 B. Collector configuration protobuf 69 C. Grammar for the filter language 71 viii List of Figures 2.1. Flow probe internals . .9 2.2. Relative sampling error . 12 2.3. NetFlow v5 export packet . 14 2.4. Structure of NetFlow v9 export packet . 17 2.5. NetFlow v9 template flowset . 18 2.6. NetFlow v9 data flowset . 19 2.7. NetFlow edge deployment . 20 2.8. Example router with both ingress and egress monitoring enabled . 21 2.9. Provider network with both ingress and egress monitoring enabled . 21 3.1. An overview of the qflow system . 23 3.2. Collector overview . 25 3.3. Directory layout of the flow database . 31 3.4. Flow capture pipeline . 32 3.5. Structure of a block . 34 3.6. Directory layout for flow tablets . 35 3.7. Internal layout of a flow tablet . 36 3.8. View file format . 38 ix LIST OF FIGURES 3.9. Parse tree for example filter expression . 40 4.1. File size of materialized view . 57 4.2. Update time for materialized view . 58 4.3. Query time for materialized view . 58 4.4. Export time for materialized view . 59 x List of Tables 2.1. Format of NetFlow v5 header . 15 2.2. Format of NetFlow v5 record . 15 2.3. Format of NetFlow v9 header . 16 4.1. Collector performance results for NetFlow v5 . 49 4.2. Collector performance results for NetFlow v9 . 50 4.3. Indexer performance for 10M records . 51 4.4. Indexer performance for 20M records . 52 4.5. Indexer performance for 30M records . 52 4.6. Indexer performance for 40M records . 52 4.7. Storage efficiency of qflow vs. flow-tools . 53 4.8. Flow extraction performance for a single IP . 56 4.9. Flow extraction performance for a network . 56 xi 1. Introduction 1.1. Motivation Internet service providers (ISPs) in Iceland must store and query large volumes of network traffic monitoring data in order to charge customers and comply with data retention laws. A typical ISP might need to store and process up to 50K flow records per second, which translates into more than 8 GB per hour of monitoring data [9]. Data retention laws require the data to be stored for six months, and for 50K records per second it would take 34 TB to store the resulting 770 billion records. A number of commercial and open source solutions exist for storing and managing network flow data, e.g. Cisco NetFlow collector, pmacct, flow-tools and nfdump. They usually store flow records in either a relational database or raw binary files. In general, relational databases offer flexibility and a powerful query language, but they are known to be slower, especially in terms of insertion rate, and consume more disk space when compared to specialized tools that use raw binary files [6]. Furthermore, scaling relational databases to billions or trillions of records can be a real challenge. For example, the iiBench 1B row insert benchmark for MySQL shows that the insert rate is highly dependent on the table size. As the table grows larger and the data no longer fits in memory, the performance degrades dramatically. In the beginning of the benchmark, MySQL can sustain around 40,000 inserts per second, but after inserting 200M rows, it has fallen down to 5,000 inserts per second. Close to 1B rows, the rate is down to 876 inserts per second. With 50K records per second, the table would reach 200M rows in one hour, and 1B rows in less than six hours. In contrast, tools based on raw binary files can handle a high insertion rate, e.g. nfdump can store over 250K records per second on a dual-core machine [6]. The binary files are flat, without any sort of index, and the insertion rate does not degrade over time. The flow records are written in the order they are received and files are usually rotated regularly, e.g. every 5, 10 or 15 minutes. Although they support fast insertion, such tools present two.

Load more