Cloud Kafka

What's CKafka?

Product Introduction

What's CKafka? Product Introduction

Copyright Notice

©2013-2017 Tencent Cloud. All rights reserved.

Copyright in this document is exclusively owned by Tencent Cloud. You must not reproduce, modify, copy or distribute in any way, in whole or in part, the contents of this document without Tencent Cloud's the prior written consent.

Trademark Notice

All trademarks associated with Tencent Cloud and its services are owned by Tencent Cloud Computing (Beijing) Company Limited and its affiliated companies. Trademarks of third parties referred to in this document are owned by their respective proprietors.

Service Statement

This document is intended to provide users with general information about Tencent Cloud's products and services only and does not form part of Tencent Cloud's terms and conditions. Tencent Cloud's products or services are subject to change. Specific products and services and the standards applicable to them are exclusively provided for in Tencent Cloud's applicable terms and conditions.

©2013-2017 Tencent Cloud. All rights reserved. Page 2 of 17

What's CKafka? Product Introduction

Contents

Documentation Legal Notice ...... 2 What's CKafka? ...... 4 Product Overview...... 4 Technical Principle ...... 8 Product Comparison ...... 11 SLA ...... 14

©2013-2017 Tencent Cloud. All rights reserved. Page 3 of 17

What's CKafka? Product Introduction

What's CKafka? Product Overview

Product Overview

Cloud Kafka is a high-throughput and highly scalable service provided by Tencent based on the self-developed CMQ engine. Compatible with Apache kafka 0.9, Cloud Kafka has superior advantages in performance, scalability, business security, and OPS, allowing you to enjoy powerful features at low cost while eliminating the tedious operation OPS work.

Application Scenarios

1. Webpage tracking Cloud Kafka processes website activities (PV, search, and other user activities) in real time, and then posts them to topics by type. These info flows can be used for real-time monitoring or offline statistical analysis. Since a large amount of activity information is generated in each user's page view, the website activity tracking requires high throughput. Cloud Kafka can perfectly meet the requirements of high throughput and offline processing.

2. Log aggregation Cloud Kafka provides the features of low-latency processing, easy support for multiple data sources and distributed data processing (consumption). Compared to the centralized log aggregation system, Cloud Kafka can implement stronger persistence guarantees as well as lower end-to-end latency while providing the same performance. The above features make Cloud Kafka an ideal "log collection center". Multiple servers/applications can send the operation logs "asynchronously" to a Cloud Kafka cluster "in batch" without being stored locally or in a DB. Cloud Kafka can submit/compress messages in batch, and the producers can hardly perceive the performance overhead. At this time, the consumers can use systematic storage and analysis systems such as Hadoop to collect and analyze pulled logs.

3. Big Data Scenario For some business scenarios related to big data, a large amount of concurrent data needs to

©2013-2017 Tencent Cloud. All rights reserved. Page 4 of 17

What's CKafka? Product Introduction

be processed and aggregated. Therefore, high cluster processing performance and high scalability are required. In the implementation, Cloud Kafka is also suitable for handling massive real-time messages and aggregating distributed application data in terms of its data distribution mechanism, allocation of disk storage spaces, processing of message formats, server selection, and data compression, facilitating system OPS.

In a specific big data scenario, Cloud Kafka processes offline data and streaming data very well, and aggregates and analyzes data easily.

Advantages

1. Decoupling The relationship between producers and consumers is effectively decoupled. Under the premise that the same API constraint is ensured, the processing between producers and consumers is allowed to be independently expanded or modified.

2. Scalability Because the message processing is decoupled, it only needs to be horizontally scaled to effectively increase the enqueue efficiency and processing efficiency of the message, which is very flexible.

3. Peak Load Shifting Message queue can withstand the sudden access pressure, without completely crashing due to the sudden overloaded requests, which effectively boosts the system robustness.

4. Resiliency When part of the system components fail, the overall system is not affected, which increases the system's fault tolerance. The process failure of a certain message is processed timely, and the messages in the queue can still be processed after the system is restored.

5. Sequential Read/Write Cloud Kafka can guarantee the orderliness of messages within a Partition, which is consistent with most message queues. Besides, Cloud Kafka ensures that data is processed in order,

©2013-2017 Tencent Cloud. All rights reserved. Page 5 of 17

What's CKafka? Product Introduction

greatly improving the disk efficiency.

6. Asynchronous Communication In the scenario where the business does not need to process messages immediately, Cloud Kafka provides the asynchronous message processing mechanism. When the traffic is large, messages are put into the queue only, and they will be processed after the traffic is reduced, which relieves the system pressure.

Glossary

No. Name Explanation 1 Broker The server in the Cloud Kafka cluster 2 Topic The message type, Cloud Kafka is message-oriented 3 Partition A concept in physical partition, where one Topic can contain one or more partitions, and Cloud Kafka uses partition as an allocation unit 4 Replica The copy of partition, used for guaranteeing the high availability of partition 5 Offset The unique serial number of a message in partition 6 Producer The producer, responsible for publishing messages 7 Consumer The consumer, consuming messages from the cluster 8 Consumer group The group of consumers, each consumer must belong to one consumer group. Each message can be consumed by multiple

©2013-2017 Tencent Cloud. All rights reserved. Page 6 of 17

What's CKafka? Product Introduction

No. Name Explanation consumer groups, but can only be consumed by one consumer in this group 9 Zookeeper Used to store meta data of cluster, conduct leader election, fault tolerance, etc.

©2013-2017 Tencent Cloud. All rights reserved. Page 7 of 17

What's CKafka? Product Introduction

Technical Principle

How It Works

The architecture of Cloud Kafka is as follows:

A typical Cloud Kafka cluster is shown above. Producer may be the messages generated by web activities, service logs, or other information. Producers post the messages to Cloud Kafka's Broker cluster in push mode, and consumers consume the messages from Broker in pull mode. Consumer are divided into a number of Consumer Groups, In addition, the cluster manages the clustering configuration through Zookeeper, and conducts leader election, fault tolerance and so on.

Implementation of High Throughput

A huge amount of network data in Cloud Kafka are permanently sent to disks and disk files over the network. The performance of this process directly affects Kafka's overall throughput, especially from the following aspects:

1. Efficient Use of Disks 2. Sequential reading/writing data on disks improves the disk utilization. Write message

©2013-2017 Tencent Cloud. All rights reserved. Page 8 of 17

What's CKafka? Product Introduction

The message is written to the page cache and flushed by the asynchronous thread. Read message The message is directly transferred from the page cache to the socket and then sent out. When the corresponding data is not found in the page cache, the disk IO is produced at this time, and the messages are loaded from the disk to the page cache. Then, they are sent directly from the socket. 3. Zero Copy Mechanism of Broker Use the sendfile system call to send data directly from the page cache to the network. 4. Reduced Network Overhead 5. Data compression reduces the network load 6. Batch Processing Mechanism: Producer writes data to Broker in batch, and Consumer pulls data from Broker in batch

Data Persistence

The data persistence of Cloud Kafka is mainly achieved through the following principles:

1. Storage Distribution of Partitions in Topic In the file storage of Cloud Kafka, a topic has multiple different partitions, with each physically corresponding to a folder. Users store the messages and index files in these partitions. For example, if you create two topics, topic1 with 5 partitions and topic2 with 10 partitions, a total of 15 folders are generated across the cluster.

2 File Storage Method in Partition

Partition is physically composed of multiple segments of equal size. These segments are read/write sequentially and are deleted quickly upon expiration, which improves the disk utilization.

Scale Out

One Topic can be divided into multiple Partitions and distributed in one or more Brokers.

One consumer can subscribe to one or more of these Partitions. Producer is responsible for equally assigning the messages to the Partitions. Messages are well-organized in Partitions.

©2013-2017 Tencent Cloud. All rights reserved. Page 9 of 17

What's CKafka? Product Introduction

Consumer Group Design

Cloud Kafka does not delete the consumed messages. Any consumer must belong to a group. Consumers in the same Consumer Group do not consume the same partition at the same time. Different Groups consume the same message at the same time, which is diversified (queue mode, publishing/subscription mode).

Multiple Copies

Why is the multi-copy design required? To enhance the system availability and reliability. Replica is evenly distributed throughout the cluster. Replica's algorithm is as follows:

1. Sort all Brokers (assuming a total of n Brokers) and Partitions to be assigned. 2. Assign the (i) Partition to the (i mod n) Broker. 3. Assign the (j) Replica of the (i) Partition to the ((i + j) mode n) Broker.

Leader Election Mechanism

Cloud Kafka maintains an ISR (in-sync replicas) dynamically in the Zookeeper. All Replicas in the ISR keep up with the leader.

Only a member in the ISR may be chosen as the Leader.

There are f+1 Replicas in the ISR, and one Partition can tolerate the failure of f Replicas under the premise that the committed messages are guaranteed not to be lost. There are a total of 2f+1 Replicas (including Leaders and Followers), and it must be guaranteed that f+1 Replicas have copied the messages before committed. To ensure a new Leader is correctly selected, the number of failed Replicas cannot exceed f.

©2013-2017 Tencent Cloud. All rights reserved. Page 10 of 17

What's CKafka? Product Introduction

Product Comparison

The details of performance comparison between Cloud Kafka and other message service products are as follows:

Features CKafka Apache Kafka RabbitMQ RocketMQ CMQ Advantages Very high High throughoutHigh reliability High reliability Very high throughput reliability Very flexible Finance and scalability other scenarios Very low OPS with strong cost consistency Disadvantages Occasionally Occasionally Poor Manual (not Average message loss in message loss performance automatic) HA throughout for extreme Not flexible Not flexible switching strong circumstances scalability expansion consistency Multiple dependent components, large OPS Limited security protection, poor isolation and compatibility Development C++ Scala Erlang Java C++ language Scalability Very flexible, Not flexible Not flexible More flexible, Flexible, smooth, easy to scale, enough, the enough, the the sender and scale-out, and only the vip broker address broker address receiver are logically a single address needs needs to be needs to be connected to Queue can to be specified specified to specified to the name server provide services to send send messages, send messages across multiple messages, and and the clusters the broker zookeeper changes are coordination

©2013-2017 Tencent Cloud. All rights reserved. Page 11 of 17

What's CKafka? Product Introduction

Features CKafka Apache Kafka RabbitMQ RocketMQ CMQ transparent for scheduling is both sending required to and receiving receive messages messages Throughput Very large Large Average Average Average General Million-level Million-level 100-thousand- 100-thousand- 100-thousand- Performance QPS QPS level QPS level QPS level QPS 2C 4GB Stress Read/write Read/write Read/write Read/write Read/write test 220,000 QPS 200,000 QPS 100,000 QPS 100,00 QPS 120,000 QPS Synchronization ISR (Replica) ISR (Replica) GM Double Raft Algorithm Synchronous writes Availability Very high High availability, Automatic Automatic Very high availability, automatic switching switching availability, automatic switching between master between master broker provides switching between master and slave, the and slave is not highly available between master and slave, the mirror queue supported. Slaveservices as long and slave, messages may supports m/s, only reads and as it contains 2 Tencent Cloud be lost after master for does not write nodes Message Service switching due to providing when master is guarantees an asynchronous services and not available availability of flush and slave for backup 99.95% replication only Consumption Pull Pull Pull and push Pull Pull and push method Message Higher Low High High Extremely high reliability Improved Broker is only When you send Broker is written Message loss is reliability by provided with messages, the to two disks avoided by three copies, the specified synchronously. synchronous good disaster asynchronous message are A message flush, with a recovery flush mechanismpersistently indicating data persistence capability for while master written to the success is of 99.999999% clusters, rare and slave are disk returned only

©2013-2017 Tencent Cloud. All rights reserved. Page 12 of 17

What's CKafka? Product Introduction

Features CKafka Apache Kafka RabbitMQ RocketMQ CMQ occurrence of only provided when both faults with master and slave asynchronous are written replication, successfully which may result in loss of part messages Data verification CRC CRC None CRC checksum Message rewind Yes Yes No No Yes Security Yes No No No Yes protection Monitoring and Yes No No No Yes alarming Service support Yes No No No Yes

Note: "2C 4GB Stress Test" indicates the result of a stress test on a 2-core 4GB memory server.

©2013-2017 Tencent Cloud. All rights reserved. Page 13 of 17

What's CKafka? Product Introduction

SLA

1 Tencent Cloud CKafka Message Service

CKafka (Cloud Kafka) is a distributed, high-throughput, and highly scalable messaging system, which is compatible with the open-source Kafka API (version 0.9 and 0.10). Based on the publishing/subscription model, Ckafka decouples messages and enables producers and consumers to interact asynchronously without having to wait for each other. Ckafka has many advantages such as data compression and supporting offline and real-time data processing at the same time. It is suitable for log compression collection, monitoring data aggregation and other scenarios.

2. Service Guarantee Indicators

Tencent Cloud will stipulate the customized service level indicators for the cloud service you bought, and will commit itself to providing you with the maximum guarantee in terms of data management and business quality. Meanwhile, Tencent Cloud will reserve the right to make a proper adjustment in any indicators according to changes. Unless otherwise specified, the "month" referred to herein has a length of 30 calendar days, and shall be calculated on the basis of a calendar month.

2.1 CKafka Message Service

2.1.1 Data Storage Persistence

The CKafka you apply for every month has a data storage persistence of '99.999999%'.

2.1.2 Destroyable Data

When you request to delete any data or before you discard or resell any device, Tencent Cloud will perform a complete, permanent deletion on all your data through low-level disk formatting, and degauss the hard disks that are due for scrap.

2.1.3 Right to Know

For now, users' CKafka service is deployed in six data centers, which are Shanghai Data Center,

©2013-2017 Tencent Cloud. All rights reserved. Page 14 of 17

What's CKafka? Product Introduction

Guangzhou Data Center, Beijing Data Center, Chengdu Data Center, Shanghai Financial Data Center, and Shenzhen Financial Data Center.

Tencent Cloud helps users choose a data center with the best network condition to store their data. Users can select the region where they belong (Guangzhou, Shanghai, Beijing, Chengdu, Shanghai Finance, or Shenzhen Finance) when making a CVM purchase.

Those data centers available to users shall comply with local laws and regulations and applicable laws and regulations of the PRC.

Tencent Cloud will not disclose any of users' data to any third party, unless such disclosure is required by regulatory authorities for supervision and auditing purposes.

2.1.4 Data Auditing

In accordance with the applicable laws and regulations and on condition of compliance with relevant process and availability of all necessary documents, Tencent Cloud may provide information regarding CVMs, including operation log of key components, operation records of OPS personnel and operation records of users, if required by regulatory authorities or if it is necessary to do so for other reasons such as collection of evidences during investigation into security incidents.

2.1.5 Service Availability

A service availability of '99.95%' is guaranteed for the CKafka Message Service, which means that the CKafka Message Service should be available for users for at least '30 x 24 x 60 x 99.95% = 43178.4 minutes' each month, and be unavailable for users for '43200-43178.4=21.6 minutes' at most each month. Service unavailable time is calculated by the user's single instance.

If the service recovers from failure within

5 minutes

, it will not be counted into service downtime. Unavailability duration refers to the period from the moment the failure occurs to the recovery of service, including maintenance duration. If the service

©2013-2017 Tencent Cloud. All rights reserved. Page 15 of 17

What's CKafka? Product Introduction

recovers from failure for over 5 minutes, it will be counted into the unavailability duration.

2.1.6 Failure Recovery Capability

Tencent CKafka is designed with the failure recovery capability. When the physical server fails, the service will be automatically migrated to a new parent host without requiring any user intervention, so as to ensure continued service for customers. Meanwhile, Tencent Cloud's professional team provides maintenance support on a

24/7 basis.

3. Service Billing Accuracy

The billing details for Tencent Cloud services are displayed on the customer's purchase and order pages. You can choose the services you need from a variety of service categories and make a purchase at the listed prices. Please refer to the information published on Tencent Cloud website for the actual prices, and the fee will be charged based on the service specifications and the length of usage.

4. Compensation

4.1 Scope

Compensation is applicable to circumstances where a user claims for compensation for incidents/failures caused by Tencent Cloud, such as the user's inability to use services properly or access them and the inability to access any particular website (service site for developers).

4.2 Compensation Standards

Downtime duration = time when the failure is resolved - start time of failure. Downtime duration is calculated in minutes, and the duration less than 1 minute will be counted as

©2013-2017 Tencent Cloud. All rights reserved. Page 16 of 17

What's CKafka? Product Introduction

1 minute

. For example, if the downtime duration is

1 minute and 1 second

, the duration will be counted as

2 minutes

.

Hundred-fold compensation for CKafka Message Service failures:

Postpaid:

a cash coupon in an amount equal to the daily fee of the failed instance ÷ 24 ÷ 60 × downtime duration (in minutes) × 100

will be offered. The upper limit of the cash coupon shall not exceed the total fee of the CKafka service.

©2013-2017 Tencent Cloud. All rights reserved. Page 17 of 17

Powered by TCPDF (www.tcpdf.org)