Kafka: the Definitive Guide Real-Time Data and Stream Processing at Scale

Total Page:16

File Type:pdf, Size:1020Kb

Kafka: the Definitive Guide Real-Time Data and Stream Processing at Scale Compliments of Neha Narkhede, Kafka The Defi nitive GuideGwen Shapira & Todd Palino REALTIME DATA AND STREAM PROCESSING AT SCALE Get Started With Apache Kafka™ Today CONFLUENT OPEN SOURCE A 100% open source Apache Kafka distribution for building robust streaming applications. CONNECTORS CLIENTS SCHEMA REGISTRY REST PROXY • Thoroughly tested and quality assured • Additional client support, including Python, C/C++ and .NET • Easy upgrade path to Confluent Enterprise Start today at confluent.io/download Kafka: The Definitive Guide Real-Time Data and Stream Processing at Scale Neha Narkhede, Gwen Shapira, and Todd Palino Beijing Boston Farnham Sebastopol Tokyo Kafka: The Definitive Guide by Neha Narkhede, Gwen Shapira, and Todd Palino Copyright © 2017 Neha Narkhede, Gwen Shapira, Todd Palino. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or [email protected]. Editor: Shannon Cutt Indexer: WordCo Indexing Services, Inc. Production Editor: Shiny Kalapurakkel Interior Designer: David Futato Copyeditor: Christina Edwards Cover Designer: Karen Montgomery Proofreader: Amanda Kersey Illustrator: Rebecca Demarest July 2017: First Edition Revision History for the First Edition 2017-07-07: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491936160 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Kafka: The Definitive Guide, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-99065-0 [LSI] Table of Contents Foreword. xiii Preface. xvii 1. Meet Kafka. 1 Publish/Subscribe Messaging 1 How It Starts 2 Individual Queue Systems 3 Enter Kafka 4 Messages and Batches 4 Schemas 5 Topics and Partitions 5 Producers and Consumers 6 Brokers and Clusters 7 Multiple Clusters 8 Why Kafka? 10 Multiple Producers 10 Multiple Consumers 10 Disk-Based Retention 10 Scalable 10 High Performance 11 The Data Ecosystem 11 Use Cases 12 Kafka’s Origin 14 LinkedIn’s Problem 14 The Birth of Kafka 15 Open Source 15 The Name 16 v Getting Started with Kafka 16 2. Installing Kafka. 17 First Things First 17 Choosing an Operating System 17 Installing Java 17 Installing Zookeeper 18 Installing a Kafka Broker 20 Broker Configuration 21 General Broker 21 Topic Defaults 24 Hardware Selection 28 Disk Throughput 29 Disk Capacity 29 Memory 29 Networking 30 CPU 30 Kafka in the Cloud 30 Kafka Clusters 31 How Many Brokers? 32 Broker Configuration 32 OS Tuning 32 Production Concerns 36 Garbage Collector Options 36 Datacenter Layout 37 Colocating Applications on Zookeeper 37 Summary 39 3. Kafka Producers: Writing Messages to Kafka. 41 Producer Overview 42 Constructing a Kafka Producer 44 Sending a Message to Kafka 46 Sending a Message Synchronously 46 Sending a Message Asynchronously 47 Configuring Producers 48 Serializers 52 Custom Serializers 52 Serializing Using Apache Avro 54 Using Avro Records with Kafka 56 Partitions 59 Old Producer APIs 61 Summary 62 vi | Table of Contents 4. Kafka Consumers: Reading Data from Kafka. 63 Kafka Consumer Concepts 63 Consumers and Consumer Groups 63 Consumer Groups and Partition Rebalance 66 Creating a Kafka Consumer 68 Subscribing to Topics 69 The Poll Loop 70 Configuring Consumers 72 Commits and Offsets 75 Automatic Commit 76 Commit Current Offset 77 Asynchronous Commit 78 Combining Synchronous and Asynchronous Commits 80 Commit Specified Offset 80 Rebalance Listeners 82 Consuming Records with Specific Offsets 84 But How Do We Exit? 86 Deserializers 88 Standalone Consumer: Why and How to Use a Consumer Without a Group 92 Older Consumer APIs 93 Summary 93 5. Kafka Internals. 95 Cluster Membership 95 The Controller 96 Replication 97 Request Processing 99 Produce Requests 101 Fetch Requests 102 Other Requests 104 Physical Storage 105 Partition Allocation 106 File Management 107 File Format 108 Indexes 109 Compaction 110 How Compaction Works 110 Deleted Events 112 When Are Topics Compacted? 112 Summary 113 Table of Contents | vii 6. Reliable Data Delivery. 115 Reliability Guarantees 116 Replication 117 Broker Configuration 118 Replication Factor 118 Unclean Leader Election 119 Minimum In-Sync Replicas 121 Using Producers in a Reliable System 121 Send Acknowledgments 122 Configuring Producer Retries 123 Additional Error Handling 124 Using Consumers in a Reliable System 125 Important Consumer Configuration Properties for Reliable Processing 126 Explicitly Committing Offsets in Consumers 127 Validating System Reliability 129 Validating Configuration 130 Validating Applications 131 Monitoring Reliability in Production 131 Summary 133 7. Building Data Pipelines. 135 Considerations When Building Data Pipelines 136 Timeliness 136 Reliability 137 High and Varying Throughput 137 Data Formats 138 Transformations 139 Security 139 Failure Handling 140 Coupling and Agility 140 When to Use Kafka Connect Versus Producer and Consumer 141 Kafka Connect 142 Running Connect 142 Connector Example: File Source and File Sink 144 Connector Example: MySQL to Elasticsearch 146 A Deeper Look at Connect 151 Alternatives to Kafka Connect 154 Ingest Frameworks for Other Datastores 155 GUI-Based ETL Tools 155 Stream-Processing Frameworks 155 Summary 156 viii | Table of Contents 8. Cross-Cluster Data Mirroring. 157 Use Cases of Cross-Cluster Mirroring 158 Multicluster Architectures 158 Some Realities of Cross-Datacenter Communication 159 Hub-and-Spokes Architecture 160 Active-Active Architecture 161 Active-Standby Architecture 163 Stretch Clusters 169 Apache Kafka’s MirrorMaker 170 How to Configure 171 Deploying MirrorMaker in Production 172 Tuning MirrorMaker 175 Other Cross-Cluster Mirroring Solutions 178 Uber uReplicator 178 Confluent’s Replicator 179 Summary 180 9. Administering Kafka. 181 Topic Operations 181 Creating a New Topic 182 Adding Partitions 183 Deleting a Topic 184 Listing All Topics in a Cluster 185 Describing Topic Details 185 Consumer Groups 186 List and Describe Groups 186 Delete Group 188 Offset Management 188 Dynamic Configuration Changes 190 Overriding Topic Configuration Defaults 190 Overriding Client Configuration Defaults 192 Describing Configuration Overrides 192 Removing Configuration Overrides 193 Partition Management 193 Preferred Replica Election 193 Changing a Partition’s Replicas 195 Changing Replication Factor 198 Dumping Log Segments 199 Replica Verification 201 Consuming and Producing 202 Console Consumer 202 Console Producer 205 Table of Contents | ix Client ACLs 207 Unsafe Operations 207 Moving the Cluster Controller 208 Killing a Partition Move 208 Removing Topics to Be Deleted 209 Deleting Topics Manually 209 Summary 210 10. Monitoring Kafka. 211 Metric Basics 211 Where Are the Metrics? 211 Internal or External Measurements 212 Application Health Checks 213 Metric Coverage 213 Kafka Broker Metrics 213 Under-Replicated Partitions 214 Broker Metrics 220 Topic and Partition Metrics 229 JVM Monitoring 231 OS Monitoring 232 Logging 235 Client Monitoring 236 Producer Metrics 236 Consumer Metrics 239 Quotas 242 Lag Monitoring 243 End-to-End Monitoring 244 Summary 244 11. Stream Processing. 247 What Is Stream Processing? 248 Stream-Processing Concepts 251 Time 251 State 252 Stream-Table Duality 253 Time Windows 254 Stream-Processing Design Patterns 256 Single-Event Processing 256 Processing with Local State 257 Multiphase Processing/Repartitioning 258 Processing with External Lookup: Stream-Table Join 259 Streaming Join 261 x | Table of Contents Out-of-Sequence Events 262 Reprocessing 264 Kafka Streams by Example 264 Word Count 265 Stock Market Statistics 268 Click Stream Enrichment 270 Kafka Streams: Architecture Overview 272 Building a Topology 272 Scaling the Topology 273 Surviving Failures 276 Stream Processing Use Cases 277 How to Choose a Stream-Processing Framework 278 Summary 280 A. Installing Kafka on Other Operating Systems. 281 Index. 287 Table of Contents | xi Foreword It’s an exciting time for Apache Kafka. Kafka is being used by tens of thousands of organizations, including over a third of the Fortune 500 companies. It’s among the fastest growing open source projects and has spawned an immense ecosystem around it. It’s at the heart of a movement towards managing and processing streams of data. So where did Kafka come from? Why did we build it? And what exactly is it? Kafka got its start as an internal infrastructure system we built at LinkedIn. Our observation was really simple: there were lots of databases and other systems built to store data, but what was missing in our architecture was something that would help us
Recommended publications
  • Large-Scale Learning from Data Streams with Apache SAMOA
    Large-Scale Learning from Data Streams with Apache SAMOA Nicolas Kourtellis1, Gianmarco De Francisci Morales2, and Albert Bifet3 1 Telefonica Research, Spain, [email protected] 2 Qatar Computing Research Institute, Qatar, [email protected] 3 LTCI, Télécom ParisTech, France, [email protected] Abstract. Apache SAMOA (Scalable Advanced Massive Online Anal- ysis) is an open-source platform for mining big data streams. Big data is defined as datasets whose size is beyond the ability of typical soft- ware tools to capture, store, manage, and analyze, due to the time and memory complexity. Apache SAMOA provides a collection of dis- tributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms. It fea- tures a pluggable architecture that allows it to run on several distributed stream processing engines such as Apache Flink, Apache Storm, and Apache Samza. Apache SAMOA is written in Java and is available at https://samoa.incubator.apache.org under the Apache Software Li- cense version 2.0. 1 Introduction Big data are “data whose characteristics force us to look beyond the traditional methods that are prevalent at the time” [18]. For instance, social media are one of the largest and most dynamic sources of data. These data are not only very large due to their fine grain, but also being produced continuously. Furthermore, such data are nowadays produced by users in different environments and via a multitude of devices. For these reasons, data from social media and ubiquitous environments are perfect examples of the challenges posed by big data.
    [Show full text]
  • DSP Frameworks DSP Frameworks We Consider
    Università degli Studi di Roma “Tor Vergata” Dipartimento di Ingegneria Civile e Ingegneria Informatica DSP Frameworks Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini DSP frameworks we consider • Apache Storm (with lab) • Twitter Heron – From Twitter as Storm and compatible with Storm • Apache Spark Streaming (lab) – Reduce the size of each stream and process streams of data (micro-batch processing) • Apache Flink • Apache Samza • Cloud-based frameworks – Google Cloud Dataflow – Amazon Kinesis Streams Valeria Cardellini - SABD 2017/18 1 Apache Storm • Apache Storm – Open-source, real-time, scalable streaming system – Provides an abstraction layer to execute DSP applications – Initially developed by Twitter • Topology – DAG of spouts (sources of streams) and bolts (operators and data sinks) Valeria Cardellini - SABD 2017/18 2 Stream grouping in Storm • Data parallelism in Storm: how are streams partitioned among multiple tasks (threads of execution)? • Shuffle grouping – Randomly partitions the tuples • Field grouping – Hashes on a subset of the tuple attributes Valeria Cardellini - SABD 2017/18 3 Stream grouping in Storm • All grouping (i.e., broadcast) – Replicates the entire stream to all the consumer tasks • Global grouping – Sends the entire stream to a single task of a bolt • Direct grouping – The producer of the tuple decides which task of the consumer will receive this tuple Valeria Cardellini - SABD 2017/18 4 Storm architecture • Master-worker architecture Valeria Cardellini - SABD 2017/18 5 Storm
    [Show full text]
  • Comparative Analysis of Data Stream Processing Systems
    Shah Zeb Mian Comparative Analysis of Data Stream Processing Systems Master’s Thesis in Information Technology February 23, 2020 University of Jyväskylä Faculty of Information Technology Author: Shah Zeb Mian Contact information: [email protected] Supervisors: Oleksiy Khriyenko, and Vagan Terziyan Title: Comparative Analysis of Data Stream Processing Systems Työn nimi: Vertaileva analyysi Data Stream-käsittelyjärjestelmistä Project: Master’s Thesis Study line: All study lines Page count: 48+0 Abstract: Big data processing systems are evolving to be more stream oriented where data is processed continuously by processing it as soon as it arrives. Earlier data was often stored in a database, a file system or other form of data storage system. Applications would query the data as needed. Stram processing is the processing of data in motion. It works on continuous data retrieved from different resources. Instead of periodically collecting huge static data, streaming frameworks process data as soon as it becomes available, hence reducing latency. This thesis aims to conduct a comparative analysis of different streaming processors based on selected features. Research focuses on Apache Samza, Apache Flink, Apache Storm and Apache Spark Structured Streaming. Also, this thesis explains Apache Kafka which is a log-based data storage widely used in streaming frameworks. Keywords: Big Data, Stream Processing,Batch Processing,Streaming Engines, Apache Kafka, Apache Samza Suomenkielinen tiivistelmä: Big data-käsittelyjärjestelmät ovat tällä hetkellä kehittymässä stream-orientoituneiksi, eli data käsitellään heti saapuessaan. Perinteisemmin data säilöt- tiin tietokantaan, tiedostopohjaisesti tai muuhun tiedonsäilytysjärjestelmään, ja applikaatiot hakivat datan tarvittaessa. Stream-pohjainen järjestelmä käsittelee liikkuvaa dataa, jatkuva- aikaista dataa useasta lähteestä. Sen sijaan, että haetaan ajoittain dataa, stream-pohjaiset frameworkit pystyvät käsittelemään i dataa heti kun se on saatavilla, täten vähentäen viivettä.
    [Show full text]
  • Interpretation of Dreams and Kafka's a Country Doctor: a Psychoanalytic Reading Sara Mirmobin1, Ensieh Shabanirad2 1M.A
    International Letters of Social and Humanistic Sciences Online: 2015-11-30 ISSN: 2300-2697, Vol. 63, pp 1-6 doi:10.18052/www.scipress.com/ILSHS.63.1 2015 SciPress Ltd, Switzerland Interpretation of Dreams and Kafka's A Country Doctor: A Psychoanalytic Reading Sara Mirmobin1, Ensieh Shabanirad2 1M.A. Student of Language and English Literature, University of Semnan, Iran 2PhD Candidate of Language and English Literature University of Tehran, Iran Corresponding Author: [email protected] Keywords: Kafka, A Country Doctor, Freud, dreams, psychological approach. ABSTRACT. Dreams are so real that one cannot easily distinguish them from reality. We feel disappointed after waking up from a fascinating dream and rejoice to wake up knowing the nightmare is ended. In some literary works the line between fancy and reality is blurred as well, so it provides the opportunity to ponder on them psychologically. The plot of some of the poems, novels, novellas, dramas and short stories is centered on the minds, thoughts, or generally speaking, human psyche. This essay elaborates upon the "nightmarish"-rather than dreamlike-story, Kafka's A Country Doctor, by applying psychological approach. It seeks to discuss the interpretation of some of the incidents of the story according to Freud's "The Interpretations of Dreams". Also, the id, ego and superego, the three parts of Freudian psychic apparatus, as well as their identification with the related characters are discussed. 1. INTRODUCTION Franz Kafka, one of the most influential writers of 20th century, was a German-language writer with the significant style of writing known as Kafkaesque. His stories are thoroughly bizarre, digging into psychological state of the characters as well as characterizing different aspects of human psyche.
    [Show full text]
  • Network Traffic Profiling and Anomaly Detection for Cyber Security
    Network traffic profiling and anomaly detection for cyber security Laurens D’hooge Student number: 01309688 Supervisors: Prof. dr. ir. Filip De Turck, dr. ir. Tim Wauters Counselors: Prof. dr. Bruno Volckaert, dr. ir. Tim Wauters A dissertation submitted to Ghent University in partial fulfilment of the requirements for the degree of Master of Science in Information Engineering Technology Academic year: 2017-2018 Acknowledgements This thesis is the result of 4 months work and I would like to express my gratitude towards the people who have guided me throughout this process. First and foremost I’d like to thank my thesis advisors prof. dr. Bruno Volckaert and dr. ir. Tim Wauters. By virtue of their knowledge and clear communication, I was able to maintain a clear target. Secondly I would like to thank prof. dr. ir. Filip De Turck for providing me the opportunity to conduct research in this field with the IDLab research group. Special thanks to Andres Felipe Ocampo Palacio and dr. Marleen Denert are in order as well. Mr. Ocampo’s Phd research into big data processing for network traffic and the resulting framework are an integral part of this thesis. Ms. Denert has been the go-to member of the faculty staff for general advice and administrative dealings. The final token of gratitude I’d like to extend to my family and friends for their continued support during this process. Laurens D’hooge Network traffic profiling and anomaly detection for cyber security Laurens D’hooge Supervisor(s): prof. dr. ir. Filip De Turck, dr. ir. Tim Wauters Abstract— This article is a short summary of the research findings of a creation of APT2.
    [Show full text]
  • Projects – Other Than Hadoop! Created By:-Samarjit Mahapatra [email protected]
    Projects – other than Hadoop! Created By:-Samarjit Mahapatra [email protected] Mostly compatible with Hadoop/HDFS Apache Drill - provides low latency ad-hoc queries to many different data sources, including nested data. Inspired by Google's Dremel, Drill is designed to scale to 10,000 servers and query petabytes of data in seconds. Apache Hama - is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS for massive scientific computations such as matrix, graph and network algorithms. Akka - a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. ML-Hadoop - Hadoop implementation of Machine learning algorithms Shark - is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can execute Hive QL queries up to 100 times faster than Hive without any modification to the existing data or queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions, providing seamless integration with existing Hive deployments and a familiar, more powerful option for new ones. Apache Crunch - Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run Azkaban - batch workflow job scheduler created at LinkedIn to run their Hadoop Jobs Apache Mesos - is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark, and other applications on a dynamically shared pool of nodes.
    [Show full text]
  • An Innocent Abroad: Lectures in China, J
    An Innocent Abroad The FlashPoints series is devoted to books that consider literature beyond strictly national and disciplinary frameworks, and that are distinguished both by their historical grounding and by their theoretical and conceptual strength. Our books engage theory without losing touch with history and work historically without falling into uncritical positivism. FlashPoints aims for a broad audience within the humanities and the social sciences concerned with moments of cultural emergence and transformation. In a Benjaminian mode, FlashPoints is interested in how liter- ature contributes to forming new constellations of culture and history and in how such formations function critically and politically in the present. Series titles are available online at http://escholarship.org/uc/flashpoints. series editors: Ali Behdad (Comparative Literature and English, UCLA), Edi- tor Emeritus; Judith Butler (Rhetoric and Comparative Literature, UC Berkeley), Editor Emerita; Michelle Clayton (Hispanic Studies and Comparative Literature, Brown University); Edward Dimendberg (Film and Media Studies, Visual Studies, and European Languages and Studies, UC Irvine), Founding Editor; Catherine Gallagher (English, UC Berkeley), Editor Emerita; Nouri Gana (Comparative Lit- erature and Near Eastern Languages and Cultures, UCLA); Susan Gillman (Lit- erature, UC Santa Cruz), Coordinator; Jody Greene (Literature, UC Santa Cruz); Richard Terdiman (Literature, UC Santa Cruz), Founding Editor A complete list of titles begins on p. 306. An Innocent Abroad Lectures in China J. Hillis Miller northwestern university press | evanston, illinois Northwestern University Press www.nupress.northwestern.edu Copyright © 2015 by Northwestern University Press. Foreword copyright © 2015 by Fredric Jameson. Published 2015. All rights reserved. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 Library of Congress Cataloging- in- Publication Data Miller, J.
    [Show full text]
  • Reflexión Académica En Diseño & Comunicación
    ISSN 1668-1673 XXXII • 2017 Año XVIII. Vol 32. Noviembre 2017. Buenos Aires. Argentina Reflexión Académica en Diseño & Comunicación IV Congreso de Creatividad, Diseño y Comunicación para Profesores y Autoridades de Nivel Medio. `Interfaces Palermo´ Reflexión Académica en Diseño y Comunicación Comité Editorial Universidad de Palermo. Lucia Acar. Universidade Estácio de Sá. Brasil. Facultad de Diseño y Comunicación. Gonzalo Javier Alarcón Vital. Universidad Autónoma Metropolitana. México. Centro de Estudios en Diseño y Comunicación. Mercedes Alfonsín. Universidad de Buenos Aires. Argentina. Mario Bravo 1050. Fernando Alberto Alvarez Romero. Pontificia Universidad Católica del C1175ABT. Ciudad Autónoma de Buenos Aires, Argentina. Ecuador. Ecuador. www.palermo.edu Gonzalo Aranda Toro. Universidad Santo Tomás. Chile. [email protected] Christian Atance. Universidad de Lomas de Zamora. Argentina. Mónica Balabani. Universidad de Palermo. Argentina. Director Alberto Beckers Argomedo. Universidad Santo Tomás. Chile. Oscar Echevarría Renato Antonio Bertao. Universidade Positivo. Brasil. Allan Castelnuovo. Market Research Society. Reino Unido. Coordinadora de la Publicación Jorge Manuel Castro Falero. Universidad de la Empresa. Uruguay. Diana Divasto Raúl Castro Zuñeda. Universidad de Palermo. Argentina. Michael Dinwiddie. New York University. USA. Mario Rubén Dorochesi Fernandois. Universidad Técnica Federico Santa María. Chile. Adriana Inés Echeverria. Universidad de la Cuenca del Plata. Argentina. Universidad de Palermo Jimena Mariana García Ascolani. Universidad Comunera. Paraguay. Rector Marcelo Ghio. Instituto San Ignacio. Perú. Ricardo Popovsky Clara Lucia Grisales Montoya. Academia Superior de Artes. Colombia. Haenz Gutiérrez Quintana. Universidad Federal de Santa Catarina. Brasil. Facultad de Diseño y Comunicación José Korn Bruzzone. Universidad Tecnológica de Chile. Chile. Decano Zulema Marzorati. Universidad de Buenos Aires. Argentina. Oscar Echevarría Denisse Morales.
    [Show full text]
  • Kafka : Toward a Minor Literature
    Kafka: Toward a Minor Literature This page intentionally left blank Kafka Toward a Minor Literature Gilles Deleuze and Felix Guattari Translation by Dana Polan Foreword by Réda Bensmai'a Theory and History of Literature, Volume 30 University of Minnesota Press MinneapolLondon The University of Minnesota gratefully acknowledges translation assistance provided for this book by the French Ministry of Culture. Copyright © 1986 by the University of Minnesota Originally published as Kafka: Pour une littérature mineure Copyright © 1975 by Les éditions de Minuit, Paris. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Published by the University of Minnesota Press 111 Third Avenue South, Suite 290, Minneapolis, MN 55401-2520 Printed in the United States of America on acid-free paper Seventh printing 2003 Library of Congress Cataloging-in-Publication Data Deleuze, Gilles. Kafka: toward a minor literature. (Theory and history of literature ; v. 30) Bibliography: p. Includes index. 1. Kafka, Franz, 1883-1924—Criticism and interpretation. I. Guattari, Felix. II. Title. III. Series. PT2621.A26Z67513 1986 833'.912 85-31822 ISBN 0-8166-1514-4 ISBN 0-8166-1515-2 (pbk.) The University of Minnesota is an equal-opportunity educator and employer. Contents Foreword: The Kafka Effect by Réda Bensmai'a ix Translator's Introduction xxii 1. Content and Expression 3 2. An Exaggerated Oedipus 9 3. What Is a Minor Literature? 16 4. The Components of Expression 28 5. Immanence and Desire 43 6.
    [Show full text]
  • PDF Download Scaling Big Data with Hadoop and Solr
    SCALING BIG DATA WITH HADOOP AND SOLR - PDF, EPUB, EBOOK Hrishikesh Vijay Karambelkar | 166 pages | 30 Apr 2015 | Packt Publishing Limited | 9781783553396 | English | Birmingham, United Kingdom Scaling Big Data with Hadoop and Solr - PDF Book The default duration between two heartbeats is 3 seconds. Some other SQL-based distributed query engines to certainly bear in mind and consider for your use cases are:. What Can We Help With? Check out some of the job opportunities currently listed that match the professional profile, many of which seek experience with Search and Solr. This mode can be turned off manually by running the following command:. Has the notion of parent-child document relationships These exist as separate documents within the index, limiting their aggregation functionality in deeply- nested data structures. This step will actually create an authorization key with ssh, bypassing the passphrase check as shown in the following screenshot:. Fields may be split into individual tokens and indexed separately. Any key starting with a will go in the first region, with c the third region and z the last region. Now comes the interesting part. After the jobs are complete, the results are returned to the remote client via HiveServer2. Finally, Hadoop can accept data in just about any format, which eliminates much of the data transformation involved with the data processing. The difference in ingestion performance between Solr and Rocana Search is striking. Aptude has been working with our team for the past four years and we continue to use them and are satisfied with their work Warren E. These tables support most of the common data types that you know from the relational database world.
    [Show full text]
  • Classifying, Evaluating and Advancing Big Data Benchmarks
    Classifying, Evaluating and Advancing Big Data Benchmarks Dissertation zur Erlangung des Doktorgrades der Naturwissenschaften vorgelegt beim Fachbereich 12 Informatik der Johann Wolfgang Goethe-Universität in Frankfurt am Main von Todor Ivanov aus Stara Zagora Frankfurt am Main 2019 (D 30) vom Fachbereich 12 Informatik der Johann Wolfgang Goethe-Universität als Dissertation angenommen. Dekan: Prof. Dr. Andreas Bernig Gutachter: Prof. Dott. -Ing. Roberto V. Zicari Prof. Dr. Carsten Binnig Datum der Disputation: 23.07.2019 Abstract The main contribution of the thesis is in helping to understand which software system parameters mostly affect the performance of Big Data Platforms under realistic workloads. In detail, the main research contributions of the thesis are: 1. Definition of the new concept of heterogeneity for Big Data Architectures (Chapter 2); 2. Investigation of the performance of Big Data systems (e.g. Hadoop) in virtual- ized environments (Section 3.1); 3. Investigation of the performance of NoSQL databases versus Hadoop distribu- tions (Section 3.2); 4. Execution and evaluation of the TPCx-HS benchmark (Section 3.3); 5. Evaluation and comparison of Hive and Spark SQL engines using benchmark queries (Section 3.4); 6. Evaluation of the impact of compression techniques on SQL-on-Hadoop engine performance (Section 3.5); 7. Extensions of the standardized Big Data benchmark BigBench (TPCx-BB) (Section 4.1 and 4.3); 8. Definition of a new benchmark, called ABench (Big Data Architecture Stack Benchmark), that takes into account the heterogeneity of Big Data architectures (Section 4.5). The thesis is an attempt to re-define system benchmarking taking into account the new requirements posed by the Big Data applications.
    [Show full text]
  • 51.Dr.Ajoy-Batta-Article.Pdf
    www.TLHjournal.com Literary Herald ISSN: 2454-3365 UGC-Approved Journal An International Refereed English e-Journal Impact Factor: 2.24 (IIJIF) Franz Kafka and Existentialism Dr. Ajoy Batta Associate Professor and Head Department of English, School of Arts and Languages Lovely Professional University, Phagwara (Punjab) Abstract: Franz Kafka was born on July 3, 1883 at Prague. His posthumous works brought him fame not only in Germany, but in Europe as well. By 1946 Kafka‟s works had a great effect abroad, and especially in translation. Apart from Max Brod who was the first commentator and publisher of the first Franz Kafka biography, we have Edwin and Willa Muir, principle English translators of Kafka‟s works. Majority studies of Franz Kafka‟s fictions generally present his works as an engagement with absurdity, a criticism of society, element of metaphysical, or the resultant of his legal profession, in the course failing to record the European influences that form an important factor of his fictions. In order to achieve a newer perspective in Kafka‟s art, and to understand his fictions in a better way, the present paper endeavors to trace the European influences particularly the influences existentialists like Kierkegaard, Dostoevsky and Nietzsche in the fictions of Kafka. Keywords: Existentialism, absurd, meaningless, superman, despair, identity. Research Paper: Friedrich Nietzsche is perhaps the most conspicuous figure among the catalysts of existentialism. He is often regarded as one of the first, and most influential modern existential philosopher. His thoughts extended a deep influence during the 20th century, especially in Europe. With him existentialism became a direct revolt against the state, orthodox religion and philosophical systems.
    [Show full text]