Big Data Technology Special Tech Spark, H2 2016

BIG DATA TECHNOLOGY SPECIAL TECH SPARK, H2 2016 6 Big data in financial services: past, present and future 28 Enterprise Blockchain Accelerator: Join us! 36Drive fast, flexible VaR aggregation with Spark INTRODUCTION 3 Editorial 4 Financial services Tech Radar TECH SPARK, H2 2016 SPARK, TECH FINANCIAL SERVICES BIG DATA USE CASES Big data in financial services: 6 past, present and future Case study: Banking on NoSQL 10 for global data distribution 15 Electronic trading and big data Interactive notebooks for rapid big 19 data development Seven golden rules for diving into 23 the data lake BLOCKCHAIN 28 Enterprise Blockchain Accelerator: Join us! Blockchain and graph: more than the 32 sum of their hype? STREAMING AND PERFORMANCE 36 Drive fast, flexible VaR aggregation with Spark 40 Data islands in the stream Batch, stream and Dataflow: 44 what next for risk analytics? 50 Minimise data gridlocks with Mache CONTENTS EDITORIAL For the last six years the financial services sector has struggled to keep pace with the overwhelming growth in big data, cloud, analytics and data- -science technologies. The situation reminds me of the music industry. Once you had pop, rock, R&B, blues and a few other distinct genres and it was simple – you knew what you did and didn’t like. Now, a new music genre emerges every three months, and such clear definitions are a thing of the past. Sound familiar? Big data can’t simply be categorised as a branch of enterprise architecture, databases or analytics. It is a completely new IT genre with blending, processing and mutating of data at scale to create the 4Vs – volume, velocity, variety and veracity – and more. Big data forms a baseline platform, which brings us to the realisation that we are building something completely new – and much bigger. The impact of big data thinking is as profound as the emergence of a new Neil Avery programming language (and we are seeing a lot more of them too). It is CTO, Excelian giving rise to a new industry – one standing on the shoulders of giants. Luxoft Financial Services Within this new landscape we see sub-genres rapidly evolving around graph-databases, streaming, cloud and DevOps, all with at least 17 new ways of solving old problems. The curious thing about this ‘new industry’ is that it makes the old one obsolete, replacing it with something as disruptive as the IT revolution was in the past. I have no doubt that we are on the cusp of the ‘next big thing’ and that the future of financial services technology will be transformed by the power of data, cloud, streaming, machine- -learning and internet of things. More importantly, it will be fundamentally different, with more giants and more shoulders. With so much to cover, collating this issue of Tech Spark was both challenging and exciting. In any market, the emergence of new capabilities that enable dramatically different ways of doing things creates huge opportunities for disruption and financial services is no exception (the latest example is of course blockchain). Only a year ago this story would have been very different. Of course, big challenges remain, not least hiring people with the right skills to tap the rich potential of new, existing and unrealised use cases. But with everything to play for, I hope that this publication will give you more insight and inspiration as you venture forward on your big data journey. EDITORIAL BOARD Neil Avery | Andre Nedelcoux EDITORIAL AND MARKETING TEAM Martyna Drwal | Lucy Carson | Alison Keepe CONTRIBUTORS Deenar Toraskar | Mark Perkins | Conrad Mellin | Raphael McLarens | Thomas Ellis | Ivan Cikic | Vasiliy Suvorov | James Bowkett | Darren Voisey | Jamie Drummond | Theresa Prevost | Aleksandr Lukashev | Alexander Dovzhikov DESIGNED BY S4 FINANCIAL SERVICES STREAMING AND INTRODUCTION BIG DATA USE CASES BLOCKCHAIN PERFORMANCE TECH SPARK, H2 2016 | 3 Neil Avery nspired and encouraged by On the Tech Radar, the category we’ve ThoughtWorks, we have created flagged as ‘hold’ means that we generally see the first Excelian Tech Radar. It’s a this space as having matured sufficiently, that snapshot from the last 12 months it has slowed and that other more creative thatI captures our key industry observations and unique approaches could be explored. based on work in capital markets with most But as expected, the largest single group of of the tier one and tier two banks in London, technologies falls within the ‘to investigate’ North America and Asia-Pacific. category: this reflects a growing appetite for R&D investment. Some trends stand out as As a practice, we’re constantly looking to particularly noteworthy. learn, lead and stay abreast of top technology trends – and separate the hype from the fact, while understanding how hype can fuel demand. FINANCIAL SERVICES STREAMING AND 4 | TECH SPARK, H2 2016 INTRODUCTION BIG DATA USE CASES BLOCKCHAIN PERFORMANCE THREE TECHS TO TRACK The first wave of hype is around the universal appeal mostly attracting hardcore tecchies with little more and uptake of Spark, which provides everything that than three years in tech development. Who’d have was promised and beyond. What’s more, as one of the thought infrastructure could have such appeal? key innovations in the big data arena – Kafka being the Finally, the biggest tsunami is blockchain. They say other – it has really helped to drive big data adoption there’s a blockchain conference in the US every day. and shape its maturity as part of a viable business It’s therefore no surprise that early innovators are strategy. We also see innovation with Kafka K-Streams scrambling, Fintechs are all the rage and incumbents and Apache Beam pushing the streaming paradigm are joining the R3 consortium to embrace the threat further forwards. rather than risk disruption from new players. It’s a two- The second wave of hype is around lightweight pronged hype cycle that we haven’t seen in a very long virtualisation tool Docker. With its shiny application time, being industry led just as much as it is technology containers, it’s witnessed a two-year growth frenzy, led. e B Apache Spark / Streaming / SQL Neo4J Couchbas Coherence Orient DB Oracle Oracle DS Graph MongoD Titan Hazelcast Streaming Google TensorFlow Cassandra Apache Mahout Analytics e Stor / NoSQL NoSQL Graph d Gri Spark ML B Redis NoSQL / InfluxD Apache Kafka Storage ML old Rethink DB H 0MQ t No SQL / Messaging p Store H2O o Dataflow d A Databricks I n Notebook Financial Google v e TensorFlow Apache Zepplin Analytics s Services Cloud t i g Apache Beam AWS Tech Radar a l t a Lambda e i r Presto T Azure Grid Container Mesos ML Tech IBM Symphony Fabric 8 m HPC Server Blockchain Docker Lang / Platfor Kubernetes Reactive Rancher Analytics BigChainD Microservices Data / Msg Go lang Ripple Ethereum Akka fi R3 B e Queues Apache Flink Apache Ni and Queu and Chronicle Map Map Chronicle FINANCIAL SERVICES STREAMING AND INTRODUCTION BIG DATA USE CASES BLOCKCHAIN PERFORMANCE TECH SPARK, H2 2016 | 5 Neil Avery hen the genesis of Hadoop weathered countless storms. But throughout emerged from a Google file the ups and downs, big data has continued W system paper published in to evolve into the all-pervasive force that it 2003, it was the seed that is today. So how did big data get so big and launched the big data technology revolution. where’s the smart money on where it will it Since then, the financial services sector has go from here? FINANCIAL SERVICES STREAMING AND 6 | TECH SPARK, H2 2016 INTRODUCTION BIG DATA USE CASES BLOCKCHAIN PERFORMANCE WHERE IT ALL GLOBAL BEGAN TURMOIL Google had been using the precursor In 2008, the sector was rocked with the collapse of the global markets technology to Hadoop in production – triggering what became widely termed the credit crunch. Not since for almost 10 years before publishing the 1929 Wall Street Crash had the financial community seen 12 months its 2003 paper. MapReduce was like it. Merrill Lynch, AIG, Freddie Mac, Fannie Mae, HBOS, Royal Bank published in 2004 and finally, in of Scotland, Bradford & Bingley, Fortis, Hypo and Alliance & Leicester all January 2006, Hadoop was formally had to be rescued from the brink of collapse. Lehman Brothers didn’t hatched from Nutch 107. The next escape so lightly and filed for bankruptcy. two years witnessed a whirl of activity Since then, recovery has seen waves of regulation forced into in the technology stack, paving the institutions. Reporting and compliance is now an industry in itself, one way to the first Hadoop Summit in which costs billions to run. As a result, for a long time financial services March 2008. HBase, a columnar store IT providers were so focused on reporting that it diverted their attention based on Google’s BigTable had also from innovation. But while MIFID 2, Frank-DODD, FRTB, C-CAR and emerged in 2007 and today forms the other regulatory standards continued to impact the shape and pace of foundation for many NoSQL stores innovation, the mood was starting to change. including Apache Cassandra. Although many anticipated rapid change in the way we develop technology, it wasn’t so clear back then where the change would come from and the central role that big data would come to play. FRESH IMPULSE adopting data-warehouse replacements by using the de facto Hadoop FOR INNOVATION standard. Having at last overcome many of its past challenges around performance, batch-style semantics and complexity, Hadoop is seen as a true data platform. It is the data lake where a plethora of tools The last three years have seen a are available to build any type of ecosystem and support multi-faceted renaissance of IT innovation across views for different groups of users.

Big Data Technology Special Tech Spark, H2 2016

Regeldokument

Trifacta Data Preparation for Amazon Redshift and S3 Must Be Deployed Into an Existing Virtual Private Cloud (VPC)

Portable Stateful Big Data Processing in Apache Beam

The Forrester Wave™: Streaming Analytics, Q3 2019 the 11 Providers That Matter Most and How They Stack up by Mike Gualtieri September 23, 2019

Scalable and Flexible Middleware for Dynamic Data Flows

Big Data Analysis Using Hadoop Lecture 4 Hadoop Ecosystem

Researching Algorithmic Institutions Essay

CIF21 Dibbs: Middleware and High Performance Analytics Libraries for Scalable Data Science

Issues at the Intersection of AI, Streaming, HPC, Data Centers And

Apache Beam: Portable and Evolutive Data-Intensive Applications

Spring 2020 1/21

Code Smell Prediction Employing Machine Learning Meets Emerging Java Language Constructs"