Issue June 2015 | presented by www.jaxenter.com #45

The digital magazine for enterprise developers Trust in Rust 1.0

The cons of and Slowing down peer reviews

Taking control of your CV Where to get the best programming experiences

Automated End-to-End Testing ... and way more tips and tutorials ©iStockphoto.com/highhorse Editorial

Get more experience

What makes a good ? Every developer will on a tour of the new release of Rust 1.0, still fresh off the give you a different answer. But assuming you already have a shelves. In the testing section, Claire Fautsch and Daniel Wit- combination of talent, education and fortunate social circum- kowski are walking us through automated end-to-end test- stances, what else is it that will guarantee you a bright career ing and handy open-source testing tools. From the Big Data in IT? Is there some combination of special ingredients needed world, we have Pierre Fricke showing us the new Foreign Data for a CV to impress employers? If you ask Shutterstock’s Di- Wrappers in Postgres. If you’re interested in FinTech, Lars rector of Engineering, the one thing that really counts is real Markull is predicting big things for Banking Service Providers. life development experience. “To gain real world skills, de- Meanwhile, we’re also taking at look at how to avoid hiccups velopers have to create and take control of their own curricu- when moving to a new version of Oracle SOA or a distributed lum,” writes Sandeep Chouksey in this latest issue, explaining system like Git or Mercurial. Finally, we’ve exactly where it is that straight-out-of-uni can also got an overview of the big JavaScript frameworks for any- begin finding gems for their CV. one that wants to look beyond the horizon of AngularJS – all In keeping with this line of thought, we’ve got an especially in all, plenty of places to continue gaining new experiences. high amount of IT experiences for you in this JAXMagazine issue. First up, Rust team member Huon Wilson is taking us Coman Hamilton, Editor

Why you should take a closer look at Rust 1.0 4 Rust 1.0: Safe Systems Programming Huon Wilson

Index Taking control of your own curriculum 7 Finding the best programming experiences Sandeep Chouksey How Git and Mercurial can hurt your code review 9 The problems of DVCS Marcin Kuzminski Automated end-to-end testing using Vagrant, Puppet and JBehave 11 Regression Testing Dr. Claire Fautsch Improving response time and latency measurements 15 Open-source tools for better testing Daniel Witkowski Disrupting Finance IT with Banking Service Providers 20 Making it easier to disrupt finance IT Lars Markull Siloed data – the new Postgres feature 22 The data runs through it Pierre Fricke 7 tips for a successful Oracle SOA Suite upgrade 24 Keeping major and minor upgrades smooth Matt Brasier Choosing the right JavaScript framework 26 AngularJS, Ember or Backbone Amanda Cline Enhancing quality of experience for content-rich sites 28 Giving users what they want when they want it Parvez Ahammad

www.JAXenter.com | June 2015 2 Hot or Not

Rust 1.0 Rust 1.0 has arrived and people are pretty happy about it. You don’t need a garbage collector or runtime to achieve super duper control of performance with added security, meaning the newest competitor to the likes of C, C++ and Google’s Go is ready to kick ass. It’s stable, too – the Rust team want to be the foundation blocks of your apps and libraries, with the marriage of stability and regular release cycles meaning they’re ready for a big commitment *wink wink*.

Netflix OSS is special Sharing salaries And we mean special in a nice way, with #talkpay of course – with Netflix OSS taking home the Special Jury Award at this In a bid to try and kickstart interest and year’s JAX 2015 conference in Mainz, dialogue about the issue of pay inequal- Germany. The entertainment platform ity, developer Lauren Voswinkel plainly is big on open source and just, well, disclosed her salary, job title and experi- big … you know? From huge libraries ence under the hashtag #talkpay – and a to huge amounts of open-source tech, lot of devs and tech heads followed suit. Netflix knows that bigger is best and Normally a taboo topic, talking openly puts its success down to the fact that about pay is what Voswinkel believes their firm is massively into the open will help make the act more common- approach. Netflix’s openness “result- place, as well as clearly addressing the ed in our projects being of much high- issue. Are you a woman in tech? Then er quality than if we had kept them #talkpay is for you. From an ethnic mi- internal”, says their Senior nority? Still for you. A member of the Engineer Ben Christensen. And that LGBT community, or basically not the probably goes for all projects making stereotypical white male techie? Then the use of their tools. initiative aims to serve you.

Jigsaw Schmigsaw R.I.P Agile So Java 9 will land in September 2016, If the creators of your favourite but Project Jigsaw plans to ruin the software development method welcome party. Nicolai Parlog deliv- are calling for its head, then you ered the breaking news (chuckle) that know there’s trouble brewing in the project will likely f*ck up exist- the ­Agile camp. Dave Thomas and ing code. Whilst the breach depends Andy Hunt have both declared on what you’re working on, internal death upon the Agile house, la- APIs and JARs may become unavail- belling the framework akin to a able and have you banging your head meaningless collection of apho- against a wall moreso than usual with risms and marketing slogans. The Java. You might also want to check if idea of “inspect and adapt” as the you’re using the extension mechanism, basis of ­Agile has been supplant- which can be made available to all ap- ed by “A­gile zealots” says Hunt, plications running on the JDK, because so he’s created a new framework that’s going bye-bye come Java 9. Be- in his own attempt to usurp the fore the move, get yourself schooled to usurpers (The GROWS Method). the nines. Piss off, half arsed Agile-buffs.

www.JAXenter.com | June 2015 3 Language ©iStockphoto.com/highhorse

Rust 1.0: Safe Systems Programming Why you should take a closer look at Rust 1.0

Blazingly fast performance, prevention of nearly all segfaults and low-level control and performance without sacrificing safety or abstractions – these are the promises made by the 1.0 release of Rust. And that’s just the start. Rust team member Huon Wilson shows us what’s most exciting about the Mozilla-sponsored systems programming language.

by Huon Wilson • Abstraction without overhead • Stability without stagnation The Rust programming language has just reached 1.0, after several years of iterative improvement. It is a modern sys- No garbage collection tems language, designed to give you low-level control, high Garbage collection is a powerful tool in software engi- performance and powerful concurrency – combining many neering, freeing you from worrying about keeping track of of the best features from other languages, while losing the memory manually and allowing you to get on with the task worst pitfalls of traditional systems languages like C or C++. of writing great code. It’s great when it works, but garbage To do this, it overcomes many of the traditional trade-offs, collectors have real downsides that make them inappropriate providing: for many areas. Things like operating systems, embeddable libraries and (soft) real-time applications often need a greater • Memory safety without garbage collection degree of control and predictability than garbage collection • Concurrency without data races can offer.

www.JAXenter.com | June 2015 4 Language

owner by default, so sending a value to a new thread through a channel will ensure the original thread doesn’t have access “The key concepts in to it: statically disallowing sharing. However, message passing is just one tool in your toolbox: Rust are ownership shared memory can be immensely useful. The type system ensures that only thread-safe data can actually be shared be- and borrowing.” tween threads. For example, the standard library offers two sorts of reference counting: Arc provides thread-safe shared memory (immutable by default), while the Rc type offers a Rust allows developers to forgo a garbage collector entire- performance boost over Arc by forgoing the synchronization ly, without being thrown back into a world of forgotten frees, needed for thread-safety. The type system statically ensures dangling pointers and segfaults. The key concepts in Rust are that it is not possible to accidentally send an Rc value from ownership and borrowing. These ideas are ubiquitous in pro- one thread to another. gramming and an important part of modern C++, but unlike When you do want to mutate memory, ownership provides other industry languages, Rust puts them front and centre, further help. The standard library Mutex type takes a type pa- statically checking and leveraging them to guarantee memory rameter for the data that is to be protected by the lock. Own- safety without a garbage collector, something that has been ership then ensures that this data can only be accessed when previously unthinkable. the lock is held; you cannot accidentally release the lock early. Rust’s idea of ownership is that each value has exactly one This sort of access-control guarantee falls automatically out parent that has complete control. As values get reassigned, of Rust’s type system and is used in many places through the placed into data structures or passed into functions, they standard library itself and more broadly. move and are statically no longer accessible via their original path. And if they are not moved away at the end of a scope, Zero-cost abstractions they are automatically destroyed. To make ownership work Performance and predictability is one of the goals of Rust, at scale, Rust also provides ways to temporarily “borrow” and an important step to achieving that while still offering the (make a pointer to) a value for the duration of a scope. safety and power required is zero-cost abstractions à la C++. As a bonus, ownership replaces more than just garbage col- Rust lets you construct high-level, generic libraries that com- lection. It is vital to Rust's concurrency guarantees, and even pile down to specialized code you might have written more removes other classes of bugs like iterator invalidation. It also directly for each case. applies to resources other than memory, freeing you from To do this, Rust gives precise control over memory layout: managing when to close sockets or files, for example. data can be placed directly on the stack or inline in other data structures, and heap-allocations are much rarer than in most Concurrency managed languages, helping achieve good cache locality, an As mentioned, ownership also ensures your concurrent pro- extremely large performance factor on modern hardware. grams won’t fall prey to some of the most insidious problems This simple, direct layout of data means optimizers can re- that can occur in them: data races. And all while maintaining liably remove layers of function calls and types, to compile a weak memory model, close to that used by hardware. high-level code down to efficient and predictable machine Getting started with concurrent programs in Rust is simple, code. Iterators are a primary example of this, the following pass a closure to a function from the standard library (List- code is an idiomatic way to sum the squares of a sequence of ing 1). 32-bit integers: One of the tenets of many languages designed for concur- rent programming is that shared state should be minimized fn sum_squares(nums: &[i32]) -> i32 { or even outlawed entirely, in favor of techniques like message nums.iter() passing. Ownership means that values in Rust have a single .map(|&x| x * x) .fold(0, |a, b| a + b) } Listing 1 use std::thread; This always runs as a single pass over the slice of integers, and is even compiled to use SIMD vector instructions when fn main() { optimizations are on. let some_string = "from the parent"; Powerful Types thread::spawn(move || { Traditionally, functional programming languages offer fea- // run on a new thread tures like algebraic data types, pattern matching, closures and println!("printing a string {}", some_string); flexible type inference. Rust is one of the many recent lan- }); guages that don’t fit directly into the functional mould that } have adopted those features, incorporating all of them in a way that allows for flexible APIs without costing performance.

www.JAXenter.com | June 2015 5 Language

The compiler ensures that you handle all cases (a catch-all clause is opt-in), greatly aiding refactoring. These enums also allow Rust to forgo the so-called billion- “Rust fills a niche sometimes dollar mistake: null references. References in Rust will never be null, with the Option type allowing you to opt-in to null­ considered impossible.” ability in a type-safe and localized manner.

Conclusion Rust is sponsored by Mozilla, which is interested in a lan- The iterator example above benefits from many of these guage that can replace C++’s performance and zero-cost ab- ideas: it is completely statically typed, but inference means stractions for web browser development, while guaranteeing that types rarely have to be written. Closures are also crucial, memory safety and easing concurrent programming. allowing the operations to be written succinctly. Rust fills a niche sometimes considered impossible: pro- Algebraic data types are an extension of the enums found in viding low-level control and performance without giving up many mainstream languages, allowing a data type to be com- safety or abstractions. Of course, there’s no free lunch: the posed of a discrete set of choices with information attached to compiler has a reputation as being a demanding assistant who each choice (Listing 2). doesn’t tolerate even any risk, and the ownership model, be- Pattern matching is the key that makes manipulating these ing a bit unfamiliar, takes some time to learn. types easy, if shape is a value of type Shape, then you can The 1.0 release comes with the core language and libraries handle each possibility (Listing 3). being tested and refined, and, importantly, the first guaran- tee of stability: code that compiles now should compile with newer versions for the foreseeable future. However, this re- Listing 2 lease doesn’t mean the language is done: Rust is adopting a train model, with new releases every six weeks. New and un- struct Point { stable features can be explored via regular pre-release betas x: f64, and nightlies. There is a standard package manager, Cargo, y: f64 which has been used to build up a growing ecosystem of li- } braries. Like the language, this ecosystem is young, so there’s not yet the wide breadth of tooling and packages that many enum Shape { older languages offer (although a performant, easy FFI helps Circle { with the latter). Nonetheless, the language itself is powerful, center: Point, and a good way to do low-level development without the tra- radius: f64 ditional danger. }, Rectangle { top_left: Point, bottom_right: Point }, Triangle { a: Point, b: Point, c: Point } }

Listing 3 match shape { Shape::Circle { radius, .. } => println!("found a circle with radius {}", radius), Shape::Rectangle { top_left: tl, bottom_right: br } => { println!("found a rectangle from ({}, {}) to ({}, {})", tl.x, tl.y, br.x, br.y) } Huon Wilson is a computational statistics post-graduate student, previ- Shape::Triangle { .. } => println!("found a triangle"), ously a computational algebra honours student, as well as a member of } Rust’s core team.

www.JAXenter.com | June 2015 6 Career ©iStockphoto.com/faberfoto_it

Finding the best programming experiences

Taking control of your own curriculum

There are things you learn at university. And there are skills you can only ever learn in the real world. Shutterstock’s Director of Engineering Sandeep Chouksey gives us his tips on the best places to find the practical experiences that make a good programmer.

by Sandeep Chouksey standing. Agile development methodologies and using version control are two, as well as how to package, dis- Throughout my career as a software engineer and technology tribute and deploy a project. You might learn the best prac- leader, I’ve helped companies find and develop talent. The tices for writing unit tests and integration tests and load tests. one constant: how unprepared most college grads are when You’ll get exposure to teamwork with a source management entering the industry. The process of obtaining a Bachelor’s tool or a continuous deployment environment and learn how Degree exposes engineers to new ideas and most important- to operate with non-technical professionals as well. ly, to how to learn. To gain real world skills, developers have If you can retain the big strategic picture and all of the steps to create and take control of their own curriculum. to execute on it effectively, you’ll have the kind of valuable experience companies crave which you can learn during an Get real life software development experience internship. Those experiences help your resume stand out, Internships are the best way to get real life experience. The with the promise of having an easier transition from school tangibility of using real-world technologies, real-world data to career, simply because you’ll have less of a ramp-up period matched with real-world customers, creates a learning experi- when you start your new job. ence not found in a classroom. In the best internships, you are working as part of a team to solve customer problems – an ex- Be part of the community perience that is invaluable regardless of the type of company Focus on a specific technology, language, or tech stack that you ultimately choose to work for or to start. has captured your interest and start attending the local meet- There are many things you will learn quickly in an intern- ups to both network and to learn how different technologies ship in dynamic environments that will further your under- are progressing.

www.JAXenter.com | June 2015 7 Career

Another way to be part of the community and get experience try everything you know and you still can’t get something to is to contribute to Open Source projects. You’ll get experience work. That’s life. Being able to deal with uncertainty is one of working with teams, especially far-flung teams, and all of your the most important skills engineers can develop. work is public which enables future employers to evaluate your To understand if candidates are okay with truly not know- code. Two great projects I am happy to recommend which offer ing, I ask open-ended, ambiguous questions. One question I structure and guidance: Mozilla and Google Summer of Code. often ask is, “tell me how the browser works”. My goal is to get more insight into their thought process. At other times, I’ll Write tests and understand what coverage means focus on the customer complaints: “A call came in stating its Get in the habit of writing tests every time you write code. taking our homepage up to 30 seconds to load. What do you When you have an assignment, first write unit tests to run the do and how do you investigate?” code you’ll write. This will prove whether the answers are cor- Better yet, the question that really told us how potential rect, before you write code. Then, write the code. Build this candidates think was this question: “You’ve just hooked up a habit early, and your life as an engineer will be much easier. computer to the network. It’s not connecting to the internet. For whatever language you’re using, learn the test frame- What do you do?” There is no right answer, but regardless work. During your interview, it’s likely the hiring manager of the candidates’ answers, we responded with “that solution will pose a problem and ask you to propose a solution. One didn’t work, the computer isn’t online”. If the candidate asked way to make a big impression: Write out or at least talk about clarifying questions, we provided the most common case and how you would test your solution. It’s a simple way to stand kept the focus on troubleshooting the main issue. Persever- out in the interview process. ance, without frustration – well that’s a rare candidate to find.

Learn to profile your code It’s all good experience In the real world, you will rarely work on code where you Your career will be a lifelong learning process. There’s no have full access to evaluate every method being called by all doubt a job is in your future – as always it’s what you make the relevant libraries. It’s important to learn how to find bot- the opportunities. With a strong foundation, an open mind tlenecks in your code because, as an engineer, you will spend and a robust toolset, you can thrive in any environment and a good amount of your time on this part of the job. adapt as technologies, languages and industries change. A code profiler runs your application and identifies “hot This has been adapted from a post that originally ran on the spots” that took some proportion longer to run, relative to Shutterstock Tech blog. You can read the original here: bits. other parts of your code. Many languages offer tools for shutterstock.com/2015/05/20/taking-control-of-your-own- code profiling. SQL queries can be analysed using “explain” curriculum/ or query analyser tools depending on your database. There are also end-to-end load testing tools. Regardless of the tool, learn how to run and evaluate the results of a profiler or some type of analyser. Sandeep Chouksey is the Director of Engineering at Shutterstock, a lead- ing, global technology company providing high-quality licensed imagery and music to businesses, marketing agencies and media organizations. Get comfortable with ambiguity Sandeep focuses on driving quality by influencing product development There are no easy answers in business, no clear-cut solutions. by building self-service tools and frameworks that can be utilized in the The only solution is the one that you create. Sometimes, you development process.

Advert Love Java all over again.

Solve common performance issues, get affordable top-tier support or choose the exact Embedded Java footprint you want with Azul.

® Love Low Latency? choose Zing azul.com/zing

® Love Open Source? choose Zulu azul.com/zulu

Love the IoT? choose Zulu Embedded

Copyright © 2015www.JAXenter.com Azul Systems, Inc. All rights reserved. | June “Azul Systems”, 2015 “Zing”, “Zulu”, “ReadyNow!” and the Azul logo are trademarks 8 or registered trademarks of Azul Systems Inc. Java and OpenJDK are trademarks or registered trademarks of Oracle Corporation and/ azul.com/zulu-embedded or its affiliates in the United States and other countries. Git and Mercurial

The problems of DVCS How Git and Mercurial can hurt your code review

For all of their efforts in helping teams deliver more commits, Git and Mercurial have also introduced one significant problem: the slowing down of peer reviews. Marcin Kuzminski explains how “pragmatic groupings” can help.

by Marcin Kuzminski and then you waste an hour just sorting through the conflict- ing merges. This can happen all the time. Five years ago, the telecommunications company I worked As the development process changed, everyone knew that at started to move from Subversion to distributed version the tools had to change. My new school colleagues and I were control systems (DVCS) like Git and Mercurial. The benefits changing our development workflow faster than Subversion were huge, but like anything new and exciting, there were could adapt. We were caught up in a type of “merge hell,” as we problems. tried to commit as often as possible, sometimes every 15 min- Git and Mercurial are both wonderful at letting software utes. You always needed a connection to the office and working developers commit frequently, from anywhere, even re- from home was a nightmare due to constant communication motely. The ease of commits allowed our team to go from lags. 5 commits a day to 100 commits a day. Then, we noticed a problem. Reading through the history of commits for peer Don’t depend on the server review of code became too much handle. In the short-term, When you don’t depend on a server connection, you can of- our peer review of code decreased in efficiency. ten work faster. This also means that you don’t have to con- We eventually developed a technique that I call, “pragmatic stantly look for places where you can get Internet access (or a grouping.” The technique can be used with either Git or Mer- VPN) just to be able to commit. With both Git and Mercurial, curial and requires no special technology. Just download Git everyone has a backup of everything (files, history), not just or Mercurial from the Internet and you can get started. Be- the server. fore I explain the techniques of pragmatic grouping, I need to With Git and Mercurial, anyone can become the server. provide some background information on why I moved from You can commit very frequently if you need to without Subversion. breaking the code of other developers. Commits are local. You don’t step on the toes of other developers while com- Subversion limitations for new school developers mitting. You don’t break the builds or environments of oth- When our development team used Subversion, everyone had er developers just by committing. People without “commit to be connected to the server to commit. This wasn’t scaling access” can commit (because committing in a distributed for our team as we became more distributed. We had a lot of version control system (DVCS) does not imply uploading people working on the same project and had constant prob- code). lems with merges and deployment. Our team members were This lowers the barrier for contributions. As an integrator, wasting huge amounts of time fixing merge problems. you can decide to pull their changes or not. Contributors can We were moving to a modern development model where team up and handle their own merging, meaning less work people commit as often as possible. Each developer records for integrators in the end. Contributors can have their own each small step. With Subversion you have to constantly wait branches without affecting others’ (but being able to share for commits. If someone else merges, you can have a conflict them if necessary). This new programming style can reinforce

www.JAXenter.com | June 2015 9 Git and Mercurial

natural communication since a DVCS makes communication In this way, you can commit frequently, but during the code essential. In Subversion, what you have instead are commit review, ask that people refer to pragmatics. For example, races, which force communication, but by obstructing your take twenty commits, and change it into a nice history of five work. pragmatic, clear steps. You should avoid steps, like “I fixed a typo” or “I renamed a function,” which are not helpful to the Exploding number of commits code review process. Other people in my organization were committing once per day This pragmatic history is a workflow that I use and devel- or even once every few days. Often times, these were developers oped after many years of using version control systems exten- working almost by themselves on a project. You can see by this sively. Both Mercurial and Git support history edits. When difference that there is a great range of workflows in collabora- people on my team approve the code with either Mercurial tive team development. My opinion is that the world is moving or Git, they make a note to squash down the commits into toward more collaboration and more frequent commits with a few steps. Then the developer rearranges their code into a better packaging or representation of a set of commits. smaller group of pragmatics and I click a button and accept The frequency of commits and the ability of the version the change. control system to handle frequent commits greatly improves This technique allows you to do things like bisect. For ex- collaboration. For my team of developers building Web and ample, if you’re trying to retroactively find where the bug was cloud-based applications, Subversion was driving us nuts. We introduced, you have clear steps to look at. Each of the steps were encountering so many problems adapting Subversion to covers an area of working functionality. You can then easily our new development process that we assigned one of the de- go to each step, dig down and identify the smaller step that velopers on our team to manage Subversion. We always needed caused the program to break. to wait for him to fix problems, sometimes leaving us unable to This also makes reading the commit history very efficient, work. With our development style so dependent on frequent which is needed for the new style of distributed development commits, this downtime of Subversion caused huge frustration. teams are moving to. Today, we’re committing ten to twenty times a day per per- If I read “fixed a typo” fifty times, it will get boring and I son. Sometimes a person will commit sixty times. In a small may lose motivation to dig down into the code. Developers team of five developers, we’ll have a hundred commits per day. may ask, “why do I even need to read the commit history?” By grouping the commits into pragmatics, it makes the his- Taming the commit history tory easier to read, making work more fun for each developer, Our best practice is to commit as often as possible to clear out and makes your team much more efficient. the working copy. Each developer marks out the steps of how they get from point A to point B. We then review all the indi- Summary vidual steps together as one change. I use a special terminol- When I first started to look at the new tools like Mercurial ogy called a “Pragmatic History” to describe the workflow. and Git, I quickly saw that they solved the painful points of For example, if a person does a pull request with twenty com- Subversion. That was pretty exciting because each member mits that are around a single idea, then we squash all twenty could commit as often as they wanted to. If they were online commits to one pragmatic. Going from twenty commits to one or offline, it didn’t matter. The merges and the whole history pragmatic is rare. A more typical scenario is to divide the twenty looked much better with Mercurial and Git. Both Mercurial commits into three pragmatics. A typical group of pragmatics: and Git were generally faster and made us more productive. My story of the pragmatic workflow illustrates what we 1. Testing did to handle the history of a large number of commits. 2. Implementation When we were on Subversion and committing once every 3. Documentation day or every few days, this problem didn’t occur. As we started to work more collaboratively, the number of com- Don’t enforce a single way to represent the workflow. You mits increased and we ran into problems with Subversion. should give developers on your team the freedom to do what Moving off of Subversion created a new set of challenges. they think is best. This is especially important in large enter- One of these challenges was how to deal with the large num- prises where it’s common to have mixed teams and each team ber of commits. A change in management of the workflow may have a different way of doing things. resolved these problems. For example, if you have a typo in a doc and you fix it, then you fix another typo, you should enforce a workflow policy of saying, “please just squash these changes into one pragmat- ic.” We tell our developers to take a few “pragmatic” steps. Marcin Kuzminski is a Python enthusiast, open source evangelist, and These are the major steps they did to achieve the functionality co-founder of RhodeCode, Inc. In addition to Python, Marcin has extensive experience with JavaScript, Java, Ruby, and C++. He’s worked as a soft- needed. Here’s another example group of pragmatics: ware developer and CTO in Poland, Israel, Germany, and USA. In his spare time, Marcin has been heavily involved in many open source projects in- 1. Created the function cluding; RhodeCode, VCS, Celery, Pyramid, Mercurial, Dulwich, Salt, and Libcloud. In the course of working on many large scale enterprise projects he developed the un- 2. Extended the function with additional parameters derlying RhodeCode Enterprise technology to bring additional collaboration, perfor- 3. Wrote tests for the function. mance, and usability to the teams and projects he worked with on a daily basis.

www.JAXenter.com | June 2015 10 Testing

Regression testing Automated end-to-end testing using Vagrant, Puppet and JBehave

What do you do when your intelligence solution reaches its limit? Dr. Claire Fautsch tells us what aspects of testing emerged as the most important and most efficient when her team at Goodgame Studios began migrating data and making production changes.

by Dr. Claire Fautsch events, which various sources send to RabbitMQ servers in a JSON format (see our previous article for more information). Due to a rapidly increasing number of new players using our Once the events reach the servers, they are consumed by a games, our existing business intelligence solution, based on Consumer component, which reads the events from the Rab- PHP and MySQL, has reached its limit, triggering the need bitMQ servers, validates the input, separates them based on for a new data warehouse solution. At Goodgame Studios, we their ID, and persists them to HDFS. decided against a “big bang” migration due to the amount of From HDFS, events are loaded into our analytical database data to be migrated and the importance of the existing solu- (HP Vertica) using ELT jobs. The jobs are scheduled and trig- tion in the day-to-day work of not only our data analysts, but gered by a component, referred to as Core. In addition, ex- also our marketing or conversion teams, to give an example. ternal data sources provide the supplementary data necessary Instead, we opted for a step-by-step migration, using an agile for aggregations. This includes, for example, marketing or project methodology. Furthermore, this approach would en- application download information. able the smooth transitions of users and reporting tools from the old system to the new one. Soon it became obvious that (regression) testing was a cru- cial part of the development process. The reasons for this are twofold. On the one side, data consistency between the old and the new system is essential. On the other, as the devel- opment team is working with new, partially unknown tech- nologies and systems, continuous rework is necessary. This includes the improvement and extensions of newly developed components and the optimization of already migrated ELT (extract load transform) processes. In this article, we will describe how we have automated end-to-end (E2E) testing. This would enable us to guarantee fully-tested ELT processes and regression tests before apply- ing any changes to production, at the same time reducing the often tedious manual effort required by regression testing.

High Level Architecture Overview To better understand the setup of E2E tests, we will briefly out- line the high level architecture of our data warehouse (DWH) solution (see Figure 1). The data in our DWH is based on Figure 1: High-level architecture

www.JAXenter.com | June 2015 11 Testing

Requirements We get asked relatively frequently why we use Vagrant and The requirements for the tests were very simple: not Docker. Indeed, Docker would have also met our needs. The main reason for not opting for Docker is that we want- 1. Tests should be automated (i. e., no manual intervention ed to remain as close to our productive state as possible. Of or data preparation necessary) course this is only possible within certain limits – mainly due 2. There should be the option to execute full regression tests to limited resources at the devs workstations – but at least at any moment using virtual machines rather than Docker containers keeps 3. Subsets of tests (e. g., related to a specific event) can be us a bit closer. executed separately 4. The tests should be executed in a self-contained environ- Puppet ment close to production (not necessarily in terms of To keep everything fully automated and simple for the end resources) user (developer), we decided to opt for a configuration man- 5. There should be the option to mock data from external agement system to provision our Vagrant boxes. A configu- sources (due to some request limitations on APIs, for ration management system allows system administrators to example) define the state of the company’s IT infrastructure and auto- 6. It should be easy for each developer to execute tests on his matically retain the correct state. development machine As no configuration management system was in use at our company at the moment of implementing the test environ- Taking requirements 4 and 6 into account, it quickly became ment, we opted for Puppet, since its initial learning curve clear that the test environment should be some kind of virtual seemed less steep than Chef’s and it fulfilled all our needs. machine. Puppet comes with its own declarative language. The system’s If we want to be as close to the production environment state is defined via Puppet manifest files, which are basically as possible, we cannot have the different components run on the Puppet programs. The manifest files contain a set of re- only one machine. We need separate VMs for HDFS, Vertica, sources, which describe the system. Manifests can be split up RabbitMQ, Consumer, and Core. Furthermore, the different in modules to give them a clearer structure and keep similar machines need to be able to talk to each other. After some functionalities grouped together and reusable. research, we came to the conclusion that Vagrant matches For example, let’s say we would like to set up a server with our needs perfectly. a MySQL database, a Java installation, and an installed moni- When it comes to adding content to the virtual machines, toring, and a second one with only a MySQL database setup. i.e., provisioning them, Vagrant offers different out-of-the-box We would then write a module for MySQL setup, which possibilities, such as shell scripts, Chef cookbooks, or Puppet could be reused to provision both servers. modules. Finally, we use JBehave together with Maven as a test Puppet installations usually work using a client-server prin- framework. This helps us to fulfill requirements 1, 2, 3, and 6. ciple with one Puppet master (the server) and Puppet agents on In the following sections, we will briefly describe each of the every node you want to provision. However, for our Vagrant frameworks used and outline why it was our tool of choice. provisioning, we opted for a setup without a server to keep the program simple and easy to run on every developer’s machine. Vagrant Vagrant is an open-source tool developed for creating virtual JBehave (development) environments. Its main purpose is to provide a JBehave is a Java-based framework for test automation designed framework for lowering development environment setup time for behavior-driven development (BDD). The framework al- and to avoid excuses such as “... but it works on my computer”. lows you to write test cases in a natural language. Each test case Vagrant is operated from the command line and provides a is a scenario, and related scenarios are grouped in one story. set of commands for basic usage. Each scenario is itself a set of steps. Each step is of one of Boxes are the main component of Vagrant. Boxes are pre- the types Given, When, or Then. A Given step represents a configured virtual machines for Vagrant environments. They precondition for a test case. For example, it allows you to are basically templates for the environment to be set up us- define what the input data should look like. A When step, ing Vagrant. There are several public catalogs for Vagrant on the other hand, defines an action happening, and a Then boxes, or you can build your own box using the vagrant box step describes the outcome. You can also use And to combine command. several steps of one type. Once the box is set up, you will want to install software For example, a story could look like in Listing 1. or change configurations, and this cannot be done manually. Vagrant offers so-called provisioners for this purpose, which allow you to handle this automatically. Either simple shell Listing 1 scripts or configuration managements systems such as Puppet Given 2 and 3 as input or Chef can be used for provisioning. If the company uses a When the numbers are summed up configuration management system, provisioning scripts can Then the result is 5 be reused to set up development environments as close to pro- duction as possible.

www.JAXenter.com | June 2015 12 Testing

This natural, syntax-like language even enables non-devel- of anything else. Each test case should include the data it opers like business people to write tests. needs in its given steps. However, at some point we still need to tell the systems We use Maven to launch the tests, and this is made easy what to do with the steps. for us with JBehave’s Maven plugin. Using Maven properties, For each step known to the system, there is an annotated this also allows us to run only specific test cases. Java method (within a POJO) representing the implementa- Currently we start the Vagrant environment manually us- tion. For the example given above, this would look like (high ing the vagrant up command. It needs to be fully started be- level) in Listing 2. fore tests can be run. In the future, we would like to include Also, if used for slightly different purposes than what it’s the Vagrant Maven plugin so that we can also control this intended for (which is BDD), JBehave also proved to be a completely automatically. good tool for our use case. Most of the ELT jobs which we want to E2E test have different content but often perform Conclusion similar steps, as for example: We did not introduce automated tests at the start of the pro- ject, but only a few months later once a lot of the code and • Given an Event X as input jobs had already been written. Consequently, there was a lot • When we run job X of extra effort required in the first step of setting up and con- • Then database table X should contain data Y figuring the environment and writing the test cases for the already existing jobs. This means that once we have a set of test steps defined by However, once you close that initial gap, developing auto- a developer, even a non-developer can write new test cases. mated test cases becomes part of the standard development process. It also now takes us a lot less time to execute regres- Putting it all together sion tests than it did before (manually). Furthermore, we have With Vagrant and Puppet, we have a self-contained test en- observed that writing these automated test cases is a helpful vironment setup, provisioned to reflect productive systems. step in the review process, especially if they are not written by Each developer can use the environment not only for testing the developer himself. Bugs can be identified at a much earlier but also for developmental purposes. stage, especially when testing edge and special cases. The JBehave tests need to run on an environment, ideally In general, we can conclude that the additional amount of starting with a clean environment for each set of test runs to time spent developing those test cases is still less than the time avoid negative side effects from previously failed runs. We spent on reviews, regression tests, and bug fixes. Additionally, have therefore set up our JBehave steps to work on the Va- the Vagrant environment has the nice side effect that each de- grant environment, and before each single test, we wipe the veloper has his very own development environment at hand. database, HDFS, and the queues to start with a clean environ- The main points we have concluded from our setup are that: ment. As this environment is used solely for the automated test, • Automated tests are not only a way to accelerate regres- we are not destroying anyone’s data or risking the breakage sion and E2E tests but also assist in the review of newly developed features. • Don’t think: “Oh, but we will lose time developing those Listing 2 tests”. The time you will gain by avoiding bugs and by public class Calculator{ minimizing manual tests, as well as the improved quality private int x; you will deliver, makes up for it completely. private int y; • Don’t wait until your project is far along to start automat- private int result; ing your (end-to-end) tests. Start from the beginning. • Plan to write automated tests as “standard” parts of your @Given("$xvalue and $yvalue as input") development process, the same as documentation or unit public void input(int xvalue, int yvalue) { tests. x = xvalue; y = yvalue; Goodgame Studios is a German online games company which } develops and publishes free-to-play web and mobile games. Founded in 2009, their portfolio now comprises nine games @When("the numbers are summed up") with over 200 million registered users. public void sum() result = x+y; } Dr. Claire Fautsch is Senior Server Developer at Goodgame Studios, where she works in the Java core team and is also involved in the data @Then("the result is $sum") warehouse project. Previously, she was employed as an IT Consultant in public void sum(int sum) { Zurich and Hamburg and as a Research and Teaching Assistant at the University of Neuchâtel (Switzerland). Here, Dr. Fautsch also obtained her Assert.equal(result,sum) PhD in Computer Science on the topic of information retrieval as well as her bach- } elor’s and master’s degree in mathematics. She enjoys exploring new technologies and taking on new challenges.

www.JAXenter.com | June 2015 13 Conference for JAVA Masterminds

October 12 – 14th, 2015 Business Design Centre, London

Very Early Bird 150 by Save £ 31.07.2015

“ JAX gives me the opportunity to influence a generation of developers with real passion to innovate and build high quality digital systems.” Jeff Sussna, Ingineering.IT

www.jaxlondon.com

Presented by Organized by Testing

Open-source tools for better testing Improving response time and latency measure- ments

In his last article on “Why it’s difficult to find performance problems during pre-production tests”, Daniel Witkowski described common mistakes that can be made during performance analysis. In this tutorial, Daniel shows us how to use available open source tools to improve the quality of your mea­ surements.­

by Daniel Witkowski • Average processing time – as total processing time divided by total number of samples. Frequently, response time or latency measurements are • Maximum processing time done from inside the application. Even more often, they are • Average Throughput – as total number of transactions done in the same method that should be monitored. This divided by time of the test would occur in most simple scenarios. In our case let’s use this approach to measure the processing Later I will describe why this set of parameters, picked for time of a simple task. monitoring, was wrong however this is very frequently a set There is a small change that this code does in relation to of data people collect during performance tests. what we think it should (calculate a sum from 1 to 800 mln) We would also like to see how the application profile but this is outside the scope of this article, so let’s assume it changes when there is some more load, so we will try to cre- does something that takes time. Also we would like to have some performance details, so let’s collect: Listing 3: JVM parameters

Listing 1: Processing time measurement GC monitoring: -Xloggc:gc.log -XX:+PrintGCApplicationStoppedTime jHiccup: -javaagent:jHiccup.jar="-d 0 -i 1000 -s 3" long start = System.nanoTime(); // do the real stuff long processingTime = System.nanoTime() - start; // save this processintTime for further analysis Listing 4: Performance reporting – code private static void print(long pTotalRunTime) { System.out.println(); System.out.format("Counter[%d]\tTotalTime[%d]\tAvgTime[%d]\t Listing 2: Processing method code Max[%d]\tThroughput (tps)[%.2f]", counter.longValue(), long sum = 0; totalProcessingTime.longValue()/1000/1000, for (int i = 0; i < 800000000; i++) { totalProcessingTime.longValue()/counter.longValue()/1000/1000, sum++; timecostMax/1000/1000, temp = sum; (double) counter.longValue()/pTotalRunTime*1000*1000*1000); } } temp = sum;

www.JAXenter.com | June 2015 15 Testing

Figure 1: Processing Time

Figure 4: jHiccup

Figure 2: Throughput

Figure 3: Total time Figure 5: GC ate some garbage in the background so the JVM (or Garbage to create a thread that is running all the time and allocating Collector) will have some work to do. some objects in memory. Assuming we learned something useful from the previous ar- When we open the GC log we see there are a lot of pauses ticle on pre-production testing, let’s also enable GC logs and use (red line) and GC is running frequently (blue line shows jHiccup for monitoring. We’ll also use CMS Garbage Collector memory utilization) but pauses are very consistent. Unless for some concurrent work (-XX:+UseConcMarkSweepGC). the application load is very unusual it is not something that This is a method that will print details about each cycle. As simulates real user traffic (Figure 5). When we open jHic- we use nanoTime() method for better (but not really accurate) cup we also see GC decreases consistently, but we now have measurements we divide it to present time in milliseconds and Throughput per second. The line from tests will look like this: Listing 5: Thread to simulate memory alloca- Counter[13] TotalTime[263] AvgTime[20] Max[30] Throughput (tps)[48.04] tion new Thread(new Runnable() { Base line without any additional load @Override We want to have a best-case scenario, so we run this code public void run() { without any additional tools enabled. int size = 1024 * 1024 * 500; The results are presented in the graphs. Throughput is at byte[] bsStatic = new byte[size]; fifty transactions per second level (Figure 1). Average time while (true) { is below 20 ms and maximum time is at 30 ms (Figure 2). // allocate data Processing time increases linearly (Figure 3). GC logs do not byte[] bs = null; show any pause and jHiccup also confirms that there were no bs = new byte[size]; pauses over 15 ms (Figure 4). All data is consistent. temp = bs.length; } First run with a custom memory fragmentation tool } Now we have base line results ready, let’s create some simple }).start(); code that will create some garbage. The simplest solution is

www.JAXenter.com | June 2015 16 Testing

pauses almost all the time which is not good (Figure 6). We can try to tune our GC Thread by adding some sleep time, randomizing sleep time and the amount of data allocated, but there is an easier way to do this. Azul Systems’ CTO Gil Tene created a Fragger tool that does exactly what we need. It is open sourced and available on GitHub so it’s really easy to use.

What is a Fragger? Fragger is a heap fragmentation inducer, meant to induce compaction of the heap on a regular basis using a limited (and settable) amount of CPU and memory resources. The purpose of HeapFragger is [amongst other things] to aid application testers in inducing inevitable-but-rare garbage collection events, such that they would occur on a regular and more frequent and reliable basis. Doing so allows the characterization of system behaviour, such as response time envelope, within practical test cycle times. HeapFragger works on the simple basis of repeatedly gen- erating large sets of objects of a given size, pruning each set down to a much smaller remaining live set after it has been promoted, and increasing the object size between passes such that it becomes unlikely to fit in the areas freed up by objects released in a previous pass without some amount of compac- Figure 6: jHiccup tion. HeapFragger ages object sets before pruning them down in order to bypass potential artificial early compaction by young generation collectors. By the time enough passes are done so that aggregate space allocated by the passes roughly matches the heap size (al- though a much smaller percentage is actually alive), some level of compaction is likely or inevitable. HeapFragger’s resource consumption is completely tunable; it will throttle itself to a tuneable rate of allocation, and limit its heap foot- print to a configurable level. When run with default settings, HeapFragger will occupy 10% of total heap space, and allo- cate objects at a rate of 50MB/sec. Fragger works as a Java agent, so there is no code change required. Multiple Java agents can be added to the Java com- mand line, so there is no risk when you already use some tools Figure 7: GC through the Java agent interface:

-javaagent:Fragger.jar="-a 1200 -s 512"

Fragger has some additional nice features as it ships with a pause detector and will output to system error stream a line with message when it detects pause. Threshold is configurable:

*** PauseDetector detected a 2252 ms pause at Thu May 21 15:12:33 CEST 2015 ***

Second run with Fragger This time when we open the GC log we see much better mem- ory profile (blue line). Also, we see a lot of pauses ranging from 0.1 second to 2.3 seconds (Figure 7). When we compare this with jHiccup we see a similar pro- file. There are frequent pauses in 0.5 second ranges and at the end they increase to over 2 seconds (Figure 8). Since we were able to see a similar profile in both GC logs and jHiccup we expect that this will have a visible impact on Figure 8: jHiccup data we measure in our application.

www.JAXenter.com | June 2015 17 Testing

Listing 6: LatencyUtils added to the code // initialized Latency Utils LatencyStats myOpStats = new LatencyStats(); … // add measured time to latency stats myOpStats.recordLatency(processingTime); … Figure 9: Throughput // when test is done just print stats Histogram intervalHistogram = myOpStats.getIntervalHistogram(); intervalHistogram.outputPercentileDistribution(System.out, 1000000.0);

prove latency measurements. This tool is called LatencyUtils and it is available on GitHub.

LatencyUtils from Azul Systems

Figure 10: Processing time The LatencyUtils package includes useful utilities for track- ing latencies. Especially for common in-process recording Throughput decreased and we see at the beginning simi- scenarios, which can exhibit significant coordinated omission lar Throughput as before (~50TPS) when it quickly drops sensitivity without proper handling. to over 20TPS (Figure 9). After a while it goes over 40TPS but then drops below 40TPS. If we compare this graph to The problem GC activity we can correlate more GC activity with less Latency tracking of in-process operations usually consists of Throughput. Starting from that we can experiment with simple time sampling around a tracked operation e.g. a data- Fragger settings, heap sizing and Garbage Collector tun- base read operation, for which latency statistics are being de- ing to find the best configuration for our application. There veloped may be surrounded by time measurement operation is one more enhancement we should do while monitoring immediately before and after the operation is performed, with Throughput. Currently, we show “total historical Through- the difference in time recorded in some aggregate statistics put” as it is calculated for total run time. This is good gathering form (average, std. deviation, histogram,. etc.) This enough for batch like application when we do not care how is later used to report on the experienced latency behaviour of Throughput changes over time as long as total work can be the tracked operation. finished on time. The problem with this extremely common form of latency When we look at response time results they are rather sur- behaviour tracking is that whenever pauses occur in the sys- prising (Figure 10). They did not change at all! Even though tem, latency statistics become significantly skewed toward we see 2.3 second pauses and jHiccup shows the 75 percen- falsely-optimistic values. This occurs in two key ways: tile is over 100 ms we still have the same results. We know they are wrong but why? If we only look at the response time • When a pause occurs during a tracked operation, a single profile we would assume we have a very stable and consistent long recorded latency will appear in the recorded values, application and we would not look to change anything at all! with no long latencies associated with any pending re- The problem we have here is related to the way we measure quests that may be stalled by the pause. and the way the JVM works. In our case, whole processing is • When a pause occurs outside of the tracked operation (and done in a loop that is usually not interrupted by GC threads, outside of the tracked time window) no long latency value so GC waits until our calculation finishes and stops. This would be recorded, even though any requested operation stop is not measured. We can add additional measurement to would be stalled by the pause. monitor time from last finish to current start but it will not have any business value. To help diagnose such problems, The solution Azul Systems created an open source tool that helps to im- The LatencyStats class is designed for simple, drop-in use as a latency behaviour recording object for common in-process Important note When we work with transactional applications it is much better to Important note calculate Throughput in some smaller time intervals. For the next test cycle we will keep all transactions including their timestamps When measuring response time or transaction latency, use at least and we will calculate Throughput in each interval. Let’s assume 1 one external monitoring tool to confirm your measurements are second as a good interval as we have such a profile for jHiccup. correct.

www.JAXenter.com | June 2015 18 Testing

latency recording and tracking situations. LatencyStats in- together with LatencyStats will transparently produce cor- cludes under-the-hood tracking and correction of pause ef- rected histogram values for the recorded latency behaviour. fects, compensating for coordinated omission. It does so by using pluggable pause detectors and interval estimators that LatencyUtils in action This time we used LatencyUtils on top of existing measure- ments in a really simple way. It is worth showing full output (I removed some higher per- centiles for the smaller list) so everybody can see the details. It is visible that 50 percentile is at 19 ms and 77 percentile is over 50 ms. In our code we had maximum at 31 ms. Also it is interesting to see that total number of samples is over 13,000 and this means that over 3,000 samples were added by Laten- cyUtils to cover for uncounted pause time of the JVM.

Improved Throughput measurement Figure 11: Throughput To improve Throughput measurement all results were stored in an ArrayList. After the test was finished, Throughput in each second was calculated. This is how Throughput changes Listing 7: LatencyUtils – test run statistics when GC activity is visible (Figure 11). Value Percentile TotalCount 1/(1-Percentile) It is clearly visible that GC pauses have a direct impact on application Throughput. Each time there is a GC event 17.96 0.000000000000 3 1.00 Throughput drops to 30TPS or below (from 50TPS level). 18.48 0.100000000000 1800 1.11 Using this data it is possible to tune the application (or JVM) 18.61 0.200000000000 2769 1.25 for some given goals – like sustained Throughput (in this case 18.87 0.300000000000 4243 1.43 if we remove startup phase we can have guaranteed 20TPS 19.14 0.400000000000 5353 1.67 and probably 30TPS with some small tuning) or application 19.66 0.500000000000 6917 2.00 availability (based on Throughput value from GC logs). If 19.92 0.550000000000 7535 2.22 we had not calculated Throughput in each second we would 20.19 0.600000000000 8016 2.50 have believed we had sustained Throughput of 40TPS. This 20.71 0.650000000000 8663 2.86 assumption would have at least 50% error. 21.50 0.700000000000 9287 3.33 23.99 0.750000000000 9922 4.00 Summary 51.64 0.775000000000 10254 4.44 In this article we have learnt a few important things: 104.86 0.800000000000 10583 5.00 167.77 0.825000000000 10918 5.71 • Response time measurement inside application code might 248.51 0.850000000000 11245 6.67 have a very big measurement error 352.32 0.875000000000 11577 8.00 • Always measure important metrics with at least one exter- 413.14 0.887500000000 11747 8.89 nal tool 482.34 0.900000000000 11911 10.00 • Use Java agents as a way to introduce additional tools to 566.23 0.912500000000 12081 11.43 the application (used in jHiccup and Fragger) 662.70 0.925000000000 12242 13.33 775.95 0.937500000000 12406 16.00 838.86 0.943750000000 12486 17.78 With very little overhead and freely available tools it is really 905.97 0.950000000000 12568 20.00 easy to extend your current monitoring framework through 1061.16 0.962500000000 12733 26.67 additional tools to confirm you do not miss a lot of transac- 1224.74 0.971875000000 12858 35.56 tions due to Coordinated Emission Problem. 1434.45 0.981250000000 12982 53.33 1778.38 0.990625000000 13104 106.67 2038.43 0.995312500000 13167 213.33 2139.10 0.996875000000 13188 320.00 Daniel Witkowski has worked at Azul System for over 6 years. He used to 2197.82 0.997656250000 13200 426.67 work with Azul Vega appliances, which had up to 768 cores of custom de- sign CPU, trying to optimize systems and scale them up to use more paral- 2231.37 0.998242187500 13207 568.89 lel cores and improve performance. Now he works with Azul’s Zing JVM 2281.70 0.999023437500 13216 1024.00 bringing concurrent and pauseless Garbage Collection to a broader audi- 2382.36 0.999926757813 13228 13653.33 ence. Daniel is actively engaged with developers and designers of cutting-edge Java 2382.36 1.000000000000 13228 systems, who are looking to overcome typical JVM problems with memory management. Daniel has a deep knowledge about performance analysis and tuning of J2EE sys- #[Mean = 146.33, StdDeviation = 337.25] tems. Formerly, he worked as a technical manager in various projects for telco, fi- #[Max = 2382.36, Total count = 13228] nance and e-commerce organizations and was responsible for designing and #[Buckets = 26, SubBuckets = 256] delivering business critical systems in areas of J2EE and system integration. He is frequently engaged as a consultant to improve application bottlenecks, system per- formance and scalability.

www.JAXenter.com | June 2015 19 Finance IT Image Licensed by Ingram Making it easier to disrupt Finance IT Disrupting Finance IT with Banking Service Providers The rise of new services in the financial service industry is inevitable. The new entrants in this indus- try – FinTech startups or other innovative players – are agile and decisive and are heavily capitalising on these advantages. However, the technical and regulatory requirements remain many and are often a cause for failure of new entrants. Banking Service Providers could decrease the problems the new players are facing. The offering of Banking Service Providers would foster innovation and allow new players to offer services in a faster and more secure way.

by Lars Markull The missing layer Financial data is strongly fragmented among many different The term Banking Service Provider has not yet found its way kinds of providers: banks, credit card companies, deposi- into the FinTech language. However, there are parallels to the tory banks and so on. And this fragmentation will only in- development of Payment Service Providers. During the rise of crease, since new entrants often create new silos. Providers ecommerce, startups were facing problems outside of their core of innovative new financial solutions could heavily benefit competence. Due to customer demand they had to integrate sev- from aggregating all available sources into their new of- eral different payment methods. This gave rise to Payment Ser- fering, since they will then be able to increase the touching vice Providers that were able to offer convenient solutions to all. points with customers. We could not see so many ecommerce shops nowadays if each Developing this connectivity to thousands of sources is re- and every shop had to manage all these methods by themselves. source-intensive, and a Banking Service Provider could offer The same will apply to the FinTech industry. this connectivity in a better and probably less cost-intensive

www.JAXenter.com | June 2015 20 Finance IT

way. The foundation of such a service would be the techni- All about the right context cal connection to financial sources, which could be based on The current FinTech development is just at the beginning: existing bank APIs or other infrastructures. new startups will continuously enter the space and innova- tive banks will develop new financial services as well. These Regulation as a driving force players can offer certain banking services in a better and less A Banking Service Provider adds different services around cost-intensive way than a bank does, hence, banking services its core offering. Legal conformity while accessing finan- will become more and more “unbundled”. For all of them, cial service providers will be a key selling point. This will banking in the right context is the motivation. In order to do become an even more important issue once the European so, startups and banks are already actively looking for Bank- Union has finalised the Payment Service Directive 2 (PSD2). ing Service Providers to use financial data in their newly de- figo is one of the fewBanking Service Providers and while veloped services. working with many FinTech startups or other innovative In the long term, the connectivity to new financial services players, we have realized that a Banking Service Provider could create yet another selling point: traditional players are takes our partners to the next level. The development teams actively looking for new services to implement in their existing of our partners are able to focus on their core product and offering. A Banking Service Provider will be in the position of do not have to manage the complex banking backend. The offering access to many different services through one single product outcome is much slicker and smarter since each access point – a potentially massive advantage in the future. party focuses on its strengths and the user benefits from services that are available for every bank and not a few selected ones. Lastly, speed is crucial in the current FinTech situation Lars Markull works at the leading German banking API provider figo and and as a Banking Service Provider we enable our partners additionally as an independent FinTech consultant. He combines experi- to quickly develop new services and they learn much faster ence from traditional banking and consulting jobs with a big passion for innovation and disruption. from the outcome.

Advert

20 NEW! QUOTES HIS TS TO THE JAVA AC R F IC MUSEUM E A T L R A JAVA M

O D

M E

T

E

A

N

R

T

S U C THE JAVA MUSEUM

20 YEARS OF JAVA PROGRAMMING

Introducing the first Java history museum — memories, quotes and photos from 20 years of Java programming.

Visit website www.JAXenter.com | June 2015 Follow @JavaMuseum on Twitter 21 Big Data ©iStockphoto.com/yewkeo

The data runs through it Siloed data – the new Postgres feature

With CIOs struggling to harness a patchwork of data solutions and siloes, newly enhanced Foreign Data Wrappers for Postgres are providing critical integration for consistent data man- agement and analysis.

by Pierre Fricke The need for solutions that can support new data types and evolving data demands, has given rise to “shadow IT” haunt- In today’s digital age companies are trying to store, manage ing the data centre. Developers are the biggest culprit and their and make sense of vast pools of information. Data has also tech tools of choice are NoSQL-only database solutions. Driven become more varied with the rapid proliferation of smart- by a need to deliver new types of applications faster and re- phones, new web apps for fun and business, and all the things spond more directly to business and operations departments that comprise the Internet of things, just to name a few. Con- employing agile development methods, application developers sequently the typical data centre now contains a patchwork are seizing on solutions that enable them to work faster. They’re of data management technologies. From enterprise class rela- spinning up clusters in the cloud or on-premises for new special- tional databases to standalone, niche NoSQL-only solutions ized applications, often working outside official controls. But to specialized extensions, the arsenal for managing data has by using NoSQL-only solutions to address specialized applica- become more diverse. tions, they’re adding silos of data to the enterprise environment.

www.JAXenter.com | June 2015 22 Big Data

Using NoSQL-only solutions, however, poses a host of challenges, complexities and even serious risks to the organi- zation. “By 2017, 50 % of data stored in NoSQL DBMSs will “Postgres has emerged as the be damaging to the business due to a lack of applied infor- mation governance policies and programs,” according to a data management solution recent Gartner report. This has pit application developers against database pro- for integration challenges.” fessionals – the database architects, DBAs and IT executives charged with maintaining data flow, stability and integrity throughout the enterprise. But Postgres has emerged as the data management solution for integration challenges with a The tool quickly became production ready and underwent a clever feature called a Foreign Data Wrapper that can inte- few changes, including being updated for PGXN point release grate data from disparate sources, like MongoDB, Hadoop and 9.2 support two years ago. Subsequently, EnterpriseDB and MySQL. Initial, FDWs had only read capabilities but developed and released to the open source community an en- recent enhancements to PostgreSQL enabled EnterpriseDB hanced FDW for MySQL, building on the work Page that to develop new FDWs MongoDB, Hadoop and MySQL hat had done. have write capabilities as well. These new FDWs are available Foreign Data Wrappers (FDW) enable DBAs to use the da- on GitHub. tabase as a single integration point to read, analyse and write to data from many remote data stores. Developers working Postgres – The data runs through it in C can create new FDWs using the hooks exposed by the Foreign Data Wrapper (FDWs) link Postgres databases with database, and many FDWs have been published in the Open other data stores and let users access and manipulate the data Source community. FDWs essentially bridge disparate data as if it were part of a native Postgres table. Indeed, FDWs en- sources, pulling data from other databases and inserting it able Postgres to act as the central hub, a federated database, into Postgres tables where users can work with it as if it were in the enterprise. It does this by using the JSON datatype, one native Postgres data. of the important features Postgres has added in recent releases The implementation for Postgres FDWs is based on the that suppose NoSQL capabilities. SQL standard SQL/MED (SQL Management of External Advances in NoSQL capabilities alongside longstanding Data), which was introduced in 2011 in PostgreSQL 9.1. It is relational database features have given Postgres the ability to a standard way of accessing external data stores ranging from support the new unstructured data types and programming SQL and NoSQL-only databases to flat files. FDWs provide methods common to many NoSQL products. In Postgres, you a SQL interface for accessing remote objects and large data can even combine unstructured data with relational tables, objects stored in remote data stores. all while maintaining compliance with the Atomicity, Con- This enabled developers in the community to build FDWs sistency, Isolation, Durability (ACID) principles of relational with read capability. Support for the SQL/MED standard was technologies as well as centralized business processing rules improved upon in recent Postgres releases. Most recently, and logic. PostgreSQL 9.3 added the ability for FDWs to support write This is possible because Postgres is extensible. Unique in the capabilities and that has opened up whole new opportunities database world, Postgres was developed with expansion in for organizations to use Postgres and these features to solve mind, making it easy to incorporate new data types, indexing integration challenges. schemes, languages, and much more without de-stabilizing or compromising existing features. Foreign Data Wrappers – Value for the enterprise and its Achieving this interconnectivity is a simple thing for Post- customers gres because of how the database has been designed. As it is Postgres FDWs offer IT organizations the ability to leverage an object-relational model, Postgres defines functionality as existing data deployments with an enterprise-ready relational simple objects. Objects can be manipulated or enhanced and DBMS with NoSQL capabilities in a unified platform. By new ones can be built. So creating say, a new data type like bringing these disparate data sources together, IT can offer up JSON as the perfect stand-in for data from foreign sources, holistic views of key entities like customer and partner. These is simply a means of following the sale rules as all the other holistic views allow new applications of engagement to make objects, and building and inserting an object. intelligent recommendations and take targeted actions on And as demand for new capabilities emerged, the open the behalf of their users and customers. This smart, focused source community behind the PostgreSQL project has simply customer engagement is the forefront of industry leadership built these new capabilities into the database. and customer satisfaction and underlines the crucial business value FDWs can play in today’s digital age. The History of Foreign Data Wrappers Four years ago, Dave Page, member of the PostgreSQL com- Pierre Fricke is Vice President of Products and Services at EnterpriseDB munity Global Development Group and EnterpriseDB Chief Corporation and was previously a Red Hat JBoss executive. Architect, helped develop and release the first Foreign Data Wrapper, an experimental version of an FDW, for MySQL.

www.JAXenter.com | June 2015 23 SOA ©iStockphoto.com

Keeping major and minor upgrades smooth 7 tips for a successful Oracle SOA Suite upgrade

There are enough reasons to have second thoughts about upgrading your Oracle SOA Suite. But if you know what to look out for, it’s easy to rule out any potential complications during your next version upgrade.

has highlighted a number of areas that are commonly the by Matt Brasier cause of complications, and so these are the areas where we focus our tips. Upgrading key infrastructure components such as Oracle SOA Suite can be a stressful process; as a key component in 1. Consider the SOA Suite upgrade as a whole many people’s software infrastructure the costs of a failed Oracle SOA Suite consists of many parts: as well as the core SOA Suite upgrade can be high. At the same time, due to its SOA components of BAM, BPMN, Mediator and their as- place as infrastructure, there are often management expecta- sociated database schemas, products such as Oracle BAM, tions that an upgrade will be simple and risk free. Oracle Event Processor, Oracle Web Service Manager all This article seeks to provide some tips that can help make need to be considered as part of the SOA Suite estate when sure your SOA Suite upgrade goes smoothly, whether it’s a planning an upgrade. An upgrade may need to be applied major version upgrade from 11g to 12c, or a minor version to multiple products to maintain compatibility, or products upgrade. Our experience of performing SOA Suite upgrades may need to have configuration or meta-data changes made

www.JAXenter.com | June 2015 24 SOA

to them to correctly function after the upgrade. As well as 5. Check your purging the SOA Suite components themselves, an upgrade will likely Purging is important because it prevents your SOA Suite da- involve deploying to a new version of WebLogic, and possibly tabase schemas filling up with entries relating to completed a new Java Virtual Machine. When planning your upgrade, process instances. These old and often irrelevant entries take it is therefore important to ensure that you know which com- up space in the database, but more importantly they clog up ponents you will be upgrading and that they are all working indexes and slow down performance. Oracle provide a set of together as a supported set of versions. purge scripts that can be scheduled to run against the data- As well as the individual components that make up your base to clean up some of this old data, and many organiza- SOA Suite estate, you should consider the environments that tions extend these scripts to take a more business oriented you have. If you plan to upgrade a reference or pre-produc- definition of old data. tion environment before upgrading production then you also After an upgrade it is important to remember that these need to consider what you will do if you find yourself needing scripts will not have been updated by the upgrade process that environment to reproduce production issues after it has to reflect any changes to the underlying database schema. been upgraded. Where the default SOA Suite purge scripts are being used you should update your database jobs to call these new 2. Understand your interfaces and the impact of an outage scripts. Where custom purge scripts are in use you should Key to having a successful upgrade is ensuring that data re- identify any necessary changes to the scripts based on the mains consistent and services don’t fail in unsupported ways. Oracle provided scripts, and make changes to your own To achieve this it is important that you understand all of the scripts as appropriate. systems that your SOA Suite communicates with, and how they will behave when SOA Suite is down for an upgrade. It 6. Test your back-out strategy may be necessary to shut down some of these systems before Back-out or rollback strategies are only any use if they have you start your upgrade, and bring them back up when it is been tested. Ensure that you have run through a trial run of complete. You may find that systems that cope fine with an any back-out strategy before you attempt the upgrade and outage of a few minutes (for example a server restart), but that you understand how long it takes. If you have a hard fail in an ungraceful way when the servers are down for many requirement that the system must be available at 7am on a hours for an upgrade. Don’t assume that just because a system Monday morning, and you know your back-out procedure is fine when you restart servers that it will be fine when you takes ten hours, then you need to be finished testing by 9pm perform the upgrade. on Sunday night and ready to make the decision not to back Where systems send data to SOA Suite, make sure you out of the upgrade. know what will happen to that data if SOA Suite is down, You also need to understand if there is a “point of no re- and if you need to be able to replay that data into SOA suite, turn” in your upgrade (either a technical step, or a time beyond you will need a (tested) approach for doing this. which there is no longer time to complete the back-out before the deadline) and ensure that before reaching that point you 3. Remember to update documentation have all the information and resources you need to make the After your upgrade is complete, you should make sure that you correct go or no go decision. update relevant documentation to reflect any changes. As well as just updating the version numbers in design documents you 7. Plan well should make sure that any support or test guides have updated As we can see, there are lots of things to think about when per- paths in them. This is very important to avoid someone acci- forming an upgrade of your SOA suite infrastructure, and the dentally starting an old version of SOA Suite when following a ones above are just a few of the more common causes of prob- set of instructions with outdated paths. lems. To make sure your upgrade goes well, you should have rehearsed both the sunny and rainy day scenarios, and have a 4. Test alarms well-established plan for how you will deal with any post-up- An area that has seen frequent changes, and that is the cause grade problems (How will you get the resources and expertise of many post-upgrade problems, is alarms. Oracle uses you require. Are the people that performed the upgrade around alarms (also known as scheduled tasks or timers) to schedule afterwards to be able to explain exactly what they did? etc.). actions in a flow to occur at a specific time. The impact of upgrades on alarms can be multiple, depending how they are used in the application, but at the very least you should con- sider whether any alarms are due to fire during the downtime of the upgrade and how to handle them. After the upgrade is complete you should check that alarms created before the Matt Brasier is a professional services consultant with thirteen years of ex- upgrade with trigger times post-upgrade still fire correctly, perience tuning Enterprise Java applications, and the author of IT books in- cluding the Oracle SOA Suite 11g Tuning Cookbook. He has spent time and that new alarms also work. working with Enterprise Java middleware, focusing on the performance as- If you have any problems you can refresh the alarm table pects of application servers and SOA platforms. Matt is a Principal Consult- or individual alarms from within Oracle Enterprise Manager ant at C2B2 – The Leading Independent Middleware Experts, specialising in SOA and Java EE, providing professional services on the leading Java middleware platforms. Fusion Middleware Control.

www.JAXenter.com | June 2015 25 Web

AngularJS, Ember or Backbone Choosing the right JavaScript framework

A look at the pros and cons of the AngularJS, Ember and Backbone JavaScript frameworks and the factors programmers need to consider when make the choice for their project.

by Amanda Cline ed templates that help in writing less code for complex apps. Almost everything needed for writing an application is built- JavaScript frameworks are becoming popular among web de- in, including components, routing and many other things. velopers for building single-page applications, as they assist in This saves developers from reinventing the wheel. keeping the code structured and maintainable, which eventu- Backbone.js: Backbone helps in making the code less clut- ally helps save a lot of time in the long run. Essentially, Java­ tered models, by implementing an event-driven communica- Script frameworks avoid the mess created by spaghetti code. tion between “Models” and “Views”. This framework helps There are many JavaScript frameworks available online, developers uncover the minimal set of models and collections, such as: Backbone, Ember, AngularJS, Knockout and many as well as UI primitives used for creating web apps using others. But, with too many options, choosing the right one JavaScript. becomes an overwhelming decision. Perhaps, you may try experimenting with one or two frameworks, but may still A Comparison of AngularJS, Ember and Backbone Java­ feel confused about selecting the one you would want to Script Frameworks master. Here we’ll discuss a few factors to understand which one of In this post, we’ll be comparing three of the most talked the three JS frameworks (i. e. AngularJS, Ember and Back- about JS frameworks: AngularJS, Ember and Backbone. But bone) will best suit your project needs. prior to that, let’s first have a brief overview of these frame- 1. Community Around the Framework works. You can always seek help from volunteers in the frame- works’ community regarding your queries. And so, the level A quick look at the AngularJS, Ember and Backbone Java­ of community support a JavaScript framework provides is an Script Frameworks important factor that needs to be considered when selecting AngularJS: This JavaScript framework give developers the a framework. Not to mention, the larger the community will ability to “extend the HTML vocabulary for web apps”. be, the more support you can expect in the form of tutorials Also, AngularJS helps in adding some control to the applica- and other useful resources. tions using data binding, controllers and plain JavaScript. Be- If you’ll look at both of these two comparison charts (Fig- sides this, the framework allows developers to create reusable ures 1 and 2), it is clear that AngularJS has attracted more components, and so much more. interest and has a more active community (evaluated in Ember.js: It is a great framework for creating highly pro- terms of opened and closed issues). This doesn’t mean that ductive web applications. It comes with Handlebars integrat- the two other two frameworks, i. e. Ember and Backbone,

www.JAXenter.com | June 2015 26 Web

only need to grasp a few con- cepts like Models, Collections etc. Plus, Backbone has great documentation. 3. Optimized for Performance Compared to AngularJS and Ember, Backbone performs much faster. The performance of both AngularJS and Back- bone frameworks may appear to you the same as when building applications with smaller pages. However, as the page grows, apps created using AngularJS two-way data binding might re- sult in decreased performance. On the other hand, since Backbone.js doesn’t perform data binding, you’ll need to write the binding on your own. Though this will make you write a large amount of code, but it will also help you focus more on improving performance without the need to make changes to the underlying framework.

Making a decision There’s no doubt that all three frameworks, i. e. AngularJS, Ember and Backbone, will be useful to any web developer. However, each framework con- tains its own set of pros and Figures 1 and 2: A comparison of AngularJS, Ember and Backbone cons, and so it is important for you to carefully review which render poor community support, but AngularJS’s growth is JavaScript framework provides a better set of advantages far better than the other frameworks. over the other for your project needs. We’ve tried to cover 2. Learning Curve three of the most important factors for choosing the right Once you’ve come to know about the framework that pro- framework. However, make sure to select one that best fits vides a better level of community support, next it is important your project’s specific needs. to determine which one can help you begin with the web ap- plication development task without much hassle. Put simply, you’ll have to identify which one of the three frameworks has an easy learning curve. AngularJS’s two-way data binding will save you from writing “boilerplate” (i. e. repetitive) code, when building a web development application. But, as your application be- comes more complex, you’ll find that AngularJS has a steep learning curve, as you will need to become familiar with lots of concepts which includes filters, modules, routing and more. Ember.js is easy to learn in comparison to AngularJS. But unlike the Backbone framework, Ember.js fails to offer guides on getting started with web app development. In essence, it Amanda Cline is one of the top-notch programmers currently serving for does not have good documentation. Xicom Technologies Ltd. Mobile App Development Company. She provides Compared to AngularJS and Ember frameworks, Back- concrete information on latest mobile technologies and also loves shar- ing tech information and ideas on application programming. bone is relatively simple and easy to use. That’s because you

www.JAXenter.com | June 2015 27 Web ©iStockphoto.com/mevans

Giving users what they want when they want it Enhancing quality of experience for content-rich sites

When it comes to satisfying users of image rich web applications, the speed and quality of every single image matters. Parvez Ahammad looks at the ways machine-learning algorithms and cloud application delivery solutions can improve the quality of experience.

by Parvez Ahammad increasingly enhanced, rich experiences with HD video and In the age of “IWWIWWIWI” (I want what I want when I complex images. As a result, websites and applications are want it), smartphones are increasingly becoming the go-to becoming fat and resulting in slow download speeds and sub- device for accessing websites, videos and applications, in ad- optimal experiences. dition to traditional desktop and laptop devices. According For the end user, the quality of experience (QoE) is defined to a recent eMarketer study, the number of smartphone us- by the speed and quality of multimedia content delivered ers worldwide will surpass 2 billion by next year. Depending to devices. When it comes to image-rich web applications, on the generation of devices, users will also experience vary- every single image matters – the speed and quality of a video ing speeds and feeds in web application delivery of photos or image received on users’ devices determines their level and videos. Simultaneously, applications continue to offer and time of engagement with the application. Offering indi-

www.JAXenter.com | June 2015 28 Web

edge, rather than calling back to the provider. Proprietary data can also be used to measure and predict user behaviour “When it comes to image- to optimize and customize the viewing experience, as Netf- lix has explored. rich web applications, For images, machine-learning algorithms can be designed to determine which parts of an image need to be delivered every single image matters.” first, based on the user’s web delivery service, so that the importance of every part of an image can be determined in an automated fashion. Delivering the most essential parts vidually tuned settings for optimal content delivery ensures of an image first, rather than waiting for an entire image to the user’s quality of experience is not compromised with the load, provides engaging user experiences and eliminates delay scaling up to millions of videos and images across the entire and lag time that can result in users navigating away from an web delivery pipeline. application or website. At Instart Logic, we’ve taken precisely such an approach to optimizing the quality and speed of ap- Intuitive, context-aware media delivery plication delivery. Our SmartVision technology determines The longer users wait for images to fully download in an ap- the optimal threshold on the server-side, sending the part of plication, the more likely they are to “multi-browse” and the image file that delivers the best results for the user com- move away from the current application, being distracted by pared to the original quality of the application first, while fill- another. QoE for video streaming is measured by two main ing in the rest of the image in the background. This approach factors: the bitrate, or the bandwidth being used to deliver the dramatically improves user engagement without sacrificing content; and the rebuffer rate, or how often video playback is the visual experience. interrupted to reload more content. A video pausing to buffer Using machine learning algorithms to automatically deter- due to bitrate interrupts the user experience and is also cause mine the most essential components of media delivery is one for tuning to another video or application. of the latest developments in application delivery, and can Rather than waiting for an entire video or image to be dramatically improve the delivery pipeline and QoE for users, completely downloaded and queued in the delivery pipeline, increasing overall customer engagement. modern cloud application delivery solutions are continually exploring how to optimize the delivery approach to serve users with the essential content first. For rich video content, Parvez Ahammad is the Senior Staff Data Scientist at Instart Logic and open caching provides a solution to delivering rich video has an extensive background in computer vision, machine learning and signal processing with applications to camera sensor networks, web ap- content, even during high network congestion. By identify- plication delivery, bioinformatics and neuroscience. He’s also the creator ing data frequently sent in network traffic and locally cach- of novel algorithmic technologies such as smartVision at Instart Logic, ing, applications are able to deliver content from the network OpSIN and Salient Watershed at HHMI-Janelia, to name a few.

Imprint

Publisher Sales Clerk: Software & Support Media GmbH Anika Stock +49 (0) 69 630089-22 Editorial Office Address [email protected] Software & Support Media Saarbrücker Straße 36 Entire contents copyright © 2015 Software & Support Media GmbH. All rights reserved. No 10405 Berlin, Germany part of this publication may be reproduced, redistributed, posted online, or reused by any www.jaxenter.com means in any form, including print, electronic, photocopy, internal network, Web or any other method, without prior written permission of Software & Support Media GmbH. Editor in Chief: Sebastian Meyen Editors: Coman Hamilton, Natali Vlatko The views expressed are solely those of the authors and do not reflect the views or po- sition of their firm, any of their clients, or Publisher. Regarding the information, Publisher Authors: Parvez Ahammad, Matt Brasier, Sandeep Chouksey, Amanda Cline, disclaims all warranties as to the accuracy, completeness, or adequacy of any informa- Dr. Claire Fautsch, Pierre Fricke, Marcin Kuzminski, Lars Markull, tion, and is not responsible for any errors, omissions, inadequacies,­ misuse, or the con- Huon Wilson, Daniel Witkowski sequences of using any information provided by Pub­lisher. Rights of disposal of rewarded articles belong to Publisher. All mentioned trademarks and service marks are copyrighted Copy Editor: Jennifer Diener by their respective owners. Creative Director: Jens Mainz Layout: Flora Feher, Maria Rudi

www.JAXenter.com | June 2015 29