23 February 2017

Wandisco WAND LN BUY Software Initiation of Coverage WANdisco: 99 Problems – initiating coverage with Buy rating History doesn’t say whether WANdisco CEO David Richards woke up on 29 September 2016 and started humming Jay Z’s ‘99 Problems’. He could have totted up a series of problems as he reflected on the boardroom coup that ousted him. With the support of certain large shareholders, Mr Richards eventually re-gained the helm and the putsch leaders have exited. The latest trading news points to WANdisco having turned a corner – operating results are very encouraging and its outsized cash burn seems to be a thing of the past. Our own recent ‘deep-dive’ on WANdisco has left us confident. The technology USP is intact, TAM is better defined, the use case is clearer, users are happy. Furthermore, the company’s ability to execute on TAM has strengthened, and sales look set to accelerate. The icing on the cake, in our view, is the ‘right stuff’ CFO now in situ. WANdisco came close to being a footnote in tech history, but we think now it’s fit to make history. Initiating with a Buy rating; our 509p target price implies 34% upside potential. • Impressive technology. WANdisco’s strength remains DConE, its patent-protected ‘active transaction replication’ software (allows data to be Price (22 February 2017) 378p moved securely, at speed and at scale between computing environments). Changes Previous Current Rating - BUY Yes, there are pretenders and other varieties, but DConE is the high Target Price - 509p watermark. IP leadership remains central to the company’s success. Share price performance (indexed)

• Blue-chip customers. WANdisco’s clients number among the world’s 350

best-known and well-regarded corporates in auto, entertainment, financial 300

services, government, healthcare, IT, telecoms and utilities. Customers 250

include Aviva, Honda, HP, Intel, Johnson & Johnson, Lockheed Martin, 200

Nokia, Sun Microsystems and Wal-Mart 150

100 Due diligence. We spoke with WANdisco’s customers, channel partners, • 50 other ecosystem partners as well as mathematicians/computer whizzes (not Mar-16 May-16 Jul-16 Sep-16 Nov-16 Jan-17 kids) and staff in the course of researching for this report. Customers were Absolute Rel.FTSE ALL SHARE (ASX) typically happy with the selling, on-boarding and implementation process. WANdisco’s habit of doing Proof of Concepts ahead of signing a main Key data contract may seem riskier for an investor (after all, why not move straight to Bloomberg/Reuters codes: WAND LN / WAND.L ‘Go’?), but customers see the value. Customers and channel partners Market cap (£m) 121 consider the product unique and think that WANdisco has a functional lead. FTSE ALL SHARE 3,969 Impressive technology, more uses cases, expanding sales pipeline, 1mth perf (%) (3.9) improved ability to execute on TAM. 3mths perf (%) 132.3 12mths perf (%) 251.2 Mojo restored. The latest FY update featured: record Q4 bookings +97% 12mth high-low (p) 402 - 100 • Free float (%) 71 YoY; H2 bookings +109%; cash US$7.6m (up from US$1.1m at 30 June 2016). The cash outturn was particularly pleasing, as it suggests that the constant cash calls are a thing of the past. Remember, this is a subscription Key financials revenue business, so it boasts visibility into future periods. Year to Dec 2015A 2016E 2017E Revenue ($m) 11 11 15 • Valuation. WANdisco offers investors exposure to a business enjoying EBITDA (20) (10) (6) rigorous growth in a global target market, and to perhaps the ‘noisiest’ EPS adj (0.88) (0.53) (0.39) theme in IT currently – Big Data. We believe the company is now positioned DPS () 0 0 0 FCF (26) (5) (4) to accelerate growth that should create further value for its shareholders. FCF yield (%) (19.3) (3.6) (2.6) Our blended valuation model (DCF US706cents, sum-of-the-parts US538cents, FCF yield US665cents) leads us to a 12-month target price on Prices are as of close 22 February 2017 WANdisco of 509p, implying 34% upside potential. All sources unless otherwise stated: Company data, FactSet, Stifel estimates

George O’Connor george.o'[email protected] +44 (0) 20 7710 7694 UK Sales desk +44 (0) 20 7710 7600 Completed: 23 February 2017 01:58EST Disseminated: 23 February 2017 01:58EST

Stifel does and seeks to do business with companies covered in its research reports. As a result, investors should be aware that the firm may have a conflict of interest that could affect the objectivity of this report. Investors should consider this report as only a single factor in making their investment decision. All relevant disclosures and certifications appear on pages 67 - 69 of this report. Wandisco 23 February 2017

Key data1 Key information

Key profit and loss data ($) Target price methodology/risks 2015A 2016E 2017E We use a blended model to arrive at our 12-month share price target of Sales ($m) 11.0 11.3 15.1 509p, using discounted cash flow, sum-of-the-parts and free cash flow EBITDA (20.0) (10.2) (6.1) yield. While WANdisco has a number of adjacent growth opportunities, EBITDA margin (%) (182.3) (90.0) (40.4) we believe the ‘cash generation’ bias in our valuation methodology Gross profit 10 10 14 reflects how investors see the benefits of subscription-based business models. Net income (29.9) (14.7) (15.7) PBT rep (31.0) (15.3) (16.4) Risks to target price. In addition to general and macroeconomic risks, EBITDA adj (16.0) (8.1) (3.9) the downside risks include continued deceleration in the source code Depreciation & amortisation 10 9 10 and Big Data markets. This would impact cash inflow and thereby DPS (c) 0 0 0 increase cash outflow and lessen investor interest. Upside risks include FCFPS (0.9) (0.2) (0.1) a better-than-expected revenue growth, possibly as a consequence of the channel partner sales accelerating faster than anticipated. Key cash flow data ($) 2015A 2016E 2017E Business description Operating profit (29.9) (19.5) (16.4) WANdisco is an infrastructure software company that has developed a Operating cash flow (17.0) 0.4 0.2 patent-protected method for data replication of across heterogeneous Capex (0.1) (0.1) (0.1) compute environments. Dividends 0 0 0 Net debt (2.6) (7.5) (3.2) Senior management Taxes paid (1) 0 0 David Richards - CEO Free cash flow (26.1) (5.5) (4.3) Cash flow from investing 0 5 (4) Erik Miller - CFO Dividends 0 0 0 Cash at end of year 3 8 3 Key dates Key balance sheet ($) 8 March 2017 - Final results 2015A 2016E 2017E Cash and cash equivalents 2.6 7.5 3.2 Major shareholders Total assets 18.1 19.5 14.4 OppenheimerFunds - 15.02%

Schroder Investment Management 9.86%

T. Rowe Rice International 6.09%

GAM 4.38%

Ross Creek Capital 3.88%

Website www.wandisco.com 1 Year end December Data in millions, except per share and percentages Source: Company data, FactSet, Stifel estimates

Page 2 Wandisco 23 February 2017

Contents

INVESTMENT CASE ...... 4

RISKS ...... 7

OUR CENTRAL CASE ...... 10

What does WANdisco sell? The disco-tech ...... 16

Our due diligence notebook...... 21

Application Lifecycle Management: a review...... 26

Data replication – what’s that about? ...... 28

‘Big Data’, Big question ...... 29

Open Source: a review ...... 33

The competitive world ...... 38 Source Code Management ...... 38 Key vendors...... 39 Data storage ...... 42

Board of Directors ...... 44

Analysis of forecasts ...... 46

Target price and valuation ...... 53

Appendix I: Citations ...... 61

Appendix II: The Paxos algorithm ...... 62

Appendix III: Hadoop – V1 to V2 ...... 64

Jargon buster ...... 65

Page 3 Wandisco 23 February 2017

INVESTMENT CASE

WANdisco is an infrastructure software company operating at the confluence of four IT axes: big/lots of data; migration to Cloud; importance of RASS; and agile software development/DevOps. The company has differentiated technology and an impressive client list, and is currently enjoying strong operational momentum.

Impressive technology WANdisco replicates data across heterogeneous environments. Its Distributed Coordination Engine (DConE) is patent-protected (11 issued, 25 pending). DConE is capable of active transaction replication, where data servers are equal peers in a distributed network. This means data is never lost (critical in disaster recovery situations), and can be moved, at speed and at scale, between computing environments (critical in replication and data migration scenarios) and in new-world Cloud SLA management (critical for customers migrating between IaaS providers). Customers talk about a very strong ROI. This technology is an enhancement of the Paxos algorithm to enable active-active replication between a variety of data sources including Hadoop clusters, Cloud environments, NAS (network-attached storage) filers, etc. It enables continuous data access in the face of network outages, hardware failures and entire data centres going up and down.

The bluest of the blue-chip customers WANdisco’s customers are companies that are building software, are organised globally, and are migrating to the Cloud – this should make for a very large total addressable market (TAM). Sectors include: auto, entertainment, financial services, government, healthcare, IT, telecoms and utilities. The customer list has a strong tech constituency that includes not only Accenture, ARM, Cisco Dell, HP and IBM, but also banks like HSBC and such global brands as GE, Fidelity, John Deere, Johnson & Johnson, Pitney Bowes and Wal-Mart. The company has c200 customers (of which c31 are on WANdisco’s Big Data product, and most use its ALM product).

Figure 1: WANdisco customers

Source: WANdisco Large TAM still expanding organically Trying to draw a boundary around WANdisco’s TAM can be a frustrating exercise. Thinking through the technology tends to throw up new use cases in multiple adjacent customer markets and vertical industries, all in addition to the current focus on replication, migration and Application Lifecycle Management. Looking at the three core areas, we think: (1) disaster recovery should be a cUS$11bn TAM; (2) data migration to the Cloud suggests a cUS$7bn TAM; (3) inter-Cloud data replication, availability and procurement/SLA management should be a large and viable market as the cloud-based application hosting market matures and enterprise users think about their pricing power.

Page 4 Wandisco 23 February 2017

Attractive multiples – all about the growth WANdisco is enjoying accelerating growth. The Q4 trading update (16 January 2017) headlined with news of record Q4 bookings +97% YoY, 109% growth in H2 bookings, FY total bookings +72% YoY. Following a US$14m fund raising in summer 2016, cash stood at US$7.6m on 31 December, up from US$1.1m at 30 June 2016. The Q4 cash burn was US$200k – down from US$6.9m in Q4 2015. There were no borrowings on the company’s revolving credit facility at 31 December 2016, indicating that WANdisco paid off its US$3.8m borrowings.

New multi-layered sales and distribution model is thriving A key focus for 2016 was to establish a partner network. Mission accomplished – now, in addition to its own ‘direct sales team’, WANdisco has a set of Tier 1 channel partners. This includes IBM, with which WANdisco inked a rare OEM agreement, as well as significant channel partnerships with Oracle and Amazon. All are contributing to bookings and are instrumental in building the company’s sales pipeline. They also reduce the cost base, thereby hastening WANdisco along the path to profitability.

Management stays the course, maintains the ‘passion’ Management has been through the mill – and remains together. The ‘top table’ still includes founders David Richards and Yeturu Aahlad. New CFO Erik Miller has public and private software industry experience.

Valuing growth and the promise of more WANdisco offers investors exposure to a business enjoying rigorous growth in a global target market, and to perhaps the ‘noisiest’ theme in IT currently – Big Data. We see abundant evidence that the company is now positioned to accelerate growth that should create further value for its shareholders. Our blended valuation model (taking in DCF US706cents, sum-of-the-parts US538cents and FCF yield US665cents) leads us to a 12-month share price target on WANdisco of 509p, implying 34% upside potential.

Figure 2: WANdisco revenue by geography (%) Figure 3: WANdisco bookings by division (%) RoW 5%

Europe 17%

N America 78%

Source: Company data, Stifel estimates Source: Company data, Stifel estimates

Page 5 Wandisco 23 February 2017

Figure 4: WANdisco: All you need to know

Q4 trading highlights: Founded 2005 Cash burn H2 Q4 FY US$200, Bookings Bookings Booking US$6.9m +109% +97% US$15.5m 11 patents YoY

25 pending "We have begun 2017 with a strong new business pipeline and a significantly reduced cost base"

Customers… loads Belfast Sheffield San Francisco

CEO & board refresh Glitsy new technology TAM… on the up

Big Data + + = Opportunity

Source: Stifel Research

Page 6 Wandisco 23 February 2017

RISKS

The key risks to our investment case include: (1) the core technology, (2) the competitive market, (3) the nature of the demand environment, and (4) WANdisco’s continued ability to deliver further sales growth at similar-to-recent rates.

Technology risk In terms of technological risk, we identify two issues:

1. What problem does DConE solve, and is it ‘critical’ enough? Some repositioning has aimed at ‘nailing down’ the use case. However, in WANdisco Fusion, we see a coherent product with a defined end-market.

2. Is the core technology established? Paxos creator (from his paper ‘The Part-Time Parliament’) Dr Leslie Lamport acknowledged to us that, despite interest in the Paxos offshoot Raft having led to a number of implementations, Paxos remains the standard approach to implementing fault-tolerant systems.

Sidebar: the origins of DConE The original intellectual property (IP) underpinning DConE was first issued in 2005, after WANdisco’s technical founder Yeturu Aahlad spent five years working to create a peer-to-peer distributed system. Dr Aahlad’s work was based on a paper by mathematician/computer scientist Dr Leslie Lamport, who named his solution the Paxos algorithm (see Appendix). That Dr Aahlad had been a distributed systems architect at Sun Microsystems tells us that he was in the right place at the right time, when the industry was just beginning to think about issues around distributed computing.

Good enough and DIY solutions . The technology industry is littered with examples of ‘good enough’ technology (note: this report was written in Microsoft Word 2010). Good enough often means ‘cheaper’, but also in this case we have come across examples where companies have developed a (nearly) peer-based system using batch-processing (i.e. not real-time) and solutions that re-hash the ‘master/slave’ methodology – i.e. they are not peer-to-peer. There is no direct competitor, and the key is data migration with no interruption (i.e. zero outage).

. Some of the ‘2.0’ web properties solve the problem by: (1) using more hardware (the unit of production in a data centre is a small, and cheap compute blade, easily deployed) by creating a hardware-based fault tolerance – an easier fix, but not generally suitable; (2) developing their own solutions. Later in this report, we look at two case studies (Airbnb and Netflix) to illustrate what cash- and engineering-rich companies can do for themselves.

. Raft. Raft is a consensus algorithm designed as an alternative to Paxos. It was meant to be more understandable than Paxos. Like Paxos, Raft offers a generic route to distributing a state machine across a cluster of computing systems, ensuring that each node in the cluster agrees upon the same series of state transitions. The difference is that Raft decomposes the problem into relatively independent sub-problems. A server in a Raft cluster is either a leader, a candidate, or a follower, with the leader responsible for log replication and informing the followers via a heartbeat message. There are a number of implementations (see https://raft.github.io/#implementations), but no large commercial sponsorship.

. enjoys some status as a tech cure-all currently, and its distributed ledger technology is being explored as a trusted way to track the ownership of assets with no need of a central authority. The design goal is to speed up transactions and cut costs, while lowering fraud incidences. At its heart, Blockchain is a distributed file system. The database is shared by all nodes participating in a system, and people using Blockchain keep copies (a block) of the

Page 7 Wandisco 23 February 2017

Blockchain file. As such, the Blockchain database uses a distributed consensus model. Each block has a cryptographic signature (aka a hash) of the preceding block, hence the ‘chain’ analogy as the blocks are added sequentially. It is a peer-to-peer network with many distributed nodes. While the failure of one node (or even several nodes) will not prevent the rest of the network from functioning properly, data will still be lost. Current criticisms of Blockchain include its (lack of) speed and the visibility of the information (to anyone). We know of one WANdisco customer that had been exploring Blockchain, but now sees DConE as more suitable for its needs.

Artificial intelligence taking over Just to future-proof this report – there are moves afoot to bring more artificial intelligence (AI) into the software development world. It is not too much of a stretch, should we accept it, that AI develops software autonomously. In that case, ‘out of the box’ fault tolerance could be at such a level that, coupled with AI developed software, knowledge (i.e. the practical implementation of data) will be ubiquitous and will always be available. This would negate the need for DConE. We offer no timeframe.

Running out of cash WANdisco has returned to seek cash from the stock market several times. The latest trading update (16 January 2017) revealed that in Q4 the cash burn was US$200k, down from US$6.9m in 2016. Among several corners the company has turned: (1) it has recut its cloth to match existing cash resources, and (2) a new appreciation of the importance of the balance sheet, a sea change we think was inspired by the new CFO. Our model suggests that the company does not need to raise fresh cash, but this depends on rising revenue and better sales execution, rather than trimming the cost base.

Sales execution is there at last As befits an early-stage software company, WANdisco has had a multifaceted ‘let’s try a few things’ go- to-market strategy. We recall its pre-IPO days, when WANdisco built a ‘frictionless’ sales model for its ALM line of business, supplemented with an ‘inside’ sales team that concentrated on converting the free community to paying customers. However, ALM/ customers were a technical audience that knew what it wanted. The same approach was never going to work for Big Data, where WANdisco debuted in 2013, and where it would have to build its own enterprise sales team supplemented by a sales channel. The early moves with the Hadoop ‘distro’ companies (Cloudera and Hortonworks) looked sensible, yet ultimately proved to be the wrong start point as the distros developed their own Disaster Recovery (DR) offer. Through iterations, WANdisco now has a multifaceted sales distribution model that headlines with a rare IBM OEM relationship. In addition, WANdisco has its own enterprise sales team, coupled to channel partners including Amazon and Oracle, plus a number of professional services organisations.

Competition – a wrinkle Looking through the competitive pack (see below), we note a product competitor in the ALM market, ‘’. Git is an Open Source, distributed version control system designed to handle all projects, from small to very large, with speed. It is positioned as a replacement product for version control tools like or Subversion.

Git has surged in recent years, and companies like GitHub and (to some extent) Atlassian have done a better job a tethering themselves to the Git banner. While WANdisco has a Git (and indeed a Gerrit) offer in addition to Subversion, arguably the company needs to work harder to establish its brand in the Git market.

‘Forking’ WANdisco is an engineering-led software company that puts technology on a pedestal, and its customers look to it to figure out ‘what’s next’. In such situations, ‘forking’ comes with the territory in the Open Source world. ‘Forking’, or the development of competing variations, occurs when projects splintering into different forms lead to disputes. However, a number of recent staff and process changes in engineering at WANdisco should help reduce the risk of forking.

Page 8 Wandisco 23 February 2017

Difficulties in hiring As a small company with developers in San Francisco, WANdisco may find it difficult to hire. In fairness, given that options are ‘under the water’, we are surprised that unplanned staff attrition is not more of an issue. We caution that:

. Good people remain ‘hard’ to hire;

. The US, Northern Ireland and Sheffield offices may find it difficult to hire. In the US, the staff may be too ‘footloose’ and expensive, too prone to receiving competitor calls. In Sheffield, they may be too scarce. In Northern Ireland, they might not be skilled enough.

We like the global nature of WANdisco, and think the company needs to have a distributed office structure to mirror how its customers are organised.

Staff are expensive We refer to ITJobswatch for an impression of UK software developer costs for contractor and full-time staff. Staff costs are c70% of any tech company’s operating spend. Most of WANdisco’s staff are in three locations. Of these, we note that a Hadoop developer in the UK is £800-850/day for a contractor, c£64,000 per annum full time. This is a UK average, and local rates in Sheffield will be cheaper than the same role in the City of London. The same role in San Francisco will average US$112,000 for FTE – that translates to £90,000, or c40% more on a ceteris paribus basis. While the issue of staff costs will be food for thought, balancing this we recall a conversation with analysts at Gartner, who noted that there was a big difference in two hypothetical developers just on the basis of the influence of the surroundings, and suggested that developers are better in somewhere like Silicon Valley, as it is a magnet for new thought and best practice. We note IDC Hadoop user study from 2016 where users rated finding skilled Hadoop staff as the key challenge – this should lead to further wage inflation.

Figure 5: Q: Which of the following best describes the challenges you faced with your Hadoop implementation?

Source: International Data Corp., 2016

Page 9 Wandisco 23 February 2017

OUR CENTRAL CASE

WANdisco is an infrastructure software company that has developed a patent-protected method for data replication across heterogeneous compute environments. The company focuses mainly on (1) Application Life Cycle Management and (2) Big Data. WANdisco has been on a roller coaster since its IPO in 2012, but we believe it has now found its feet. There is a new coherency with the Fusion product, a better defined sales model, and operating results look to be turning around.

What does WANdisco do? WANdisco has built a range of enterprise-class data replication products using its own technology, DConE. These products improve data migration and ‘round-the-clock’ availability in areas like replication, mirroring and clustering, and also help to eliminate WAN latency.

At the company’s outset, DConE was focused on a relatively narrow area in version control. The company expanded into the Hadoop (big) data replication with the 2013 launch of its Non-Stop Name Node on the heels of its 2012 acquisition of AltoStor. In 2015, WANdisco debuted Fusion, which significantly expanded TAM as it connected beyond Hadoop distributors into the wider storage market including vendors such as Amazon, EMC, IBM, Oracle and Teradata. A core technology runs through the products, which are designed to be independent of the underlying application so they can serve as the foundation for distributing other applications or databases. WANdisco enables geographically distributed servers to remain continuously synchronised (i.e. have the same data at the same time). This solves problems for companies with distributed divisions that are working collaboratively (e.g. in software design) that operate over a WAN (wide area network). It also appeals to those which are concerned about network speed, latency, availability, scalability and security. WANdisco competes in the same markets as infrastructure software companies like CA, IBM, Micro Focus and Microsoft.

How does it do it? WANdisco has developed an active transaction replication to provide continuous availability, streaming back-up, uninterrupted migration, hybrid Cloud and Cloud bursting, and data consistency across clusters that are any distance apart. By this, all the data servers in a network are ‘equal’ (i.e. ‘peers’). To better , compare with active:passive – or better master/slave, where one server is the de facto controller – should the master go down, data will be lost. So active:active is superior because the data servers maintain ‘a consensus’ across a distributed network, so the core data remains secure.

WANdisco’s mission WANdisco aims to push the limits of what can be achieved with distributed computer systems deployed on a WAN. Note that while WANdisco uses Open Source software and is a member of the Apache Foundation, DConE itself is not Open Source software. Talking with Dr Yeturu, we gained the impression that he remains committed to developing software that will create “a richer multipart interaction with the Internet”. Fundamental to this view, in our opinion, is that the Internet is a distributed computing platform and so being able to harness various web servers and ‘see’ them as a single entity, and to coordinate computing resources without using a central server, is crucial.

Snapshot background WANdisco was founded in 2005, when experienced tech executives David Richards and Jim Campigli swooped on the work of Yeturu Aahlad. Dr Aahlad had spent the preceding five years developing a practical implementation of the Paxos protocol, which facilitates the near-real time replication of data. This eventually came to be called DConE. The trio coined the term active-transactional data replication to explain how DConE creates LAN functionality across a WAN any distance apart, at any scale. From an initial focus on the ALM market, following the 2012 IPO they decoupled the DConE IP from ALM. They then pointed DConE at new markets, and the first one was Big Data. Today, customers use WANdisco software for version control and for managing data in New Age distributed computing environments like Hadoop, HCFS (Hadoop compatible file systems), Cloud object storage and NFS.

Page 10 Wandisco 23 February 2017

WANdisco timeline . 2005. In October, WANdisco gained its first customer through the sale of its initial product, CVS Multisite. Began working on its first product in the Subversion suite – Subversion Multisite, a product that many in the industry had thought was unachievable.

. 2006. Launched CVS Access Control and Subversion Access Control, offering security and audit capabilities.

. 2007. First sales in the UK, Japan and Australia. Launched Subversion Support Services. WANdisco was named in the Software Development Times 100 List.

. 2008. Opened Sheffield headquarters.

. 2009. WANdisco signed contracts with HP and . The HP contract was the group’s first multimillion-dollar sale for Subversion Multisite. WANdisco was then the major contributor to the Subversion project, as it employed full-time core Subversion developers. This allowed WANdisco to provide the same level of support for Open Source Subversion that was normally only available for commercially licensed software.

. 2010. In July, WANdisco moved its business away from a perpetual to an annuity-based subscription model.

. 2011. Release of uberSVN, an Open Source ALM platform. Launched uberAPPS, an online store offering applications and services for enterprise ALM that are certified to work with Subversion and uberSVN. Received the British Computing Society’s Business IT Innovation of the Year award for uberSVN, as well as being named in the Red Herring North America Top 100, an award recognising leading private technology start-up companies in North America.

. 2012. In June, a 5x over-subscribed AIM IPO raised US$15m at 180p. Acquired AltoStor, received US patent award. Opened Belfast development centre.

. 2013. Partnered with Hadoop distro Cloudera. Entered the Big Data market, launching Non-Stop NameNode. In May, ex Sage plc CFO Paul Harrison was appointed CFO. Raised US$19m in September. Shares moved towards 1500p.

. 2014. Burn rate exceeded US$2m/month. Launched Non-Stop HBase. Former Sage CEO Paul Walker appointed Chairman. Secured credit facility with HSBC.

. 2015. Debuted Fusion. Raised US$24m in January. Signed the IBM OEM partnership.

. 2016. Raised US$15m in June. Failed boardroom coup, with the protagonists resigning seven days later. Share price slumped to c180p. In what we view as a moment of clarity, WANdisco appointed Erik Miller as CFO. Inked US$1m auto contract for self-driving cars via its IBM OEM contract.

. 2017. A ‘knock it out of the park’ FY trading update headlined with record Q4 bookings +97% YoY, US$7.6m cash with a US$200k cash burn in Q4, versus US$6.9m the previous year. CEO David Richards talked about a ‘strong new business pipeline and progress towards profitability’.

Page 11 Wandisco 23 February 2017

Figure 6: WANdisco timeline

CVS Access Control Subscription Non-Stop HBase Subversion Access Control pricing

IPO, Belfast Failed boardroom coup Sheffield hq Buys AltoStor, Erik Miller CFO joins

2006 2008 2010 2012 2014 2016 2005 2007 2009 2011 2013 2015 2017

First customer. HP, Juniper Networks Big Data market ‘Knock it out of the Subversion suite contracts Non-Stop NameNode park’ FY update

Sales UK, Japan Australia. Red Herring Debuts Fusion SD Times 100 Top 100 IBM OEM

Source: Company data, Stifel Research

Who are the customers? Users are typically larger corporations, and include a number of Fortune Global 100 companies. These companies are building software, organised globally and migrating to the Cloud. Sectors include auto, entertainment, financial services, government, healthcare, IT, telecoms and utilities. The customer list has a strong tech constituency that includes not only Accenture, ARM, Cisco Dell, HP and IBM, but also banks like HSBC and such global brands as GE, Fidelity, John Deere, Johnson & Johnson, Pitney Bowes and Wal-Mart. The company has c200 customers (of which c31 are on WANdisco’s Big Data product, and most use its ALM product).

The use case: What do customers use WANdisco for? . Disaster recovery. Customers use WANdisco to ensure that if their server goes down (e.g. the area electricity gets knocked out), that data would not be lost. WANdisco products provide Cloud, on-premise and Cloud-to-Cloud with guaranteed data consistency and no data loss.

. Data migration to the Cloud. Over time, the Cloud is attracting customers like a moth to a flame, because it is cost-effective, requires little upfront investment, and brings many other benefits to an enterprise. This use case will impact many companies, and will be long-running simply because companies will migrate to the Cloud at different paces. By its nature, however, the Cloud represents a one-off sale, and arguably less interesting for WANdisco (and its shareholders).

. Customers could use a ‘hybrid’ Cloud, where they mix and match on-premise and Cloud servers, and move data between the two. Other users might move data from the ‘edge’ of the network to the core. Here a customer talked through a medical example – with on-the-edge data collection from remote locations and the analysis in the core. This gives WANdisco an annuity revenue stream.

. Inter-Cloud data replication and availability. In this scenario, customers look to maintain dual supplier strategies (like their on-premise brethren of old), or to migrate to a new Cloud vendor after losing confidence with the current supplier owing to poor SAL management, predatory pricing and the cost of extracting data or security issues, etc. They will need to be able to move data between suppliers in order to avoid predatory pricing and ‘lock-in’. This nascent market currently reflects the early stage of enterprise users migrating to the Cloud. Some customers have already started to migrate between Cloud providers, but much of their eagerness to do so gets diluted once they recognise the downtime required with traditional migration methods. We are reading more about these Cloud migrations in the trade press (http://searchcloudcomputing.techtarget.com/tip/Warning-signs-its-time-to-switch-cloud-service- providers), suggesting that there is a growing audience.

Page 12 Wandisco 23 February 2017

. Improving Cloud provider availability. In September 2015, Airbnb, Netflix and Tinder had outages of eight hours at Amazon in a single day. If data were being consistently replicated across multiple data centres within a Cloud provider in near-real time, these outages would not have occurred. This suggests that AWS data are replicated across data centres in a batch-based approach – hence the outages.

Impressive ROI Customers and channel partners speak positively of the realised ROI. While the data are all anecdotal, a Forrester Total Economic Impact (TEI) study of WANdisco’s Subversion Multisite product revealed a 357% return on investment within a nine-month period.

The sales channel partners . IBM. In April 2016, WANdisco announced an OEM deal with IBM where the latter will resell WANdisco Fusion under a white-labelled product called IBM Big Replicate. This is a two-year, non- exclusive deal whereby IBM sells Fusion, offers first-line product support, and WANdisco provides technical and engineering support to IBM. As to the economics, IBM seems to be charging cUS$7k/node(server)/year to the customer in an on-premise deployment, and pays a 30% royalty to WANdisco. While there was a banner US$1m for autonomous cars in December 2016, we think that the IBM will start to deliver as 2017 progresses.

. (AWS). In March 2016, WANdisco Fusion began selling on the AWS marketplace, where it was listed as a featured product. We understand that the economics are an 80/20 split in favour of WANdisco of any revenue generated through AWS. The pricing for Fusion is a 30% upcharge to S3 pricing. S3 is currently priced at $0.03/GB/Month. Fusion is 30% on top of this, with WANdisco receiving 80% of that. This is priced on all data that are under continuous replication. We believe there are four to five live customers via AWS.

. Oracle. WANdisco has a resale agreement in place with Oracle. WANdisco Fusion works with Oracle’s Big Data Appliance. The first deal through this channel was announced in October 2016, a US$1.5m deal with an unnamed regional US bank. An Oracle reseller sourced and closed the deal.

. HP. WANdisco announced a resale agreement with HP in Q2 2016. The banner client is the Dubai Connected City. Given the Micro Focus merger, we would imagine more sales activity in 2017 as reps get keen to show that they are ‘useful’.

. New Context. This partnership was inked in July 2016. New Context is essentially an outsourced IT management service. The concern that WANdisco addresses here is data integrity for clients that have data compliance policies, and WANdisco can control where the data goes and who has access to it in an auditable way. This is important for clients that have classified data with concomitant rules on read access, and that also have data that are not allowed to leave the US.

. Google. WANdisco is a listed technology partner of Google Dataproc. We understand that Google has developed a solution for data replication for Google search – by using hardware it calls the Google Spanner database. With this, Google put atomic clocks on all of their servers, launched GPS satellites, and ran cabling across their data centres. This is deemed to be good enough for Google search, but is not a commercial solution.

Page 13 Wandisco 23 February 2017

Revenue model WANdisco has an annual subscription licence model. This provides a visible forward revenue stream, and a foundation for further expansion as clients ‘buy more’. Prior to Q3 2010, WANdisco sold its products under a perpetual licence model, but dropped it in order to build better visibility into its model. Subscription agreements are typically one-year rolling agreements paid annually in advance, although in certain circumstances multi-year licences are agreed. The cost for all subscriptions is determined mainly by:

. The storage;

. The number of servers;

. The number of named users.

The software subscription licence includes software, standard support for eight hours a day for five days a week, and upgrades for the length of the paid subscription period. Additional training, implementation services and Open Source and premium support (24 hours a day, seven days a week) are available for an additional fee. The fee structure moves from US$4,995 to US$21,995 for the ‘Platinum’ service.

There is limited associated service whereby WANdisco offers free downloads of certified and Git binaries. For these customers, WANdisco has a “Freemium” business that offers paid-for products and services with enhanced features and functionality around Open Source software. There is also some free community support – but support is a chargeable service, with tiered pricing.

What does Fusion do? WANdisco’s DConE technology is at the heart of Fusion. It allows multiple instances of the same application to operate on independent hardware without sharing any resources. This is active- transactional replication technology for continuous availability, streaming backup, uninterrupted migration and hybrid Cloud, and ensures data consistency across clusters that are any distance apart. This is possible because all of the application servers are synchronised continuously by the DConE engine, and operate as peers to each other, regardless of whether the servers are on the same LAN or are globally separated and accessible only over a WAN. The industry talks about this as being a ‘peer- to-peer’ system when there is no central co-ordinating ‘master’, or lead service. This is achieved by immediately replicating changes made against one server to the others (‘active-active replication’).

Using this WANdisco software creates the effect of a single-server system (i.e. a quasi-single instance), which then performs at LAN speed even though the servers themselves may be thousands of miles apart.

Once WANdisco’s products are installed at each site, each server becomes an active node on the WAN with its own DConE. These work cooperatively as peers to perform distributed transaction management tasks, handle conflicts, and ensure that the same write order is maintained across all of the servers. This means WANdisco provides One-Copy Equivalence (consistent data/single version of the truth) across a system of distributed servers connected over a WAN or LAN, and should one server go down there is no effect on the other servers in the implementation. This has significant implications in terms of maximising productivity, eliminating downtime and preventing data loss in a globally distributed collaborative work environment.

If a site goes down, with Fusion installed with each cluster or in each Cloud environment, each cluster knows the last good transaction it processed. Hence, when it comes back online, it can reach out to the other servers installed with the other participating clusters, grab all the transactions that it missed while it was offline, and apply them and re-sync automatically. This eliminates the risk of human error in recovery, and ensures against data loss. Fusion continuously replicates data.

Page 14 Wandisco 23 February 2017

Figure 7: WANdisco Technical USP

Source: WANdisco

Page 15 Wandisco 23 February 2017

What does WANdisco sell? The disco-tech

Under the WANdisco Fusion brand, products fall into two distinct product camps of (1) data storage – referencing Big Data, and (2) source code management – referencing the ALM market.

Figure 8: WANdisco product set

ALM

Big data

Hadoop SVN Multisite Plus NAS & SAN Git Multisite Plus SDK Gerrit Multisite plus Amazon S3 Access Control plus Google Cloud Subversion and Git Binaries MS Azure IBM OpenStack SWIFT

Source: Company data

Data storage products Hadoop. With this flagship product, WANdisco offers ‘always-on availability’ and performance with a Hadoop environment across clusters running any mix of distributions, any distance apart, with no downtime. There is continuous read/write access to data during replication and migration. The Hadoop clusters can be any distance apart, on any distribution supporting the Hadoop Compatible File System (HCFS) API, with read/write access to the same data everywhere. Hadoop deployments already in production scale up with existing hardware for greater cost savings and ROI.

User advantages include: (1) no significant administrator overhead for setup, monitoring, maintenance and disaster recovery; (2) no manual intervention required to handle out-of-sync conditions, with guaranteed data consistency across clusters; (3) no vendor lock-in – transfer data to other Hadoop clusters running any distribution and version; (4) no need for scheduled backups outside normal business hours due to resource contention with other applications.

NAS & SAN. WANdisco offers continuous replication and migration, and is billed as the only solution to provide continuous consistent connectivity to data as they change in Network Attached Storage (NAS) and Storage Area Networks (SAN) and replicates those data to any other WANdisco Fusion supported storage environment, either on-premise or in the Cloud.

User advantages include: (1) continuous read/write access to data during replication and migration – automated forward recovery in the event of an outage; (2) continuous one-way to N-way synchronisation and parallel file transfer capabilities optimised for NetApp filers; (3) scales to handle any number of source and target environments with any volume of data, either on-premise or in the Cloud.

SDK. The Software Developer Kit enables third-party developers to extend WANdisco’s Fusion to any potential data source. WANdisco has developed an integrated development environment (IDE), an API- based plug-in that allows developers to bring new target data sources to Fusion. It can also be used to reuse Fusion components for bandwidth management, encryption key handling and licensing.

Page 16 Wandisco 23 February 2017

Supported Cloud environments. The Cloud environments are Amazon S3, Google Cloud, MS Azure and IBM OpenStack SWIFT. For these, WANdisco offers a simple setup in both on-premise and Cloud environments. WANdisco recommends using the standard Cloud vendor utilities for installation and deployment, and then users can migrate data between any Fusion-supported on-premise environments and the Cloud.

IBM OEM In April 2016, IBM inked an OEM agreement to resell WANdisco Fusion, rebranded IBM Big Replicate. It is rare for IBM to sign an OEM in core technology – but there have been many at the application level. With this, WANdisco products have become a key embedded component of IBM’s Big Data, Cloud and analytics solutions. WANdisco has trained more than 5,000 quota-carrying IBM reps, so we expect more deals, like the banner auto contract, should follow.

Big Replicate is the IBM OEM version of WANdisco Fusion. Like Fusion, it is akin to a data insurance policy in that it provides the core functionality supporting continuous availability and performance with data consistency across clusters that are any distance apart, either on-premise or in the Cloud. It gives users and developers access to the same data, the same view of the data, read and write access to the same files – just as if they were working against a single data source at a single location.

IBM Big Replicate offers: . 100% availability – always-on, with performance surpassing the most demanding service-level agreements (SLAs);

. Reduces costs – increase capacity with no increase in hardware costs;

. Lowers complexity – simplified backup, recovery, migration and expansion;

. Bridges to the Cloud – migrating to Cloud and going hybrid with no downtime; and

. Accelerates data access – real-time data wherever you need it.

We concur with the IBM view that (1) enterprises are expected to move spending on traditional infrastructure to public Cloud and true private Cloud, and (2) customers are much more willing to move more Big Data applications into production when they are confident that the platform is hardened and meets enterprise standards. Industry analysts Wikibon also share that view (see figure). Joint research by IBM and WANdisco concluded that operationalising Hadoop to enterprise-grade standards with WANdisco Fusion had the following advantages:

. Shortened the product rollout and increased the pace at which projects move from lab to production;

. The Hadoop distribution could close sales engagements twice the value of un-replicated Hadoop sales; and

. Led to customers scaling up their initial purchases by an average 220% within six months.

Big Replicate works directly with all of the Cloud object storage, which means that customers can migrate from Amazon/Microsoft/Google to IBM SWIFT. Some customers have already started to migrate between Cloud providers, and more would probably do so but tend not to once they realise the downtime traditional migration methods require.

On a more provincial level, on 15 June 2015, IBM announced a major investment in Spark. IBM plans to embed Spark into its own Analytics and Commerce platforms, offering Spark as a service on IBM Cloud. The company accordingly put more than 3,500 IBM researchers and developers to work on Spark- related projects worldwide. However, one of the difficulties in monetising Spark is the challenge of

Page 17 Wandisco 23 February 2017

getting users to migrate their data sets in and out of IBM Cloud. With Big Replicate, IBM now has the tool to get customers migrating to its platform.

A significant win already On 19 December 2016, WANdisco announced a significant (US$1m) contract win with a major automotive multinational in Detroit. The use case is that the customer is moving data between data centres and the Cloud, and WANdisco is deemed to offer the only solution capable of delivering this with continuously changing large data sets as the customer moves to self-driving cars.

Figure 9: Worldwide spend on enterprise infrastructure (hardware, software & staffing), $bn

Source: Wikibon Public & True Private Cloud research, 2016

Source Code Management products Subversion Multisite Plus is WANdisco’s flagship product, accounting for the majority of its Source Code Management revenue. This uses Fusion technology and grants globally distributed software engineering teams using Subversion with local access to the same data at all times, regardless of where the data originates. Subversion Multisite makes it possible to achieve the same level of collaboration globally that is only otherwise possible between developers at a single location.

Subversion Multisite also enables continuous operation with no downtime or data loss in the event of network or server outages, be they planned or unplanned.

In an independent study conducted Forrester, Subversion Multisite was shown to deliver an average of 167% ROI within a nine-month payback period. In addition, the product was shown to increase the speed of software builds by 3-4 times, reduce previously experienced downtime due to maintenance from approximately two hours a day to zero, and therefore reduce overall development cycles by 40% or better.

Git Multisite plus. Similar to Subversion, this ensures no single point of failure for a Git master repository, benefits from no downtime and no disruption with instant synchronisation, plus LAN-speed access at all locations. This enables distributed software development teams to collaborate with no Git master downtime, no disruption and consistent security policy enforcement across all locations. Product advantages include automated recovery and LAN-speed read/write access to the same version of the Git master repository at every location. Furthermore, new users, teams, repositories and access rules can be added with no downtime and no disruption and there is selective replication of specific repositories between sites.

Page 18 Wandisco 23 February 2017

Gerrit Multisite plus. Gerrit is a web-based code review and repository management for the Git version control system. It enables teams of engineers with different user privileges to collaborate in building software. With WANdisco Fusion, Gerrit events can be replicated to and from any location with no single point of failure. Gerrit is a free, web-based team code collaboration tool that integrates with Git and was developed at Google by Shawn Pearce from a set of patches for Rietveld (another software review tool), became a ‘fork’ and evolved into a separate project. ‘Gerrit’ is the given name of Gerrit Rietveld (1888- 1964), a Dutch designer after whom Rietveld is named.

Access Control. This is an easy-to-use, point-and-click interface to implement and maintain security policies. While Subversion does provide security features, Subversion Access Control builds on those to offer an authorisation, authentication and audit security solution. Subversion Access Control can be implemented standalone or in combination with Subversion Multisite, allowing security policies to be immediately replicated to enforce consistency across all servers. This capability is especially relevant in the US, where audits under Sarbanes-Oxley focus on the potential financial losses that companies might experience if were to fall into the hands of a competitor. This is a significant issue in the current environment in which software development is frequently outsourced to regions with less robust intellectual property protection.

Open source binaries. WANdisco offers free downloads of certified Subversion and Git binaries that provide a complete, fully tested version of Subversion based on the most recent approved release. WANdisco offers these free downloads in order to increase its presence and profile in this market, and to act as a base upon which customers can purchase WANdisco’s support services and other products.

Support offering WANdisco offers enterprise-class support services for Subversion for organisations that require guaranteed response times, continuous access to web- and phone-based support, automated delivery of the latest fixes and upgrades, and other benefits that typically come with a commercial software vendor’s support contract. WANdisco employs Subversion developers who are significant contributors to the Open Source code base. In addition, Hyrum Wright, WANdisco’s director of Open Source, has also been the Subversion project release manager since 2008.

WANdisco’s Fusion in Subversion The key advantages of WANdisco’s active-active replication technology when applied to Subversion are in the industry jargon the RAS factor (reliability, availability and scalability), as follows:

. Reliability: distributed servers remain continuously synchronised, so that users at every location have the same access for both reads and writes, as if they were all working in a single office. With other solutions using master/slave architectures, distributed developers often submit changes to outdated versions of the same software programmes, due to the lag time in receiving the latest data, creating significant additional work. Using WANdisco’s technology, software developers at different sites can make changes to the same source code files at the same time.

. Availability: each server is always an exact replica of every other, providing continuous operation with no downtime or data loss, even during routine maintenance. Recovery after server or network outages is automatic, without the risk of human error. EMC Symmetrix’s synchronous disaster recovery and business continuity solution can only provide this between servers connected over a Metropolitan LAN (generally distances of up to 120 miles).

. Scalability: WANdisco’s technology enables distributed software development to be scaled to include additional sites, servers and users, while maintaining consistent levels of performance and availability.

Page 19 Wandisco 23 February 2017

Sidebar: How does version control work? First, think of version control systems as being like a library (in the jargon: repository) for storing multiple versions of a piece of software and keeping track of any changes made by software . When developing software, software developer teams will be tasked with developing the core element (the trunk) while others do more features (the branches). Teams will find ‘bugs’ – bits that do not work and bits of the software code that do not work together. The central repository is important in keeping the ‘right’ code and the history of all the changes.

As such, version control is a central part of the software development process – ‘the blueprint’. Using WANdisco’s technology with Subversion, software developers at every/any site have local access to the same data at all times. Consequently, software developers can make changes locally, but see each other’s changes immediately, regardless of where the changes originated (think of the other place being the other side of the world). Today, the software industry and large companies normally use a ‘follow the sun’ model of continuous development when designing new software. Given the economics of the industry – software programming is a fungible skill – the technical competence of (say) a in Belfast, Chennai or San Francisco is similar but they have different salaries, which helps illustrate why so much development work has moved offshore. This in turn moves ‘collaboration’ (working jointly on a project) from a local area (remember our earlier comment on developers having local access to the same data) to operating globally. It is the challenge of being able to replicate that ‘local’ on a global basis (i.e. wide area) that has given WANdisco what we believe is a unique technology.

Why is it important? In our opinion, WANdisco has a disruptive technology and is set to tap into an increasing market opportunity as software developers strain under bigger data volumes and new software developments. This should (1) prompt users to replace old technology with new, while (2) allowing WANdisco to add new transaction capabilities in its incremental services – translating into continuing growth for WANdisco.

Page 20 Wandisco 23 February 2017

Our due diligence notebook

Analyst disclosure Between March 2014 and September 2016 WANdisco’s share price experienced a dramatic decline, falling 87%. WANdisco has given us an all areas pass to understand the events which probably culminated with the failed coup in September 2016, to enable us to better understand what happened and the way forward. We believe the ‘99 problems’ are as follows:

A product misstep Having acquired AltoStor in 2012, WANdisco entered into the Hadoop Big Data market the following year when it launched its Non-Stop Name Node product. Its Non-Stop HBase product debuted in 2014. There were a few technical hitches because DConE want to input source code into the admin engine of the Hortonworks and Cloudera Hadoop distro. This was too invasive and led to implementation issues and disappointing adoption rates. We thought the product was debuted prematurely, and this speed to market is a classic Silicon Valley tactic.

. Remedial action taken. WANdisco spotted the issue and launched Fusion. Fusion is a better architected product, the development team having learned from the Non-Stop experience. Jagane Sundar developed a stricter discipline on technology insertion and forking, and there have been some exits from the technology team. The upshot is that Fusion didn’t just solve a customer problem, as it connected beyond Hadoop distributors into the wider storage market including vendors like Amazon, EMC, IBM, Oracle and Teradata.

Sales undershot expectations WANdisco consistently missed consensus forecasts. There was a large churn in the direct sales team. The culture was aggressive, as were the compensation plans for an early-stage business. The spring was wound too tightly, in our view.

. Remedial action taken. A new Head of Sales has had a calming influence. He has a proper industry background, most lately from Tibco. Importantly, he had a working relationship with the WANdisco CEO prior to joining. In our view, the sales team now is better configured, the revised compensation quotas are doable and a better ‘stretch’ for quota-carrying folks. There is an improved pipeline management process, and (sales) staff attrition has been reduced. In addition, we note that Peter Scott, at WANdisco back in its pre-IPO days, now oversees the critical IBM relationship.

New culture Into a start-up culture, WANdisco hired a bevy of senior staff from Sage plc and other UK FTSE companies. The resumes were, of course, impressive, yet culturally this was the proverbial oil and water mix. An established company culture that emphasised M&A over technical expertise was at odds with a software start-up that focused on organic growth. (Interestingly, it seems that Sage’s new CEO Stephen Kelly is himself tussling with the legacy.)

. Remedial action taken. Recent staff exits have removed much of the Sage influence. Right there in its DNA WANdisco is an engineering-led organisation. While staff contentedness has suffered (see its Glassdoor rating, below), from our wanderings throughout the various offices, we observed that the culture seems that of a start-up rather than a spendthrift.

Page 21 Wandisco 23 February 2017

Figure 10: WANdisco, Glassdoor rating

4.5

4

3.5

3

2.5

2

1.5

1

0.5

0

6/1/2014 7/1/2014 8/1/2014 9/1/2014 1/1/2015 2/1/2015 3/1/2015 4/1/2015 5/1/2015 6/1/2015 7/1/2015 8/1/2015 9/1/2015 1/1/2016 2/1/2016 3/1/2016 4/1/2016 5/1/2016 6/1/2016 7/1/2016 8/1/2016 9/1/2016 1/1/2017

10/1/2014 11/1/2014 12/1/2014 10/1/2015 11/1/2015 12/1/2015 10/1/2016 11/1/2016 12/1/2016

WANdisco All companies

Source: Glassdoor

Whither Hadoop? WANdisco’s positioning in the Hadoop market led to a flurry of initial interest from customers, interested companies, potential corporate buyers and investors (we remember a 2013 trade show, the WANdisco stand was besieged). Yet the Hadoop market did not expand as quickly as expected – remember, once it was expected to deal a death blow to the relational database market. A Gartner analysis in 2014 suggested that growth by incremental clusters was stalling (see http://blogs.gartner.com/merv- adrian/2014/12/05/hadoop-deployments-slow-to-grow-so-far/). In preparing this report, we spoke to a number of companies active in the Hadoop market which expressed similar ‘disappointment’. This undermined the confidence among those investors who had seen Hadoop as the Next Big Thing.

. Remedial action taken. DConE is a generic platform technology. While Non-Stop NameNode was a ‘Hadoop product’, WANdisco has made a point of showing that Fusion ‘connects’ to a wide range of Big Data databases. Yes, it is a Hadoop solution – but it is also a Hive or SPARC offer. This widens TAM, and re-enforces a positive sentiment around the product. Note that Oracle is a channel partner of Fusion, and has publicly disclosed sales. Oracle’s entire business is built off its relational database. That Oracle is now supporting one of the New Age database platforms is a big endorsement of WANdisco, in our view.

. We are further encouraged after reading an IDC end user research into Hadoop usage patterns. The study surveyed 201 US Hadoop users in September 2016. The users averaged 40-50 Hadoop clusters with 500-900 TB of storage. The users cited challenges that they faced with their Hadoop implementation including: Data management and retention, Infrastructure costs, Data relevancy, Difficulty in selecting the right technology, and Insufficient or lack of IT skills. This is a positive for WANdisco given the high importance of integration and data migration. Also users achieved savings from US$10m to US$50m by implementing Hadoop, with many suggesting that savings above US$50m were possible within three years. News of this should encourage other users to ‘try’ the technology, and thereby expand TAM. IDC concluded that Hadoop has become a core data management platform for data analytics with enterprises using Hadoop-powered analytics for: (i) analysing varied data sources that include operations, point of sale systems, web and social media, etc, (ii) improving customer satisfaction, gaining competitive advantage and reducing time to bring products to market.

Page 22 Wandisco 23 February 2017

Figure 11: Top challenges with application/software components Technology, Data management, Insight and Integration

Source: International Data Corp, 2016

The burn rate At one stage, WANdisco was burning through US$2m/month. This was mostly a consequence of hiring too many staff. The FTE headcount was 200+ in 2015. New managers had a ‘grow fast/now we are in Silicon Valley’ mentality.

. Remedial action taken. History is written by the victors. Who knows what possessed WANdisco, which in pre-IPO mode prudently held onto its cash (after all, it had to make payroll each month) to begin to splurge that cash after its IPO? For us, it was the influx of new names that poured gasoline on the flames of spending. Suffice to say, those folks are themselves no longer on payroll, and the headcount has been halved to a sensible c100. This was a painful lesson for all involved.

One CFO out/another one in Paul Harrison, WANdisco’s CFO since 2013, resigned in June 2016.

. Remedial action taken. Erik Miller was appointed CFO in December 2016, having been appointed Interim SVP of Finance on 10 October 2016 – not even a week after he resigned following the short-lived boardroom coup on 6 October. Mr Miller has a ‘get on with it’ freshness to him and in our view his background is ideal for WANdisco. (See Management section)

Subversion – sank faster than expected Since WANdisco’s IPO in 2012, Git has usurped Subversion (SVN) with surprising speed in version control, and has sapped interest in other version control products. Today, Git is the most popular version control system, accounting for 70% of all the search interest among the top five VCSs (see figure below). For its part, SVN has 13.5% share as well, and still used by such very large companies as Backblaze, FreeBSD, Mono and SourceForge. Concurrent Version Systems has the smallest share of (0.8%) in 2016, even though many IDEs have built-in CVS support: , , NetBeans, PyCharm, etc.

. Remedial action taken. WANdisco has a Git offer and has consolidated its position in the Subversion world, focusing on large multisite users. Undeniably, it is the case that WANdisco’s almost ‘all or nothing’ push into Non-Stop NameNode resulted in it de-emphasising its ALM market position, just as it was on the verge of changing. Today, the ALM team is less of a ‘spare part’ at WANdisco, and more core to the offer.

Page 23 Wandisco 23 February 2017

Figure 12: Version control search trends

Source: Google Trends, GitHub, Atlassian, GitLab, Git Prime

CEO – the company champion CEO David Richards stepped back from meeting investors, and the company was less visible to investors.

. Remedial action taken. Mr Richards admits that he withdrew from investor engagement to concentrate on the business – remember, there was the product position in Big Data to rectify with the debut of Fusion, and other senior staffers, since left the company, were doing investor relations. As we mentioned this to the Chairman of an LSE company, he expressed surprise and said that Mr Richards was one of the best salesmen he knew. The sinking share price might well have had a demoralising effect throughout the company, as it had for investors. The good news is that now Mr Richards seems re-energised.

Keep your friends close, and channel partners even closer A spat with some of the Hadoop community vendors would not have gone down well among the company’s investors and users. We spoke to one user who admitted that the dispute had delayed their DConE implementation.

. Remedial action taken. We recall a similar row within the Apache Community some years back. The important thing is that WANdisco has now developed a sales channel with symbiotic partners, rather than with partners where there was a degree of product conflict.

Further thoughts from our due diligence We spoke with WANdisco’s customers, channel partners, other ecosystem partners as well as mathematicians/computer whizzes (not kids) and staff in the course of researching for this report.

Customers were typically happy with the selling and the on-boarding and first implementation process. WANdisco’s habit of doing Proof of Concepts ahead of signing a main contract may seem riskier for an investor (after all, why not move straight to ‘Go’?), but customers see the value in them. Customers like the core product functionality and see the product as unique, and think that WANdisco has a functional lead. The same sentiment was stated by the channel partners, which had conducted their own technical due diligence on the product. Some expressed disappointment that the Hadoop market has not developed as quickly as expected. Users and sales partners showed little interest in the share price fall, or recovery. The strong post-sale customer service ethos was flagged up several times.

Page 24 Wandisco 23 February 2017

WANdisco’s competitive edge During our discussions with WANdisco customers and channel partners, one recurring theme we heard was that the customers have globally distributed software engineering teams, and were facing issues arising from network latency, inconstant availability, restrictions on scalability and security (full disclosure: one client had suffered from an earthquake). All commented on the uniqueness of WANdisco’s DConE.

Figure 13: Hype cycle

Big Data

Source: Gartner, 2015 Stifel

Page 25 Wandisco 23 February 2017

Application Lifecycle Management: a review

WANdisco’s Source Code Management portfolio forms part of the Application Lifecycle Management (ALM) market. ALM is involved with managing the life of an application. The process begins with an idea, moves to creating the application, to deploying it (when the application goes into production), to when the application reaches the end of its life and is removed from service. ALM software tools facilitate the architecture, coding, testing, tracking and release management, and can be divided into three distinct areas:

. Governance – encompasses all of the decision-making and project management for the application, extends over this entire time.

. Development – the process of creating the application happens first between the idea and deployment. For most applications, the development process reappears again several more times in the application’s lifetime, both for upgrades and for wholly new versions.

. Operations – the work required to run and manage the application, beginning before deployment, then running continuously.

Advantages of using ALM tools include: (1) increased productivity, due to sharing best practices; (2) better quality, so that the software meets the needs and expectations of users; (3) breaks boundaries through collaboration and smooth information flow; (4) accelerates development through simplified integration; (5) cuts maintenance time; (6) increases flexibility by reducing the time it takes to build and adapt applications that support new business initiatives.

We are interested in the ALM segment because it seems to be undergoing a revival – this might not seem obvious from the latest financial results in the sector, but we are picking up via the user community and the trade press and industry events. There is a new vigour and growth in this market: one IDC study suggests that the worldwide agile ALM software market was worth US$593.5m in 2015, +27.1% YoY. IDC also expects very strong growth for agile ALM software in the 2016-2020 period, with a 26.1% CAGR to US$1.89bn by 2020 (albeit on low initial numbers).

From a macro perspective, we believe software execution is under pressure from the increased complexity of multimodal deployment (i.e. apps on mobile, social, Cloud, and Big Data and analytics) and this is leading to greater complexity and more ALM automation. Concurrently, the increasing role and complexity of IT in the enterprise and the need to better align IT with business needs, corporate governance, and a plethora of regulatory requirements have combined to support ongoing growth for agile ALM.

SaaS offerings for agile ALM by major providers like IBM and Microsoft augment existing offerings from CA Technologies, Agile Central (formerly Rally Software), VersionOne, Atlassianand other providers should significantly contribute to market growth for agile software development practices. These more modern tools, which provide full ALM, have a common user interface, meta model and process engine.

ALM to migrate to DevOps DevOps is seen as a new approach to the traditional ALM process. It is an enterprise software development phrase used to describe a type of agile relationship between Development and IT Operations. The goal of DevOps is to change and improve the relationship through better communication and collaboration between the two business units. The term “DevOps” was first coined in 2009 by Patrick Debois, who became one of its chief proponents.

Page 26 Wandisco 23 February 2017

Rather than seeing these as two distinct groups that are responsible for specific tasks but don’t really work together, the DevOps methodology recognises the interdependence of the two groups. By integrating these functions as one team or department, DevOps helps an organisation deploy software more frequently, while maintaining service stability and gaining the speed necessary for more innovation.

In a DevOps culture, software developers work with IT operations staff to ensure the organisation achieves optimal running of software with minimal problems. The guiding notion is that business units collaborate and break down the traditional silos. This culture emphasises a fast workflow, so that features move into production quickly and faults are detected and corrected as they occur, without disrupting other services.

The DevOps model found initial traction within native digital businesses and with applications running in public and private Clouds and at sites with massive traffic numbers – like Google, Amazon, Twitter, and Spotify, where DevOps helps ensure frequent deploys with a low failure rate. However, companies of all sizes are beginning to implement DevOps practices.

We believe the continued popularity of DevOps will increase the emphasis on software development tools, and so becomes relevant to WANdisco, and for its Software Configuration Management products.

Software Configuration Management A software system comprises hundreds, thousands or tens of thousands of parts, each with an interface that is plugged together to form a software system. These software “parts” are referred to by different names – subsystems, modules, or components – and must be identified with a version number. Version control system software like Subversion, Git or Gerrit keeps track of all work and all changes in a set of files, and allows several developers (potentially widely separated in space and/or time) to collaborate.

Version control is important because as teams design, develop and deploy software, it is common for multiple versions of the same software to be deployed in different sites and for the software developers to be working simultaneously on updates. Bugs or features of the software are often only present in certain versions (because of the fixing of some problems and the introduction of others as the programme develops). Therefore, for the purposes of locating and fixing bugs, it is vitally important to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs. It may also be necessary to develop two versions of the software concurrently (e.g. where one version has bugs fixed, but no new features, while the other version is where new features are worked on). At the simplest level, developers could simply retain multiple copies of the different versions of the programme, and label them.

This simple approach is criticised as inefficient, as many near-identical copies of the programme must be maintained, requiring a lot of self-discipline on the part of developers, and it often leads to mistakes. Furthermore, in software development, legal and business practice and other environments, it is increasingly common for a single document or snippet of code to be edited by a team, the members of which may be geographically dispersed and may pursue different and even contrary interests. Hence, version control that tracks and accounts for ownership of changes to documents and code is not only helpful but for many users critical. The role for Open Source in ALM? Recent Forrester research concludes that Open Source technology was a top priority for IT users (see above), so we expect to see Open Source seep into many technology areas. Indeed, Open Source software has found a strong foothold in recent years, particularly among companies seeking to drive costs down. Market size – What is TAM like? Gartner argues that source code management Subversion Multisite forms part of the SCCM market. Gartner valued this market at US$919.4m in 2015. Gartner valued the broader ALM market (the market which uberSVN is part of) at US$1.54bn in 2011, growing to US$1.76bn in 2015, and the Application Development market (including both SCCM and ALM) at US$8.64bn in 2011, growing to US$9.41bn in 2015.

Page 27 Wandisco 23 February 2017

Data replication – what’s that about?

‘Replication’ is the process of sharing information to ensure consistency between software or hardware. An example of data replication would be if the same data were stored on multiple storage devices, or if the same computing task is executed many times. Data replication is used to improve reliability, fault- tolerance or accessibility of computer systems. WANdisco helps customers make multiple clusters or locations appear seamless, while the underlying infrastructure shifts between data centres as well as towards the public Cloud. It is common to talk about ‘active’ and ‘passive’ replication in systems that replicate data or services:

. Active replication is performed by processing the same request at every replica. . Passive replication has each single request being processed on a single replica before being transferred to the other replicas.

WANdisco DConE architecture for replication If at any time one master replica is designated to process all the requests, it is the primary-backup scheme (as in the master/slave model, discussed earlier) – this is the norm in areas like high-availability servers. High availability means servers that are more often available or do not fail. This is a specialist side of the IT market. In an alternative approach, should any replica process a request and then distribute a new state, this would be termed a multi-primary scheme (called multi-master). In this approach, some form of distributed concurrency control must be used, such as a distributed lock manager.

Load balancing, a more widely recognised term, is not task replication Load balancing simply distributes a load of different (not the same) computations across machines, and allows a single computation to be dropped in case of failure. However, load balancing sometimes uses data replication (especially multi-master) internally to distribute its data among machines.

‘Back-up’ is something different altogether ‘Back-up’ is also different from replication. It saves a copy of data unchanged for a long period. By contrast, replicas are frequently updated and quickly lose any historical state.

So what is it? ‘Replication’ is regarded as one of the most important topics in the overall area of distributed systems. Whether one replicates data or computation, the objective is to have some group of processes that handle incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data, reply to read requests, and to apply updates. However, when one replicates computation, the goal is to provide fault-tolerance – that means never-fail critical systems. For example, a replicated service might be used to control a telephone switch, so that if the main controller fails, the back-up can take over all of its functions. But the underlying needs are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence any replica can respond to queries.

In a master/slave environment, database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates.

Multi-master Multi-master replication (i.e. updates can be submitted to any database node, and then ripple through to other servers) introduces increased costs and complexity which may make it impractical in some situations. The most common challenge in multi-master replication is transactional conflict prevention or resolution – i.e. the requests ‘ping’ off each other. So, for example, if a record is changed on two nodes simultaneously, the replication system would detect the conflict before confirming the commit and abort of one of the transactions, while others would allow both transactions to commit, and then try to resolve the conflict.

Page 28 Wandisco 23 February 2017

‘Big Data’, Big question

What is ‘Big Data’? The concept of ‘Big Data’ revolves around the notion that data is growing so large it is becoming difficult to work with using standard database management and analytics tools. These difficulties include capture, storage, search, sharing, analytics, structure and unstructured, and visualising the data. In a 2001 research report, META Group analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). This ‘3Vs’ model is still used to describe Big Data. Our only nuance would be to add a fourth ‘V’ ‘value’ which seems to us to be much a core part of Big Data’s proposition. The use of ‘cost’ inputs such as cheap data blade servers and Open Source tools for ingesting the data has encouraged adoption, in our view.

While every technology has false starts (note the Gartner hype cycle), we expect interest in Big Data to grow because of the perceived benefits of working with progressively larger data sets that allow analysts to spot business trends, prevent diseases, combat crime, etc. Industry analysts still expect strong growth (see figure below). The concept has morphed from the data warehousing community. In addition to general commercial challenges (sell George more life insurance by gathering every bit of information imaginable about him), there are Big Data ‘grand challenges’ in fields like meteorology, genomics, medical &D, physics, environmental research and business informatics. Data sets also grow in size because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers and wireless sensor networks. Industry research concludes that the world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s (about every three years), and that 2.5 quintillion bytes of data are created every day.

Figure 14: Big Data Software, Hardware and Professional services (US$bn)

Source: Wikibon

Page 29 Wandisco 23 February 2017

WANdisco in Big Data Any mention of Big Data leads to a discussion of Hadoop. Hadoop has been the ‘database’ of choice for the large data world – Facebook is the most notable user. However, many Fortune 500 companies – including eBay, Bank of America and JP Morgan – have Hadoop deployments.

Hadoop is, in fact, ‘’, and thereby an Open Source stablemate of WANdisco. Hadoop is a database that allows distributed processing of large data sets across clusters of computers, and is designed to scale up from single servers to thousands of machines, each offering local computation and storage. While we acknowledge that here there are many ‘rows’ of data, there is little underlying complexity in the core system (consider lists of George’s friends, favourite Bee Gees tracks, his Saturday Night Fever ‘likes’, his bell-bottom snapchats, but there is little overlap in the data – simply the amount of data that George is storing).

We spent time in the 1990s with IBM, which had a project called ‘Parallel Sysplex’. Under this project, IBM hammered together a series of its mainframes with the goal of making them act as one large computing resource – a tightly coupled cluster. At the time, no one really needed a mainframe of that enormity and the benefit of Parallel Sysplex was in better managing the mainframe systems.

With this in mind, we think the ‘obvious’ option for WANdisco would be for DConE to orchestrate the distributed agreements between each of the various data sources in any particular Big Data ‘database’. By co-ordinating all of the data, DConE should be able to tame what could well be an amorphous mass of varying data types. The data would only become ‘actionable’ after someone has created a tool to give complete visibility across the network – DConE could very well be that tool – as it was designed from the outset to manage distributed computing resources. In addition, Hadoop is not without its challenges, especially to early adopters of the technology. The big draw for the technology is its extreme scalability and use of cheap hardware. Hadoop employs a master node to keep track of data and to determine how to access them.

If the master goes down, everything could be at risk – this is something that WANdisco DConE could address – a significant opportunity.

When the Hadoop ‘consensus’ broke down Hadoop was once expected to take over the world. It didn’t.

Hadoop morphed in recent years. It started out as a batch-processing programme, whereby users collected data on their cheap data blades and it went into a data sump – more neatly described as a ‘data lake’. Analytics tools would then do a first run-through, and then the data could be transferred to the more usual line of business applications to do the heavy lifting analysis. However, the role of Hadoop changed with the advent of Hadoop V2 in 2014, which allowed transactional data to be ingested. This resulted in users and developers being able to think differently about Hadoop, but it also led to growth (measured in terms of Hadoop cluster sizes) beginning to stall. Functionally, Spark began to replace the batch-processing MapReduce as the execution engine in Hadoop. As it enabled continuous processing, new use cases appeared as data could be continuously processed, users could think about ‘on the fly’ advanced analytics. This in turn gave rise to a new ecosystem of tools for building Big Data-rich applications as the Hadoop use case went from batch to on-line transactional. Rather than HBase, there was Cassandra; rather than YARN, there are Docker containers and Mesos or Kubernetes. In place of Flume, there was Kafka. And rather than HDFS, use S3 or just about any commodity object store.

All this innovation has left administrative and developmental complexity in its wake. Mainstream enterprises, unable or unwilling to hire the scarce, necessary and expensive skills, started increasingly looking to Amazon, Microsoft, Google, as well as other vendors building on their services or infrastructure, to deliver ‘as-a-service’ simplicity.

Page 30 Wandisco 23 February 2017

These Cloud vendors have their own views on analytics-as-a-service on Hadoop. Today, customers can choose to run Hadoop hosted in these vendors’ Clouds and swap in some specialised, proprietary functionality, such as elastic data ingest with Amazon’s Kinesis Firehose or a library of machine-learning APIs on Azure. Over time, these Cloud vendors may well provide much deeper integration among their proprietary services.

Test drive the new world . Consider a financial services company implementing predictive customer analytics applications in order to offer the timeliest and most relevant financial information services to customers. The solution would need to offer 24/7 availability and automated disaster recovery, which is something no financial services firm can afford not to have. With WANdisco Fusion, these issues go away.

. Fusion’s architecture eliminates vendor lock-in, and supports staggered upgrades of Hadoop across locations without any interruption in operation. So Fusion can be implemented across different versions of the same platform, as well as mixed storage environments, including Oracle BDA, Cloudera, Hortonworks and MapR. This means users can migrate across the IaaS vendors, and thereby avoid any vendor lock-in.

. Consider WANdisco’s million-dollar deal with a car manufacturer, which sees Fusion used to replicate data to the Cloud, the data relating to self-driving technology and predictive maintenance. It is currently a US$1m royalty contract to WANdisco, but it is only at the start of a new industry, (autonomous cars), and in time could build to a huge industry-wide contract.

Hybrid Cloud – the data migration and replication use case loud and clear “Cloud will increasingly be the default option for software deployment”, state industry analysts Gartner in “Market Insight: Cloud Computing’s Drive to Digital Business Creates Opportunities for Providers” (June 2016). Gartner argues that Cloud-first and Cloud-only are replacing the defensive no-Cloud stance that dominated many large providers in recent years, and most provider technology innovation is Cloud- centric, with the stated intent of retrofitting the technology to on-premise.

While not everything will be Cloud-based, the extreme of having nothing Cloud-based will largely disappear, and Gartner argues that hybrid will be the most common usage of the Cloud. But this will require public Cloud to be part of the overall strategy. Technology providers will increasingly assume that their customers will be able to consume Cloud capabilities.

Gartner predicts:

. By 2019, more than 30% of the 100 largest vendors’ new software investments will have shifted from Cloud-first to Cloud-only. Why? Because the Cloud-first in software design and planning is gradually being augmented or replaced by Cloud-only, and this applies to private and hybrid Cloud scenarios alike.

. By 2020, more computing power will have been sold by IaaS and PaaS Cloud providers than sold and deployed into enterprise data centres.

“With the growth of both bimodal computing and Cloud provider offerings, software-defined enterprise data centers have become less centrally important than building a strong multi-provider management capability,” explained Thomas J. Bittman, VP and distinguished analyst at Gartner. “Unless very small, most enterprises will continue to have an on-premise (or hosted) data center capability. But with most compute power moving to IaaS providers, enterprises and vendors need to focus on managing and leveraging the hybrid combination of on-premise, off-premise, Cloud and non-Cloud architectures, with a focus on managing Cloud-delivered capacity efficiently and effectively.”

Page 31 Wandisco 23 February 2017

The hybrid Cloud market – assessing TAM In an April 2016 research report by MarketsandMarkets, the hybrid Cloud market is estimated to jump from US$33.28bn in 2016 to US$91.74bn by 2021 – a 22.5% CAGR.

The growing requirement of organisations for agile, scalable and cost-effective Cloud computing solutions has increased the demand for hybrid Cloud solutions and services. Along with this, the rising need of standards for interoperability between Cloud services and existing systems, increasing demand to avoid vendor lock-in, and increasing digital services and their applications have contributed significantly towards the growth of the hybrid Cloud market. Organisations are experiencing a significant need for the adoption of hybrid Cloud solutions to keep themselves updated with the technological advancements in the Cloud market.

The players The hybrid Cloud market ecosystem comprises hybrid Cloud solution and service vendors including HP, Microsoft, IBM, Cisco, Equinix, Oracle, VMware, Citrix, Rackspace, and Amazon Web Services, Inc. A few other major vendors such as TeraGo Networks Inc., Dell Inc., Panzura, VMTurbo Inc., Google Inc., RightScale, and Verizon Terremark offer comparatively narrower, yet locally effective solutions and distribution networks in the hybrid Cloud market ecosystem.

Page 32 Wandisco 23 February 2017

Open Source: a review

WANdisco charges for its software. However, it uses Open Source components. Many investors are perplexed by the notion and economics of Open Source software. Here, we discuss its merits.

Open Source software is available in source code form – source code is the building block of the software. Having the source code would allow anyone to develop the same software or understand its internal functioning, change, improve and distribute it. In the ‘normal’ world, the source code is copyright-protected. While it is still viewed with some suspicion, industry stats suggest that 98% of enterprise-level companies use Open Source offerings in some capacity. Indeed, SAP Research has concluded that, ‘software development is undergoing a major change from being a fully closed software development process towards a more community-driven Open Source software development process. Successful Open Source projects like , Apache, PostgreSQL and many others are growing super- linearly.’

The development of the Open Source industry has had four pivotal moments:

. The first big push. On 31 March 1998, web browser software company Netscape posted c8 megabytes of compressed Communicator 5.0 code on the Mozilla.org website. By giving away its source code, Netscape was hoping to tap into a virtually unlimited talent pool to develop the next- generation browser, boost its browser market share and traffic to its Netcenter site, which it was trying to position as an onramp to the Internet. While giving away software was part of Netscape’s DNA – it started out giving its browser away free – at one time it was the most downloaded programme on the Internet and, according to IDC, was used by more than 75% of all surfers. Netscape gave software developers an open licence in exchange for an agreement to post their modifications of the code on Mozilla.org. For its part, Netscape would then add the best third-party enhancements to its own branded versions of the product. While the company became a footnote in the industry of the IT industry, losing share to Microsoft and its bundled browser (Internet Explorer), this was one of the events which endorsed the notion of Open Source software.

. Linux. Mention ‘Open Source software’ to Europeans and the instant name recognition is likely to be Linux and its founder Linus Torvalds. Torvalds started out creating a new kernel, based on Unix technology, and is now over 370 megabytes of source code. Torvalds announced that he had developed the Linux kernel in 1991, and by 1993 there were more than 100 developers working on the Linux kernel. In 1996, the penguin mascot came along. As Linux became more popular, there were a number of antagonistic interactions with Microsoft. In 2004, Microsoft published results from customer case studies evaluating the use of Windows vs. Linux, which concluded that Linux compared unfavourably to Windows in terms of reliability, security, and total cost of ownership. Interestingly this had the opposite effect and in fact led to Linux getting more industry support and being ported to all major platforms. In 1999, IBM announced an extensive project for the support of Linux and thereafter Oracle, HP and Red Hat. By July 2009, Microsoft submitted 22,000 lines of source code to the Linux kernel. This has been referred to as ‘a historic move’ and as a possible bellwether of an improvement in Microsoft’s corporate attitudes toward Linux and Open Source software. By 2011, Microsoft had become the 17th-largest contributor to the Linux kernel.

. The seminal IPO. For investors, the key to the Open Source world was the August 1999 Red Hat IPO. Red Hat was a Linux distributor. On IPO day, the share price tripled, making a US$3bn company after the company raised US$84m, with reported revenue of US$3.5m. Revenue fell back to US$2.7m for the subsequent quarter. It was the first test case for Linux in the broader public marketplace, but back then it was seen as paving the way for future Open Source IPOs (names in the frame were VA Linux Systems, Caldera Systems, Linuxcare, and Cygnus Solutions).

Page 33 Wandisco 23 February 2017

. The big public sector endorsement. In our discussions with various Open Source companies over the years, most conclude that government has been one of the slowest adopters of Open Source software despite the industry arguing that “governments have an inherent responsibility and fiduciary duty to taxpayers” to implement Open Source. This has now changed. Of the ‘banner’ examples: (1) in 2009, the US White House switched its CMS system from a proprietary system to Drupal Open Source Content Management System; (2) accelerating the use of Open Source has been a goal of the UK administration since 2009, with government departments required to adopt Open Source software when ‘there is no significant overall cost difference between open and non- Open Source products’ because of its ‘inherent flexibility’; (3) in March 2012, the UK government launched a one-year migration project for all of its public institutions.

How is Open Source software developed? Open Source software is developed in a public, collaborative manner – the industry talks about ‘community’ development. Software is built and maintained by a network of volunteer programmers. The traditional model of development works in a fairly centralised way, with defined roles for developers who design (the ‘architects’), people responsible for managing the project, and people responsible for implementation. In the Open Source world, users are co-developers who are encouraged to submit additions to the software, code fixes for the software, bug reports, documentation, etc. This model increases the rate at which the software evolves – there is even Linus’s (Torvalds) Law which states, “Given enough eyeballs, all bugs are shallow.” This means that if many users view the source code, they will eventually find all bugs and suggest how to fix them. Note that some users have advanced programming skills, and furthermore, each user’s machine provides an additional testing environment. This new testing environment offers that ability to find and fix a new bug. Open Source software developers will also use services provided for free on the Internet and Open Source tools like CVS and Subversion source control systems and the GNU Compiler Collection.

Examples of Open Source Prime examples of Open Source products are the Apache HTTP Server, the ecommerce platform osCommerce and the Internet browser Mozilla Firefox. One of the most successful Open Source products is the Linux operating system, an Open Source Unix-like operating system. Android, Google’s operating system for mobile devices, is another.

What do naysayers gripe about? They have concerns about security, reliability and serviceability. Partly, the concerns are fuelled by a view about Open Source that relates back to its development model (the quip: a camel is a horse designed by committee), also the more general perception that Open Source licences are viral, a lack of formal support and training, the velocity of change, and a lack of a long-term roadmap. That said, the evidence suggests differently. For example, an analysis of 5bn bytes of free/Open Source code by 31,999 developers shows that 74% of the code was written by the most active 10% of authors. The average number of authors involved in a project was 5.1, with the median at 2. There is the issue of a lack of technical and general support, however Open Source companies often combat this by offering support, sometimes under a different product name.

Is Open Source software good/bad/indifferent relative to ‘closed’ software? Academic studies conclude that the value proposition for open, as compared to ‘normal’ or proprietary, software is based on its (1) security, (2) affordability, (3) transparency, (4) perpetuity, (5) interoperability, and (6) localisation. The top four reasons (as provided by Open Source Business Conference survey) why organisations choose Open Source software are (1) lower cost, (2) security, (3) no vendor ‘lock-in’, and (4) better quality. In our view, after talking to users, companies and industry analysts, the big pull is the ROI. A report by the Standish Group states that adoption of Open Source software models has resulted in savings of about US$60bn/year to consumers. Adoption has been strong. Acceptance of Open Source has been higher than many expected, and we believe the reasons for this include:

. The ROI.

Page 34 Wandisco 23 February 2017

. The Open Source support services offered by companies such as IBM, Oracle, HP and Dell.

. Hybrid Open Source/proprietary software business models are appearing.

. Open Source is being used to gain competitive advantage (and hence is receiving indirect promotion).

In some market sectors, adoption has surged. For example, the Internet has been dominated by Open Source and free software since its inception; most web servers run Open Source software. The Apache has been the most installed web server since statistics began to be gathered.

How do Open Source software companies make money? Open Source software companies need to have a series of revenue-generating activities. (1) Software can be developed as a consulting project for one or more customers. The customers pay to direct the developers’ efforts: to have bugs prioritised and fixed or features added. Companies or independent consultants can also charge for training, installation, technical support, or customisation of the software. While they provide the software freely, software companies can sell licences to proprietary add-ons such as data libraries. For example, an Open Source CAD program may require parts which are sold on a subscription or flat fee basis. There is the common ‘dual-licence’ approach, whereby there are free and not-free elements. Companies like Ingres, MySQL, Alfresco, SugarCRM, Puppet, Acquia and Zenos all have a free version (some GPL, some Apache, some LGPL). (2) Software companies will have enhancements to the free code, and will use their staff on a time and materials basis to work with ‘enterprise customers’ to create specific functionality for them.

Companies that offer Open Source software are able to establish an industry standard and, thus, gain competitive advantage; it also helps foster developer loyalty as developers feel empowered and have a sense of ownership of the end product. This should in turn lead to lower sales and marketing costs. (3) Another approach is best typified by SugarCRM. Promoting its software as Open Source, SugarCRM at the outset did not have an OSI-approved licence, so it sat in a ‘grey, in-between’ area. This has not stopped it from raising US$116m.

This gets into the ‘freemium’ topic ‘Freemium’ is a portmanteau comprised of the ‘give-some-away-for-nothing’ and ‘charge-for-other- premium-bits’. This describes a business model by which a product or service is provided free of charge, but a premium is charged for advanced features, functionality, or related products and services. Freemium has been popular in the software industry and is particularly suited to software as the manufacturing cost is negligible and as long as cannibalisation is avoided, then little should be lost by giving software away for free.

Freemium is closely related to tiered services. It has become a highly popular model – think about LinkedIn or FT.com. Ways in which the product or service may be restricted in the free version include:

. Feature-limited (e.g. a ‘lite’ version of software, such as Skype).

. Time-limited (e.g. only usable for 30 days, such as Microsoft Office).

. Capacity-limited (e.g. for an accounts package, can only be used to read a limited number of articles, such as the Harvard Business Review).

. Seat-limited (e.g. only usable on one computer rather than across a network).

. Customer class-limited (e.g. only usable by educational users).

Page 35 Wandisco 23 February 2017

Any idea of the market size? Vague There is a long-standing industry quip that Open Source vendors start out with a US$10bn market, and make it a US$1bn one. Hard for industry analysts to stab at, as so much of the offer is free and they count revenue and spend. We note research portal Statista says that the Open Source software revenues will reach €57,326m in 2020 from €51,022m in 2016 (see figure).

Figure 15: Open Source software market 2008-2020 (€m)

70,000 57,326 54,292 55,930 56,746 60,000 51,022 50,000 46,340 40,822 40,000 34,923 28,716 30,000 22,641 17,197 20,000 11,609 7,924 10,000 - 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

Source: Statistica

The collaborative element Noted industry commentator Richard Stallman claims that ‘Open Source is a development methodology’. However, it is also a culture of the people – i.e. the ‘community’. The community has a shared ethos which is coupled with a collaborative working model. The communities are aligned with one of the Open Source projects. Having a strong, growing community is important because it leads to increased popularity, increased use and a larger development pool to build more tools and more software. Some of the more prominent communities include:

. Apache Software Foundation, creators of the Apache web server; a loose affiliation of developers headed by Linus Torvalds, creator of Linux. Subversion is part of this community;

. Eclipse Foundation;

. Project;

. Mozilla Foundation, best known for the Firefox web browser;

. OW2, European-born community developing open-source middleware. ALM and Open Source When making the case for Open Source, it is vital that providers demonstrate they have the same structure and ecosystems as users expect from the major proprietary software vendor. Consequently, Open Source offerings need to be packaged up with hosting, consultancy and the support network that many IT decision-makers consider to be a necessity for implementation.

Revision control systems such as Concurrent Versions System (CVS) and later Subversion (svn) and Git are examples of tools that help centrally manage the source code files and the changes to those files for a software project. Utilities that automate testing, compiling and bug reporting help preserve stability and support of software projects that have numerous developers but no managers, quality controller or technical support. Building systems that report compilation errors among different platforms include Tinderbox. Commonly used bugtrackers include Bugzilla and GNATS.

WANdisco gets involved In October 2009, WANdisco announced that it was hiring core Subversion contributors as the company moved to become a major corporate sponsor of the project, from CollabNet. This included Hyrum Wright, president of the Subversion Corporation and release manager for the Subversion project since

Page 36 Wandisco 23 February 2017

early 2008, who joined the company to lead its Open Source team. Subsequent developments sponsored by WANdisco included SubversionJ (a Java API) and implementation of the Obliterate command, similar to that provided by Perforce.

In January 2011, WANdisco announced that it had completed the acquisition of SVNForum.org – the world’s largest Subversion user community with over 20,000 active members. WANdisco has made the site more secure, added better spam protection and improved search engine optimisation.

The rise, and rise of Git Git is a decentralised version control system. Designed by Linus Torvalds, Git is an Open Source, distributed version control system designed to handle everything from small to very large projects with speed and efficiency and is positioned as a replacement for version control tools like Perforce or Subversion. It has gone through a steep growth curve and now has 19m developers working on c50m projects, up from 14m projects in 2015. There are c42m users.

Git does nearly all of its operations without needing a network connection. The approach is that every Git clone is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server. This is a peer-to-peer approach that clients can synchronise on a peer-to-peer basis, clients can make changes in the repositories and those changes will be local to them unless they synchronise with someone else. Depending on the requirements, Git also offers a centralised repository. With Git, clients can commit changes to their localised repositories as new revisions while being offline. In addition, Git has a complete copy of the data stored locally in the client’s system, so it is extremely fast and there is no time wasted when waiting for network response time.

Git, like BitKeeper, and Bazaar, are collectively referred to as ‘Distributed Version Control Systems’. These work offline, and everyone has a copy of the entire source code on their local machine. Hence, it is interesting for software developers working in a loose-knit environment where there can be a prolonged disconnect which could well be spent working on implementations without disturbing the master repository. The problem is that there is no ‘one copy of the truth’ of the source code assets and so it is problematic to software developers who undertake continuous builds. By contrast, Subversion (and CVS) has a different approach whereby every replica is a single version of the truth of the source code repository.

Git’s growth, we think, was helped by a big push of GitHub. GitHub is a web-based Git or version control repository and Internet hosting service offering Version Control and Source Code Management of Git as well as adding its own features. GitHub is the 50th most popular website in the world. In April 2016, GitHub reported having more than 14m users, 70k organisations and more than 35m repositories. That makes it the world’s largest source code host. It provides access control and several collaboration features such as bug tracking, feature requests, task management and wikis for every project. Business customers include Airbnb, Bloomberg, Hootsuite, IBM, Nasa, SAP and many more (see https://github.com/business/customers).

Page 37 Wandisco 23 February 2017

The competitive world

Although it sells one technology, WANdisco operates in two distinct markets: Source Code Management (ALM) and Data Storage (Big Data). To our knowledge, no other vendor spans both of these markets. Source Code Management Companies active in the Source Code Management market are either established traditional companies or modern Open Source oriented companies like WANdisco. The top-level view from industry analysts IDC is that this market segment is poised for significant transformation in the coming years for a variety of factors:

 Traditional tools are increasingly commoditised and consolidated as systems vendors bundle more core configuration management and control into operating environments, systems software and converged hardware solutions.

 Users are buying unified mobile and desktop endpoint management, software distribution, IT asset management, with a SaaS-based delivery model of change and configuration management solutions.

 Demand for subscription pricing and Cloud service SaaS delivery options that allow end-user self-service and unified control across on-premise, mobile, virtual client, and Cloud-based resources.

IDC believes that the worldwide change and configuration management software market totalled US$5.9bn in 2015 (+7.1% YoY) with the leading vendors being Microsoft, VMware, and Dell. As demand was healthy for mobile, virtualisation and Cloud management, there is pressure to consolidate as participants begin bundled selling solutions and point vendors struggle to gain momentum. IDC analyst Mary Johnston Tuner, Research VP, Enterprise System Management Software, argues that, “Open Source communities are helping fuel the development of a new generation of agile, scalable, and highly innovative monitoring, analysis, automation, orchestration, and configuration management software technologies…The rate and pace of innovation taking place in the Open Source arena should definitely be factored into any future enterprise systems, Cloud, or DevOps management software purchase or strategy decision.”

Figure 16: Worldwide change and configuration management software market (US$m) Vendor 2013 2014 2015 2015 share (%) 2014-2015 growth (%) Microsoft 1,139.80 1,254.60 1,450.20 24.7 15.6 VMware 448.6 546.5 742.7 12.6 35.9 Dell 405.5 438.6 428.5 7.3 -2.3 HPE 437.9 414.2 426.4 7.3 2.9 IBM 370.8 350 332.2 5.7 -5.1 SAP 284 297.4 271.4 4.6 -8.8 BMC 188.1 183.7 190.4 3.2 3.6 Citrix 160 164.9 175.4 3 6.4 NEC 172.6 166.6 148.9 2.5 -10.6 Symantec 190.9 182.4 146.5 2.5 -19.7 163.8 163.5 143.7 2.4 -12.1 Micro Focus 121.6 115.7 129.2 2.2 11.7 Others 1064.8 1212.5 1294.3 22.0 6.7 Total 5,148.40 5,490.90 5,879.60 100 7.1

Source: International Data Corp.

Page 38 Wandisco 23 February 2017

Git continues to rise While Subversion was the most popular Open Source Version Control tool, since 2012 Git has usurped Subversion (SVN) in the version control market and has sapped interest in other version control products, making Git the most popular version control system today. Git accounts for 70% of all the search interest among the top five VCSs (see figure below). For its part, SVN has 13.5% share and it is still used by some of the largest companies including Backblaze, FreeBSD, Mono, and SourceForge. Concurrent Version Systems had the smallest share of (0.8%) in 2016, even though many IDEs have built-in CVS support: Eclipse, Emacs, NetBeans, PyCharm, etc. Technology leaders tell us that Git will ultimately beat SVN, partly due to the ease of collaboration brought by GitHub.

To gain an impression of the market share statistics, we looked at Re:Code, where they illustrate usage of the Open Source version control. The results support the view about the increasing importance of Git (see table).

Figure 17: Version Control systems used by developers, Eclipse Community

Source: ReCode

Key vendors Atlassian Founded in 2002 in Australia and most noted for its collaboration software and the Confluence Wiki. Today, more than 1,700 staff in six locations service 68,000+ customers (teams of software developers, IT managers and knowledge workers) for collaboration, communication, service management and development products. Most of the product portfolio is in the collaboration arena, and includes JIRA for team planning and project management, Confluence for team content creation and sharing, HipChat for team messaging and communications, Bitbucket for team code sharing and management, and JIRA Service Desk for team services and support applications. It was floated in the US in December 2015. Regarding the WANdisco overlap, it does compete on some functional areas, but Atlassian is mostly in the collaboration arena.

CFEngine Privately held CFEngine is an Oslo- and California-based software and services company that supports the Open Source software CFEngine for distributed configuration management. Licensed under the GPL, CFEngine automates large-scale IT computing infrastructure: ensuring the availability, security and compliance of mission-critical applications and services. CFEngine configuration management products are highly scalable through decentralised, autonomous agents that can continuously monitor, self-repair, and update a global multi-site enterprise every five minutes – with negligible impact on system resources or performance. Users include Facebook, AT&T, Cisco, eBay, LinkedIn, AMD and the US Navy. They use it to manage servers, desktops and other heterogeneous computing devices.

Page 39 Wandisco 23 February 2017

CollabNet Privately held CollabNet is a leading provider of Enterprise Cloud Development and Agile ALM products and services for software-driven organisations. Best known as the founder of Subversion, over the past 15 years, first with Subversion, then with TeamForge and then CloudForge, CollabNet has been focused on Cloud-based software development tools and practices. With more than 10,000 global customers, the company provides a suite of platforms and services to address: Agile, DevOps and hybrid Cloud development. Its CloudForge development-Platform-as-a-Service (dPaaS) enables Cloud development through a flexible platform that is team friendly, enterprise ready and integrated to support leading third party tools. The CollabNet TeamForge ALM, ScrumWorks Pro project management and Subversion Edge source code management platforms can be deployed separately or together, in the Cloud or on-premise.

CollabNet complements its technical offerings with consulting and training services. CollabNet operates hosted, onsite or in the private Cloud. In addition, it integrates with any tools and desktops, via open APIs. Many CollabNet customers improve productivity by as much as 70%, while reducing costs by 80%. TeamForge is the leading enterprise platform for ALM for any technology stack and development process. As an indication of its relative size, CollabNet supports more than 10,000 customers with 6m users in 100 countries. On 2 February 2017, CollabNet revealed that in 2016 it had experienced accelerating growth in revenue and profitability for the third consecutive year, headlining with 200% YoY earnings growth. Renewal rates also set an all-time record, exceeding 92%. CollabNet has been linked to a possible IPO many times. Since February 2014, it has been owned by Vector Capital, although terms of acquisition were not disclosed then.

Concurrent Versions System – for clarity CVS remains a version control system, not a company. CVS was created in the UNIX operating system environment and is available in both Free Software Foundation and commercial versions. It is a popular tool for programmers working on Linux and other UNIX-based systems, and it supports distributed, multisite and offline operations. The initial release was in November 1990, and there have been no new releases since 2008.

IBM ClearCase MultiSite IBM’s offer in this segment is ClearCase. Again, another master/slave approach, ClearCase MultiSite repository can become a single point of failure, and thereby can pose an issue for remote sites that are separated by large timezone distances. We also understand that pricing is high both for the software (we understand ASPs range US$25-300k) but also create T&M labour (the same again), and according to users also needs more administrators. In fact, we understand that it is not uncommon for organisations to appoint a full time administrator to manage the ClearCase server and software at each development site. In addition, synchronisation is not automatic with each write operation, and must be done on a scheduled basis with an administrator monitoring.

Micro Focus – a thought Micro Focus is an ALM competitor, and the latest deal, merging with HPE, gives it another foothold in the ALM market, which comes in the wake of its Serena Software merger which similarly strengthened its ALM position. Micro Focus is a consolidator in the Infrastructure software market. The Micro Focus product is StarTeam (from the Borland acquisition), which is a SCCM platform with a master/slave methodology whereby StarTeam manages both assets and activities in a single repository. Micro Focus is supportive of Open Source software – after all, it owns SUSE the ‘other’ Linux distributor after Red Hat. Micro Focus’s StarTeam product has long had Open Source support.

Microsoft For Microsoft, the Visual Studio Team System is the product family that brings together tools for software development process. The Visual Studio Team System connects with Microsoft’s Project Server to help project managers get up-to-date information on what developers are doing.

Page 40 Wandisco 23 February 2017

On a distributed computing basis, Microsoft has Visual Source Safe, an SCCM solution. VSS by itself is not a client server-based solution like CVS. Instead, it is a client file system solution and within a LAN, file sharing is used to access the source code repository. However, this is not feasible for remote sites accessing the repository over a WAN where Microsoft recommends purchasing Source Gear’s SourceOffSite to address this problem and effectively turn VSS into a client server solution that can support remote access over a WAN. This effectively brings VSS to the same level as CVS at an additional cost. Investors will appreciate that the master VSS database acts as a single point of failure.

Perforce Perforce Software is a privately held company with headquarters in Alameda, California and international offices in the UK, Australia and Canada. Perforce version management technology is billed as one which increases productivity, improves software quality, and helps reduce the complexity of global environments. Today, more than 400,000 users across 5,500 businesses use Perforce for enterprise version management. From the product perspective, Perforce has central server architecture – which again can result in the master Perforce server becoming a single point of failure.

Founded by Christopher Seiwald in 1995, Mr Seiwald sold the company in February 2016 to investment group Summit Partners. Here, too, we note some consolidation, as Perforce acquired Seapine Software in November 2016. Note that Seapine has developed a suite of software development lifecycle tools which includes testing tools, configuration management, test-case management and requirements management.

Pushmi Pushmi is an Open Source enterprise standard for version control. It works by creating transparent slave replicas that are writable by normal Subversion clients, so it can be added to the existing infrastructure. Consequently, Pushmi is a master/slave solution where all write transactions are written to the central master server, and are then copied to the remote slave repositories.

Page 41 Wandisco 23 February 2017

Data storage The key players in the space (Google, IBM, Oracle) all indicate that (1) WANdisco IP is extremely broad, (2) there is a general acknowledgement that there isn’t anyone doing anything similar, and in their view (3) WANdisco has a multi-year head-start over anyone. There is no direct competitor. However, looking into the market we see a number of competitive areas:

. We can see operational overlaps with ETL (Extract, Transform, Load) vendors in the data migration business. The long list of established companies focused on this area includes Informatica, Information Builders, SAS and Syncsort. Here, the competitive threat can be somewhat overplayed.

. There is also some overlap with the database replication vendors. This is more often a master/slave architecture between the original and the copies. The master logs the updates, which then ripple through to the slaves which output a message, stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates. In the case of multi-master replication, the updates can be submitted to any database node, and then ripple through to other servers – but here the challenge is transactional conflict prevention. Database replication also becomes difficult when it scales up.

. In addition, there are more niche Big Data replication vendors like Attunity, Denodo, Talend and many others. Looking at (e.g. Attunity) we note that its Replicate software helps load and ingest data across all major databases, data warehouses and Hadoop, on-premise and in the Cloud. Attunity’s customer base for this product set is estimated to be c2,000 organisations globally. Through capabilities for replicating data to and from the Cloud, and accelerating data warehouse deployments, Big Data (Hadoop) and workload optimisation support, Attunity enables a broad scope of integration styles, and also supports Apache Kafka by enabling real-time intertwining of data movement.

. The most potent are the IaaS vendors. Here we find the solution is first to put hardware server technology at the problem. Note for example that Amazon Web Services has disaster recovery. But looking at the offer, we note Amazon has an AWS Import/Export feature when users are moving large amounts of data into and out of AWS – it is portable storage device, which for data sets of significant size, AWS Import/Export is often faster than Internet transfer. For data migration, as customers move to the Cloud, AWS offers Amazon Relational Database Service, alongside Amazon DynamoDBis (a NoSQL database) and Amazon Redshift is a fully managed, petabyte- scale data warehouse service.

. Other Cloud infrastructure vendors, such as distro Cloudera or Hortonworks. We have seen distros and infrastructure Hadoop companies launch their own, mostly, Disaster Recovery based solutions. For example, Cloudera Manager Backup and Disaster Recovery (BDR) is its Hadoop DR offer. These are typically active:passive and likely reflect early Hadoop usage and the use of non- mission critical applications. On user forums, we find lots of interest in using the ‘snapshot feature’ in HDFS. These will take a point in time image, which in truth could be for an entire file system, a sub-tree in a file system or just a file. Clearly this will not capture the incremental data changes, and snapshotting can be bandwidth-intensive.

. Other technology solutions – here, we have Raft and to some extent Blockchain, yet neither has a major commercial endorsement. Docker is also lurking in the background. While not directly competitive, it allows users to develop distributed computing systems by using software ‘containers’. Docker can be integrated into various infrastructure tools, including Amazon Web Services, Ansible, CFEngine, Chef, , IBM Bluemix, HPE Helion, , OpenStack, Oracle Container Cloud Service and others. According to industry analysts 451 Research, ‘Docker is a tool that can package an application and its dependencies in a virtual container that can run on any Linux server. This helps enable flexibility and portability on where the

Page 42 Wandisco 23 February 2017

application can run, whether on premises, public Cloud, private Cloud, bare metal, etc.’ Kafka also deserves a mention. Kafka is used for building real-time data pipelines and streaming applications. This is an Apache Open Source project. Kafka lets users publish and subscribe to streams of records so it is a messaging system. Kafka also lets users store streams of records in a fault- tolerant way. So we can see advantages like scalability and fault-tolerance so there is some overlap with WANdisco. Kafka is also widely used and it may become more popular as users look to ingest data into a Hadoop cluster and then cleanse it. (Consider a use case: you might trawl the web for all references for all daily commuters into the City of London – use Kafka to gather that raw data. Then run an analytics to find the only Irish guy with five kids.)

. DIY. (1) Airbnb posted a blog detailing its migration of a Hive warehouse. The warehouse had grown from 350TB in mid-2013 to 11PB by the end of 2015. The sheer size of the warehouse gave rise to issues of reliability, so Airbnb opted to migrate to a new architecture. Airbnb decided that existing migration tools either had issues with a large data warehouse or had a significant operational overhead, so it developed ‘ReAirto’ to save time and effort when replicating data at this scale. Initially, all the data was in a single HDFS/Hive warehouse, but the mixed production with ad hoc workloads led to reliability issues. Airbnb split the warehouse in two: production jobs and ad hoc queries. It then had to migrate the large warehouse, and then after the split keep datasets in sync. It developed ‘ReAir’ to do this, and has since open-sourced the tool for the community. ‘ReAir’ is useful for replicating data warehouses based on the Hive metastore, and Airbnb has made these tools to be scalable to clusters which are petabytes in size. ReAir can work across a variety of different Hive and Hadoop versions and can operate largely standalone.

. ReAir includes two replication tools: batch and incremental.

. A batch replication tool to copy a specific list of tables at once, which is ideal for a cluster migration.

. An incremental replication tool to track changes that occur in the warehouse and replicate the objects as they are generated or modified. This keeps datasets in sync between clusters as it starts copying changes within seconds of object creation.

. Link to GitHub: https://github.com/airbnb/reair

. (2) Netflix has a loosely coupled microservice-based architecture that emphasises separation of concerns. It has developed EVCache its data-caching service that provides the low-latency, high- reliability caching solution, and it routinely handles upwards of 30m requests/sec, storing hundreds of billions of objects across tens of thousands of memcached instances, translating to c2 trillion requests a day globally. When Netflix launched in 130 additional countries, it built EVCache’s global replication system. EVCache is Open Source, and has been in production for more than five years. Netflix’s Cloud-based service is spread across three Amazon Web Services (AWS) regions. While requests are mostly served from the region the member is closest to, this can change due to (say) problems with infrastructure or a regional/geographic failover, so because of this NetFlix has a stateless application server architecture, whereby a server can take a request from any region. Consequently the data must be replicated to all regions so it’s available to serve member requests no matter where they originate. Netflix designed EVCache for itself, so it admits that one non- requirement was to have strong global consistency. Remember that Netflix doesn’t mind if (say) the Ireland and Virginia servers have slightly different recommendations for the same person as long as the difference does not impact browsing or streaming experience. For non-critical data, Netflix has an “eventual consistency” model for replication, whereby local or global differences are tolerated for a short time. This simplifies the EVCache replication design, and means that Netflix did not worry about global locking, quorum reads and writes, transactional updates, partial-commit rollbacks or other complications of distributed consistency.

Page 43 Wandisco 23 February 2017

Board of Directors

David Richards, Interim Chairman, President, CEO and Co-founder Since co-founding the company in Silicon Valley in 2005, David has led WANdisco as CEO. With more than 20 years of executive experience in the software industry, Mr Richards sits on a number of advisory and executive boards of Silicon Valley start-up ventures. A passionate advocate of entrepreneurship, he has established and successfully exited several highly successful Silicon Valley technology companies. David Richards was the founder & CEO of Librados, an application integration software provider, and led the company’s acquisition by NASDAQ listed NetManage, Inc. in 2005. Mr Richards holds a BSc in Computer Science from the University of Huddersfield. He was appointed Interim Chairman on 6 October 2017, in the wake of the failed boardroom coup.

Erik Miller, CFO and Board Director Mr Miller was re-appointed CFO on 5 December 2016, having held the role of Interim SVP of Finance at WANdisco since 10 October 2016. His initial CFO and Board appointment was September 2017, but he resigned in the wake of the boardroom coup, 6 October. Previously Erik Miller was the CFO of Nasdaq- listed Envivio, Inc., provider of video transcoding software from February 2010 to January 2016. Following its acquisition by Ericsson AB, from January 2008 to July 2009, Mr Miller served as CFO at SigNav Pty. Ltd., a component supplier to the wireless industry, where he was responsible for finance and administration functions; and from March 2006 to January 2008, he served as CFO at Tangler Pty. Ltd., a social networking company, where he was responsible for finance and administrative functions. Mr Miller received a B.S. degree in Business Administration from the University of California, Berkeley.

Grant Dollens, Non-Executive Director Grant Dollens founded Global Frontier Investments, LLC, a long term-oriented global equities fund, in 2010, and serves as its portfolio manager. Previously, he was an investment analyst and member of the investment committee for Ayer Capital, a long/short equity healthcare fund, where he was focused on medical devices, diagnostics, healthcare services, biotechnology and pharmaceutical investments. Prior to Ayer, Grant Dollens was an associate in the healthcare group at BA Venture Partners (now Scale Ventures) where he sourced, evaluated and invested in private medical device, biotechnology, specialty pharmaceutical and healthcare service companies. Before BA Venture Partners, Mr Dollens was an investment banking analyst in corporate finance at Deutsche Bank Alex. Brown, focused on the technology sector.

Grant Dollens received his MBA from the Kellogg School of Management at Northwestern University, with majors in Analytical Finance, Management & Strategy, and Accounting. He received his B.S. in Biomedical Engineering from Duke University. He is a member of the Board of Visitors at the Pratt School of Engineering at Duke University.

Karl Monaghan, Non-Executive Director Karl Monaghan is currently Managing Partner at Ashling Capital LLP, which he founded in December 2002, to provide consultancy services to both quoted and private companies.

Prior to founding Ashling Capital, Mr Monaghan has worked in Corporate Finance for Robert W Baird, Credit Lyonnais Securities, Bank of Ireland, Johnson Fry and BDO Stoy Hayward. Additionally, he trained as a Chartered Accountant with KPMG in Dublin and holds a Bachelor of Commerce from University College Dublin. Mr Monaghan brings a wealth of capital markets and board experience, and is currently a Non-executive Director of AIM companies CareTech Holdings plc and Sabien Technology Group plc. Mr Monaghan was appointed to the WANdisco board on 5 December 2016.

Page 44 Wandisco 23 February 2017

DR. Yeturu Aahlad, Chief Scientist, Inventor & Co-Founder Dr. Aahlad is a recognised world-wide authority on distributed computing where he currently holds 28 patents. Prior to WANdisco, Dr. Aahlad served as the distributed systems architect for iPlanet (Sun/Netscape Alliance) Application Server. At Netscape, Dr. Aahlad joined the elite team in charge of creating a new server platform based on the CORBA distributed object framework. Prior to Sun/ Netscape Dr. Aahlad worked on incorporating the CORBA security service into Fujitsu’s Object Request Broker. Dr. Aahlad designed and implemented the CORBA event services while working on Sun’s first CORBA initiative. Earlier in his career, Dr. Aahlad worked on a distributed programming language at IBM’s Palo Alto Scientific Center.

Dr. Aahlad has a Ph.D in distributed computing from the University of Texas, Austin and a BS in EE from IIT Madras.

Page 45 Wandisco 23 February 2017

Analysis of forecasts

We modelled WANdisco using a range of inputs based on the demand environment, visible bookings traction and the fade rate and average selling prices. Our cost model assesses the gross margin by line of business and the operating cost model includes estimates for staff numbers and staff costs.

The bookings drivers WANdisco sells subscription licences. These are charged and then booked to revenue on a pro rata basis. Generally, a ‘booking’ has three components: the recognised revenue element, the deferred revenue element (billed and sitting on the balance sheet until revenue recognition conditions have been satisfied), and the ‘unbilled’ accounts receivable or the amount which refers to contracts where WANdisco has yet to bill the customer. These would be for (say) year two or three of a three-year contract term. We have generated bookings by looking at these components across the ALM and Fusion/Big Data divisions (see table for our core assumptions). We have backcasted our assumptions to the prior years, 2014A and 2015A, to test their broad applicability.

ALM assumptions . Short-term bookings growth is predicated on the H2 larger contract signings and some increase in the customer count given a new focus on the division in H2/2016. We have used an 85% renewal rate.

. We have maintained pricing on new bookings despite the recent evidence that WANdisco is winning larger deals. On renewal bookings, we have factored in some ‘upsell’ success with the ASP increasing ahead of the general inflation rate. However, for add-on bookings, we have elected to keep the ASP deflating.

. We see a residual long tail bookings stream in SmartSVN.

Figure 18: ALM booking assumptions (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Bookings New customer booking 7.5 1.2 1.9 1.9 1.4 1.5 1.6 Customer count 46 20 24 24 18 20 21 Asp (US$000) 163.0 60.0 78.0 78.0 78.0 78.0 78.0 Add-on bookings 4.0 1.6 1.4 1.2 0.8 0.8 0.8 Customer count 54 49 47 42 34 34 34 Asp (US$000) 74.1 32.7 31.0 27.9 25.1 25.1 25.1 Renewal bookings 2.8 3.5 6.0 6.6 6.6 5.8 5.7 Customer count 68 83 129 140 175 192 209 Asp (US$000) 41.2 42.2 46.4 47.3 37.9 30.3 27.3 SmartSVN bookings 0.3 0.2 0.2 0.2 0.2 0.1 0.1 Total bookings (US$m) 14.6 6.5 9.51 9.84 9.04 8.35 8.26 Total customer count 168 152 200 206 226 246 263 Average asp (US$000) 86.9 42.8 47.6 47.8 40.0 34.0 31.4 Bookings growth -55.5% 46.3% 3.5% -8.1% -7.7% -1.1%

Source: Company data, Stifel estimates

Fusion assumptions . We envisage continued growth in the customer count, with new customer bookings growing markedly through our projections period, i.e. 80% YoY in 2016E followed by further growth in 2017E. We also assume a stronger ‘Go Live’ rate as WANdisco improves the rate by a mix of more experienced, more numerous staff dedicated to the function.

. Our model is built on the cost of storage. While we envisage flat cost/TB through our projections period, we see users buying incrementally more storage. We note IDC end-user research into

Page 46 Wandisco 23 February 2017

Hadoop adoption concluded that storage was growing by c60% a year. We have used a cost/node methodology. While this is still valid, discussions with users suggest that the market is moving away from this.

. We have factored in an 80% renewal rate for 2016 and 2017, rising to 80% in 2018 and then to 90% for 2019-20. The 90% rate is a better average for this sector.

Figure 19: Fusion bookings assumptions (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Bookings New customer bookings 0.38 0.96 2.25 3.90 6.36 8.95 11.60 Customer count 9 17 31 52 83 115 146 Asp (US$000) 42 56.5 73.5 74.9 76.4 77.9 79.5 Go live 0.36 0.72 2.57 4.46 7.27 10.24 13.26 Customer count 3 6 21 36 58 80 102 Asp (US$000) 120 120 120 122 125 127 130 TB 8 8 8 12 16 24 Price/Node +5 TB (US$000) 15.0 15.0 15.3 10.4 8.0 5.4 Go live to new customers (%) 33% 35% 70% 70% 70% 70% 70% Renewal/Scale-up (US$m) 0.04 0.00 1.18 2.67 5.95 13.25 16.82 Customer count 1 5 17 29 50 72 92 Asp (US$000) 36 31.2 68.6 91.5 120.1 183.0 183.0 Additional TB 24 24 48 64 84 128 128 Price/TB (US$) 1500.0 1300.0 1430.0 1430.0 1430.0 1430.0 1430.0

Bookings (US$m) 2.80 2.50 5.99 11.02 19.58 32.44 41.68 Total customer count 10 26 48 83 135 198 261 Average asp (US$000) 280.0 96.2 96.2 115.4 138.5 166.2 199.4 Bookings growth -10.7% 139.8% 83.9% 77.7% 65.6% 28.5%

Source: Company data, Stifel estimates

How do we arrive at revenue? Revenue follows bookings at WANdisco. As we commented, revenue is a pro rata from bookings as the revenue recognition conditions are satisfied. We therefore apportion our bookings across revenue and receivables, and factor in that revenue is also a function of prior period bookings.

This model also gives rise to a book/bill ratio (see below). We were conscious that this will change over time as it will be influenced by contract terms.

Figure 20: WANdisco bookings assumptions (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E New sales bookings 17.4 9.0 15.5 20.9 28.6 40.8 49.9 Growth in bookings (%) 18% -48% 72% 35% 37% 42% 22% Deferred revenue (including unbilled) 19.3 16.2 18.0 23.3 28.3 36.4 43.5 Unbilled receivables 8.0 6.3 7.7 9.2 11.3 14.4 17.3 New sales bookings 17.36 9.01 15.50 20.86 28.63 40.78 49.94 New deferred revenue 12.5 6.5 11.2 15.0 20.6 29.4 36.0 New recognised revenue 4.9 2.5 3.64 4.03 5.66 6.97 8.46 Deferred revenue release 6.3 8.5 7.7 11.1 12.7 16.8 19.8

Revenue 11.2 11.0 11.3 15.1 18.4 23.8 28.3

Source: Company data, Stifel estimates

Page 47 Wandisco 23 February 2017

Figure 21: Book/Bill (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Bookings 17.360 9.012 15.503 20.858 28.628 40.785 49.940 Growth (%) -48.1% 72.0% 34.5% 37.3% 42.5% 22.4% Revenue 11.218 10.994 12.234 14.801 18.153 23.943 28.394 Growth (%) -2.0% 11.3% 21.0% 22.6% 31.9% 18.6% Book/Bill (x) 1.5 0.8 1.3 1.4 1.6 1.7 1.8

Source: Company data, Stifel estimates

By geography The US is the early adopter market. We expect North America to account for the vast majority of revenue through our projections period. At this juncture while there is some Rest of World (RoW) revenue it is nascent and, we believe, liable to be impacted by large contract flows.

Figure 22: WANdisco revenue by geography (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Revenue by geography North America 9.4 7.3 7.8 10.6 13.0 16.9 20.1 Europe 1.4 3.0 3.5 3.7 4.5 6.0 7.1 RoW 0.4 0.8 0.0 0.8 0.9 1.0 1.1 Revenue by geography 11.2 11.0 11.3 15.1 18.4 23.8 28.3

Source: Company data, Stifel estimates

Currency Billings are all made in US dollars by WANdisco. Also, revenues and any expenses of foreign operations are translated at average rate. The main currency exposures are UK sterling and the Australian dollar. Latest rates suggest some slight currency headwind due to sterling weakness. We see expenses, judged by the staff count, as follows:

Figure 23: WANdisco staff count by region Region FTE North America 45 Europe 70 RoW 5 Group staff 120

Source: Stifel estimates

Just a thought We believe our estimates are conservative – particularly in the light of the strong end in Q4/2016. However, we acknowledge that a series of accelerators could potentially deliver better-than-expected earnings growth:

. More new clients. Sales are basically upsells to free customers managed by WANdisco’s inside sales team (i.e. to on-base customers), but with (1) a strong channel push, coupled with (2) additional quota-carrying sales staff there could be upside to our new account assumptions.

. Widening the sales reach. There are co-related product areas where WANdisco could make sales inroads. The first indication of this is likely to be from new channel partner sales, where they are using domain or subject matter experts to open new ‘use cases’. For us, this was the case from that banner auto contract that WANdisco signed via its IBM OEM relationship. In our view, there should be sales opportunities in adjacent markets like healthcare, fintech and smart cities.

. Rising average selling prices (ASP). The recent experience at WANdisco is that it has been able to increase ASPs. Given (1) the macro backdrop, and (2) many users still expect Open Source

Page 48 Wandisco 23 February 2017

software to be ‘free’ software, there may well be some customer push-back on attempts to increase prices. However, with future pricing predicated on storage, and given rising storage volumes, there may well be a ‘natural’ price inflator.

. New products. WANdisco has maintained an aggressive technology insertion rate – adding modules for new functional areas and expanding its product range. Maintaining this technology insertion rate should continue to drive upgrade spend among the customer base.

Are our sales expectations rational? (i) We assumed six quota-carrying sales staff with an average ‘bag’ of US$1.5m. On 75% utilisation, they generate US$6.75m. For 2017E, there is a further US$8.2m from deferred and US$4m from OEM, suggesting ‘doable’ revenue of US$18.9m. This is 30% ahead of our estimate of US$14.7m. We are thereby relaxed given our assumptions that this is an achievable stretch. We also flag up the balance sheet release means that the new sales hurdle is low – as 55% of revenue is already pre-sold as the company starts the new year.

(ii) Survey data suggests that our growth in storage could be on the conservative side. We note IDC analysts Hadoop users averaging storage YoY growth c60%+. Also the survey data suggests some large Hadoop clusters, which are already larger than our assumptions, IDC survey data with respondents averaging 40-50 Hadoop clusters with 500-900 TB of storage.

(iii) We are still at the early stages of this technology. Usage patterns are based from users who have themselves limited experience with Hadoop in production (rather than test and experimentation) deployment (see figure).

Figure 24: Hadoop usage in production deployment

N= 137 Source: International Data Corp. 2016

Gross margin We assume a flat gross margin through our projections period.

As a caution, we note that Atlassian’s latest final results (to 30 June 2016) featured an 83% gross margin – for a company posting US$457m revenue, of which subscription was US$146.7m, maintenance US$218.8m, perpetual licence US$65.5m and other at US$26m. Increased hosting fees are a common reason for high cost of sales for SaaS application companies.

Page 49 Wandisco 23 February 2017

The staff-adjusted cost model As in the mainstay of technology companies, while profit is seen as a function of Sales & Marketing, Administrative and R&D costs, in our view the most important cost is the ‘people’. Salaries were 58% of the operating cost in 2015A. We attempt to capture this dynamic in our cost model where we take a ‘bottom up’ approach. Here, we use the reported cost of staff and project forward on the basis of: (1) wage inflation/deflation; and (2) assumed headcount numbers. Our assumptions are outlined in the table below and include:

. Group average headcount reduces from 159 in 2015A to 134 in 2016E.

. We have factored in an average cost/employee increase by 8.0% in 2016E. This follows the observed 7.2% in 2015A. Both of these reflect the blended salary rates and the impact of (cheaper) new joiners. Our 2016E average cost/staff is ahead of the current UK industry average, c3.3% – but given the US West coast location, we need to monitor any local ‘war for talent’.

. Our assumptions are outlined in the table (see below):

Figure 25: Staff-adjusted cost model (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Software development 103 92 83 70 70 71 71 Selling & Distribution 48 49 34 25 27 29 31 Administration 14 18 17 13 11 8 6 Average headcount 165 159 134 110 108 108 108 Growth in staff (%) 32% -4% -16% -18% -2% 0% 0% Employee productivity Revenue/employee (US$000) 67.99 69.14 100.82 136.96 170.23 223.84 262.38 EBITDA/employee (US$000) -108.33 -100.55 -56.99 -28.85 -5.26 37.21 66.24 Employee cost Staff costs (US$m) -13.98 -14.44 -13.14 -11.33 -11.68 -12.26 -12.87 Cost/employee (US$000) -84.72 -90.79 -98.06 -102.96 -108.11 -113.51 -119.19 Growth in cost/employee (%) 9% 7.2% 8.0% 5.0% 5.0% 5.0% 5.0%

Source: Company data, Stifel estimates

Our adjustments We have adjusted mainly for share-based payments, exceptionals and intangible amortisation. Capitalising R&D is allowable under IFRS (IAS38), but irks old school SSAP fans. Clearly, this approach flatters the statutory profit results. The increase in intangible assets is a function of the accumulated amortisation.

Capitalised R&D costs include staff costs, contractor and consultancy costs and software costs which are directly attributable to the development of the software. Intangible fixed assets (capitalised salaries and capitalised software costs) are amortised over five years; this is in line with historical rates used. Amortisation is first charged in the financial year following the year of capitalisation. Our assumptions are:

Figure 26: WANdisco R&D assumptions (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Staff 103 92 83 70 70 71 71 Proportion of staff engaged in software development (%) 62% 58% 62% 65% 65% 66% 66% Staff salary rate (US$000) 84.72 108.95 117.67 123.55 129.73 136.21 143.02 Spend related SW development (US$m) 8.73 10.02 9.77 8.65 9.08 9.67 10.15 R&D staff 58 60 54 46 46 46 46 R&D staff / total Software developers (%) 56% 65% 65% 65% 65% 65% 65% Development spend 4.88 6.53 6.35 5.62 5.90 6.29 6.60

Source: Company data, Stifel estimates

Page 50 Wandisco 23 February 2017

Dividend policy We do not expect a dividend through our projections period.

Cash and collection WANdisco has minimal working capital requirements, and a lean capex. We assume working capital inflow of US$3.38m in 2016E, from US$1.7m in 2015A. The key driver behind our working capital assumption is our core debtor day assumption. To arrive at the 2015A reported debtor days (223 days), we divided the US$6.73m receivables in 2015A by revenue and multiplied by 365 days. For 2016E, we assume 188 days and multiply by revenue. We acknowledge that this is a clunky analytical tool, however it is transparent. Actual DSOs will be 66-70.

The trade creditor figure, US$3.09m in 2016E and using the same methodology, this translates to creditor days rising from 90 days in 2015A to an assumed 100 days for 2016E.

Figure 27: WANdisco debtors and creditors (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Trade debtors 6.45 6.73 5.81 6.00 5.80 6.86 7.52 Trade creditors 3.20 2.71 3.09 3.72 4.04 4.57 5.42 Trade Debtors/Turnover 0.25 0.25 0.25 0.25 0.25 0.25 0.25 Debtor days 210 223 188 145 115 105 97 Creditor days 104 90 100 90 80 70 70

Source: Company data, Stifel estimates

The cash Our model indicates that WANdisco has adequate cash reserves to carry it through to cash generation. The latest trading update headlined with FY cash at US$7.6m – a surprise, to the tune of US$1m, for us. The US$200k outflow was a second positive surprise. We currently model the burn rate at US$6.25m. We await publication of final results before we will be able to understand where the provenance of that extra US$1m and the progress with the cash burn and to better understand to what extent they are repeatable into future periods.

Figure 28: WANdisco net cash (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Operating cash flow -12.41 -17.04 0.36 0.22 5.19 7.25 7.81 Free cash flow -20.91 -26.14 -5.48 -4.34 -0.15 1.78 1.92 Net cash flow -23.01 -26.08 -5.48 -4.34 -0.15 1.78 1.92 Increase / (decrease) cash -23.19 0.07 4.99 -4.34 -0.15 1.78 1.92 Closing cash / (debt) 2.48 2.56 7.55 3.21 3.06 4.84 6.76

Source: Company data, Stifel estimates

Page 51 Wandisco 23 February 2017

Figure 29: WANdisco cash burn assumptions (US$m) Current staff 134 Cost/head (US$'000) 103.0 Annual cost (US$m) 13.8 Average quarterly salary cost (US$m) 3.4 Other direct costs 2.80 Annual costs 25.0 Quarterly cost (US$m) R&D 2.00 S&M 2.13 G&A 2.13 Total quarterly costs 6.25 Other direct costs (quarterly basis) Travel 570 Marketing 500 IT 450 Facilities 350 Legal/Audit/Prof 530 Other 400 Total Other Direct costs 2.80 Quarterly salary cost 3.45 Total quarterly costs 6.25

Source: Stifel estimates

Page 52 Wandisco 23 February 2017

Target price and valuation

Our 12-month target price share is 509p (implying 34% upside potential) WANdisco offers investors exposure to a business enjoying rigorous growth in a global target market, and to perhaps the ‘noisiest’ theme in IT currently – Big Data. We see abundant evidence that the company is now positioned to accelerate growth that should create further value for its shareholders. Our blended valuation model (taking in DCF US706cents, sum-of-the-parts US538cents and FCF yield US665cents) leads us to a 12-month share price target on WANdisco of 509p, implying 34% upside potential.

Investment rationale . Strong growth opportunity – growth is currently accelerating.

. An established business serving global Tier 1 customers. This is not a blue-sky business.

. Robust business model. The subscription revenue model gives visibility and helps to de-risk forecast sales.

. The offer. In our view, WANdisco technology has a competitive lead in an established growth market. This competitive lead is based on very strong technology building blocks.

. WANdisco is at an inflection point. We think WANdisco is stepping into an inflection point. We understand that sales activity has accelerated since mid-2016.

. Attractive cash flow profile. The company has minimal working capital requirements. That said, as WANdisco delivers closer to its target operational model, it should generate operating cash ahead of operating profit.

In addition, we believe that the stock market has overlooked the following points:

. Notable changes in the engineering team as WANdisco has refined its core offer. The Fusion product has been re-architected so as not to have the same problems as its predecessor.

. Greater clarity and coherency as to the use case, so conversations with sales people and prospects are more productive.

. An established blended sales model now, with ‘big hitter’ endorsements from the likes of IBM, Oracle and Amazon.

. There has been a step-change in terms of cash management – WANdisco was cash-neutral in Q3, and in Q4 there was a minimal US$200k outflow.

. The culture which must have been smashed to smithereens is now more positive, and more illustrative of a company ‘on the turn’. Mr Richards is now visibly doing more to engage with all investors. There is a new experienced CFO who is well-versed in the dynamics of smaller growing global software companies.

. The implementation team is busy-busy-busy and has increased work rate on converting on a number of POCs (Proof of Concept).

. Channel partners are bullish about the prospects about sales pipeline.

Page 53 Wandisco 23 February 2017

. There is a general acknowledgement that Big Data has not happened as quickly as first expected (where is the US$1bn Hadoop company? quipped one source) and many observers frame its development relative to the Gartner Hype Cycle. Nonetheless there seems to be a general feeling that market participants are now able to better wrap their arms around the market and have a better ‘fix’ on the compass headings.

Figure 30: WANdisco: Valuation overview (x) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E EV/Bookings 7.78 15.03 8.40 6.45 4.71 3.26 2.62 EV/Sales (x) 12.06 12.30 11.55 8.92 7.31 5.58 4.63 EV/EBITDA (x) -7.57 -8.46 -16.03 -34.80 -192.06 37.74 18.93 EV/EBIT (x) -3.52 -4.52 -6.68 -8.22 -9.66 -12.64 -16.43 EV/FCF (x) -6.47 -5.18 -23.76 -31.01 -894.64 74.70 68.25 EV/IC (x) 9.65 19.05 13.66 15.54 13.56 15.11 23.75 P/E (x) -459.93 -541.98 -889.81 -1233.43 -1536.69 -2331.84 -3875.29 PEG (x) na 35.80 22.76 44.27 77.87 68.38 97.30 Free cash flow -18.3% -19.1% -3.6% -2.6% -0.1% 1.1% 1.2% yield (%) Dividend yield (%) na na na na na na na EV/NOPAT (x) -3.53 -4.52 -8.84 -8.54 -10.12 -13.49 -18.01

Note: Priced 22 February 2017 Source: Company data, Stifel estimates

How do we value WANdisco? . Share price target. We use a blended model to arrive at our 12-month share price target of 509p, using discounted cash flow, sum-of-the-parts and free cash flow yield. While WANdisco has a number of adjacent growth opportunities, we believe the ‘cash generation’ bias in our valuation methodology reflects how investors see the benefits of subscription-based business models.

. Risks to share price target. In addition to general and macroeconomic risks, the downside risks include continued deceleration in the source code and Big Data markets. This would impact cash inflow and thereby increase cash outflow and lessen investor interest. Upside risks include a better-than-expected revenue growth, possibly as a consequence of the channel partner sales accelerating faster than anticipated.

The valuation read-across from WANdisco’s listed peer group WANdisco has no operational peers listed on the LSE. That said, we acknowledge that there are a couple of Big Data analytics companies, Fusionex and First Derivatives, and LSE-listed Micro Focus is a competitor in the Source Code Management/ALM market. Hence, we position WANdisco among an international set of peers so that investors are better able to tease out the salient valuation points. We have included WANdisco in our Big Data and Systems Software (the closest proxy for the ALM market participation) (see tables below).

We used Bloomberg consensus forecasts to establish the valuation for the peer group companies. We looked at revenue growth, EBITDA growth and EV/EBITDA margin, FCF yield, P/E, and EV/Sales. We make the following observations:

. WANdisco has slightly better revenue growth. Generally, the segment sports a low level of profitability with companies also loss-making.

. Reviewing the profitability-based ratios tell us little. That said, a PEG shows more, and this is frequently used for high-growth companies. Indeed, if we use FY2 EBITDA growth as the basis of a PEG formula, we arrive at 0.8x for WANdisco. Applying that methodology across our segments shows that the Big Data segment is, at 0.6x, on one of the cheapest PEG ratings across the coverage universe, where the global average is 1.5x.

Page 54 Wandisco 23 February 2017

. There is a large valuation gap with the System Software segment, which is characterised by slow growth but decent levels of cash generation, while the stock market is ascribing premium valuations to the Big Data companies.

. Within the Big Data segment, there are some large valuation discrepancies. Note the valuation ‘blue water’ between the likes of Splunk and Atlassian and the majority of companies.

Figure 31: KPIs for Big Data companies EV Share price Revenue growth EBITDA growth EBITDA margin Operating leverage Company (£m) (p) FY1 FY2 FY1 FY2 FY1 FY2 WANdisco WAND LN 174.6 377.5 2.8% 30.7% -55.6% -41.3% -78.7% -35.3% -1.3 Fusionex FXI LN 308.5 137.0 17.8% 46.0% -79.5% 66.5% 7.5% 8.5% 1.4 First Derivatives CRW LN 359.4 1215.0 15.6% 11.6% 12.0% 10.9% 31.3% 31.1% 0.9 Tableau * DATA US 3249.0 54.6 6.4% 8.1% -158.3% 46.1% 6.4% 8.6% 5.7 Splunk ** SPLK US 7853.7 65.5 40.5% 25.9% -130.6% 60.9% 8.7% 11.1% 2.3 Hortonworks HDP US 551.1 10.5 28.8% 26.6% -98.8% -2983.7% -1.3% 29.0% -112.3 Talend TLND US 754.1 26.8 39.5% 34.7% 15.6% 9.5% -21.8% -17.7% 0.3 Attunity ATTU US 135.8 8.8 15.2% na -179.1% na 10.7% na na Atlassian TEAM US 5823.1 29.8 34.3% 29.1% 715.7% 33.5% 21.3% 22.1% 1.2 New Relic NEWR US 1656.6 35.1 44.0% 28.8% -82.6% -165.2% -3.5% 1.8% -5.7 Average 24.5% 26.8% -4.1% -329.2% -1.9% 6.6% -11.9

* Followed by our colleague Tom Roderick ** Followed by our colleague Brad Reback Note: Priced 22 February 2017 Source: Bloomberg data

Figure 32: Valuation ratios for Big Data companies EV Share price EV/EBITDA (x) EV/Sales (x) PE (x) EV/FCF (%) PEG Company (£m) (p) LTM FY1 FY2 LTM FY1 FY2 LTM FY1 FY2 LTM FY1 FY2 (x) WANdisco WAND LN 174.6 377.5 -8.7 -19.6 -33.5 15.9 15.5 11.8 -3.6 -6.7 na -10.1% na na 0.8 Fusionex FXI LN 308.5 137.0 9.3 45.4 27.3 4.0 3.4 2.3 2.4 -10.5 na 2.1% na na 0.4 First Derivatives CRW LN 359.4 1215.0 22.3 19.9 18.0 7.2 6.2 5.6 30.8 26.2 23.7 4.2% na na 1.6 Tableau * DATA US 3249.0 54.6 -33.6 57.7 39.5 3.9 3.7 3.4 -0.3 136.6 3.8 3.5% 2.5% 2.8% 0.9 Splunk ** SPLK US 7853.7 65.5 -29.4 96.1 59.7 11.7 8.4 6.6 -29.8 na 1.2 1.3% 1.8% 2.9% 1.0 Hortonworks HDP US 551.1 10.5 -2.3 -182.5 6.3 3.0 2.3 1.8 -2.4 -5.8 -0.1 na -5.8% 1.5% 0.0 Talend TLND US 754.1 26.8 -37.8 -32.6 -29.8 9.9 7.1 5.3 na -32.5 -0.4 -1.4% 0.3% 0.3% -3.1 Attunity ATTU US 135.8 8.8 -16.0 20.2 na 2.5 2.2 na na 53.3 na -0.9% na na na Atlassian TEAM US 5823.1 29.8 362.9 44.5 33.3 12.7 9.5 7.4 na 91.0 0.6 1.6% 2.8% 3.7% na New Relic NEWR US 1656.6 35.1 -31.5 -181.1 277.6 9.1 6.3 4.9 na -66.6 -1.3 -0.9% -0.9% 0.4% -1.7 Average 23.5 -13.2 44.3 8.0 6.5 5.5 -0.5 20.5 3.9 -0.1% 0.1% 1.9% 0.0

* Followed by our colleague Tom Roderick ** Followed by our colleague Brad Reback Note: Priced 2215 February 2017 Source: Bloomberg data

Figure 33: Valuation ratios for System Software companies EV Share price EV/EBITDA (x) EV/Sales (x) PE (x) EV/FCF (%) Div yield Company (£m) (p) LTM FY1 FY2 LTM FY1 FY2 LTM FY1 FY2 LTM FY1 FY2 FY1 FY2 Micro Focus MCRO LN 7915.911 2205 15.5 12.0 9.8 6.4 5.8 3.6 29.6 12.6 12.1 3.5% 2.6% 4.2% 3.7% 3.9% CA CA US 12576.58 32.19 8.4 8.6 8.6 3.1 3.2 3.2 17.7 13.4 13.2 7.7% 7.6% 7.6% na na IBM * IBM US 206053.3 181.17 13.0 10.6 10.6 2.6 2.6 2.6 14.6 13.1 12.8 6.2% 5.7% 5.8% 3.2% 3.3% Oracle ** ORCL US 170095.1 42.315 11.3 10.2 9.7 4.6 4.5 4.4 20.1 16.5 15.1 7.3% 6.8% 7.4% 1.4% 1.4% SAP ** SAP US 113010.2 93.8199 17.7 14.0 13.1 5.1 4.8 4.4 na 21.8 19.9 3.2% 3.7% 4.0% 1.3% 1.4% Software AG SOW GY 2655.263 34.7 10.2 9.5 9.1 3.0 3.0 2.9 na 14.6 13.9 7.2% 6.7% 7.0% 1.8% 1.9% Average 12.7 10.8 10.1 4.1 4.0 3.5 20.5 15.3 14.5 5.9% 5.5% 6.0% 2.3% 2.4%

*Followed by our colleague David Grossman, **Followed by our colleague Brad Reback Note: Priced 2215 February 2017 Source: Bloomberg data

Page 55 Wandisco 23 February 2017

Valuation range . DCF valuation: US706cents. Our cost of capital assumptions include a WACC of 9.32% and a 3% terminal growth rate – at a zero terminal growth, the implied valuation would be US493cents. Our WACC assumptions include a beta of 1.2, risk-free rate of 2.5x (ahead of current UK 10-year government gilts at 1.5%) and equity risk of 5.7x (we calculated this by the inverse equity method using the sector P/E of 17.6x). We are not surprised at the DCF valuation, given the strong cash generation as evidenced with the latest FY trading update coupled with our expectation of an increasingly attractive EBITDA margin at WANdisco – i.e. we project a 35% margin at the end of our estimates period, which would be analogous to (say) the SUSE division of Micro Focus. (Our DCF sensitivity table appears below.)

Figure 34: DCF sensitivity analysis (UScents) Terminal growth (%) Cost of equity 1.0 2.0 3.0 4.0 5.0 8.5 549 617 710 844 1054 9.32 545 613 706 840 1051 10 542 610 703 837 1048 11 538 606 699 833 1044

Note: Priced 22 February 2017 Source: Stifel estimates

. SOTP valuation: US538cents. Here, we disentangled the divisions (ALM and Big Data) and applied standard multipliers, based on our latest sector valuation results. We arrive at a value of US538cents. We also assumed a 15% corporate overhead, which is a common aspect of sum-of- the-parts valuation models.

. We highlight the sum-of-the-parts valuation because we believe that we should see further consolidation in 2017 in both of WANdisco’s operational segments, Source Code Management/ ALM and Data Storage. Investors have seen Micro Focus become a consolidator on a global scale. We are also conscious that a trade buyer will value WANdisco from the perspective of their own channel pipeline. In this regard, consolidators like IBM will pay a premium to the market value simply because it reflects their ability to ‘squeeze’ greater sales volume out of the product.

Figure 35: Sum-of-the-parts (US$m) Line of business Bookings 2017E (US$m) Multiplier Value (US$m) ALM 9.8 3.9 38.4 Big Data 11.0 6.5 78.4 Group 20.9 5.6 116.7 Cash ($m) 7.5 Group value (US$m) 202.6 Shares (m) 32.0 Per share (UScents) 633 Corporate overhead 15% ‘Clean' value/share (UScents) 538

Note: Priced 22 February 2017 Source: Stifel estimates

. Free cash flow yield US665cents: We have observed the FY2 free cash flow yield for our tech company universe is currently 4.2% for FY2 (see table below). We also acknowledge that early- stage companies will typically attract a lower yield, so it expands through the company lifecycle. To reflect our thinking, we have averaged WANdisco’s FCF from 2018 to 2020 to arrive at US$2.13m. If we apply a 1% FCF to this, we arrive at a US665cents valuation.

Page 56 Wandisco 23 February 2017

Figure 36: FCF at different yield rates (UScents) FCF (US$m) 2.13 FCF yield 0.5% 1.0% 1.5% 2.5% 3.0% 3.4% 4.0% 4.5% Market cap (US$m) 425.8 212.9 141.9 85.2 71.0 62.6 53.2 47.3 Shares 32.00 Price/share (UScents) 1,331 665 444 266 222 196 166 148

Note: Priced 22 February 2017 Source: Stifel estimates

Figure 37: Technology sector average FCF yield (%) EV/FCF (%) Segment LTM FY1 FY2 UK SaaS -0.8% 1.8% 3.6% US SaaS -1.4% -1.0% 1.6% Consumer tech 2.0% 3.0% 5.4% E-commerce -0.1% 2.0% 0.4% Accounting/Financial 3.2% 3.7% 4.5% Security 2.8% 3.2% 5.0% Global IT Consultancy 5.3% 5.5% 6.2% UK IT Consultancy 3.4% 2.7% 4.4% Big Data -2.4% 0.8% 1.5% Data Governance 3.9% 3.8% 5.9% UK FinTech 5.6% -3.0% 3.5% Global FinTech 2.5% 1.8% 2.6% Mobile/Test & Measurement 3.8% 4.0% 5.1% Systems software average 6.0% 5.8% 6.3% ERP 5.3% 5.3% 6.5% FANG 2.2% 2.4% 3.5% Collaboration software -0.8% -1.0% 2.3% Payments 2.8% 1.2% 0.8% Average 3.5% 3.4% 4.2%

Note: Priced 22 February 2017 Source: Bloomberg data

Page 57 Wandisco 23 February 2017

Figure 38: Profit & loss (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Revenue by division ALM 10.4 9.2 8.6 10.1 9.3 8.2 8.1 Fusion/Big Data 0.8 1.8 2.7 5.0 9.1 15.6 20.1 Total Revenue 11.2 11.0 11.3 15.1 18.4 23.8 28.3

Cost of sales -2.17 -0.75 -0.79 -1.06 -1.29 -1.67 -1.98 Gross profit 9.05 10.25 10.49 14.04 17.13 22.16 26.30 Gross margin (%) 81% 93% 93% 93% 93% 93% 93% Reported EBIT pre exceptional -38.476 -29.915 -19.49 -16.368 -13.951 -10.524 -7.977 Our adjustments: Share based payments 11.907 4.057 2.03 2.231 2.454 2.700 2.970 Acquisition related items 0.145 0.000 0.000 0.000 0.000 0.000 0.000 Depreciation & Amortisation 8.550 9.870 9.34 10.270 10.795 11.348 11.930 Adj EBITDA pre exceptional -17.874 -15.988 -8.128 -3.867 -0.702 3.523 6.923 Margin (%) -159% -145% -72% -26% -4% 15% 24% Exceptional costs -1.586 -0.614 4.380 0.000 0.000 0.000 0.000 Adj EBITDA post exceptional -19.605 -16.602 -3.75 -3.867 -0.702 3.523 6.923 EBITDA margin (%) -175% -151% -33% -26% -4% 15% 24%

Depreciation -0.267 -0.270 -0.22 -0.238 -0.261 -0.287 -0.316 Adj EBITA -18.141 -16.258 -8.344 -4.105 -0.963 3.236 6.607 Net Interest 0.56 -0.51 -0.20 0.000 0.000 0.000 0.000 Adjusted pre tax profit -17.584 -16.764 -8.54 -4.105 -0.963 3.236 6.607 Exceptional items -1.586 -0.614 4.38 0.000 0.000 0.000 0.000 Share based payments -11.907 -4.057 -2.029 -2.231 -2.454 -2.700 -2.970 Intangible amortisation -8.283 -9.600 -9.120 -10.032 -10.534 -11.060 -11.613 Pre-tax profit post exceptional -39.360 -31.035 -15.312 -16.368 -13.951 -10.524 -7.977

Taxation 1.05 1.13 0.575 0.604 0.634 0.666 0.699 Profit after tax -38.307 -29.906 -14.737 -15.764 -13.317 -9.858 -7.277 Other income -0.444 0.055 0.058 0.061 0.064 0.067 0.070 Dividends 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Reported retained profit post exceptional -38.751 -29.851 -14.679 -15.704 -13.253 -9.791 -7.207 Weighted average basic shares (m) 24.02 26.78 32.00 35.13 35.13 35.13 35.13 Weighted average fully diluted shares (m) 24.02 28.78 32.00 35.13 35.13 35.13 35.13

Diluted adjusted EPS (US$) -1.03 -0.88 -0.53 -0.39 -0.31 -0.20 -0.12 Diluted reported EPS (US$) -117.89 -67.48 -30.57 -15.64 -7.18 4.24 13.24 DPS (US$) 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Source: Company data, Stifel estimates

Page 58 Wandisco 23 February 2017

Figure 39: Cash flow (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Operating Profit -38.31 -29.91 -14.74 -15.76 -13.32 -9.86 -7.28 Depreciation & Amortisation 8.55 9.87 9.34 10.27 10.79 11.35 11.93 Share based payments 13.35 4.67 2.03 2.23 2.45 2.70 2.97 Increase/(decrease) creditors/payables 0.74 -0.43 0.38 0.63 0.32 0.53 0.85 Increase/(decrease) deferred income 6.15 -1.51 2.09 3.04 4.75 3.58 0.00 (Increase)/decrease debtors/receivables -2.94 0.28 0.92 -0.19 0.19 -1.05 -0.66 (Increase)/decrease in gov grant -0.15 -0.05 0.00 0.00 0.00 0.00 0.00 Working capital 3.80 -1.71 3.38 3.48 5.26 3.06 0.19 Other 0.21 0.04 0.35 0.00 0.00 0.00 0.00 Operating cash flow -12.41 -17.04 0.36 0.22 5.19 7.25 7.81 Net interest -0.04 -0.06 0.18 0.16 0.15 0.13 0.12 Taxation 1.06 -0.58 0.40 0.42 0.44 0.47 0.49 Net capex -0.48 -0.10 -0.08 -0.08 -0.09 -0.09 -0.10 Development -9.04 -8.37 -6.35 -5.06 -5.84 -5.97 -6.40 Free cash flow -20.91 -26.14 -5.48 -4.34 -0.15 1.78 1.92

Dividends 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Acquisitions -2.10 0.00 0.00 0.00 0.00 0.00 0.00 Other 0.00 0.06 0.00 0.00 0.00 0.00 0.00 Net cash flow -23.01 -26.08 -5.48 -4.34 -0.15 1.78 1.92

Shares issued (net) 0.44 26.17 14.30 0.00 0.00 0.00 0.00 Cash/(debt) acquired 0.00 0.00 -3.83 0.00 0.00 0.00 0.00 Currency effects -0.62 -0.02 0.00 0.00 0.00 0.00 0.00 Increase/(decrease) cash -23.19 0.07 4.99 -4.34 -0.15 1.78 1.92

Opening cash/(debt) 25.67 2.48 2.56 7.55 3.21 3.06 4.84 Closing cash/(debt) 2.48 2.56 7.55 3.21 3.06 4.84 6.76

Source: Company data, Stifel estimates

Page 59 Wandisco 23 February 2017

Figure 40: Balance sheet (US$m) Year to 31 December 2014A 2015A 2016E 2017E 2018E 2019E 2020E Fixed Assets P. P & Equipment 0.41 0.23 0.23 0.21 0.18 0.19 0.17 Intangible assets 9.81 8.58 5.92 4.98 4.51 5.55 5.15 Total fixed assets 10.22 8.81 6.15 5.18 4.68 5.74 5.33

Current Assets (Receivables) Cash at hand and in bank 2.48 2.56 7.55 3.21 3.06 4.84 6.76 Trade debtors 6.45 6.73 5.81 6.00 5.80 6.86 7.52 Other receivables & prepayments 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Corp tax credit receivable 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Total current assets 8.93 9.28 13.36 9.20 8.86 11.69 14.27 Total assets 19.15 18.10 19.51 14.39 13.54 17.43 19.60

Current liabilities (Payables) Short-term debt 0.01 0.00 0.00 0.00 0.00 0.00 0.00 Trade creditors 3.20 2.71 3.09 3.71 4.04 4.57 5.42 Deferred income 6.08 6.06 8.15 11.19 15.94 19.52 19.52 Current tax liabilities 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Deferred government grant 0.08 0.03 0.03 0.03 0.04 0.04 0.05 Total current liabilities 9.36 8.80 11.27 14.95 20.02 24.13 24.99

Net Current Assets -0.43 0.48 2.08 -5.74 -11.16 -12.44 -10.72 Total Assets-Current Liabilities 9.79 9.29 8.24 -0.56 -6.47 -6.70 -5.39

Non-current liabilities Deferred income 5.19 3.70 6.36 9.44 14.36 22.89 30.08 Deferred tax liability 0.01 0.01 0.01 0.10 0.11 0.12 0.13 Retirement benefit obligations 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Total non-current liabilities 5.19 3.70 6.36 9.54 14.47 23.01 30.22 Total Liabilities 14.56 12.50 17.64 24.48 34.48 47.14 55.21 Net assets 4.60 5.59 1.87 -10.09 -20.94 -29.71 -35.61

Shareholders’ Funds Called-up share capital 3.88 4.67 4.67 4.67 4.67 4.67 4.67 Share premium 56.59 81.97 96.27 96.27 96.27 96.27 96.27 Translation reserve -0.30 -0.25 -3.59 0.15 2.56 3.58 4.89 Other reserves 1.25 1.25 1.25 1.25 1.25 1.25 1.25 Retained earnings -56.81 -82.05 -96.73 -112.43 -125.68 -135.48 -142.68 Shareholders’ funds 4.60 5.59 1.87 -10.09 -20.94 -29.71 -35.60 Equity and liabilities 19.15 18.10 19.51 14.39 13.55 17.44 19.60

Source: Company data, Stifel estimates

Page 60 Wandisco 23 February 2017

Appendix I: Citations

In this report we have mentioned a number of companies which are followed by our colleagues across the wider Stifel technology team. Please see their contact details below along with their current opinions.

Figure 41: Stifel coverage Company Code Price Rating Analyst Contact Tel Global Payments GPN 78.70 Buy Chris Brendler [email protected] 00 1 (443) 224-1303 IBM IBM 181.15 Buy David Grossman [email protected] 00 1 (415) 364-2541 Intuit INTU 119.43 Hold Brad Reback [email protected] 00 1 (404) 869-8051 Microsoft MSFT 64.36 Buy Brad Reback [email protected] 00 1 (404) 869-8051 Oracle ORCL 42.51 Buy Brad Reback [email protected] 00 1 (404) 869-8051 PayPal PYPL 42.42 Buy Scott Devitt [email protected] 00 1 (212) 271-3765 Salesforce.com CRM 82.08 Buy Tom Roderick [email protected] 00 1 (312) 564-8701 SAP SAP 94.01 Hold Brad Reback [email protected] 00 1 (404) 869-8051 Spirent SPT 105.00 Buy Lee Simpson [email protected] +44 (0) 20 7710 7652 Splunk SPLK 64.98 Hold Brad Reback [email protected] 00 1 (404) 869-8051 Square SQ 15.04 Buy Scott Devitt [email protected] 00 1 (212) 271-3765 Tableau DATA 54.03 Buy Tom Roderick [email protected] 00 1 (312) 564-8701 WorldPay WPG 267.16 Buy Chris Brendler [email protected] 00 1 (443) 224-1303

Source: Stifel Research

Page 61 Wandisco 23 February 2017

Appendix II: The Paxos algorithm

Paxos solves the problem of achieving consensus in a computer network whereby consensus is the process of agreeing on one result among a group of participants. Paxos achieves consensus by being able to maintain one-copy equivalence based on quorum agreement, i.e. how to get multiple data sources to agree to any new transactions that are proposed at any one of them. With a quorum those participating Hadoop clusters, databases, or whatever, agree to a transaction, then the transaction is written by all of them, at the same time. The consensus protocol approach has become the basis for the state machine replication approach to distributed computing, as suggested by Dr Leslie Lamport. The Paxos protocol was first published in 1989, and was named after a fictional legislative consensus system used on the Greek island of Paxos, and was later published as a journal article in 1998.

Paxos is usually used where durability is required (e.g. to replicate a file or a database), in which the amount of durable state could be large. The protocol attempts to make progress even during periods when some bounded number of replicas are unresponsive.

How does it work?

In a distributed system using Paxos, all of the servers in the system function as peers to deliver a cooperative approach to transaction management to ensure the same transaction order at every server. Each server can function in any of the three following roles: (1) proposer, (2) acceptor, and (3) learner. There are three phases in the Paxos algorithm, which can be repeated during the process of reaching consensus: (1) election of a server to be the coordinator or proposer; (2) broadcast of the transaction proposal to its peers that then assume the role of “learners” who either accept or reject the proposal; and (3) acceptance, once a majority of the servers acknowledge the proposer and accept its proposal allowing consensus to be reached. The server that assumes the role of coordinator then broadcasts a ‘commit’ message to notify all of its peers to proceed with the transaction, and making the order permanent.

When a server issues a proposal, it generates a sequence number for the proposal with a value higher than the last one of which it is aware, and broadcasts it to the other servers. If a majority of the other servers replying indicate that they have not seen a higher sequence number, the server is then allowed to act as coordinator, or leader, for this proposal. At this point, the other coordinators cannot proceed until consensus is reached on the current proposal.

In most deployments of Paxos, each participating process acts in three roles: Proposer, Acceptor and Learner. The Proposer (the leader) creates a proposal identified with a number N. This number must be greater than any previous proposal number used by this Proposer. Then, it sends a Prepare message containing this proposal to a Quorum of Acceptors. The Proposer decides who is in the Quorum.

Production uses of Paxos

. Google uses the Paxos algorithm in its Chubby distributed lock service in order to keep replicas consistent in case of failure. Chubby is used by BigTable, which is now in production in Google Analytics and other products.

. Google Spanner and Megastore use the Paxos algorithm internally.

. The OpenReplica replication service uses Paxos to maintain replicas for an open access system that enables users to create fault-tolerant objects. It provides high performance through concurrent rounds and flexibility through dynamic membership changes.

. Microsoft uses Paxos in the Autopilot cluster management service from Bing.

Page 62 Wandisco 23 February 2017

. WANdisco has implemented Paxos within its DConE active-active replication technology.

. XtreemFS uses a Paxos-based lease negotiation algorithm for fault-tolerant and consistent replication of file data and metadata.

. Heroku uses Doozerd which implements Paxos for its consistent distributed data store.

. Ceph uses Paxos as part of the monitor processes to agree which OSDs are up and in the cluster.

. Apache Mesos uses Paxos algorithm for its replicated log coordination.

. Windows Fabric used by many of the Azure services makes use of the Paxos algorithm for replication between nodes in a cluster.

. Oracle NoSQL Database leverages Paxos-based automated failover election process in the event of a master replica node failure to minimise downtime.

Page 63 Wandisco 23 February 2017

Appendix III: Hadoop – V1 to V2

Hadoop underpins most Big Data architectures. There was a large upgrade in 2014 which migrated V1 characterised by batch-oriented MapReduce jobs to a more transactional model Hadoop V2. Hadoop V1 popularised MapReduce programming for batch jobs, and demonstrated the potential value of large scale, distributed processing. But MapReduce was not suitable for interactive analysis, and constrained in support for graph, machine learning and on other memory intensive algorithms. It was these areas that captured much attention. Two of the most important advances in Hadoop V2 were HDFS federation and the resource manager YARN. Hadoop V2 has introduced a new processing model that lent itself to common Big Data use cases including interactive SQL over Big Data, machine learning at scale, and the ability to analyse Big Data scale graphs.

HDFS HDFS is the Hadoop file system, and comprises two major components: namespaces and blocks storage service. The namespace service manages operations on files and directories, while the block storage service implements data node cluster management, block operations and replication. In Hadoop V1, a single NameNode managed the entire namespace for a Hadoop cluster. With HDFS federation, multiple NameNode servers manage namespaces and this allows for horizontal scaling, performance improvements and multiple namespaces.

YARN YARN is a resource manager that was created by separating the processing engine and resource management capabilities of MapReduce. YARN is often called the Hadoop operating system because it is responsible for managing and monitoring workloads, maintaining a multi-tenant environment, implementing security controls, and managing high-availability features of Hadoop. Like an operating system on a server, YARN is designed to allow multiple, diverse user applications to run on a multi- tenant platform. YARN supports multiple processing models in addition to MapReduce.

Hive Hive is the most popular SQL-in-Hadoop option and the Hadoop community has invested heavily in making Hive faster, more scalable, and supportive of more SQL operations. This work has been done under the Stinger initiative. The first phase of the Stinger initiative realised 35x-45x speed improvements over prior versions of Hive and added new SQL functions. With Hadoop V2, the Stinger initiative released an initial version of a vectorised query engine that leads to an addition 5x-10x speed increase, improved VARCHAR and DATE types and improvements to the query optimiser.

Spark Spark is an Apache project that uses memory to improve the speed of large-scale data analysis programs. Programs can be written in Java, Scala or Python. Spark is a data analysis platform for building specialised Big Data tools such as Shark (a SQL engine), Spark Streaming for real-time data processing, MLib for machine learning and GraphX for graph processing.

Figure 42: Hadoop V1 and V2

Source: ITPro

Page 64 Wandisco 23 February 2017

Jargon buster

Phrase/word/acronym Definition Law used to find the maximum expected improvement to an overall system when only part of the system is improved. It is Amdahl's Law often used in parallel computing to predict the theoretical maximum speedup using multiple processors. Apache Software Foundation The association of software engineers which provides support for the Apache community of Open Source software products. Application Lifecycle Application Lifecycle Management, being the continuous process of managing the life of an application through governance, Management, ALM development and maintenance. Application Programme The API is a set of rules that software programmes follow to communicate with each other and therefore serves as an Interface, API interface between different software programmes to enable them to communicate and share data. Average revenue per user (sometimes known as average revenue per unit), usually abbreviated to ARPU, is a measure used ARPU mainly by consumer communications and networking companies, defined as total revenue divided by number of subscribers. High-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced Big Data decision-making, insight discovery and process optimisation. Binaries Executable (i.e. it does something) software rather than raw source code files. Business intelligence (BI) is the ability of an organisation to collect, maintain, and organise knowledge. This produces large Business Intelligence amounts of information. BI technologies provide historical, current and predictive views of business operations. CIO Chief Information Office (aka IT Manager). A Cloud is a series of remote computers that are accessed over a network, as though they were one computer. A key benefit is the flexibility whereby capacity 'spikes' can be facilitated by provisioning new servers; the Cloud can automatically direct Cloud computing more individual computers to serve pages for the site, and more money is paid for the extra usage. COBOL Common Business Oriented Language was the first widely-used high-level programming language for business applications. CORBA Common Object Request Broker Architecture. Code drop To deploy new or updated application code. COTS Common-off-the-shelf. Commodity parts (hardware/software) used in a system build, an acceptable approach now as a building blocks and for 'firmware' deliverables. CVS The Open Source Concurrent Versions Systems, being a software version control system. A user interface that organises and presents information in a way that is easy to read. They aim to integrate information from Dashboard multiple components into a unified display. Data scraping, or ram A technique in which a computer program extracts data from human-readable output coming from another program. Data scraping scraping is that the output being scraped was intended for display to an end-user, rather than as input to another program. A database used for reporting and data analysis. It is a central repository of data which is created by integrating data from Data warehouse multiple disparate sources. DConE Distributed Coordination Engine. The fusion of software development and IT operations in order to increase the insertion rate and success of new line of DevOps business applications. (DR) involves a set of policies and procedures to enable the recovery or continuation of IT infrastructure and systems following Disaster recovery a natural or human-induced disaster. Enterprise software Software used in an organisation that is an integral part of a computer information system. Systems which integrate internal and external management information across an entire company embracing ERP finance/accounting, manufacturing, sales and service, customer relationship management, etc. Short for extract, transform, load: three database functions that pull data out of one database and place it into another 'Extract' ETL reads the data, 'Transform' makes it legible by new database, 'Load' puts it on new database. Enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its Fault-tolerance components. Feature creep, creeping featurism or ‘featuritis’ is the ongoing expansion or addition of new features in a product, such as in computer software. Extra features go beyond the basic function of the product and so can result in software bloat and over- Feature creep complication rather than simple design. The development of competing variations of a software product, occurs when projects splinter into different forms, or R&D is Forking organised into competing teams. FTE Full time equivalent/staff member/payroll jockey. An event, typically lasting several days, in which a large number of people meet to engage in collaborative computer Hackathon programming. An open-source software framework for storage and large-scale processing of datasets on clusters of commodity hardware. Hadoop Hadoop is an Apache top-level project being built and used by a global community of contributors and users. A combination of a private and public Cloud services. The goal is to combine services and data from a variety of Cloud models Hybrid Cloud to create a unified, automated computing environment. An approach to querying data when it resides in a computer’s random access memory (RAM), as opposed to being stored on In-memory analytics physical disks. Jay Z Shawn Corey Carter, aka Jay Z, is an American rapper, businessman, and investor. He is married to Beyonce. LAN Local Area Network. MPP (massively parallel Processing of a programme by multiple processors working on different parts of the programme, with each processor using its processing) own operating system and memory. Mash up An application that uses and combines data presentation or functionality from two or more sources to create new services.

Page 65 Wandisco 23 February 2017

Jargon buster

MongoDB A cross‐platform, document‐oriented database. Classified as a NoSQL database, MongoDB eschews the traditional table‐ based relational database structure in favour of JSON‐like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. An open‐source graph database, implemented in Java. Described by the developers as "embedded, disk‐based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables". Neo4j is the most popular Neo4j graph database. Net promoter score. A management tool that can be used to gauge the loyalty of a firm's customer relationships. It serves as NPS an alternative to traditional customer satisfaction research. OEM Original equipment manufacturer. On-prem/on-premise Software installed 'locally’ (i.e. possibly in the tower under your desk – or the rack in the corner). One-Copy Equivalence All servers have exactly the same copy of data. Open Source Software provided in source code form under a free licence. Petabyte 1 PB = 1000000000000000B = 1015bytes = 1000terabytes. A highly controlled environment not open for public consumption which sits behind a firewall and has a focus on governance, Private Cloud security and compliance. A highly scalable datacentre owned and operated by a third party for use by other companies or individuals. The complexities are masked from the consumer. Public Clouds are viable because they typically manage relatively repetitive or straightforward Public Cloud workloads. The performance of a telephony or computer network, particularly the performance seen by the users of the network. To measure it several related aspects of the network service are considered, such as error rates, bit rate, throughput, Quality of Service transmission delay, availability, jitter, etc. RASS Reliable, Available, Serviceable, Secure – the central tenets of enterprise IT. ‘Read’ is capable of being displayed whereas something changeable is ‘write’. Disks, files and directories are read/write, but Read/write (say) operating systems allow you to protect objects with read-only capabilities so other users cannot modify them. Replication Sharing information. Retention rate The rate at which customers, whose contracts are due for renewal, either renew or extend their contracts. Software as a service, ‘On-demand software’, where the software is hosted centrally in a distant location and is accessed by users using a web SaaS browser. An iterative and incremental agile software development framework for managing product development. It defines "a flexible, holistic product development strategy where a development team works as a unit to reach a common goal, challenges Scrum assumptions of the "traditional, sequential approach" to product development. A wireless device that is configurable – with physical results, e.g. a change in operating frequency – by changing software Software reconfigurable settings, i.e. without intervention. Structured query language, a way of interrogating databases for business analysis, associated heavily with relational SQL databases. A distributed computation framework written predominantly in the Clojure programming language. It uses custom-created STORM spouts and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. System administrator/ Person responsible for the upkeep, configuration, and reliable operation of computer systems; especially multi-user sysadmin computers. The duties generally include the uptime, performance, resources and security of the computers. Systems integrator, SI An organisation that integrates offerings from various software vendors. Computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's Unicode writing systems.

Source: Stifel

Page 66 Wandisco 23 February 2017

Important Disclosures and Certifications I, George O’Connor, certify that the views expressed in this research report accurately reflect my personal views about the subject securities or issuers; and I, George O’Connor, certify that no part of my compensation was, is, or will be directly or indirectly related to the specific recommendations or views contained in this research report. Our European Policy for Managing Research Conflicts of Interest is available at www.stifel.com.

Rating and Price Target History for: Wandisco (WAND/LN) as of 02-22-2017

1,500

1,200

900

600

300

0 Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 Q1 2014 2015 2016 2017

Rating Key

B - Buy UR - Under Review H - Hold NR - No Rating S - Sell NA - Not Applicable I - Initiation SU - Rating Suspended D - Discontinued

Created by BlueMatrix

For a price chart with our ratings and any applicable target price changes for WAND.LN go to http://stifel2.bluematrix.com/sellside/Disclosures.action?ticker=WAND.LN Stifel or an affiliate is a market maker or liquidity provider in the securities of Wandisco. Stifel or an affiliate has received compensation for investment banking services from Wandisco in the past 12 months. Stifel or an affiliate expects to receive or intends to seek compensation for investment banking services from Wandisco in the next 3 months. Wandisco is provided with investment banking services by Stifel or was provided with investment banking services by Stifel or an affiliate within the past 12 months. Wandisco is a client of Stifel or an affiliate or was a client of Stifel or an affiliate within the past 12 months. Stifel or an affiliate is a corporate broker and/or advisor to Wandisco. The equity research analyst(s) responsible for the preparation of this report receive(s) compensation based on various factors, including Stifel’s overall revenue, which includes investment banking revenue. Our investment rating system is three tiered, defined as follows: BUY -We expect a total return of greater than 10% over the next 12 months with total return equal to the percentage price change plus dividend yield. HOLD -We expect a total return between -5% and 10% over the next 12 months with total return equal to the percentage price change plus dividend yield. SELL -We expect a total return below -5% over the next 12 months with total return equal to the percentage price change plus dividend yield. Occasionally, we use the ancillary rating of SUSPENDED (SU) to indicate a long-term suspension in rating and/or target price, and/or coverage due to applicable regulations or Stifel policies. SUSPENDED indicates the analyst is unable to determine a “reasonable basis” for rating/target price or estimates due to lack of publicly available information or the inability to quantify the publicly available information provided by the company and it is unknown when the outlook will be clarified. SUSPENDED may also be used when an analyst has left the firm.

Of the securities we rate, 48% are rated Buy, 42% are rated Hold, 3% are rated Sell and 7% are rated Suspended. Within the last 12 months, Stifel or an affiliate has provided investment banking services for 18%, 7%, 3% and 15% of the companies whose shares are rated Buy, Hold, Sell and Suspended, respectively. Additional Disclosures

Page 67 Wandisco 23 February 2017

Please visit the Research Page at www.stifel.com for the current research disclosures and respective target price methodology applicable to the companies mentioned in this publication that are within Stifel's coverage universe. For a discussion of risks to target price please see our stand-alone company reports and notes for all stocks. The information contained herein has been prepared from sources believed to be reliable but is not guaranteed by us and is not a complete summary or statement of all available data, nor is it considered an offer to buy or sell any securities referred to herein. Opinions expressed are subject to change without notice and do not take into account the particular investment objectives, financial situation or needs of individual investors. Employees of Stifel, or its affiliates may, at times, release written or oral commentary, technical analysis or trading strategies that differ from the opinions expressed within. Past performance should not and cannot be viewed as an indicator of future performance.

As a multi-disciplined financial services firm, Stifel regularly seeks investment banking assignments and compensation from issuers for services including, but not limited to, acting as an underwriter in an offering or financial advisor in a merger or acquisition, or serving as a placement agent in private transactions. Affiliate Disclosures “Stifel”, includes Stifel Nicolaus & Company (“SNC”), a US broker-dealer registered with the United States Securities and Exchange Commission and the Financial Industry National Regulatory Authority and Stifel Nicolaus Europe Limited (“SNEL”), which is authorized and regulated by the Financial Conduct Authority (“FCA”), (FRN 190412) and is a member of the London Stock Exchange. Registration of non-US Analysts: Any non-US research analyst employed by SNEL contributing to this report is not registered/qualified as a research analyst with FINRA and is not an associated person of the US broker-dealer and therefore may not be subject to FINRA Rule 2241 or NYSE Rule 472 restrictions on communications with a subject company, public appearances, and trading securities held by a research analyst account. Country Specific and Jurisdictional Disclosures United States: Research produced and distributed by SNEL is distributed by SNEL to “Major US Institutional Investors” as defined in Rule 15a-6 under the US Securities Exchange Act of 1934, as amended. SNC may also distribute research prepared by SNEL directly to US clients, including US clients that are not Major US Institutional Investors. In these instances, SNC accepts responsibility for the content. SNEL is a non-US broker-dealer and accordingly, any transaction by a US client in the securities discussed in the document must be effected by SNC. US clients wishing to place an order should contact their SNC representative. Canadian Distribution: Research produced by SNEL is distributed in Canada by SNC in reliance on the international dealer exemption. This material is intended for use only by professional or institutional investors. None of the investments or investment services mentioned or described herein is available to other persons or to anyone in Canada who is not a “permitted client” as defined under applicable Canadian securities law. UK and European Economic Area (EEA): This report is distributed in the EEA by SNEL, which is authorized and regulated in the United Kingdom by the FCA. In these instances, SNEL accepts responsibility for the content. Research produced by SNEL is not intended for use by and should not be made available to non-professional clients. The complete preceding 12-month recommendations history related to recommendation(s) in this research report is available at https://stifel2.bluematrix.com/sellside/MAR.action Brunei: This document has not been delivered to, registered with or approved by the Brunei Darussalam Registrar of Companies, Registrar of International Business Companies, the Brunei Darussalam Ministry of Finance or the Autoriti Monetari Brunei Darussalam. This document and the information contained within will not be registered with any relevant Brunei Authorities under the relevant securities laws of Brunei Darussalam. The interests in the document have not been and will not be offered, transferred, delivered or sold in or from any part of Brunei Darussalam. This document and the information contained within is strictly private and confidential and is being distributed to a limited number of accredited investors, expert investors and institutional investors under the Securities Markets Order, 2013 ("Relevant Persons") upon their request and confirmation that they fully understand that neither the document nor the information contained within have been approved or licensed by or registered with the Brunei Darussalam Registrar of Companies, Registrar of International Business Companies, the Brunei Darussalam Ministry of Finance, the Autoriti Monetari Brunei Darussalam or any other relevant governmental agencies within Brunei Darussalam. This document and the information contained within must not be acted on or relied on by persons who are not Relevant Persons. Any investment or investment activity to which the document or information contained within is only available to, and will be engaged in only with Relevant Persons. In jurisdictions where Stifel is not already licensed or registered to trade securities, transactions will only be affected in accordance with local securities legislation which will vary from jurisdiction to jurisdiction and may require that a transaction is carried out in accordance with applicable exemptions from registration and licensing requirements. Non-US customers wishing to effect transactions should contact a representative of the Stifel entity in their regional jurisdiction except where governing law permits otherwise. US customers wishing to effect transactions should contact

Page 68 Wandisco 23 February 2017 their US salesperson.

The recommendation contained in this report was produced at 23 February 2017 01:58EST and disseminated at 23 February 2017 01:58EST.

The securities discussed in this report may not be available for sale in all jurisdictions and may have adverse tax implications for investors. Clients are advised to speak with their legal or tax advisor prior to making an investment decision. Additional Information is Available Upon Request © 2017 Stifel. This report is produced for the use of Stifel customers and may not be reproduced, re-distributed or passed to any other person or published in whole or in part for any purpose without the prior consent of Stifel. Stifel Nicolaus Europe Ltd. 150 Cheapside, London, EC2V 6ET. Registered in England Number 03719559

Page 69