How We Use Open Source at .com – Part 1

By Ian Varley | Published: July 18, 2014 (Series complete July 29, 2014)

While many may not realize the importance of open source at salesforce.com, it plays a critical role here. These days, developers at salesforce.com spend all day working in open-source software, but this wasn’t always the case.

In the early years of the company, the standard-issue workstation ran Windows. But a few developers noticed that building the Salesforce application went much faster on Linux, using the same exact hardware — cutting the build time from hours to minutes. They started passing around a CD, and before anyone knew what had happened, the change went viral and nearly everyone in engineering had switched to Linux. (Thankfully, our awesome corporate IT team soon followed suit and started issuing new machines with a supported version of Linux for new hires.)

On top of the Linux OS, the entire development stack also runs on open source. The Salesforce platform is primarily written in Java and runs on Sun’s JDK (which is open source). Most developers at salesforce.com write, compile, debug, and test the application in Eclipse, an open-source programming editor (maintained by the aptly named Eclipse Foundation). Many projects use Git as their source version control, although the central repository is Perforce because of its size.

Like most companies, we use many of the “usual suspect” Java open-source libraries during development: Guava, Gin, and Guice (open sourced by ); (from the Apache Software Foundation); Jackson (killer JSON parsing library written and maintained by Salesforce engineer Tatu Saloranta); among hundreds of others. Our extensive automated test suites make heavy use of the test frameworks JUnit, Mockito, JMockit, and Selenium.

Building the software is a big task for a complex platform like Salesforce. The platform was originally built using the build framework , an open-source tool. But as the complexity of the system has grown, the core build team has been making the transition to

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015 (with the help of Jason van Zyl, the author of Maven). Maven’s declarative dependency management gives teams a much faster, more modular build.

Once the code is written, there’s a large-scale system that builds and packages the software. Once committed, software artifacts are built and packaged for deployment using Jenkins, a simple, continuous integration and scheduling framework. Our release package store is written in Java and uses a sequence of Jenkins instances to build, package, test, and promote the resulting compiled artifacts. Internal testing — that is, running hundreds of thousands of tests on every checkin — is done by a fleet of OpenStack instances.

Once the application is built and packaged, it needs to be deployed into production. In addition to three major releases per year, salesforce.com engineering also does hundreds of maintenance releases and patches containing performance improvements, capacity adds, configuration updates, and the like. Historically, this has been done using a homegrown tool written in , which allows deployments under the extremely rigorous security restrictions that protect Salesforce production instances. However, the sheer scale of Salesforce’s deployment infrastructure has prompted recent advances in this area, upping both the automation factor as well as the security. The new systems use a suite of open-source software tools, including Razor (hardware provisioning), Puppet ( install automation), (orchestration), and Rundeck (operation).

And when the software is running in production, salesforce.com engineers monitor their services using an open-source metrics aggregation platform called Graphite. Data flows into Graphite using , a high-throughput distributed messaging system written by LinkedIn. The Importance of Open Source at Salesforce.com – Controlling Your Own Destiny

An important benefit of open source is that it lets you control your own destiny. Adopting open-source software gives our engineers the ability to solve problems directly.

An example of this is the software that routes web requests to our servers (the “servlet container”). Salesforce started out using a well-established commercial product, which worked well for many years. However, a new project in 2011 ran into a snag when the servlet container lacked a key feature, and the vendor was unwilling to implement it. Instead, the team decided to switch to an open-source version called Jetty. This was a big project, of course. It’s hard to switch a core component in a big, complex application — especially one that’s been in active use for 10 years! But the project was a success, and Salesforce now runs Jetty everywhere. Search Indexing at Salesforce.com

Another great example of open-source software usage at Salesforce is search indexing (the process of taking text, like Account details and Chatter posts, and making it accessible to fast user searches).

The original implementation of search at Salesforce used a popular open-source search indexer called . The search development team at the time decided to “fork” Lucene (that is, make a local version that diverges from the community-maintained version).

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015 However, as everyone knows, the scale of Salesforce has increased manyfold over the years, to the tune of 1.5 billion transactions every day. Along the way, the search team discovered a challenge in the architecture of their implementation of Lucene. The team needed a way to scale this capability “out” rather than “up,” using a large number of smaller machines to process the same requests.

The solution they found was . Solr is a horizontally scalable system with a loosely coupled REST interface. This architecture allowed the team to move the query and index processes to the same host, and cut out the requirement to use a SAN. This move also netted the team a spate of new features. (Solr still uses the latest version of Lucene for the core library.) And the team has contributed to and sponsored fixes to Solr, including a somewhat uncommon one: the ability to support indexers with over 10,000 cores!

As you can see, we use a LOT of open-source. It gives us the flexibility needed to solve for customer and platform demands such as performance and scale. In Part 2, we’ll detail more of the ways that salesforce.com contributes to existing open source projects.

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015 Open Source at Salesforce.com – Part 2: How We Contribute

By Ian Varley | Published: July 23, 2014

One of the great things about open source is that it lets companies with large engineering teams, like Salesforce, use the specialized expertise of their engineers for a much broader impact. Note: If you missed our open source intro, it’s a great place to start!

In Part 1 of this series, we looked at a few of the open source tools, frameworks, and products that salesforce.com engineers use to support the Salesforce service. Now we’ll look at how salesforce.com engineers contribute improvements to those same programs back to the community. Open Source at Salesforce.com: Contributing to Apache Qpid

At salesforce.com, we deliver 3 major releases a year and dozens of patches. We need the ability to resolve customer issues quickly.

As an example, take our (MQ) layer. Message Queues are a way of shifting the execution of that code to a later time; like a line at the bank, each request “queues up” and waits for its turn to be run. In the early days, many Salesforce developers wrote their own implementations of this common and useful pattern, but these were eventually centralized into a single consolidated queue system running on a closed-source commercial product.

This worked well enough…until a bug appeared. In some situations, the task would fail with an opaque error message. The engineering team worked with the vendor for many release cycles, trying one patch after another to no avail. Had it been an open source program, the team would have made fixing this terrible bug their top priority and squashed it. But as it was, they were at the mercy of the vendor.

To solve this problem, as well as improve scalability, the engineering team searched for a replacement and eventually settled on Apache Qpid. The team once again ran into mysterious bugs… but because it was an open source project, they were able to look at the source code, debug the issues, fix them, and contribute them back — including a client-side fix that resulted in a 40 percent reduction in memory usage! And, not only does salesforce.com benefit, but everyone who uses Apache Qpid gets this improvement.

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015

Test All The Things

Selenium is a browser-based automation tool. When you want to write integration tests against the user interface of a product (for example, the rendered web page), it’s helpful to have a tool that lets you craft those tests in a cross-platform, cross-browser way. Salesforce makes heavy use of Selenium in running our massive suite of functional and integration tests against every code check-in. Salesforce.com engineer Luke Inman-Semerau has been a committer on the Selenium project for the last 3 years, and has been a key contributor on documentation, python, and java.

Salesforce.com is a mobile-first platform, and that’s just as evident in our testing. We’ve been heavily involved in 3 different mobile drivers for Selenium. Luke is a committer on both the Selendroid project (Selenium for Android), and the ios-driver project (Selenium for iOS), along with Salesforce engineer Roman Salvador. Both Selendroid and ios-driver were created at eBay, and salesforce.com was an early adopter who helped incubate and evolve them to their current state.

We’ve also adapted Selenium to work with other mobile devices. Salesforce.com engineer Jim Evans has produced a new OS library, Windows Phone Driver, that allows you to use the same Selenium web driver API to automate web applications running on Windows Phone 8.1.

Greg Wester, Sagar Wanaselja, and David Louvton all presented on how Salesforce uses Selenium earlier this year. Batch Data: Hadoop

The core of Salesforce’s business, of course, is data. Nearly every operation on Salesforce uses data in one form or another: viewing Accounts, executing Apex and Visualforce, generating reports, etc. Optimizing our use of data is the largest part of almost every engineer’s job at salesforce.com. And, no surprise, open source is a key part of this too.

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015 Processing large batches of data can be a big resource draw. Doing this against a standard is challenging because it requires you to first extract and transform the data, then load it somewhere else where you’ll do the heavy lifting. In 2004, Google’s MapReduce paradigm took the batch-processing world by storm and was quickly given a vibrant open source life in the form of . Hadoop sends the computations to the data instead of the other way around.

At salesforce.com, CTO Walter Macklem started a project in 2010 to introduce Hadoop at Salesforce, under the codename “Gridforce.” This made use of Hadoop Distributed File System (HDFS) and now also uses , which is a high-level language on top of MapReduce programs. Prashant Kommireddi, one of the team leads for the Gridforce team, is a committer on Pig, contributing regularly to its development.

Today, batch processing with Hadoop is used extensively in back-end processing, such as improving search relevance and discovering recommendations for items to follow in Chatter. It’s also part of a new program that allows access to pre-processed log files (code name “ELF”). Getting Committed with HBase

Relational are extraordinary and powerful pieces of software, and Salesforce relies on relational databases. This gives a wide range of capabilities and a consistent basis for storing all customer data. However, it comes with some inherent limitations. Because of the depth of relational capabilities in the product (including triggers, views, indexes, and wide- ranging atomic transactions), there becomes a point of diminishing returns: the amount of engineering effort required to incrementally improve performance becomes prohibitive to team scale.

So we asked ourselves: What if we could store vast numbers of records but with fewer assumptions and capabilities? What if we could scale our data storage hardware out horizontally but with the same data safety guarantees?

The answer to this question is Apache HBase. HBase is a horizontally scalable “NoSQL” database based largely on the design of Google’s Bigtable system. You may be familiar with HBase because it’s the same technology that runs the massive infrastructure behind Facebook Messages. It’s a fault-tolerant, consistent row store that scales linearly by adding commodity hardware machines. This means that some things that are easy for a relational database (like transactions and indexes) are comparatively much harder. (Though, as you’ll see in Part 3, we’re closing that gap via , a SQL library on HBase that was open-sourced by Salesforce!)

What will run on HBase? Initial features are targeting audit and compliance use cases, suc h as audit history, event tracking, and archival storage of older records. Eventually, though, we’re aiming to offer high-safety, low-cost large storage for “big objects” that use familiar APIs but don’t give you all the features of classic Salesforce objects.

HBase was the first major open source software project that Salesforce got deeply involved with at a community level. Lars Hofhansl, an architect at Salesforce for over 10 years, became an HBase committer in 2012, and he has since gone on to be the 0.94 series release manager and a member of the HBase PMC. Jesse Yates, another HBase committer, is also on the Salesforce HBase engineering team and has coauthored many critical HBase features, including Snapshots.

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015 Both Lars and Jesse spoke at HBaseCon 2014 (as did Salesforce engineers Eli Levine and James Taylor). Along the way, the focus of Salesforce’s engineering efforts on HBase have been directed at bringing it up to the same level of world-class resiliency and data safety that we demand for our enterprise customers. (This is my team, so I could go on for hours about it. But I’ll leave that for a future post. However, you can check out Lars’, Jesse’s, and James’ presentations on our Slideshare channel.) Speaking of Open Source Committers…

As you can see, salesforce.com isn’t just an open source consumer; we are also a contributor. But, beyond that, we’re also an open source “pusher” in that we actively support a large roster of engineers who work on open-source projects part-time or full-time. This includes project leaders like Tom Lane (PostgresSQL), Matz (Ruby), Jason van Zyl (Maven), Damien Katz (CouchDB), as well all the other folks listed above.

In Part 3, we’ll talk about how salesforce.com has been busy creating and releasing new open- source projects.

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015 Salesforce.com’s New Open Source Releases – Part 3

By Ian Varley | Published: July 29, 2014

In Part 1 and Part 2, we talked about open-source projects that started outside of Salesforce. But, does Salesforce create and contribute new open source releases as well? Historically, yes, although typically not core platform features… But, this is changing. With over 4 million custom apps written on our Platform, and 1.5 million developers in our community, the importance of open sourcing core platform technologies is critical for continued success. In the last year, Salesforce has open sourced two large libraries, Aura and Apache Phoenix. Apache Phoenix: We Put the SQL Back in NoSQL

In Part 2 of this series, we talked about how Salesforce makes use of HBase, a NoSQL database for high scalability. For developers who are used to relational databases, one of the issues with NoSQL is that…well, there’s no SQL! Learning how to write optimized data access code with the HBase client API is not a task for the faint of heart.

Enter Apache Phoenix. Originally authored by James Taylor, an architect at salesforce.com, Apache Phoenix provides the ability to use HBase via high-performance, low-latency SQL statements, making it similar to any other relational database API. The project’s slogan is, “We put the SQL back in NoSQL!” You may recall, we profiled Apache Phoenix in a blog post back in May.

From its open-sourcing in 2012, Phoenix has spiked in popularity and contribution. It was accepted as an project in late 2013, and it became a top-level Apache project in early 2014. It also was included in mature distributions by Hortonworks (a company built on open source!).

Contributors from dozens of companies have provided patches and contributions, and the committer list has a broad representation from around the world. Recent contributions include the addition of hash join, a new CSV loader, Pig integration, and the inclusion of SQL Arrays.

As a top-level Apache project, Phoenix is perhaps the most well known of Salesforce’s open- source contributions in recent years. But it’s not the only one. Aura: The Backbone of Salesforce1

Aura is the component-based UI framework that was used to create Salesforce1. It’s an event- driven, client-server UI architecture; developers can use it to build a single UI that will work seamlessly across desktops and devices, and interact with a server-side process (like the Salesforce service). It’s very much in the spirit of modern JavaScript frameworks, intended to elevate developers from the nitty-gritty of DOMs and Javascript, allowing them to work faster and avoid browser compatibility pitfalls.

Aura (currently in closed pilot, and entering Beta during the next major release) will provide Salesforce customers with access to the exact same technology for authoring their own components. Developers can create components that integrate directly into Salesforce1. This provides better access to features and optimizations than were available previously.

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015 Aura is open sourced on GitHub, and to date the project has over 7,000 commits. Why is it released as open source? As an open project, access to the source code gives transparency to customers. As its use increases, customers can contribute new features back to the project, leading to better quality for all.

(Interested to get your hands dirty with Aura? Salesforce product manager Skip Sauls created a simple example that spins up Aura on , for some great HTML 5 demos.) Force.com IDE

Many Salesforce developers did a dance of joy on the day Salesforce announced it had open- sourced the Eclipse Force.com IDE Plug-in. If you’re not familiar with it, the Force.com IDE is an Eclipse plug-in that makes it easier to work with Salesforce assets (code, schema, etc.) in Eclipse. By open-sourcing it, regular users are now allowed to contribute bug fixes and new features. Within a month of the open-source announcement, a couple dozen ideas have been posted, and one contributor has already begun creating a new feature for the plugin. And More…

Other projects have also been open-sourced in recent years. The Mobile SDKs for iOS and Android are both open (and, incidentally, make heavy use of another open library, , to blend web and native components). We also open-sourced Kylie, which is a JavaScript tool for measuring page rendering and script execution time.

There’s open source ON the Salesforce platform, as well as underneath it. In addition to the standard forcedotcom git repo and apex searches, you can find lots of open, shared Apex code in the force.com Code Share. And there are a ton of open source repositories available at the developerforce page, created by the salesforce.com developer relations team–mobile packs, wearbles, and more.

There’s a thriving open source community around the Salesforce platform, partly seeded by Salesforce examples and SDKs, as well as by salesforce.com customers developing Apex components. More and more, being a Salesforce developer means working in open source projects, big and small (like this great example of a simple heartbeat monitor app, from Product Manager and professional “Salesforce hacker” Adam Torman); the restforce Ruby gem written by community member, Eric Holmes (incidentally, the original open source Ruby gem for Salesforce was written way back in 2005!); and nforce for Node.js developers written by another community member Kevin O’Hara. New Open Source Releases: Conclusion

Building software with open source in mind is inherently beneficial because it drives you to do plain old “good design.” If you’re going to share something with the world, it has to be loosely coupled to the rest of your systems, and it has to be cohesive and comprehensible — it has to “do one thing well.” This, and the lessons of win-win collaboration, has been a huge boon to engineers at salesforce.com. Embracing the open-source model has been a sea change in the last few years.

Look for more involvement and contributions in the near future!

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015 Big thanks to: Regina Burkebile, Reid Carlberg, Adam Torman, Doug Chasman, Eric Perret, Helen Kwong, James Taylor, Jeff Bergan, Jesse Yates, Jon Bruce, Lars Hofhansl, Phil Waligora, Rachel Mei, Santosh Rau, Scott Hansma, Vijay Devadhar, and Walter Macklem.

From developer.salesforce.com/blogs/engineering/2014/07/open-source-at-salesforce-com.html 11 April 2015