Spotify: A Product Story Episode 5 - Transcript

You’ve just boarded a long haul flight. Your jacket’s in the overhead compartment, hand luggage stashed under the seat in front of you. Seatbelt on. Tray table stored. The plane taxis down the runway -- and you take off.

You start to relax and lower your seat back.

Then, you hear this from the cockpit:

Ladies and gentlemen, this is your captain speaking, with an important message from our crew: we will now change engines, mid-flight. You may experience some turbulence as we lose some speed and even potentially some altitude… but we’re fairly certain it’s the right decision in the long run. Thank you for flying with us and hold on tight!

Changing the engine mid-flight. It sounds crazy -- but this is essentially the position we found ourselves in at in 2013.

This is Spotify: A Product Story, and I’m your host, Gustav Söderström. I head up product, engineering, data and design for Spotify.

In this podcast, we’ll pull back the curtain on our biggest product launches. We’ll break down what worked, what didn’t, why Spotify ultimately succeeded, and how you can use the key lessons in product strategy that we’ve learned along the way.

On today’s episode -- When Your Winning Bet Becomes Your Losing Bet -- we’ll explain how Spotify’s proprietary stack went from being our superpower to being our kryptonite, and just how close we came to crashing and burning.

(01:47) From day one, Spotify was built to perform.

Daniel Ek: I think it comes back to your DNA as the founder, and my DNA was I loved technology. And I loved technology even for technology's sake.

1 Spotify: A Product Story Episode 5 - Transcript

That’s Spotify’s CEO, Daniel Ek. When he and his co-founder Martin Lorentzon started Spotify in the mid-2000s, peer-to-peer networking was the coolest tech around -- like the blockchain of its time. Disruptive and a little bit scary but potentially incredibly powerful.

So, even before they figured out exactly what Spotify would do, they made a bet on developing the tech and they never looked back.

Daniel Ek: We knew we wanted to build a company and again the role model at the time was Google, which had, you know, fantastic technology. And we were like, what could our sort of fantastic technology be? And that was the ambition of the company.

Spotify’s fantastic technology was a hybrid client-server and peer-to-peer system -- technology that made it possible for the first time to stream music fast enough that people not only switched over to Spotify from downloading music illegally, they even started paying for music again.

-- If you haven’t already heard them, now’s a good time to go back and listen to episodes 1 and 2 for the full story --

This feat of engineering -- the proprietary technology stack that let you listen to any song in the world in less than 500 ms -- was Spotify’s secret sauce -- the thing that made it succeed where so many others had failed.

Until, one day, it wasn’t.

Well, that’s not entirely true. In reality, it took several painful years for Spotify to fully outgrow its proprietary stack.

A fact that Emil Fredriksson knows better than almost anyone.

Emil Fredriksson: I walked in there on the first day, and it was just an empty, an empty building. And, started talking to the people there that I didn't know yet and assembling the office furniture and kind of like -- OK, so what do we do now? Where do we start?

2 Spotify: A Product Story Episode 5 - Transcript

Emil has been Spotify’s Operations Director since 2008. While the rest of the company (all 5 of them) figured out how Spotify could leverage peer-to-peer technology, Emil oversaw the very first Spotify prototype -- a standard client-server set-up stashed in a closet. At least until it outgrew the apartment Daniel and Martin were using as Spotify HQ at the time.

Emil Fredriksson: An apartment isn't built to have 10 or 20 computers running in the same place, so. You just can't plug more things in, you're going to trip the circuits. I remember once when I think the cleaners in the office had plugged in their vacuum to the wrong power socket and overloaded one of the circuits. So the servers had shut down. And Daniel was calling me and saying, there's all of these people saying that calling and text messaging and sending him Facebook messages or whatever that Spotify isn't working. So I had to jump in a taxi and go to the office. This was on a weekend -- and switched the circuit breakers back on and unplug the vacuum and try and get everything back up and running.

As Spotify added the peer to peer technology and local caching that we described in episode 1, the central servers that Emil was responsible for only needed to serve as little as 10% of all the listening happening on Spotify and the system became much more fault tolerant as it could always fall back on peer to peer if the central servers failed. But even with that it was hard to keep up with the growth of Spotify, especially as the system still had some central components, such as the playlist system that easily got overloaded during peak usage.

Emil Fredriksson: So what would happen was on Friday night, the people started to use Spotify more and more because there’s parties or people who were at home listening to music, and at Friday night, it would always fail, often the playlist system. And so what happened was I would get the alert and I would start calling the developers that knew how the playlist system worked and we would spend hours and hours on the Friday night trying to bring it back to a workable state. So looking back, you think like, well, why didn’t we fix it? Could we have done something to make it work better? But that was the reality at Spotify for me and for many of the developers -- it was putting out fires. So even thinking about building new features or strategy long term, that sort of thing, it just didn't have a space in our mind. We were just going from one week to the next.

3 Spotify: A Product Story Episode 5 - Transcript

Emil went from managing one server to an apartment full of servers to traveling all over Europe and North America -- trying to lease space and hire teams on the ground as fast as possible in order to set up enough data centers to keep up with Spotify’s explosive growth.

At the same time, Spotify’s user base was moving off of PCs and onto mobile devices -- so the peer-to-peer network that had made the desktop client so special in the beginning, was actually turning into a liability -- we couldn’t use it on mobile because it would drain people’s batteries, and use up all their data. Which resulted in more and more strain being put on our servers.

Meanwhile, the rest of the internet had started to catch up. The pay off in terms of latency and reliability that came from running our own data centers diminished every day, as cloud computing got better, faster, and cheaper.

Emil Fredriksson: So much time had passed that like that competitive edge is just like eroded for a number of years, like no one really even seemed to care that much about how -- I mean, obviously the product has to be really fast, but we weren't significantly faster than any of our competitors anymore, and that kind of is just table stakes. So that was very important in one of the faces of Spotify. It just wasn't that important. And it's just not technically that difficult to do anymore. Gustav Söderström: Exactly. What is important changes and controlling parts of the stack may be vital during a period of time, but then it turns from your biggest advantage to one of your biggest costs and you actually want to move on to figure out another part of the stack or of the business model that is still vital, that is underperforming or that has lots of opportunity left. You don't want to stay and spend the same amount of cost on the opportunity that isn't there anymore. Emil Fredriksson: Yeah, I think that's 100 percent true. And the challenge is that when you spent enough time to get into the nitty gritty details of doing something yourself, like zooming back out and realizing that this isn't the most important thing that I should be doing right now, that's like a personal challenge and not something that comes natural to humans, I believe, that's a fight that you have to take and really be purposeful about. Otherwise, it's so easy to just stay in your little bubble and continue iterating on your little problem that you've learned so much about and become so good at.

4 Spotify: A Product Story Episode 5 - Transcript

(09:13) The reality is, it took us longer than it probably should have to accept that our data centers were becoming such a bottleneck. Partly because we were too attached to the “fantastic technology” that we originally pioneered and partly because -- at some point -- being in a constant state of crisis had become Emil and his team’s comfort zone.

At the same time, we could only add new features as quickly as Emil and his team could establish new data centers and provision new servers, so Emil’s job wasn’t just unsustainable for him, it was actually inadvertently holding back the entire company, through no fault of his own.

Which brings us to today’s first product strategy lesson.

Lesson #1: Don’t get attached to the status quo. When you’re really good at something, you continue doing it -- because you’re really good at it, not because it’s necessarily the right thing to do!

And because we humans accept almost anything if it just happens slowly, over a long period of time, you accept things that -- if someone would’ve told you about them upfront, or they would’ve happened quickly -- you wouldn’t have. Sometimes you need to stop, zoom out and start from first principles again.

Emil and his team were doing a phenomenal job, but from a 10 thousand foot view, it was the wrong job.

Instead of racing to provision just the right amount of computing power -- not too much, which wasted money; not too little, which caused outages -- they should have been moving operations to the cloud, which would scale dynamically to fit Spotify’s needs.

Emil Fredriksson: We're really not a company that's well adjusted to these long term planning for physical infrastructure things, and it's just really far away from the sort of product that we're building. What we ended up with is really, really poor utilization because we needed to have all of this buffer for all of these unforeseen consequences. So we had really poor utilization, we had really long lead times and we had a company that was moving back and forth all over the place. So it's like a bad fit for the company.

5 Spotify: A Product Story Episode 5 - Transcript

Of course, at some level Emil had known this ever since the days of hooking up 10 to 20 servers in that first apartment and hoping that nobody tripped a circuit. So very, very early on he actually tried moving the Hadoop cluster for our recommendations system to the cloud.

Emil Fredriksson: I remember thinking that okay this is the future, this is where we have to go, and it's going to make our life so much easier. And it failed miserably, like it was immensely expensive to do the calculations, they were so slow that we couldn't use it at all, and we just had to roll back and build our own large on -- in data center analytics cluster to do that. And I mean, part of that probably was our -- the product was really immature and we didn't really understand how to use it. I'm not sure the use case matched completely and so on and so forth. But that was the first experience. And I kind of OK, this isn't mature enough and we're too big, so we have to continue building on our own. And I think we continue to build on our own for -- I mean -- years. And I still had this feeling of like Google and Amazon are -- they're better than us at this. Like we can do this now. But why are we doing this when there's someone else that has a thousand people working to do pretty much the same thing? So we kind of knew that this is a matter of time before this is going to happen. But it just wasn't the time then. And then we were building and building and building. And, you know, it's really difficult when you're fighting just to keep ahead of the growth of the company. The top of mind isn't like, OK, now we're going to do this huge migration, basically duplicating this entire system in two and running them in parallel just in order to get to the cloud. That was just difficult to imagine carving out the priority for the company at large and for the team that I was running at the time to start a project like that. Gustav Söderström: So this is what I think many people miss with migrations like this. It sounds logical because you're switching from your own premise to the cloud. And actually, ideally, the cloud would require even less people. But that's not how it works. Like for two years, you're going to have to have twice the people because you need to keep the old system alive until the last service from the old system has migrated completely. So during two years, you're going to have like you're running two companies at the same time, right? It’s not like you can switch them over. And people don't really get how expensive and hard that is. Emil Fredriksson: And it's not only people like. Sure, yeah. There’s twice the -- I mean, maybe it's not twice, but it's -- you running them in parallel, it's twice the amount of work. But it's also at least twice the amount of cost because you're

6 Spotify: A Product Story Episode 5 - Transcript

keeping all of the things that you already have. And it's at least twice the amount of complexity, because now all of the product developers that are sitting there, they have to relate to two different systems that have -- behave in two different ways. So it’s really like -- if you look at it from like a time and cost and complexity, there's this huge hill that you have to climb. And that's also kind of a mental hill that you have to climb; OK, we know that we have to do it, but do we really want to start it? It's so painful, so we're just going to wait another quarter and see what happens. So it was Daniel that came and said, OK, we have to do this, it has to be a priority and that's what kick started the whole cloud migration when we did it for real the second time. Gustav Söderström: I think this says something about the value of having the right people at the top and having maybe founders at the top or at least technically skilled people at the top. If you look at it from the outside, what you're saying is with this system this big, that was growing this fast. While it was actually not -- I mean, it wasn't in the same state when it's transitioning. By the end of the transition, Spotify would, if it were successful, would be vastly larger. So you'd run into new scaling problems as we were trying to shift. Right. So if there was ever a good reason for the analogy of switching engines mid-flight, I think this was it. But also the airplane was speeding up. It wasn’t like flying at some fixed altitude. Emil Fredriksson: Yeah, you were like adding in more engines to the plane while we were running. I remember like ordering millions and millions of dollars of servers as we were like doing the migration because we had to have this overlap. And I mean, it doesn't feel great doing that, knowing that these are going to be in use for a short amount of time. But that's just the way -- that's what you had to do.

And so -- whether we felt ready or not -- the decision was made. After 6 years of running our hybrid client-server solution, it was time to migrate our entire operation -- over a hundred million users and counting -- to the cloud.

It would be the biggest infrastructure project in Spotify’s history, and our users would never even know. Unless, of course, everything went catastrophically wrong.

Tyson Singer: I’m Tyson Singer, I head up Platform at Spotify, which is a team that focuses on ensuring we have all the infrastructure set up for Spotify’s services to succeed as well as really focus on the productivity of all our employees.

By the time Tyson was brought in to oversee the migration in 2016, we knew 3 things:

7 Spotify: A Product Story Episode 5 - Transcript

1. It was going to be a painful, lengthy process, no matter how we went about it.

2. It was going to be enormously expensive -- basically double what infrastructure had cost us so far, and...

3. It was going to force us to put all other development on hold until we made the switch.

And that was just about all we knew.

Tyson Singer: So this was one of the things that did shock me a little bit, which was we didn't really have any decision-making frameworks or guidelines for the buy versus build decision at all.

(17:05) As a young, quickly growing company that had always prided itself on its engineering chops, the default mindset at Spotify was that -- if we needed a new tool, the best option was to build it ourselves. And for a long time, that was generally true.

Tyson Singer: We had amazing engineers and they could build things that were better than what was available. But then we kind of rested on our laurels and the rest of the industry sort of caught up and surpassed us with open source models and things that were supported by cloud vendors and whatnot. And then we started to sort of fall behind and we still had this mindset that we needed to build everything. And so teams would basically say, well, I've got this one, you know, interesting requirement that doesn't fit to anything that exist before. And there wasn't any sort of thinking, OK, what's the long term operating cost of them building this out? And what's the cost benefit ratio of having that super special feature that we would be able to get by building things out?

This brings us to product strategy lesson #2: the question isn’t if you can do it better -- it is if you should do it better.

As a company, you have limited resources, so your biggest cost isn’t actually going to be the direct cost of the thing you would buy. The biggest cost is actually the opportunity cost of

8 Spotify: A Product Story Episode 5 - Transcript what you would have been able to do with all those smart people, if they didn’t have to build this thing instead -- the thing that no other company could build and that you couldn’t buy anywhere -- the thing that actually differentiates you.

This is initially counterintuitive to many people, but if you think about it -- building everything yourself, instead of buying it, is basically the same thing as saying that your engineering team is the cheapest team around and that they don’t really have anything important to do. Is that really how you think about your team? Aren’t you selling yourself a bit short? In reality you should probably have as a general rule to not build anything that someone else could build.

It’s the sign of a seasoned engineering manager or product developer -- and a more mature company -- to know when to take a step back and ask: how will building this particular tool move the needle on our core strategy?

Tyson saw right off the bat that -- unlike, say, a Facebook-sized company -- it just didn’t make sense for us to allocate all those resources towards hosting our own cloud when someone else could do it faster, better, and more cheaply.

Tyson Singer: Maintaining and managing your own data centers does suck up a lot of people. So being able to sort of shift that future headcount growth back into business-facing things instead of infrastructure, seemed also quite intuitive and made a lot of sense to me.

Our most precious engineering resource -- our people -- would be better invested elsewhere. To free them up, we decided to partner with an external cloud provider, just like Emil had tried early on. But this time, the cloud was everywhere, and much more advanced.

Outsourcing our cloud infrastructure meant that a lot of people’s jobs -- including Emil’s -- were about to drastically change. Hopefully for the better. But still, Daniel faced a lot of push back.

Daniel Ek: Especially the dev ops, which were very powerful constituents at Spotify at the time, were actually quite against the cloud all together. Part of that was the kind of a fear of losing their job and then the other part of that was that they were hired because they, they wanted to build a cool tech stack and it took quite some

9 Spotify: A Product Story Episode 5 - Transcript

persuasion to try to clear people from saying -- look, you’re not going to lose your job in all of this. I spended a lot of time talking about that and saying - we’re not going to cancel our dev ops, we’re gonna move you to more higher valued things. You should not be provisioning what new servers come in, you should be actually doing dev work instead to build a better environment for our engineers.

Yes, their current job was going away -- but as Daniel explained, no one was going to be let go because of this.

People who were very passionate about working on infrastructure exclusively may choose to move on, but for the people who wanted to stay, the migration would actually give them the chance to work on an engineer’s favorite thing -- a whole new set of hard problems.

Once the dev ops team was on board, the next step was choosing which cloud hosting provider to work with.

By 2016, Amazon Web Services was the industry leader, with Google Cloud Platform as the newer entrant -- the challenger.

Amazon was -- in many ways -- the obvious choice. But that doesn’t mean it was the right choice.

Tyson Singer: Not picking the industry leader was an interesting decision.

Picking Google over Amazon meant that we could work hand-in-hand with the very same company that inspired our own “fantastic technology” in the first place.

Tyson Singer: That came with some risk, which is this sort of new entrant didn't have the maturity of a lot of the enterprise level capabilities. But one of the things that working with Google gave us the advantage is they were working higher up in the stack. They were working on managed services. And so if you go back to our original hypothesis, which is we are trying to mitigate the opportunity cost of having to invest more and more in infrastructure, then that becomes a pretty good move. And then if you can pick a cloud provider where you can influence the direction of those managed services so that it fits your business needs more substantially, then it becomes a much better proposition.

10 Spotify: A Product Story Episode 5 - Transcript

Gustav Söderström: It seems like we deliberately chose to bet on someone who was a challenger, who was inclined to be more aggressive in helping us figure out solutions, just more hungry, because we would be one of the biggest customers for them instead of just yet another customer. There was also a really strong fit, honestly, in engineering culture, which if you speak to our engineers, they say it was very important. It was just this intangible trust. And we actually got access to not just the BD person, but the actual engineers, which enabled the solution to a lot of these problems that did turn up. Right. And if you think about it in retrospect, it seems quite reasonable that someone like Google, they would start building for what was the largest scale, which was SMBs; small and medium businesses, not for like, the unicorns of Spotify, that were super big scale. Not quite at Facebook level, but certainly not SMB level. So in retrospect, it's pretty obvious that we would come with very uncommon requirements to them and that maybe in retrospect, this picking the challenger was even more important than we realized because these uncommon requests would have been fallen completely flat, maybe on a bigger provider. What’s your sense? Tyson Singer: Yeah, maybe. Although Google came into the cloud from a different angle than Amazon did. I think Amazon was really focused on the SMB market. Google had already a world scale solution and solved those sorts of problems. And so when they took some of their products to market, actually didn't fit with some of our sort of smaller scale needs. So I think “Big Table” was rolled out. It's a Google product and it doesn't work very well at sort of low RPS, low request per rate. The variability in the latency is way too high. But we wanted a no sequel, like a standardized, no sequel type solution that fit with the big table model. We did not want to have to have more and more variance in that. So that was one of the things that we needed or some of these just more simple things. Gustav Söderström: How important do you think the kind of engineering culture, and engineer to engineer relationship was for this to work? Tyson Singer: I think it was super important, because if you are going to sort of be the big fish in the little pond and influence things, you need to have the relationships set up at all levels. And so it can't just be the VPs and the CTO’s and all that talking to each other, because they are too removed from a lot of the details. And so being able to scale that out to every sort of engineer and our infrastructure team was really important to be able to drive the success.

(26:14)

11 Spotify: A Product Story Episode 5 - Transcript

And that is lesson #3: there’s a benefit to picking the challenger. Actually, there are lots of benefits.

Because Google Cloud was just starting out, we immediately became one of their “marquee clients”.

That in turn meant that we could influence the roadmap to a different extent than with an already established provider, and as a result we could still get some of the benefits of having “built it ourselves”.

And by supporting the underdog, we were taking steps to prevent getting squeezed down the road by a monopoly.

All of this taken together gave us more leverage and helped us form a closer partnership, and therefore play a more active role in shaping their product -- a new position for us, the scrappy DIYers.

Tyson Singer: There wasn't really the experience of understanding because, you know, Spotify had been a small company that as a big company with a large amount of purchase power, that we could influence the roadmap and the feature set of other companies. And that really played out in the context of our cloud provider because we came in and scaled up very rapidly and our spend scaled up really rapidly. And so all of a sudden we had sort of this tremendous leverage and influence over what was then a pretty, you know, small scale cloud service.

But working with the newer entrant came with some surprisingly mundane issues as well.

Urs Hölzle: Ironically, one of the biggest problems we had was actually not one that either of us had really anticipated. And that was actually billing. Like you know -- can we send you a correct bill?

Nowadays Urs Hölzle is SVP of Engineering at Google, but back when he started in 1999:

Urs Hölzle: It was a very small company at the time, I actually interviewed in the garage before we moved to the first office. We had our self printed business cards and my first business card, I picked the title Search Engine Mechanic because

12 Spotify: A Product Story Episode 5 - Transcript

everything was broken, basically this was still sort of university code, which sort of worked, but didn't scale and wasn't that stable. And so at the time, Google kind of went down many days around noon because it was just kind of falling over under load. And so from the beginning, really, my area was infrastructure performance, scalability, you know, make us survive the next week. Because at the time every week we had something like 10 percent more traffic and then that really -- if you barely survived the previous week, that becomes the focus of like, how can you survive 10 percent more? And this was all before cloud. So it's not like you can just spin up 10 percent more VMs and then survive with scaling.

I’m sure Emil will be relieved to hear that Google had the same setup and the same constant outages in their garage that we had in the apartment in the early days!

But, Google, of course came to the opposite conclusion.

Urs Hölzle: Then we started doing sort of what's known today as cloud, you know, the Google file system, distributed file system, Big Table, map reduce these kinds of things. And a few years later, we kind of were outgrowing rented data centers and had to build our own and focused a lot on that. Then came a lot of networking because networking didn't really scale. And then really the external cloud started to become something because we sort of realized that, you know, the first 10 years we've done everything custom for ourselves because we have to, like you couldn't buy anything that was anything like a cloud. So you had to kind of build your own. But we realized that actually that for many companies, this was still a huge problem, like something that actually cost them enormous amounts of money and ended up not being such a great result and therefore that there would be a potential to attach a public product to that.

Google was always going to end up running their own cloud -- because they’re just that big -- which is why turning that service into a public product to try to get more scale than just your own infrastructure was a great move for them.

I asked Urs to join us on the show because I wanted to know what his perspective is on the relationship between Spotify and Google in those early days.

Urs Hölzle: I think that's actually also the thing that we were looking for -- because we were and actually still today are looking when we have certain customer -- or let's say certain industry, certain customer types -- we're really looking for one or two sort

13 Spotify: A Product Story Episode 5 - Transcript

of lead customers where you can co-develop a product with or actually a platform with. Right? We felt that was actually a very good match, you know, sort of an engineering culture and that for the cloud native. Like, you know, Amazon's early lead customer was Netflix -- not available anymore. And so to us, Spotify looked like a Netflix in the sense that you had really the ambition and the talent to drive our products into the right direction and actually be ahead of other customers so that by the time other customers hit limits, you know, they weren't limits anymore because you had hit them first and together we fixed them. And you can only do that if two things are present. Right? One is that the problems actually do emerge earlier there. Right. So it is a lead customer in the sense of stressing your systems. But then also two that you have a high quality, high bandwidth interaction with them, meaning not just their talent base, but also the way they are willing to operate is such that you can have honest conversations and talk about the good and the bad and really make progress, sort of as much as if you were working for the same company.

In the beginning, it was a bumpy ride. Because we were figuring out the cloud together, at the same time. But as we both grew that shared experimental mindset became an advantage.

It is interesting to realize how both of us really needed each other -- for our separate strategic reasons - Spotify as a new cloud business with unique requirements, and Google as a new cloud provider wanting to offer unique service. They wanted a client that could simultaneously help guide the product development and be their “guinea pig” so when other big clients came along later, Google would have already encountered and troubleshot those problems. Which meant that while there were some frustrating moments -- take billing, for instance -- that was actually by design and needed to happen with a close and trusted partner.

Six years after deciding to make the switch -- we shut down our very last server; a Hadoop cluster outside of London. And we marked the occasion in a very Spotify fashion.

Emil Fredriksson: So there's like an urn with pieces of the last Hadoop server somewhere.

And Emil could finally get some sleep.

14 Spotify: A Product Story Episode 5 - Transcript

Emil Fredriksson: And I mean, I think for a lot of us, it was like a relief because we had spent so much time trying to fix and patch and maintain and keep it running for so many years and kind of like it's a weight off our shoulders, not having to worry about that because, you know, it's like you might be woken up in the middle of the night and fix something that's really detailed that you maybe hadn't thought about for a long while. And now Ok -- your mind is a bit cleaner. There's one less thing that you have to worry about and you can focus on the others. So for me, it's a big relief and I'm -- like for me, I think it's one of the projects that I'm most proud about in infrastructure at Spotify that, like, we managed to make that switch, that where you really have to have a lot of conscious -- consciousness and perseverance to go through with it.

By moving to the cloud, we had finally gotten past the era of Emil furiously provisioning hardware in order to keep up with our ambitions -- now, the only limit was our imagination.

Which can be both a good and a bad thing, as we learned.

Nicole Bouchard is a product manager at Spotify. By the time she joined in 2017, most of the back end systems were up and running in the cloud -- but we were just beginning the next phase: the google cloud data migration.

Nicole Bouchard: And so Spotify at the time had been really struggling with we've just moved on to GCP. Big Query was very popular, but had not had really any idea like how we could use it, how we could set it up, how to integrate it well with the other data pipelining tools that we had.

It was Nicole’s job to figure out how to work with Google to properly harness all that data -- not get burned by it.

Nicole Bouchard: At times people would do things that they didn't realize were super expensive. So I remember one day we were looking at our data and we had a single query that cost something like thirty thousand dollars and just nobody realized it was running. It was just totally accidental. Somebody just like accidentally put like a, you know, select all star sort of thing. And it was like, oops, you've just copied every single bit of data in our database. Like, that's very expensive.

15 Spotify: A Product Story Episode 5 - Transcript

(35:59) Which brings us to our fourth and final strategy lesson, as popularized by the late great Stan Lee. Lesson #4: with great power comes great responsibility.

And what we’ve learned is that the better people understand their power, the better choices they make with it.

Nicole Bouchard: And a big bunch of our work with Google over, you know, the next couple of years were really reinforcing this idea that we needed better data both for ourselves as a management team, but also to be able to share with our users to help them make better choices. One of the biggest challenges for us is that even as a sort of like team that was doing all of this administration, we had no visibility into a lot of these details. And so we literally couldn't even visualize what was going on ourselves because the data wasn't available. And that was actually one of the biggest challenges throughout most of this period, was people like, oh, great, like, can you show me when I'm doing something wrong? And we’re like, actually, no, we can't. All we know is when things go wrong, things blow up and we won't know until things blow up. So you just have to be very -- try to be really conservative with what you're doing, because as soon as something goes wrong, it gets really hard to recover from. And we will have almost no cycles before we hit that point.

So, Nicole leaned even further into that partnership Tyson and Urs had established. Every two weeks the product and technical teams got together with Google’s engineers to discuss the trade-offs and tactical issues that we were facing. It started to feel less like we were two separate companies and more like we were one team, working towards the same goal of getting Spotifiers the data they needed without breaking the bank.

Nicole Bouchard: We didn't really expect the scale of change in the amount of processing we could do with data. We expected a little bit of growth, but it was actually exponential. And it was not just people are processing more data, but they're doing more things at the same time, doing more in parallel. They're asking more complicated questions. And that unlocked a huge amount more understanding of our users and just opened up even further questions, things that we never been able to ask or answer previously. And I think that was really intriguing to watch that evolution. Gustav Söderström: A big part of the bet on Google was actually to bet on data being one of the most important things to be able to work with or manipulate data

16 Spotify: A Product Story Episode 5 - Transcript

effectively. But it seems like we still underestimated the actual impact that it would have to unlock that type of productivity, is that fair? Nicole Bouchard: Yeah, and I think it went hand in hand with the growth of both users on Spotify, so not only are there the number of people generating data that we want to query higher, but we also have more and more people within the company doing that kind of investigation and asking different kinds of questions as we grow the business and open up new product lines or, you know, matured parts of the organization all of a sudden we are investing a lot more in understanding stuff that we just didn't have the bandwidth for before. So that added all of these extra layers of complexity. But one of the things that came with all that scale of data was all of a sudden the problem of finding the thing you needed to work with became so much harder. When it was a constrained pool, it was like, OK, well, you know, you go to the three or four data sets that everybody knows are reliable. And all of a sudden it was like, we've exploded this and there's a massive new problem. How do you make sure all these people who are asking new questions and trying to do new things are working with valuable data, the right data at the right time in the right ways. Gustav Söderström: So you ran into this new problem, which was you unconstrained people and maybe the naive approach from management point of view was like -- I remember we spoke a lot about how long something took to do, like a data job, a batch job or something. So first you would have to wait for the capacity to free up and then would take some amount of time. I remember we measured cycle times quite a lot. And we saw this cycle times, just contract from, you know, weeks, sometimes for a single job to, you know, days and then hours and closing in on instantaneous. Then the naive approach would be like, hey, we saved money because now you can do so much faster. But what happens is people just ask more questions instead, right? So the best case stays the same. It's very hard to actually make that go down. But you can get so much more productivity for the same amount of capacity then. Right. So this was one of the big changes. But this also produced massively more data. So how did you think about solving that problem? Nicole Bouchard: That became, I think, the focus of the DI org and to some extent still is today, you know, with some of the ongoing efforts we have right now is how do we leverage this massive amount of data we have and operate at the scale that we have it. And I think that's gone through a couple of different evolutions. I think one of the first things we were really noticing was we just have to, like, fundamentally make it findable. So, you know, how do we add a layer on top of it? Just that, you know, very naively you can do a search effectively for the data you're looking for. Then once

17 Spotify: A Product Story Episode 5 - Transcript

that was solved, it was OK, well, it's not just enough to find a data set. I need to find a high quality data set to get that information. What does quality mean? How do we make sure that the stuff that is being generated is high quality so it is discoverable in the first place? Then there's the question of how do we make it so that these things are joinable and that they work together? While, how do we start thinking about creating data sets that we think are representing some really key aspects of our business rather than everyone doing whatever they want? You know, really trying to think about that sort of next layer like these are the golden data sets, the ones that we're counting on and relying on us as being key to the business. Gustav Söderström: So this whole approach of let people kind of use and do whatever they want instead of constraining the problem by saying these are the things you can work with. So the solutions we had to solve, how unique do you think they are to Spotify? And maybe how much do you think they are the right solutions, the others just don't do it well yet? Nicole Bouchard: My impression talking to peers in the industry through the various conferences or other situations is that Spotify has approached democratising, not just data access, but data creation is an order of magnitude higher than many of our peers, just a completely different level like, you know, other other companies will do this and sort of democratize it, but we take it to an extreme, both for better and for worse, you know, like we get the trade offs that come with that. But it's not always the good trade offs. But it was something that's, I think, you know, really fundamental to Spotify culture, this idea of if you need something, you can feel free to create it for yourself. Gustav Söderström: So the risk is that it produces much more complexity for everyone else -- Nicole Bouchard: Correct. Gustav Söderström: -- when they're trying to find which data set to use. So you needed to put in more work in another place instead of trying to explain and visualize what were trustable data datasets, high quality, you know, maybe low quality, golden datasets and so forth. Nicole Bouchard: And you had to do that for people who weren't necessarily data experts, so you couldn't rely on them having a strong background or knowledge or necessarily even knowing how to optimize being aware of the need for optimization. So there's a lot more outreach and education involved in solving these problems than if you can be a little bit more hierarchical and you have a, you know, clear set of people who are accountable for this, all of a sudden your potential audience or the

18 Spotify: A Product Story Episode 5 - Transcript

potential people you need to reach is any person at the company. The spectrum of what you get is so much broader than for companies that really have more central data organization, who's very much in charge of creating data sets. You know, we don't have that. It's like anybody -- anybody creates data here at Spotify.

With Google’s help, Nicole and her team solved the problem of “too much productivity” -- that is, scaling up very, very quickly the number of queries and new data sets being made -- by giving people more information, not less. They could have created artificial constraints on who could create and access what kind of data and how often, but that has never been Spotify’s culture. Instead Nicole’s team visualized the problem, and created ready made “golden data sets” -- specifically designed to help steer people in the right direction without actually limiting their access to information or ability to be productive. And we’ve seen this time and time again at Spotify -- the best solutions come from groups of people who have the tools, time and freedom they need to do their best work. And we’ll dive into that in even more detail in a later episode.

(45:00) So, what have we learned from Spotify’s journey to the cloud? Let’s recap!

Lesson #1: Don’t get attached to the status quo! When you’re really good at something, it’s easy to continue doing it, because you’re really good at it, not because it’s the right thing to do!

Lesson #2: The right question isn’t if you can do it better but if you should do it better. It’s tempting to make everything yourself just because you can, but that’s a strategic mistake in the long run.

Lesson #3: There’s a benefit to picking “the challenger.” Leverage your relative power, and support the competition to keep yourself from getting squeezed by a monopoly down the line.

Lesson #4: With great power comes great responsibility. But even more importantly: the better people understand their power, the better choices they make with it.

That’s it for this episode!

19 Spotify: A Product Story Episode 5 - Transcript

If you haven’t listened to our earlier episodes -- go back and check them out. We have interviews with Sean Parker of Napster, world renowned investor and analyst Mary Meeker -- and the spotifiers past and present who have shaped our strategy along the way.

Next week, we’ll go all the way to the bottom of the stack in our quest to create the perfect listening experience.

Thomas Cullen: We were sitting and meeting with you folks really early on about this subject. And I said, why don't you just use Bluetooth? And somebody on your team looked at me and goes because Bluetooth sucks.

That’s all coming up in the next episode.

Spotify: A Product Story is produced by Munck Studios for Spotify.

Special thanks to Allison Gilles, Spotify’s Director of Engineering and Data Infrastructure, and Niklas Gustavsson, our VP of Engineering, for all their help with the background research for this episode, and to Hans Zimmer for letting us use my all-time favorite song -- “You’re So Cool” -- in the episode.

Veronica Harth is our in-house Spotify correspondent.

We’re edited by Frances Harlow and mixed by Joakim Löfgren, Viktor Bergdahl and Andrea Fantuzzi.

Our theme music was composed by Andrea Fantuzzi.

I’m Gustav Söderström. Thanks for listening.

20