RANKBRAINRANKBRAIN ANDAND THETHE ERAERA OFOF ARTIFICIALARTIFICIAL INTELLIGENCEINTELLIGENCE

www.greenlightdigital.com Search Algorithms

2 www.greenlightdigital.com |  +44 (0)20 7253 7000 RANKBRAIN AND THE ERA OF ARTIFICIAL INTELLIGENCE

ADAM BUNN, DIRECTOR OF SEO AND CONTENT & ENGAGEMENT

Last October’s RankBrain story was one of the splashiest of 2015. When I studied the impact RankBrain seemed to be having on the SERPs and the information available about it, including two different patents that seemed closely related to what RankBrain is supposed to do, I was reminded a lot of a previous update from called Hummingbird. You may remember it as 2013’s major algorithm update, which I wrote about in the 2013 SEO Briefing that year.

Hummingbird marked the first time Google had ostensibly are an ambiguous term that humans understand only named its entire algorithm – but if you dig into what it because of the context that the search term sits within. But, was all about, you’ll see it was predominantly a change post- Hummingbird, places would have been substituted to the way Google processed search queries, with a with restaurants, resulting in more relevant results being few supporting tweaks to existing algorithms as well as returned. Effectively, Hummingbird allows Google to changes to the way sites were indexed. All the existing understand ambiguous parts of queries by their context. components, such as PageRank, Panda, Penguin and so on – the ones which actually determine rankings - were still But, here’s the thing: it does this with a sort of “brute running. There was a lot of panic and excitement about force” approach that relied on trying out various known Hummingbird caused by the apparent change to the entire potential substitutions of any ambiguous words in context algorithm (even though the actual ranking algorithms with the words either side of it, then the words one word barely changed at all) which was further spurred by the away either side, then the words two words away, until reports from Google that it affected 90% of queries an adequate substitution was found. Finally, the revised (it was a change to the way they processed queries, so query is run alongside, or instead of, the original query. naturally it affected nearly all of them!). Ultimately, it isn’t a real “understanding” of the terms.

In my estimation, the big change with Hummingbird All of the context around Hummingbird’s query processing was the introduction of a series of synonym databases engine is important if you want to understand the and “substitution engines” that allowed Google to do a RankBrain we know today, as it seems to be an evolution better job of returning relevant results for ambiguous of Hummingbird’s bid to comprehending search terms. natural language queries. A good example from one of the The big difference is that instead of revising some of the Hummingbird patents was handling queries such as “pizza queries with the above described brute force approach, places in London”. This would have been a difficult query it revises them by guessing the intent of a query using a for pre-Hummingbird Google to deal with because places algorithm, which can learn from previous

www.greenlightdigital.com |  +44 (0)20 7253 7000 3 Search Algorithms

data about what searchers meant when making ambiguous A final point is that when RankBrain came out, Google queries, and applying those learnings to understand new publicly stated that it was “the third most important ambiguous queries. ranking factor”. Based on what I’ve put forward so far, you’d be right to conclude that it’s certainly not a “ranking Compared to Hummingbird’s query processing, this makes factor” that you’d need to consider in the traditional sense, RankBrain particularly adept at understanding queries so I don’t quite know how to reconcile that statement. In that Google has never seen before. According to Google, my view it was almost certainly hyperbole designed to RankBrain was rolled out gradually at the beginning of attract coverage on the company’s advances in artificial 2015, and helps with “a large fraction” of the queries it intelligence, or a twisted version of the truth. If I had to receives (15% was initially reported, but this later proved to guess, I’d say that the “large fraction” of queries it affects be an unfounded assumption). means that it technically impacts the most top tier queries I won’t pretend to be an expert on machine learning but of any Google algorithm, having the third greatest reach in from what we know, Google is using a branch of the science terms of ranking factors. Measured like this, algorithms we called “deep learning” that tries to create AI-like behaviour consider to have important ranking factors, such as Panda based on large data sets, and has probably been applying or Penguin, look insignificant in comparison as they usually it to their products for a while. They’ve had a dedicated only impact 2-3% of queries each time they’re rolled out. deep learning project called since 2011, and This kind of chicanery would provide a technically sound Google’s capabilities in this field took big strides forward basis for Google’s claim, even though actual rankings are in 2013 and 2014 with the acquisition of DNNResearch not impacted too much by RankBrain. Inc. and DeepMind Technologies respectively, with both companies focussed on deep learning.

4 www.greenlightdigital.com |  +44 (0)20 7253 7000 The Quality Update or “Phantom”

On 5th May 2015, a noticeable algorithm update took place. The update was initially named “Phantom” by the industry because there was no word from Google on its nature, or even that an update had happened. Our testing and analysis showed significant changes in the ranking impact of key user signals such as bounce rate. Later that month, Google confirmed it had changed the way site quality was being processed in its core algorithm, but provided no further detail on the changes.

NEWS WAVE PANDA A comparatively minor update that The industry headed into 2015 believing seemingly benefitted sites that regularly that Panda had been incorporated as updated their content – namely a permanent “switched on” part of the newspapers and magazines – was reported algorithm. This marked a change from the in June 2015. It corresponded with Google traditional behaviour of a Panda update, regaining access to the “Twitter Firehose” which would typically be pushed out – a raw feed of tweets directly from Twitter monthly. By April, Google had stated this – lending credence to the theory that wasn’t the case, and July saw the start of Google was analysing trending topics and the first Panda update of the year – almost responding with fresh content, leading to a 10 months after the previous confirmed boost for news websites. update. Google eventually stated it would roll out the update over the course of PENGUIN several months. Meanwhile, Penguin didn’t seem to In August 2015, many sites saw gains run at all last year; there was no public they’d made in July reversed, leading some announcement of an update and no notable to speculate that Google had wound back shifts in rankings that pointed towards the update. Penguin as the cause. We were all braced Given the huge reduction in frequency, the for a big update in October 2015 when long roll outs and the general silence on Google stated the new Penguin would the subject of Panda, it certainly seems be coming “soon”, but they eventually we’re in a post-Panda era – at least as far postponed it to 2016 (and, of course, it still as Google’s public acknowledgement of the hasn’t happened as I write this). algorithm. Google may have found a way to The last confirmed Penguin update was in score quality without Panda, or more likely December 2014. considering it was always their intention, finally and quietly incorporated part, or all of it, into the main algorithm.

www.greenlightdigital.com |  +44 (0)20 7253 7000 5 www.greenlightdigital.com

The Varnish Works, 3 Bravingtons Walk, King’s Cross, London, N1 9AJ +44 (0)20 7253 7000 [email protected]