New Signals to Search Engines Future Proofing Your Search Marketing Strategy by Mike Grehan

Home , Social search

Incisive Interactive Marketing LLC. 55 Broad St, 22nd Floor, New York, NY 10004

1. There Will Be Change. 2. Can The Future Of Search Really Be Based On A 63-Year-Old Idea?

3. Crawling Through the Chaos.

4. Links, Clicks and Cliques.

5. The Ten Blue Links Must Die: From SEO to Digital Asset Optimization (DAO).

6. Collective Intelligence.

7. Connected Marketing: Generation Y, The Always-On Mobile Generation.

8. Blah, Blah, Blah, Very Nice Mike. But So What?

Page | 2

There Will Be Change.

I have been involved in media and marketing my entire career: ten years in broadcast (radio and TV) and the rest in marketing, both Agency- and Client-side.

I wrote my first guide to search engine optimization (or positioning, as it was known then) in 2000. I find it remarkable that the advice I gave then on how to make a web site crawler- friendly and how to get quality links for better indexing and ranking remains pretty much the same.

Google is failing in its mission to make the world's information universally accessible by crawling the web. This isn’t because there's anything wrong with Google; instead, it’s because crawling was a perfectly acceptable way to find and index web pages back in the nineties when only millions existed. Today, Google itself is aware of a trillion URLs, but it stands no chance of crawling and indexing them in a timely fashion. What’s more, those trillion URLs don’t even represent the entire web – it's just what Google has discovered so far.

There has to be a better way.

In the second edition of my book, I studied information retrieval (IR) techniques and how they applied to web search. I dove into network theory and became absorbed with how things (such as web pages) are connected and how communities are created.

Now, as I burn the midnight oil scribbling away at my next book I realize just how powerful social network analysis is for the future of marketing. The web is no longer just a collection of HTML pages linked together. It is a network of networks of people who are totally connected to each other. I believe it's the understanding of these connections that will shape not just search, but all of marketing. The new book will build upon the thoughts, concepts and ideas outlined in this paper.

I have been accused, at times, of having a too theoretical approach to Search Engine Optimization. However, as a practitioner I have helped many Clients achieve major victories simply by better understanding the science of information retrieval on the web. IR is interdisciplinary, with roots in computer science, mathematics, library science, information science, information architecture, cognitive psychology, linguistics, statistics and physics. All of this underpins the freshness and relevance of search engine results.

Network theory in particular has long been applied to linkage data on the web. Its influence is still significant today, as the web evolves and increasing numbers of always-connected target groups emerge. Case in point: Algorithms such as HITS and Google’s PageRank are already being applied and tested in social networking sites. New signals are emanating from

Page | 3

the end user at search engines, such as query chains, click-through data and user trails. Therefore, tapping into the wisdom of crowds will surely be a vital component in the future of search.

I am not a math specialist. As with my previous books and papers, my goal is to translate the underlying complexities and mathematics of search into a simpler, non-math language, assessing the relevance of particular models as they apply to online marketing.

I believe the next major advances in search must be in the area of learning machines and artificial intelligence. Search engines started with the bare "A 5-inch-long rectangle with a long list of text results basics of computer technology, crawling, indexing and beneath it doesn't do much to help people make ranking HTML pages based on an end user’s query. sense of the billions upon billions of unorganized bits Significant improvements were then realized with of data in the world." Google’s distributed computing across a grid and Susan Dumais. Principal Researcher, Adaptive implementation of network theory in its PageRank Systems & Interaction Group, Microsoft Research. algorithm. Now it’s time for smarter programs and learning machines and even new protocols.

Try not to think about this document as a white paper; instead, use it as a jumping-off point for discussion. This document is based on a vision. It answers questions, but also poses new ones. This kind of dialogue is healthy and valuable as we try to think of creative ways to improve our Clients' visibility at search engines and their overall marketing objectives.

Search marketing has been based around a twenty-year-old protocol that was never intended to be a commercial medium. For sure, the web technology giants are now looking at newer signals from end users. But now, maybe, they should even be considering new Internet platforms and protocols. Let's explore.

Page | 4

Can The Future Of Search Really Be Based On A 63-Year-Old Idea?

Mendel's concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential.

The above passage comes directly from “As We May Think,” the seminal essay by American engineer and inventor Vannevar Bush. One of America's most successful scientists leading up to the Second World War, Bush is also known as “the man behind the atomic bomb.”

By the end of World War II, Bush questioned what scientists could do collectively as a follow-up to their joint work “[making] strange destructive gadgets.” In his essay, he argued that as humans turned from war, scientific efforts should shift from increasing physical abilities to making all collected human knowledge more accessible.

His idea sounds familiar, doesn’t it?

Google's mission is to organize the world's information and make it universally accessible and useful.

Bush made many predictions about future technology, including the fax machine, personal computers, the Internet, World Wide Web and speech recognition – all very startling as he was light years ahead of his time. But it was a system he called "memex" that probably makes him the earliest hypertext thinker.

Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

Page | 5

As you read the entire paper, you realize just how thought-provoking and inspirational it is. Bush didn't actually believe that machines would ever be able to emulate human memory, but he was convinced that memex could augment the brain by suggesting and recording useful associations.

The human mind... operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory.

Although Bush was a totally out-of-the-box thinker for his time, "Search as we know it today, is that loud two- he didn't get quite as stroke engine. But soon it will be the foundation far as the potential of for much more efficient and productive services a new digital medium. But his innovative as it becomes reinvented in a multitude of ways." thinking and vision sparked a new breed Andrei Broder, Yahoo! Research Fellow and Vice of hypertext thinkers. President for Computational Advertising. One of these thinkers is Ted Nelson, an American sociologist, philosopher and pioneer of information technology. He coined the term "hypertext" in 1963 and published it in 1965. His work has mainly been based around making computers easily accessible to ordinary people. As he says:

"A user interface should be so simple that a beginner in an emergency can understand it within ten seconds."

Nelson founded Project Xanadu in 1960 with the goal of creating a computer network featuring a simple user interface. He claims some aspects of his vision are already being fulfilled by the invention of the World Wide Web, but as a whole, he dislikes the web with all its XML and embedded mark-up. "HTML is precisely what we were trying to prevent!" he once famously said.

Hypertext, and the World Wide Web as we know it, has its roots back in 1980. But of course, something else had to happen before anyone could even conceive the idea of a World Wide Web: somebody had to invent the Internet.

Yes, even though the words “web” and “Internet” are used interchangeably, they're two entirely different things. You're probably thinking, “well everyone knows that," but you'd be surprised.

Page | 6

It's not necessary to discuss how or why the Internet came about. But it is important to understand that the World Wide Web is more or less an application that is delivered via the Internet.

To be clear, the Internet is a global system of interconnected computer networks that interchange data by packet switching using the standardized Internet Protocol Suite (TCP/IP). It is a "network of networks."

It carries various information resources and services – such as email, file sharing and online gaming, among other things – as well as the inter-linked hypertext documents and other resources known as the World Wide Web.

And who’s the "father of the Internet"? That’d be American computer scientist Vint Cerf. You can find tons of stuff about him on the World Wide Web. And so, for the purpose of moving forward in this paper, the Internet has now been invented.

Let's now skip forward to 1980 and introduce, for a change, an English computer scientist, Tim Berners Lee. Now Sir Tim Berners Lee (co-ranked first in the list of “100 Greatest Living Geniuses” by The Daily Telegraph) expanded upon the ideas of earlier hypertext thinkers and proposed a project based on the sharing and updating of information among researchers.

Berners Lee was also a fellow with CERN (European Organization for Nuclear Research). By 1989 he became aware that CERN was the largest Internet node in Europe and saw an opportunity to join hypertext with the Internet.

"I just had to take the hypertext idea and connect it to the Transmission Control Protocol and domain name system ideas and — ta-da! — the World Wide Web."

Yes, that's how simply he describes an invention that has profoundly impacted the entire planet.

The first Web site was built at CERN on 6 August 1991. It provided an explanation about what the World Wide Web was, how one could own a browser and how to set up a Web server. It was also the world's first Web directory, as Berners Lee maintained a list of other Web sites apart from his own. And that, perhaps, also makes him the founder of web search.

But you know what? The web and what people expect from it has come a very long way since then.

So, is the future of search really still based on a 63-year-old idea?

Page | 7

Crawling Through the Chaos.

There is a prevailing assumption about Google, the belief that it has access to all the content on the entire web. But this notion is not correct.

To many end users, Google is the web. Yet, mighty as it is, Google can only return results from the fraction of the web it has managed to crawl.

Crawler, spider, bot – they're all interchangeable terms for what search engines use to find, download and index web pages. That's what they were invented for: to download HTML pages.

However, they're almost blind to non-text content. And the vast amount of user-generated content, such as video, images, audio and other file types, are not so easily handled or indexed. Nevertheless, we keep plugging in all kinds of alien-to-the-browser apps (Flash, for instance), trying to turn browsers into something they were never meant to be. As a result, we’ve made it harder for search engine crawlers to find, classify and index our stuff.

The search engine crawler is the "textbook SEO" favorite, but it faces many imitations. There are strong freshness requirements and multiple timescales. Trying to discover the relevance of existing pages in the index while dealing with the rapid arrival rate of new web content isn't an easy task, either.

The overhead (average number of fetches required to discover one new page) needs to be kept to a minimum. Bandwidth is also an issue: It wouldn't be practical to attempt to download the entire web every day (and probably not even possible). Some sites are so large that they simply can't be crawled from beginning to end, even in the space of a week.

In fact, no crawler is ever likely to crawl the entire reachable web. An almost infinite number of URLs, spider traps, spam and all kinds of other issues prevent it.

Moreover, there will always be a trade-off between re-crawling existing pages and crawling a new page. After all, in a connected world where breaking news is of global concern, search engines must be able to provide such information in near real-time to avoid end-user dissonance.

New pages are primarily discovered when created and linked to from existing indexed pages. Of course, this is also where the “Filthy Linking Rich” dilemma I wrote about some years ago comes into play: web sites with a large number of links continually attract more

Page | 8

links than those with only a few. As a result, more of their content is indexed and linked to, perhaps giving them an edge when it comes to ranking.

There's also the case of search engines knowing about certain web pages, but not having yet crawled them. After all, billions of links are extracted from billions of pages by Google, meaning there must be some order and priority as to which get crawled first. Recently, Google announced that their link-processing systems had hit the trillion URL mark (but knowing URLs exist doesn't necessarily mean they will get crawled). Plus, Google estimates that the number of links is growing by several billion each day.

In addition, the invisible Web still exists. Millions of pages are locked in databases or behind password-protected areas that crawlers are blocked from. So while search engine crawlers are certainly much smarter now than in the web’s early days, they may never be able to provide a complete and "As content becomes more diverse, more complex, timely discovery of web content. bigger and more fragmented... Getting it through HTTP and HTML may not be the right model So what real use will the crawler serve in the future? anymore." Maybe it’ll be a backfill for other methods of Andrew Tomkins. Vice President, Search Research, Yahoo! information retrieval on the Internet?

The introduction of Google's Universal Search supports this assumption and proves that methods beyond the crawl are required to retrieve relevant information from the emerging structure of the web.

But can this method of data capture – which has its roots based on a technology that can be traced back to 1945 – be anywhere near as efficient as it was in the early days of the web?

User-generated content analysis. Cross-content analysis. Community analysis. Aggregate analysis. All of these must be taken into account to provide the most relevant results and richest end user experience.

The web grew too big for the original human-powered Yahoo! index to scale with. In response, they adopted the crawling/ranking algorithm seen at most search engines now. At the time, it seemed like the obvious way to go. But today, as user-generated content – ranging from social networking to blogs, photo sharing to video sites – grows exponentially, the crawler is slowly being defeated.

Is it time to explore new ways for search engines to bring together the world's information? Absolutely.

How about new protocols for different types of search engine feeds? And what about developing special relationships with publishers of user-generated content?

Change is coming.

Page | 9

Links, Clicks and Cliques.

There's an anecdote I tell at conferences when I'm speaking on linkage and connectivity data. It's the story of how foremost computer scientist Jon Kleinberg discovered the flaw in search engines trying to rank web pages based on the text on the page:

Back in 1997, when AltaVista was the dominant search engine, Kleinberg did a search for “search engine.” He was totally surprised to learn that “Alta Vista” didn't appear in its own results.

He then tried an informational query for “Japanese automotive manufacturer.”” He was even more astonished to observe that manufacturers such as Nissan, Toyota and Honda didn't appear at the top of the results.

Kleinberg then went back to the AltaVista home page and realized the words “search engine” didn't appear anywhere on the page. Similarly, trips to the Nissan, Toyota and Honda home pages turned up no sign of the phrase “Japanese automotive manufacturer.”

Kleinberg’s research and work are thoroughly discussed in the fascinating book Six Degrees: The Science of a Connected Age, written by world-renowned physicist Duncan Watts. Watts and Kleinberg collaborated on determining the new science of a connected age, eventually leading Kleinberg to developing the algorithm known as HITS, which is based on connectivity data and ranks documents on what are known as hub and authority scores (this occurred around the same time Larry Page and Sergey Brin were developing Google's PageRank algorithm).

In a nutshell, Kleinberg helped improve the quality of web search by applying social network analysis to the ranking mechanism. Instead of judging page quality by the text on the page, focus was shifted to the overall quality of the pages linking to it.

This is why there has been so much emphasis on link building in the SEO community. But consider this: If a link is a kind of

Page | 10

vote from one web page author to another (as Google refers to it), how do people without web pages vote (i.e. the haves vs. the have-nots)?

After all, it's not really democratic to instantly alienate a few hundred million Web users (back then) just because they didn't have links to vote with.

It's interesting because about four years ago I spoke to a well-known information retrieval scientist about human evaluation being folded into the algorithm. We talked about Yahoo's early recognition that it could never scale its human-powered index to match the exponential growth of the web.

I mentioned that there had to be added value knowing that an editor had actually viewed your web pages and indexed them. He responded that Kleinberg’s hubs and authorities algorithm did just that, with the hub sites functioning as editors, picking out authority sites and essentially improving the index in a mutually reinforcing way.

In a similar manner, the wisdom of crowds and the voice of the end user are now sending huge signals to search engines. As bookmarking, tagging and rating gain popularity and scales up, so will its influence on what appears in the search engine results pages (SERPs).

The strongest signals are now coming from the ubiquitous search engine "In terms of signals, the toolbar is the big one. Being toolbar. Mining the search able to follow end users trails to figure out how they trails of surfing crowds and connect with the content they're looking for." looking at end user activity data provides search Andrew Tomkins, Vice President of Search Research, engines with unique Yahoo! insights that help identify the most relevant websites.

In fact, whereas search has traditionally been based around signals from content creators (text, links etc.), it is now very much based around modeling user behavior. Searchers submit queries, reformulate them (query chains), click on results and then navigate away from the search engine.

The search result they click on, though, is not always the destination page. Users generally browse far from the search results (anywhere up to five clicks) and visit a range of domains during their information search.

Search engines have always had access to query and click-through logs for implicit end user feedback for re-ranking documents. But it's post-search browsing behavior which provides valuable information on destinations that are truly relevant to the user’s information goals.

Page | 11

End users provide huge amounts of information about the results they prefer for a given search by clicking on one result and choosing not to click on others. Search engines can use artificial neural networks to change the ordering of the search results to better reflect what users have actually clicked on in the past.

Why a neural network as opposed to just remembering a query and then counting how many times a result was clicked? The beauty of a neural network is that it can make reasonable guesses about results for queries it has never seen before (based on their similarity to other queries). This is significant as up to 25% of all queries to search engines each day have never been seen before.

The collective intelligence which has previously been applied to linkage data for ranking documents can also be applied to clicks and search trails. Moving away from the limitation of systems that focus purely on queries and documents is most certainly a major shift in information retrieval online. The relationships between queries and documents and the relationships between documents have been easy for search engines to capture, model and compute. But toolbar data allows search engines to capture relationships between queries, documents and a user's true search context. Of course, search engines have always been able to determine the quality of a page by end user behavior before the toolbar idea. Detecting thousands of clicks on the browser back-button sends a clear enough signal that the page is of low quality.

So search engines now have a very powerful combination of signals. I pondered recently what the results of a purely human-powered search engine would be like these days. And then it occurred to me that, based on learnings from organic listings, that's exactly what Google's AdWords advertising program is centered around: implicit feedback from the end user. The Ten Blue Links Must Die: From SEO to Digital Asset Optimization (DAO).

I'm not a big fan of the term “Web 2.0.” For me, it's all about “always-on broadband.” I don't think the web changed that much when we started adopting much "…when you are trying to remember the steps to the faster, always-on connections, but our surfing habits Charleston, a textual web page isn’t going to be nearly certainly did. as helpful as a video. The media of the results matters." Long gone are the days of waiting for a graphic to download, only to then look at a static web page Marissa Mayer, VP, Search Products & User Experience, Google. debating whether to read the text or not. End users expect a much richer experience these days. This is why YouTube is now as popular (if not more) than Google itself.

Page | 12

It's a natural progression, then, for search engines to tap into every source of information possible before weaving it all together on the results page. Of course, this also changes the way we approach search marketing. It can no longer be about getting Clients into the top ten blue links on a results page. Instead it’s all about search engine positioning: Where, above the fold, can you make your Clients more visible? What file types or methods do you have to utilize to achieve this?

I've always believed that ranking reports provide little or no value. This is especially true in today’s era of Universal Search, where ranking reports provide sub-optimal data. Your search marketing firm might tell you that you're in the top ten for X number of keywords at Google, but does translate into actual visibility?

For example, at the time of writing, a search for “bed and breakfast new york” brings back this result at Google:

Ignore the paid search results for a moment. Which of the above results is number one for the search? Obviously, it's the first of Google's local listings. The rest of the organic results have been pushed down below the fold. And just look at the rich data that comes with those local results: maps, telephone numbers, links to web sites and customer reviews.

A ranking report for this query might tell you you're in the top ten organic listings, but in actuality, you would have had to been number one or two just to be seen above the fold.

Page | 13

Take a look at this result for the query “iron man” from early 2008.

As you can see, the only way for anyone selling Iron Man merchandise to survive is to switch to a paid search campaign – unless you have the power to use a vertical creep approach and push the official Iron Man site below the fold.

So what does this mean? Essentially, the old strategy of pushing blue links up the charts for visibility may not be as successful as pushing competitors down below the fold.

Universal Search offers many new methods of increasing visibility at search engines. Understanding these methods, and combining it with knowledge of end user behavior at the search engine interface, can be extremely powerful.

After all, studies have shown that end users rarely click through to the second page of results (most don't even bother scrolling). Instead, users act in accordance to what are known as query chains. Following an initial query, most users scan the page above the fold. If they don't see a result that looks promising, they simply reformulate the query. This can happen numerous times. Once a search engine spots these query chains, it can then preempt the searcher’s actual destination page.

For example, someone searching for “rare collections” may then follow it with “special editions” and then finally “limited edition books.” If this is observed many, many times, then it would be safe to assume that the results for the last query would also be good for the first.

Page | 14

The dynamic between end user activity and the way that search engines present these new types of data, then, is the key to developing new search marketing strategies. Plus, when executed correctly, DAO offers a unique opportunity to achieve maximum visibility in a minimum amount of time.

For example, earlier this year I made a short video promoting the Search Engine Strategies "Marketers need to embrace universal Conference in London (I’m search by integrating video, images, books, etc. honored to be the Chair of the As far as Google trends go for 2009, they show). I compressed it, will affect SEOs as well." uploaded it to Google Video, Matt Cutts, Google Engineer and Head of YouTube, Metacafe and AOL Web Spam Team. Video. After that, I tagged and bookmarked it at various social media sites, such as Delicious and StumbleUpon.

I then sent out an email telling everyone on my contacts list that it was available at YouTube. After it had been played over 100 times and tagged more than ten times around various social media sites, it turned up as the number one search result for “ses London” (yes, it's what you would call a long tail search) And it happened in hours – not days or weeks!.

Even more noteworthy, a search today for “mike grehan ses” still brings two videos to the top of the pile – even after all these months!

Now that's the power of the social web and Universal Search.

Page | 15

Collective Intelligence.

Perhaps the biggest change in search is the shift toward information-seeking on social networking sites. The knowledge possessed by your friends and people you know acts as a supplement to the web's huge amount of other, less verifiable information. This knowledge can provide extremely qualified answers to specific queries through a process defined as information-seeking via a chain of trust.

Online communities are becoming an increasingly important area of research because of the rich signals they send to search engines. Web pages are no longer "There’s a lot of expertise, purely static. Real-time chat takes place constantly on knowledge, and context in the Web. Tagging and folksonomy data arrangement, users’ social graphs, so along with rating and reputation systems, are ushering putting tools in place to make search into a whole new era. “friend-augmented" search easy could make search more Much research is also taking place into combining data efficient and more relevant." from social networks and document reference Marissa Mayer, VP, Search networks like PageRank to create a dual layer of trust- Products & User Experience, enhanced (or socially-enhanced) search result ranking. Google. It’s a case of an algorithm mind-meld with the wisdom of crowds, if you will.

Jon Kleinberg himself has shifted his research focus from a search engine's centralized index to the social structure of large online communities.

Similarly, Google has launched a new API to graph social networks across the web. This intensive research by major search engines into the web's social fabric clearly indicates that we're moving into a new form of information retrieval, one of networks of trust.

But of course, while the term “social media” may be new, the idea of user-generated content is not. After all, bulletin boards, discussion groups and forums have been around since the web’s early days. But what is new is the format. In the past, user- generated content was strictly text-based, but today it encompasses all types of electronic content: movies, images, tags, rankings, ratings...the list goes on and on.

Whatever the term, the power of user-generated content cannot be denied. Social media has become indispensable to millions of users. Community question-answering networks in particular have become a popular destination for people looking for help.

Page | 16

In fact, in some countries this type of search has become more popular than search engine results themselves. For example, an average of 16 million people visit South Korean search portal Naver each day, keying in 110 million queries. Naver’s users also post an average of 44,000 questions a day on Knowledge iN, an interactive question-and-answer database. These receive about 110,000 answers, which range from one-sentence replies to academic essays complete with footnotes. Such widespread involvement shows that the influence of social media and connected marketing is stronger than ever.

Still not convinced? Then consider the Dewey Decimal system, which has long been the backbone of library cataloguing systems across the United States. Now, the Dewey Decimal system might seem a little out of place when discussing social media, but bear with me.

As an exercise, try visiting the Danbury Library’s online catalogue. As Gene Smith notes in his new book Tagging, what you’ll find is something most unlike a typical library.

For example, a search for The Catcher In The Rye brings up a list of related books and tags, not just a call number. These tags contain keywords such as “20th century adolescence” and “angst.” Click on the tag marked “angst” and you'll find a list of angst-riddled titles, such as The Bell Jar and The Virgin Suicides. In essence, each tag links to a new set of books and ideas.

Why is this significant? Well, it’s because these tags aren't the work of diligent librarians in the Danbury Library. Instead, they’re the result of Library Thing, a web site where more than 200,000 users track and tag their personal libraries.

In fact, Library Thing's members have added nearly 20 million tags to 15 million books, making it the second largest library in North America. It’s undeniable: People-powered meta data for the social web is becoming hugely popular.

That’s why it simply cannot be ignored by search engines. But how do they deal with this data and fold it in?

Page | 17

Connected Marketing: Generation Y, The Always-On Mobile Generation.

Perhaps the most dramatic change, though, will come with always- connected, 3G mobile phones. For example, the iPhone is already the leader in the number of mobile/local searches at Google. Such a device – which can send and receive email on the move, not to mention surf the web, receive RSS feeds, instant message, Twitter and dialogue at social networking sites – indicates that this truly is the age of connected marketing.

In fact, connected marketing may be a much more realistic and descriptive term than social media. In order to have social media, one needs to have social networks. In a network of trust you can tap into the wisdom of crowds. And social search provides much more validated and verified answers than that of a search engine algorithm.

A new phenomenon of digitally connected communities are emerging as a force to counterbalance the power of the big brands and advertising.

Modern consumers are forming communities and peer-groups to pool their power, resulting in a dramatic revolution of how businesses interact with their customers. We need to think about interactive engagement marketing and community-based communications. New marketing strategies will be built around this.

Traditional marketing methods are becoming increasingly ineffective and even counter- productive. The power of the brands and the abuses by marketing have created a vacuum for a counter-balance, and digitally connected communities are emerging as the counter-force to redress the balance. Everything is interconnected: The physical world, social systems, your innermost thoughts, the unrelenting logic of The way a business can and must interact with the the computer - everything forms one immense, powerful new communities is through engagement interconnected system of reality. Nothing exists in marketing, by enticing the communities to interact isolation; everything is part of the system and part of a with the brands. larger context.

Andy Hunt, Programmer and Author.

Page | 18

Blah, Blah, Blah, Very Nice Mike. But So What?

It's a thought paper, so…

Think about search yesterday.

Long before Google, even before the World Wide Web, there were search engines. In 1994, Brian Pinkerton launched WebCrawler, arguably the Web's first full-text retrieval search engine. I spoke with Pinkerton a few times many years ago when researching the second edition of my book. He explained to me how he had applied Cornell University computer science professor Gerard Salton's vector space model for the extraction and analysis of text for ranking documents. At that time, SEO was all about manipulating rankings through on- page factors.

A little cottage industry of search engine optimizers started to grow. More and more people became aware of the tactics and techniques they could employ to mess around with Web pages and outsmart competitors.

Then Google arrived and introduced a hyperlink-based algorithm called PageRank. This wasn’t a completely novel idea: before Google, foremost computer scientist Jon Kleinberg had written about incorporating network theory and citation analysis into a ranking algorithm.

But ten years ago, the emphasis in SEO switched from basic page tweaking to the quest for inbound links. The industry had to change with it. Unfortunately, to paraphrase Thomas Edison, PageRank came dressed in overalls and looked like work.

Think about…

Search Today.

Discovering knowledge from hypertext data such as text and link analysis is still a large part of a rapidly developing science under the banner of information retrieval. These signals have been very strong in helping search engines determine relevancy and rank according to authority. Yet even though industry leaders acknowledge that SEO is much more of a marketing process than a technical effort, much attention is still given to crawler activity and indexing.

Page | 19

Over time, crawlers have gotten smarter and CMS developers have become more aware of sites' need to be more crawler-friendly. But even after 10 years of research and development, inherent problems from processes related to search engine information retrieval still remain. But Google's rollout of Universal Search last year completely changed the playing field. All of the other major search engines have followed Google's lead and are discovering new methods of uncovering patterns in different types of web content, structure and end-user data.

Signals from end users who previously couldn't vote for content via links from web pages are now able to vote for content with their clicks, bookmarks, tags and ratings. These are very strong signals to search engines, and best of all, they don't rely on the elitism of one web site owner linking to another or the often mediocre crawl of a dumb bot.

So what about…

Search Tomorrow.

In November 2003 the industry was rocked with Google's Florida update. Google's rollout of Universal Search last year caught everyone by surprise. And now, with Google's launch of Wiki, you can say farewell to the old-fashioned ranking report.

The guys at Google don't give any advance warning about fairly dramatic changes that affect the entire SEO industry, so we need to be better prepared and understand where search is going.

Is your SEO vendor (or your in-house SEO team) fully aware of the new signals and able to develop new forms of optimization and strategy? After all, SEO will give way to a new form of Digital Asset Management and Optimization for a search engine audience expecting a much richer experience. This new type of SEO will place a much larger emphasis on optimizing a range of file types, from PDFs to images to audio/visual content.

More effort will be placed on feeds to search engines. This won’t be just XML feeds into paid inclusion and shopping comparison, but will also incorporate feeds with other types of information, such as local, financial, news and other verticals. Mobile will become much more popular and search will gradually become more of a personalized experience.

Personalization and Digital Asset Optimization will end 1999-style ranking reports, as search results will be based on blended results from end-user specifics, such as geographic location, time of day, previous searching history and peer group preference.

Online, monitoring of the customer’s voice will become much more important than pushing a brand message. Monitoring search results for different file types will become increasingly important as end users (even competitors) upload content that may be related to your brand/ product/service. Reputation management will be highly valued as marketing

Page | 20

continues its reversal from a broadcast medium to a listening medium. Marketing into networks will see huge growth, and social search will grow with it.

Google's mission is to organize the world's information and make it universally accessible and useful.

I'm returning to this mission statement from Google. I quoted it at the beginning of this document. Having gathered together some of the facts relating to the future of information retrieval on the web, I firmly believe that Google is failing in its mission. That said, it's not because anything is wrong with Google. It's because the whole approach to a “walled garden” environment where everything is kept in order may not be feasible anymore. If the web (and crawling the web) really was the answer to the future of information retrieval, then it might be possible. But with so much content outside the crawl and end users now requiring a much richer experience than a few text-based documents, it may be time to rethink the whole "I had (and still have) a dream that the World Wide thing. Web could be less of a television channel and more of an interactive sea of shared knowledge." We're essentially trying to force elephants into Sir Tim Berners Lee, Inventor of the Worldwide Web. browsers that don't want them. The browser that Sir Tim Berners Lee invented, along with HTML and the HTTP protocol, was intended to render text and graphics on a page delivered to your computer via a dial-up modem, not to watch movies like we do today. Search engine crawlers were developed to capture text from HTML pages and analyze links between pages, but with so much information outside the crawl, is it the right method for an always-on, ever-demanding audience of self producers?

It's clear that "Web 2.0" has created a fundamental transformation of the web into a true collaborative and social platform. In the realm of the social web, technology is being developed to augment "social cognition" - that is, the ability of a group of people to remember think and reason.

In the 1975 science fiction movie Rollerball, all of the world's information is stored on computers. Somehow, the computers manage to lose the entire 13th century. Perhaps the movie is right: a centralized repository is not the way to go when trying to organize the world's information and make it universally accessible and useful.

"I just had to take the hypertext idea and connect it to the Transmission Control Protocol and domain name system ideas and — ta-da! — the World Wide Web."

It's this statement which is so intriguing, though. Sir Tim Berners Lee invented the World Wide Web in no time at all. He simply used his previous knowledge of hypertext thinking and applied a protocol to a medium that already existed.

So what about the possibility of developing new platforms or even new protocols?

Page | 21

Many years ago your grandparents sat in front of a brown wooden box listening to Franklin D Roosevelt. Who knows, maybe last week you bought a new HD-ready TV. Guess what? Same airwaves, different protocol.

Interesting thought!

Page | 22