The Essential Guide to Enterprise Search in SharePoint 2013 Everything You Need to Know to Get the Most Out of Search and Search-based Applications About the authors

Jeff Fried, CTO, BA Insight Jeff is a long-standing search nerd. He was the VP of Products for company LingoMotors, VP of Advanced Solutions for FAST Search, and technical product manager for all Microsoft enterprise search products. He is also a frequent writer, who has authored 50 technical papers and co-authored two new books on SharePoint and search. He holds over 15 patents, and routinely speaks at industry events.

Agnes Molnar, MVP Agnes is a Microsoft SharePoint MVP and a Senior Solutions Consultant for BA Insight. She has also co-authored and contributed to several SharePoint books. She is a regular speaker at technical conferences and symposiums around the world.

Michael Himelstein, vTSP Michael has more than 20 years of practical experience developing, deploying, and architecting search-based applications. In this role he has advised hundreds of the largest companies around the world around unified information access. He was previously a Technology Solutions Manager in the Enterprise Search Group at Microsoft.

Tony Malandain Tony Malandain is a co-founder of BA-Insight. Tony architected and built the first version of the product which gained significant momentum on the Microsoft Office SharePoint Server (MOSS) and positioned BA Insight as the leading Enhanced Search vendor for SharePoint. Tony was awarded a patent for the core AptivRank technology, which monitors usage behavior of search users to influence relevancy automatically.

Eric Moore Eric Moore is the lead for BA Insight’s Search Interactions and Content Enrichment teams. He is accustomed to living at the leading edge of search, and has deep experience with multimedia search, XML search, and content enrichment. Prior to BA Insight, Eric worked for five years at FAST and on the Microsoft Search Platform team. Eric has developed state of the art Products, algorithms, and platforms for specialized information workers.

SharePoint 2013 The essential guide to enterprise search 2 What’Introducts in thiions e-book?

There’s a lot to say about SharePoint 2013, and about search in SharePoint 2013. This e-book is focused only on search, and is meant to give you a working understanding of the new features so that you can get oriented with them and think about how you will deploy and use them. It does not try to cover everything, nor is it meant to be a hands-on guide. In this book we will be covering five key areas as they relate to search. These key areas are color coded, and represented by the blocks below. Each section contains short chapters that can be read independently or continuously. The goal is to enable readers to focus on the information they need to learn about at the moment.

User Working Working Architecture, Applications & Experience with Queries with Deployment & Development & Results Content Operations

Not every area of search has changed in SharePoint 2013, and those that are currently familiar with search won’t be lost at sea. For example, the deployment model, services architecture, and crawling and connector subsystems are pretty much the same as with SharePoint 2010. End users will see a dramatically different search UI, but they will be able to use it with no training (it’s quite intuitive). If you have built up a competency in search, you’ll be able to take it further in many ways — which we highlight throughout this e-book. Deeper Dives: Technet — What’s new in SharePoint 2013 search Blog article from Microsoft Search Group TechNet landing page refreshed weekly with articles on SharePoint 2013 Highlights of Search in SharePoint 2013

SharePoint 2013 TheThe essenessenttialial guideguide t too enentterpriseerprise searchsearch 3 What’s in this e-book?

Highlights and Key Take-Aways

User Experience What’s New? Benefits The face of search is totally revamped — not just in The search experience is easy, clean, and fast. keeping with the new SharePoint UX overall, but with deep refinements, better display for results using Result Blocks, a hover panel with previews, and more.

Working with Queries & Results What’s New? Benefits In SharePoint 2013 search scopes, federated locations, and best sharePoint 2013 is light-years ahead of other search platforms bets are now deprecated in favor of result sources, query rules, in this area. Result sources, query rules, and result templates and result templates. off remarkable control over search presentation. These are brand-new concepts, well worth learning — they arm site administrators and site collection administrators with the tools to field powerful, effective search.

Working with Content What’s New? Benefits crawling is an area that has changed least with SharePoint With continuous crawling, users get fresher content faster. 2013, but there are still some great enhancements, including continuous crawling.

Business Connectivity Services has continued to evolve and complex security scenarios are more tractable (though now supports claims tokens through the BDC. still hard).

The Content Processing and Linguistics capabilities in SharePoint This platform offers a lot of power to developers, as well as 2013 search are very strong and extensible. There’s lots of new providing some key capabilities end users will notice. capabilities including a completely new file parsing mechanism.

Architecture, Deployment & Operations What’s New? Benefits Under the hood, there is a new architecture, a new search Search deployment and management is different, and largely core, and many new modules that are the culmination of the better. Making search hum for O365 — fully multi-tenant, FAST acquisition — not just combining the best of FAST and smoothly scalable and fault-tolerant, and manageable at SharePoint search, but some significant innovations from a multiple levels — was a key goal for this release and there are continued investment in search. big benefits for on-premise deployments too.

Applications & Development What’s New? Benefits There’s a new development model for SharePoint 2013 This makes extending search much more accessible, and generally, and for Search specifically. will foster a lot of exciting search-based applications.

There’s a new Content Extensibility Web Service (CEWS) a lot of great possibilities are now open that opens up content processing for extension. to developers.

search is used pervasively throughout the SharePoint 2013 Your users will get more done and enjoy a variety of platform, and powers the new web content management applications, both built in and tailored — all powered by (WCM) and e-discovery capabilities, topic pages, the content- search. by-search web part, myTasks, mySiteView, and more — along with great enterprise search, people search, and site search.

SharePoint 2013 The essential guide to enterprise search 4 Table of Contents

6 Introduction SharePoint 2013 Search is Here

7 Chapter 1 User Experience — T the New Face of Search in SharePoint 2013 8 R raising the Bar: The SharePoint 2013 User Experience 10 First Class Search Interactions: More to Love 12 The SharePoint 2013 Search Center Overview 14 Refiners and Faceted Navigation 16 S search Center Setup

18 Chapter 2 Working with Queries and Results — N new Mechanisms in SharePoint 2013 19 Query Processing: the ’s Automatic Transmission 22 Query Rules and Query Suggestions 26 Result Types and Result Templates

28 Chapter 3 Working with Content — Crawling, Connectors, and Content Processing 29 C content Capture 33 Content Processing 36 Linguistics Processing

40 Chapter 4 Architecture, Deployment, and Operations — Getting under the Hood 41 N new Architecture, Single Search Engine Core 45 Indexing and Partitions 47 Analytics 49 Federation and Result sources 52 Search in Exchange 54 Search Administration 58 Upgrade and Migration

63 Chapter 5 Applications and Development — N new Models for Search-Based Applications 64 The New Development Model in SharePoint 2013 69 The Content Enrichment Web Service (CEWS) 71 S search-Based Applications in SharePoint 2013

77 Conclusion Introduction

SharePoint 2013 Search is Here There’s a New Search in Town SharePoint and a huge architectural change SharePoint 2013 has arrived, and it is chock full for search specifically, there are also many new of new capabilities and features. This is a release features to build on. Peeking under the hood, with major architectural changes, built “for the there is evidence that there’s more innovation next 15 years”, and it is very different from to come in future releases — powerful new SharePoint 2010. mechanisms which aren’t fully used yet.

With SharePoint 2013, the enterprise search This isn’t a perfect release — there are some capabilities are dramatically different and things that take getting used to, some areas that very exciting. Search has a new face, a new still need sanding, and some situations where development model, and some remarkable you need to write code or turn to partners built-in features. For search Jedis this new to boost the power of your search capabilities. platform has a lot to love, it is: We’ll point out some of these areas where you can turbocharge your search in this e-book. • Clean, fast, and easy to use. • Straightforward to install, administer, Search technology (and basically all software and scale. that does sophisticated things around human • Provides very powerful high-end search language) is extremely hard in general. High-end features. search is very powerful, and can be applied in a myriad of situations, so covering everything is • Makes creating search-based applications at odds with making search easy. The approach simpler than ever. of providing hooks for extensibility and For search Jedi apprentices, this release will encouraging partners and customers to use change your world. Search is the “Force” used them works — and Microsoft has a great set of pervasively throughout SharePoint 2013 and partners to pull this off. has the power to transform the way your Search is still hard — don’t let the easy, business uses SharePoint. simple user experience fool you into thinking What is intriguing about this release is that otherwise. But Microsoft has done a remarkable it’s very clear that Microsoft’s investment and job making this high-end technology accessible innovation around search hasn’t stopped — it and easy for the mainstream. You will get has accelerated. They’ve hit a key design target enormous benefit from this release, so get to (easy, powerful search that runs on premise know it. or in the cloud) right on the money. Since this release is a key architectural change for

SharePoint 2013 The essential guide to enterprise search 6 CHAPTER 1 U ser Experience – The New Face of Search in SharePoint 2013

7 Chapter 1 The New Face of Search in Sharepoint 2013

Raising the Bar: • Mobile and Tablet Deployment — Support for fluid layouts, touch, and The SharePoint 2013 voice interaction mean that using User Experience SharePoint on Microsoft’s Surface tablet User experience broadly characterizes the way and the Apple iPad is much easier and that people, users, work through user interfaces smoother. This means that users can and information and product-specific concepts access information anywhere at anytime, to get work done. SharePoint’s users, broadly, with the same ease-of-use they’re can be pegged to two groups: familiar with from their desktop. Business End Users — regular, line of business users who utilize SharePoint for specific tasks and projects.

IT Users — IT professionals who manage, configure, and customize SharePoint for business users.

For any new generation of a product, user experience goals are straightforward: make it easier for the user to get work done faster, cheaper, and better. A simple, intuitive, attractive design also helps. Consumers expect ease-of-use and a certain amount of slickness when it comes to interacting with products; the bar is high when • SharePoint 2013 and Applications — The it comes to how they can get work done. bar is also going up when it comes to ease of access to information. SharePoint With SharePoint 2013, there are several 2013 is able to field experiences that developments surrounding user experience are mobile and search driven, as well as that business users can look forward to: for customer and employee only facing • Modern UI/Windows 8 Look and Feel sites. There are a variety of full-fledged — The new look and feel confronts applications that run on your desktop, users with the most “radical” update in your browser, and on leading mobile in 20 years (UI news link below) to devices and present new ways to access prepare for a multi-device world. This and interact with SharePoint information, look and feel for the Windows operating further enhancing the user experience system supports mobile and has the and productivity. ability to boost productivity for an increasingly mobile work force.

SharePoint 2013 The essential guide to enterprise search 8 Chapter 1 The New Face of Search in Sharepoint 2013

Open for Designers Mobile Challenges and Opportunities: If you’re familiar with SharePoint you know that Windows 8 and Metro you can customize your interface to make it Windows 8 devices have a new interaction look nearly any way you want to — but you flow. The desktop, charms, apps, and tiles are also know that the vast majority of business distinctly different from the familiar Windows users leave the look and feel as the default and 7 desktop. This represents an opportunity for never change it. With SharePoint 2013, you application developers to create truly engaging no longer use PowerPoint to create themes in user experiences that work across many a proprietary format. It’s easy to theme sites devices. However, it also poses a challenge for using HTML (including support for HTML5) developers to learn the Windows 8 stack, and a — as shown below. This opens up SharePoint learning curve for users. Touch is highly intuitive design to a much wider range of customization and highly engaging; that said, the question will by designers, and will result in a lot of very be how and when do users gain their first attractive SharePoint sites. experience and confidence with idiomatic Windows 8. Will learning be amortized in context of your project, or someone else’s?

Metro, as seen so far in SharePoint 2013, is a sparer, less dense way of presenting information, which is good from a user experience perspective. It also means there is less information displayed per page of results, and that decrease may trouble users who rely The SharePoint 2013 user experience is on “recall” over “precision” in their browsing a platform-wide update, ready for a new and scanning. The solution to this problem may generation of interaction. Changes in be to present information more effectively, to the underlying presentation tier, service make the less more. In order to provide richer architecture, Object Model (OM), and Office results, the design and consequent development Apps all further the goal of making it easier to of processing and enrichment processes configure and deploy valuable applications in will require new skills from the SharePoint this new delivery environment. application developer.

Deeper Dives TechNet on mobile devices and Blog with highlights of Design Article on Windows 8 UI » SharePoint 2013 » Features in SharePoint 2013 » SharePoint 2013 UI blog »

SharePoint 2013 The essential guide to enterprise search 9 Chapter 1 The New Face of Search in Sharepoint 2013

First Class Search Interactions • Transitions Across SharePoint Tasks — The disjunction between “contextual — More to Love search” and “search sites” is gone in SharePoint 2013 has revamped the user SharePoint 2013. There are fewer experience overall (not just for search), and obvious differences between apps; this offers nice user experience improvements for version of SharePoint does not feel everyone. Highlights of the previous release, stitched together like previous versions. SharePoint 2010, included the roll out of the New developments include the seamless “ribbon” across all of Office and SharePoint, and flow between functions such as people the first roll out of Office Web Applications. search and search verticals. Search specific developments for the • Productivity — Search helps users SharePoint 2013 platform for the end user quickly return to important sites and include a flatter, cleaner, and more responsive documents by remembering what they interface. The “flatness” comes from a top have previously searched and clicked. The down design that makes the transition of results of previously searched and clicked views in SharePoint Views (sites and document items are displayed as query suggestions libraries), Search Views (search sites) and at the top of the results page. Detail Views (snippet and document) invisible. This improved responsiveness comes from • Search Mechanisms Under the Hood — the new architecture of the SharePoint 2013 Queries, interpreting queries, returning presentation tier, which extensively uses relevant results, and the presentation modern HTML, JavaScript, and AJAX style of those results are pervasive across interactions with responsive SharePoint search SharePoint 2013. It’s not always and metadata services. obvious that search is “there”, but

But that’s not all folks, there’s a lot more to appreciate about the new Search User Experience with SharePoint 2013 Search:

• Document Previews — Office documents are rendered in the page for easy viewing, so there’s less interruption going from one view to the next.

• Interactive Elements — Fly outs or hover card patterns are implemented quickly and cleanly. Search results fly in and additional information about what you are looking for is available with a flick of the mouse.

SharePoint 2013 The essential guide to enterprise search 10 Chapter 1 The New Face of Search in Sharepoint 2013

search technologies are used across The New Face of Search the SharePoint 2013 platform, and key Search in SharePoint 2013 has a completely new interfaces lower the complexity different look and feel from previous versions; of customization IT professionals and the UI has been largely rewritten. The new application developers need to do in face of search in SharePoint 2013 is easy-to-use, order to support business users. clean, and intuitive — it offers easy exploration and navigation of information while presenting Search powers a number of areas which may information in an actionable format. This is a far or may not be obvious as search: cry from the ten blue links concept that the • Upgrades to People Search and industry has been living with for nearly 20 Social Features — making it easy to years. There are also a number of changes that explore and find people, expertise, and have been made to enhance ease-of-access to conversations that are important to the information supporting both productivity and task at hand. mobility.

• New Social Features — My Sites, While the out of the box interface is clean, and Communities, Teams, and Conversations we view this as a positive enhancement, it is create dynamic content that are quickly not as information-dense as heavy search users indexed via constant incremental crawls and demand. There are a number of search-based returned through SharePoint 2013 search. applications that can bridge user requirements surrounding information access and analysis and • Personalization Features — search we will provide several next steps and options suggestions are personalized, and include for review at the end of this e-book. visited documents, as described in the chapter on query rules and query suggestions. These show up “as if by magic”, and many users enjoy them Deeper Dives without thinking about search at all. Search User Interfaces book by Marti Hearst » Overall, the search interfaces are clearer and brighter, and all the different parts of SharePoint apps seem to work better together. It is also much easier to customize search-driven experiences in SharePoint 2013 than with any other enterprise search platform.

SharePoint 2013 The essential guide to enterprise search 11 Chapter 1 The New Face of Search in Sharepoint 2013

The SharePoint 2013 Document Previews and the Hover Panel Search Center — Overview One of the most exciting new features added The SharePoint 2013 Search Center has inherited to SharePoint 2013 is the integration of the new look of SharePoint 2013 — overall it is document previews right within the search clean, modern, and dynamic. As you can see from results. This feature leverages a new standalone the screenshot below, it is quite different than server that hosts Office Web Applications. what you are used to seeing. The familiar tabbed With Office Web Apps users can now open interface is apparent, but it has a more streamlined a document in a web client environment with look and feel and includes some new out of the reasonably high fidelity while preserving format, box tabs such as videos. There are also more fonts, sizing, etc. actions that can be done directly from the search A key component within the document interface, including a hover panel. preview display is the “take a look inside” Some of the capabilities from FAST show up functionality. This provides the ability to in this release as well — deep refiners and jump specifically to a relevant section of the document previews in particular. These have been document, based on extraction of sections for taken to the next level with additional features several document types. For example, because such as the ability to show histograms for dates, it is likely that the slide titles in a PowerPoint and allow for a search inside the refiners. While presentation were designed by the presenter to both capabilities are welcome, they are somewhat summarize the content of each slide, these titles limited — whetting your appetite for more. are extracted and shown as links. This feature is also available for Word documents and Excel *Note: The refiner counts are turned off by default, documents (focused on graphs and named but they appear with one click in the web part tables) as well as SharePoint sites (top sub sites configuration panel. and document libraries).

There are some limitations to the document preview features with SharePoint 2013. It is relatively slow and missing functionality that other preview products take for granted. This includes search term hit highlighting, the ability to immediately jump to the most relevant page of the document, as well as copy and paste functionality from within the preview. Breadth of content types is another area where SharePoint 2013 previews falls short — they are only available for content hosted in SharePoint, and only for a limited set of file

SharePoint 2013 The essential guide to enterprise search 12 Chapter 1 The New Face of Search in Sharepoint 2013

formats (for example, Word and PowerPoint, is available by default. The new hover panel but not PDF). This preview technology was not provides a great way to show profiles and designed for documents to be consumed via this content, in addition to social connections. interface, but rather to determine if this is the particular document that you have been looking for. Notwithstanding these limitations, though, document previews are a boon to the user and a great addition to search.

The hover panel paradigm works well in the Search Center. This can be customized and may vary based on content type or tab. Default actions with document preview include the Edit, Send, and View Library features, as well as Follow, a social feature. They also allow some actions directly from the search page, including editing content directly in Office Web Apps.

For many applications, people will want to customize the search center, because it is not as information-dense as heavy search users or search-based applications demand. This type of customization is easy to do, and we’ll cover it later in the chapters about query rules, result sources, and development model

Overall, the SharePoint 2013 Search Center interface is better than any other search UI we’ve seen on the market. It appears to People Search be very robust, and holds true to Microsoft’s People Search is another strong part of the ‘works anywhere’ commitment. It functions Search Center. As with SharePoint 2010, smoothly both in the cloud with Office 365 and people search lights up with actions when on premise, as well as in all of major browsers used together with Lync, and phonetic search (Internet Explorer, Mozilla, Chrome), and the

SharePoint 2013 The essential guide to enterprise search 13 Chapter 1 The New Face of Search in Sharepoint 2013

experience on tablets like the iPad is pretty of the products that are part of the search good. A word to the wise: just don’t let a sexy machine are Microsoft. — for example, People demo or quick test drive lull you into thinking search lights up with actions when used with that ‘it just magically works’. As with all search Lync; myTasks work with Project Server; and products, the navigation depends on having previews work only documents stored in decent metadata. SharePoint with recent Office formats, and require a separate OWA server. If you don’t Overall the out of the box interface is clean, have servers that run these other products, the fast, and provides relevant results — so the additional features associated with them simply basic ‘must have’ elements of great search don’t show up. However, search still works very are covered. There are also a lot of exciting well even without them. When you have all these capabilities that make exploration easier, give parts in place, though, they work extremely well users insight, and enable action directly from together — a big accomplishment for Microsoft search. with strong productivity benefits to the end user. Of course, everything works better with search when all Deeper Dives TechNet — creating a search center in SharePoint 2013 » Intro to the hover panel » Longitude Search Overview »

Refiners and and FAST Search for SharePoint created “deep refiners” out of the entire result set, even if it Faceted Navigation was millions of items. Less than ten years ago, the idea of using faceted metadata for flexible search and With SharePoint 2013, there are now two navigation was just being hatched in an different modes for the refiner web part: academic research project called the Flamenco standard search results, and faceted navigation. project. Now it is de rigueur, it has proven to For standard search results, refiners are be effective and enterprise search without it is generated as they were with FAST Search subpar. Microsoft added search refinement in for SharePoint. You can now define display SharePoint 2010, with the refiners populated by templates to use for rendering different kinds of whatever content is in the associated managed refinements, which is a big win over SharePoint properties. SharePoint 2010 created refiners 2010. All refiners are now deep refiners. out of the top N results (called “shallow Faceted navigation is more dynamic. It is used refiners” — top 50 results was the default), in conjunction with term sets (served from the

SharePoint 2013 The essential guide to enterprise search 14 Chapter 1 The New Face of Search in Sharepoint 2013

term store), which are also used for navigation Configuring these refiners via the term store in document libraries. With faceted navigation is convenient, and there are built-in tools that a term from the term store filters what kind of make is easy to create a hierarchy, customize data should display. If the managed property is the refiners within the hierarchy, and set up a ‘refinable’, the refiners that show can depend very dynamic experience, as shown below. on the term. This is handy in many search scenarios, including the online store scenario which inspired it. For example, users can use faceted navigation in an online store to find products more easily. The scenario below uses the term store terms Camera and Laptop and managed properties Megapixel Count, Color, and Manufacturer. So, with faceted navigation your terms would look like this:

• For the term Camera, add refiners for Megapixel Count and Manufacturer Navigation and Search Unified Hierarchy is also used to create results pages, • For the term Laptop add refiners for as part of the WCM part of SharePoint 2013. Color and Manufacturer Navigation settings are based on the same hierarchy, so that users can search, navigate, The refiners that show up now are based or refine their way to their result. Navigation on that term, which can be set based upon a controls also have built-in customization, as page or catalog hierarchy, so that you get the shown below. following whether you navigate or search to laptops:

SharePoint 2013 The essential guide to enterprise search 15 Chapter 1 The New Face of Search in Sharepoint 2013

As you can see, Faceted Navigation is quite by metadata. All refiners used in Faceted a powerful capability. Refiners are available Navigation are deep refiners, so there are no everywhere, they adjust dynamically and can be gaps caused by a missed item in the deeper configured to an exact design — all controlled result set.

Deeper Dives TechNet Managed metadata overview » Technet — configure facted navigation in SharePoint 2013 »

Search Center Setup • Faceted Navigation — Metadata used For the IT Professional, SharePoint 2013 for top down navigation (“Faceted offers more control over the logic of search Navigation”) and metadata exposed as applications, and it exposes that control in search results for bottom-up refinement a clear, consistent, and logical model. We’ve are now both managed through the outlined key concepts as they relate to Search term store. Center Setup and how they are used to deliver On the Premise or In the Cloud? search results. Get Going Faster • Query Configuration— Query Rules Setting up a new search center is pretty are used to control ranking, query intent straightforward. As illustrated below, site classifications, synonyms, and query administrators can easily set up a SharePoint 2013 rewriting in SharePoint 2013. search center to run on premise or in the cloud.

• Presentation Configuration — Query Rules and Display Templates determine what result snippet gets shown for what class of query, what type of document, and for what category of user. The integration of search query processing across the platform means that display templates and query rules are applicable throughout the application.

SharePoint 2013 The essential guide to enterprise search 16 Chapter 1 The New Face of Search in Sharepoint 2013

The Search Center itself is a site template, and in SharePoint 2010. Most Meeting Workspace the good news is that with this latest release site templates from in 2010 have also been some of the rough edges from SharePoint 2010 discontinued in SharePoint 2013 — including have been removed. For example, this template the basic, blank, decision, and social meeting now inherits design elements from a master workspace templates and the multipage page, so you don’t need to jump through hoops meeting template. They have been replaced to make it match your design. This does not by features from other parts of SharePoint mean that you don’t still need to think about and from OneNote and Lync, which all how to manage the ‘universal search center’ support collaborative work, live conferences, — which may serve many site collections with smaller meetings, note-taking, and storage different themes and designs — but you now of notes and other conference-generated have easier control. commentary. The benefit is that projects with multiple contributors and collaboration across Changes to Sites and Site Templates geographically distributed teams is streamlined. There are a number of changes to sites and site templates overall in SharePoint 2013. The The facilities for web content management facilities for sharing (requesting and granting (covered in the Applications section) are site permissions) are completely revamped remarkably improved — and totally driven by and considerably improved, as shown in the search. This makes creating externally-facing screenshot below. sites and applications much more effective. If you have responsibility for explaining and exposing a service or product to a market inside your company, the business-focused features that are new in SharePoint 2013 are a strong proposition for inside the firm audiences. For example, if you provide consulting services internally for a legal practice area, recommendations, customization of search experience based on queries and personalized interaction, etc. enable users to find relevant information more quickly.

The Document Workspace site template has been removed in SharePoint 2013, simplifying the list of templates available when a new Deeper Dives site collection is created. This will TechNet — creating a search center in SharePoint 2013 » be a big change to users since Blog on using the Content by Search Web Part » this template was a workhorse

SharePoint 2013 The essential guide to enterprise search 17 CHAPTER 2 Working with Queries and Results – New Mechanisms in SharePoint 2013

18 Chapter 2 New Mechanisms in sharepoint 2013

Query Processing: understanding the intent behind the query. You The Search Engine’s can leverage information such as: • Where the query originated from. For Automatic Transmission example, if you run a search from your The search experience involves many different company’s helpdesk intranet site, you are processes, so creating a great search experience likely to be looking for FAQs, how tos, requires covering everything — from the or IT specialists. The search engine can moment information is pulled from the source now capture that intent to provide more systems to the moment it is presented to the targeted results. user in search results. SharePoint historically • Who launched the query. If you are had strong coverage on the crawling side via based in the United States, and searching its Business Connectivity Service and Protocol for employee benefits, you are more than Handler framework, and strong coverage on likely looking for U.S. employee benefits the presentation side via its XSLT driven core than for Canada or United Kingdom. results web parts. FAST Search for SharePoint on the 2010 platform then brought coverage • What concepts or entities can be of the content processing area via its pipeline recognized in the query. For example, extensibility framework as well as its built-in if you were searching for an expense entity extractors. SharePoint 2013 completes report form, the search engine will the coverage by providing a strong query return the Excel spreadsheet, InfoPath processing framework, shown below. form, or web page which enables you to file your expense report.

Query Processing in Action An example of query processing techniques combined would be a search for a weather forecast on Bing. The very first result you’ll get at the But what does “Query processing” mean top is the weather forecast for your location. exactly? If you’re familiar with SharePoint 2010, Bing automatically understands the concept think of query processing as the evolution behind the query, and then correlates it with of search scopes, federated locations, and information about you, the user (in this case, best bets. With intranet search indexes now your location) to provide you with the forecast. frequently reaching tens of millions of items, It is also worth noting that this answer is not formulating the right query is more and displayed like the other results on screen. It is more critical to finding relevant information. instead carefully rendered in a visual format to Fortunately, there are a number of techniques enable you to quickly make a decision based on you can use to reformulate the query by that information.

SharePoint 2013 The essential guide to enterprise search 19 Chapter 2 New Mechanisms in sharepoint 2013

Query processing in SharePoint 2013 is scopes. The key difference here is that the extra intended exactly for these scenarios; to enable conditions enabled in 2013 go far and beyond a smart, targeted search experience which what 2010 could do. SharePoint 2013 comes understands what the user is searching for and with a strong query builder to apply conditions to provide the optimal result straight from based on the user, the search page URL (or any the search page. This is a very exciting new parameter found in it), the site, or the current capability in SharePoint 2013, as it will open date. Result sources can also be used to return up many opportunities to rapidly build new results from remote content, much like federated applications driven by search which will look locations in SharePoint 2010. (The result sources nothing like the standard list of ten blue links. construct is covered in greater detail in the Federation chapter of this e-book).

Query Rules allow conditional transformation of queries and results based on custom logic. Imagine you want to simplify searching for budget spreadsheets in your organization. Using query rules, you can type simple search queries such as: budget spreadsheet project X and behind the scenes the request can be transformed into something much more elaborate. The query rule could recognize the terms budget and spreadsheet in the Getting in Gear: Result Sources, Query Rules, and Result Types search query and rewrite the query so that the document content type must be ‘budget’, So let’s dive now into the details of what the file type Excel, and the file content match SharePoint 2013 offers for query processing. the project name you specified in the search We referred to query processing earlier as the keywords. Additionally, the results would be evolution of search scopes and best bets. We sorted from the most recently modified file so meant it — literally! In SharePoint 2013, search that the freshest information is returned first. scopes, federated locations and best bets are It is worth noting that the same Query builder now deprecated in favor of result sources, functionality used for Result Sources is also query rules, and result blocks. available here as a means to define conditions Result Sources enable you to focus searches and on query rules or transform user queries subset of the total information accessible in your organization by applying extra conditions to the search queries on behalf of the end-user. Stated as such, they sound very much like 2010 search

SharePoint 2013 The essential guide to enterprise search 20 Chapter 2 New Mechanisms in sharepoint 2013

The last major new feature introduced for to create pages as all the functionality is user query processing on SharePoint 2013 is the friendly and has point and click interfaces. Result Type construct. A result type supports Microsoft made it even easier by pushing the presentation of results in a tailored way, this functionality not only to site collection and the result block contains a small subset of administrators, but to administrators as well. results that are related in a specified manner. That’s right, farm level privileges are not For instance, you can create several result required — as long as you own a site (such as blocks for sales collateral, knowledge base your personal site) you articles, documentation, etc. so that when a can use these capabilities to build your own user searches for a specific product you can search center. make sure to always return the top two or Two examples of applications you can build three pieces of sales collateral or knowledge using these new features: base articles matching this query. • A manufacturing dashboard that displays In spite of the enhanced capabilities these all about a specific part based on its part tools provide, you may run into scenarios number. Information could include the where they are not suitable or flexible enough inventory level, the last orders for that for a particular search scenario. For example, part, the instructions on how to use that geo-searches (ranking or search results filtering part, and forum discussions from your based distance), personalized queries (complex customers about that part. query changes based on who executes the • A knowledge portal, that enables you query), synonyms expansion, etc. are not to share FAQs, knowledge base articles, supported. In these scenarios you can still rely documentation, or tutorials to empower on the Search API to build your own web your support or helpdesk team. part or search application that implements the appropriate logic. The API, is for the most part Powering your applications via search has never comparable with the version seen in SharePoint been easier. The chapter on Search-Based 2010 with a few exceptions. The main exceptions Applications has many more examples, and we include the removal of the FulltextSqlQuery class encourage you to explore what’s possible, and and syntax which have been deprecated, and the even to try building some of your own. appearance of the SearchExecutor class which allows you to execute multiple related queries in one shot. Deeper Dives No Speed Limits Technet on query processing » Microsoft has made it very easy to create Blog — overview of search in SharePoint 2013 » search pages using this new functionality. In List of terms for query builder » fact, you don’t need programing experience New KQL syntax in SharePoint 2013 »

SharePoint 2013 The essential guide to enterprise search 21 Chapter 2 New Mechanisms in sharepoint 2013

Query Rules and Query Suggestions In the last chapter we introduced Query Processing and several new key concepts including Query Rules and Query Suggestions. Now, let’s go into greater detail on these and some other query features like spell check and rank management. Working with these features, you can customize search to a great degree, without writing code.

Query Rules Query Rules are a brand new feature in SharePoint 2013, and they are designed to enable you to act upon the intent of a query and provide a remarkable amount of control and configurability. The Query Rules framework is composed of three top • Query is common in a different source level components: Query Conditions, Query (like Videos result source) Actions, and Publishing options. These are all configurable via PowerShell, or via the UI shown • Results include a common result type to the right. (like file type) Query Conditions are rule sets that are meant • Advanced rules which can match to determine the intent of the query (does the across a set of terms, dictionary, regular query meet a rule?) Options for expression, etc. this include: If the query is against a particular result source • Query contains a specific word or words (see the Result Source chapter in this book) or • Query contains a word in a specific category, result source conditions can also be dictionary applied. If the Query Condition is met, Query • Query contains an action word that Actions are then triggered. matches a specific phrase or term set

SharePoint 2013 The essential guide to enterprise search 22 Chapter 2 New Mechanisms in sharepoint 2013

Query Actions specify a series of actions that boost of x number of points. XRANK take place once a query condition is met (what is a FAST capability that allows you to to do if the rule is met). These actions include: override the default relevancy ranking • Assign a promoted result — This by boosting the relevancy score for replaces the “Best Bet” and a former particular results at query time. FAST Search for SharePoint 2010 • Publishing Options — Publishing feature known as “Visual Best Bets”. The options determine when a query rule configuration of the promoted results is active (When to do this?) A rule may allow you to specify if the returned be active in a specific time interval (start action should be treated as a best bet date, end date) or always active (by (hyperlink) or as a fully formatted HTML default). You can also configure a review block (Visual Best Bet) date (triggers an e-mail reminder to • Create and assign a results block — review this rule). When a condition is met, one or more The power of query rules is not only in the results blocks can be triggered. Result flexibility they provide, but also the richness blocks specify an additional query to run and complexity that can be derived from and how to display results. This feature them. Imagine a single Query Condition being includes a full query designer so you can met, which then triggers a visual best bet, a build and test queries before finalizing results block from a remote SharePoint site, a them. You can also include the results results block from a cloud source, and a query above those returned by core results, or transform that will boost results coming from interleaved by ranking. Additionally you the cloud. In addition, rules would determine can choose custom display templates that these actions are only taken between instead of the default for the result or November 25th and December 26th. An results block. example of how this would work in an intranet • Change the ranked results by changing scenario, would be if you had a query rule the query — This allows you to assign that was active only during insurance open additional parameters and weighting enrollment windows. (XRANK Boosts) values to the query (Query Transforms for those familiar with Query Suggestions FAST). For example, if the condition of Query suggestions enable users to ask better the rule is met, apply XRANK constant questions, and make it simpler to search for information. This feature was sorely lacking in SharePoint 2010. In SharePoint 2013, Query Suggestions are supercharged, thanks in part to the addition of the Analytics Processing Component and the Analytics Reporting

SharePoint 2013 The essential guide to enterprise search 23 Chapter 2 New Mechanisms in sharepoint 2013

Database. These components provide for query to help them find information, and to analytics aggregation and persistent storage of assist them in writing better queries. These these analytics. suggestions are provided in two forms:

Some key new features include: 1  A list of items that others are typing for their • My Queries — Personal Query Log (in queries. Analytics ), which factors your 2 A list of items you have clicked on before personal SharePoint activity into the from your personal query log. query suggestions. A key aspect of this feature is that it will never • My Sites — This capability tracks sites provide a suggestion to a search that did not you have visited, and factors them into yield a click-through (someone clicking on the query suggestions. the document), and it will never provide a • Our Terms — This feature uses suggestion if the results would lead to a dead information related to the most frequent end (zero-result query). queries across all users that “match” the search terms.

Query Suggestions now take two forms: Pre-Query Suggestions and Post-Query Suggestions. Both of these help the user ask better questions by showing you what others have asked before; they differ in when they are displayed and how people use them.

Pre Query Suggestions include both a list of queries from other users, and a list of items you have clicked on before, as shown in the screenshot below.

Post Query Suggestions are provided after a query is executed and when results are displayed. These suggestions are based upon the results that you have clicked on at least twice. They provide a quick means to go back to a document that you regularly review or Pre-Query Suggestions occur prior to a select. They are similar to the “Related Queries” query being executed. The goal of pre-query provided with SharePoint 2010. Suggestions can suggestions is to aid users in selecting a also be tuned (inclusions and exclusions) within

SharePoint 2013 The essential guide to enterprise search 24 Chapter 2 New Mechanisms in sharepoint 2013

the Service Application Admin Pages. It is also Working Wonders with Queries important to note that these are not tuned at The mechanisms for query rules, query the site collection level, but only at the SSA level. suggestion, and query spell checking are new with SharePoint 2013, and they may take Query Spell Correction some getting used to. Previously, there were Spell correction is a familiar and very useful some capabilities in SharePoint 2010 that feature, since humans are prone to misspelling and processed queries such as the keyword features of course fat-fingering. SharePoint 2013 provides that applied to synonyms, best bets, and spell correction by default, as shown below: promotions/demotions that are now replaced by query rules. Once you become familiar with these new features, you will find you can work wonders.

In spite of all of the obvious pluses, there are some limitations with query rules. You can’t call a program from a query rule, which blocks In SharePoint 2010, spell correction was a variety of use cases. For example, synonym implemented as a series of XML files that expansion is done on full queries, and on defined inclusion and exclusion items for the pre-built synonyms. This makes expansion easy dictionary. In SharePoint 2013, Query Spell to understand but has been a big annoyance Correction is managed from within the term to many search administrators in SharePoint store of the Managed Metadata Service. Within 2010. This limitation can be addressed, but the term store, Query Spellcheck Exclusions only through applications available through and Inclusions are nodes within the term store, the Microsoft partner ecosystem — not via as shown below. Dynamic dictionary creation is query rules. Calling applications based on still supported, but is now managed from within query patterns (for example, pulling up an the term store. ATM location app when users search for ‘bank branch near me’) is feasible in SharePoint 2013, but not directly from query rules. However, these limitations are important only for a specialized set of search applications. The power of query rules, and query processing generally, Within the for search, Query in SharePoint 2013 is light-years ahead of other Spell Corrections can be configured to use search platforms. “Did You Mean” type functionality for query Learn to use these mechanisms, and you will be transforms. in a great position to dazzle your business users with the power of search.

SharePoint 2013 The essential guide to enterprise search 25 Chapter 2 New Mechanisms in sharepoint 2013

Deeper Dives Good blog post on query rules » TechNet on query processing » List of terms for query builder » New KQL syntax in SharePoint 2013 »

Result Types and Result Templates There’s another new concept in SharePoint 2013 search, called Result types. Result types let you control how search results will be displayed, and let you display different content in different formats. For example, if you have e-mails, documents, and database records in the same result set, you may want to use different formats for each and display different managed properties for each. With SharePoint 2010, this meant creating complex xslt, and there was no easy way to group similar results Results Framework Redux together for presentation. With SharePoint The Results framework is composed of three 2013, wizards ease configuration of displayed parts (as shown below): results, and HTML and JavaScript enable you to add finishing touches if needed. • Rules Engine — A list of rules to determine if the result type should be triggered. The screenshot below has multiple result types, • Property List — Associates the rule presented in result blocks. Videos, documents, to document type, content type, personal recommendations, and a “visual best bet” or other managed property within (though it’s no longer called that) all have their SharePoint search. own presentation and their own result template. • Rendering Template — Defines how that particular result will be displayed.

SharePoint 2013 The essential guide to enterprise search 26 Chapter 2 New Mechanisms in sharepoint 2013

managed property before it can be used in a rendering template.

3  Specify where you would like the requested property list items to be displayed using a tagging convention as follows (-#= contenttype =#-) by using a Rendering template. The Rendering template consists of a template that is composed of HTML and Result Types Unleashed might contain JavaScript. Within this simple The power of Result Types really becomes evident to edit template (Not like editing XSLT when looking at a real-world scenario. In the in SharePoint 2010) you can call specific scenario below you have multiple documents graphics (icons, etc.) and be stylize it in any that have been assigned content types (i.e. way that you would normally stylize HTML. specification documents, data sheets, etc.) Result types may seem complex to master, but once you become familiar with them you will appreciate how powerful they are. There are impressive tools in SharePoint 2013 that facilitate ease of use, and formatting is done using any tool you are familiar with. (SharePoint Designer has dropped the ability to do this kind of formatting, which will be annoying to some, but there are lots of great tools available to work with HTML and JavaScript.)

With SharePoint 2010, very few people actually Within Result Types you can: did the kind of formatting and result templating 1 Specify a rule based upon specific criteria. that was possible — it was too complex and The rules can contain fairly advanced features, arcane to use. With SharePoint 2013, you will such as BOOLEAN logic (i.e. AND OR NOT), quickly find that result types and result templates equality (i.e. = or !=), or comparison ( < OR > ). are enjoyable to work with, and you’ll discover These rules can also be applied to managed that you use them naturally to make search properties. For example the rule might results look great and work well for users. be ContentType= “spec documents”).

2  Specify which managed properties Deeper Dives you would like to have returned Customizing search results via Result Types and Display Templates » once rule conditions have been Technet — query variables » met. You must specify at least one

SharePoint 2013 The essential guide to enterprise search 27 CHAPTER 3 Working with Content — Crawling, Connectors, and Content Processing

28 Chapter 3 Crawling, Connectors, and Content Processing

Content Capture SharePoint 2013 supports multiple crawl Capturing content is fundamental to search — components, crawl , and content if it’s not crawled and indexed, you can’t find it! sources as shown below. There are a number of The process of connecting to content sources, connectors included out of the box: crawling them to get content, and making that • SharePoint content searchable is far more complex than • HTTP (web crawler) most people realize. It was also one of the most • File Share frustrating areas to manage with SharePoint 2010. • Business Data Connectivity (BDC) Framework — also includes these As a quick orientation, the basic function of connectors that are built on the BDC a crawler is shown in the figure below. The framework: concept is simple enough: the crawler connects –– Exchange Public Folders securely to a given content source, maps the –– Lotus Notes content from the source system to the crawled – properties of the search engine, and feeds the – Documentum Connector engine in either a full crawl or an incremental –– Taxonomy Connector (connects crawl (which finds any changes). What makes to MMS) content capture different from one search • People Profile Connector engine to the next is the breadth of connectors, coverage of different security models, and data types, the performance (both throughput and latency), the robustness, and the ease of administration. SharePoint 2013 does well on all counts — although most connectors are supplied by Microsoft’s partners, not Microsoft.

SharePoint 2013 The essential guide to enterprise search 29 Chapter 3 Crawling, Connectors, and Content Processing

Connector and Crawling Changes Overall, the most noticeable change in content For the most part, these connectors are essentially capture is Continuous Crawling. This is a new the same as the connectors in SharePoint 2010. method of insuring you have the most current The connector and crawler infrastructure are the data in your search index, and is available only part of SharePoint 2013 taken most directly from for SharePoint content. Rather than living SharePoint search, so they have the fewest changes. with a latency of several minutes and with full While few, there are still some notable changes. crawls that might take many minutes to start populating content in the index, you’ll see The has some nice updates that content within seconds! address previous headaches. These changes include: When you enable continuous crawls (using the UI shown below), a crawl schedule no longer Anonymous Crawl for HTTP Anonymous applies — you are running crawls in parallel authentication allows any user on a web site to and the crawler gets changes from SharePoint access any public content without providing a sites every N minutes (set to 15 minutes by user name and password challenge. SharePoint default but this parameter is changeable). 2013 allows you to get at these web sites without Continuous crawls do not stop for errors, but associating crawl to a user account. This is handy rather note the error and continue to crawl for general web crawling and makes the setup content. Continuous crawls can occur while of web crawls simpler. SharePoint 2010 used the other crawls (full or incremental) are active or spsearch account to log into sites, which stymied starting, where incremental crawls need to wait many people trying to crawl SharePoint sites with for other incremental crawls to complete prior anonymous access, public web sites, and the like. to starting to crawl. With this capability you can Previously there were work arounds, but they now keep content fresh, and won’t experience were painful. The updated functionality now offers mysterious delays when additional content a pain free way to perform this task. sources are added. Asynchronous Web Part Crawl A common way to improve performance of SharePoint sites has been to load web parts asynchronously, which dramatically speeds up the first display of the page. However, crawling these pages for search also delivered incomplete information. In SharePoint 2013 search, the crawler now gets a full rendering of the page in order to index them. This doesn’t work for all asynchronous pages, just for most out of the box web part content. But it takes care of the vast majority of problems in this area.

SharePoint 2013 The essential guide to enterprise search 30 Chapter 3 Crawling, Connectors, and Content Processing

The Taxonomy connector is new in SharePoint still be good for high performance or particular 2013, and you will see it at work even when tasks. But the primary way to create connectors you don’t use it explicitly, since the term store is through the BDC Framework, which was is much more integrated with search. As you introduced in SharePoint 2010 as part of will read in other chapters of this book, you Business Connectivity Services (BCS). BCS can now create entity extractors directly from is an umbrella term for a set of technologies term sets, set up WCM page hierarchies using that brings data from external systems into the term store, define faceted navigation using SharePoint Server 2013 and Office 2013 taxonomies, and much more. (shown in the figure below).

Building New Connectors As with SharePoint 2010, you can make new When you start getting into search, you quickly connectors pretty simply. For systems with find that you want to get at more and more static schemas, straightforward security, and kinds of content from more and more places. moderate performance needs, this is not a Data silos are everywhere, and search lets you huge job. There are some great improvements bridge these silos easily and securely. In order to in Business Connectivity Services as a whole do this, you need a connector for each content — for example, there’s tooling specifically to source — and many organizations have dozens create External Content Types against OData of systems that require connectors well beyond sources, there are Representational State what comes out-of-the box w. Luckily there are Transfer (REST) and Client Side Object Model two options: leverage a rich set of partner-built (CSOM) interfaces, and External Content Types connectors or (if you are a developer), create that can be scoped to a single SharePoint app. new ones yourself. Unfortunately, none of these apply to search — creating an indexing connector for search is not SharePoint 2013 will still support existing the same as creating an External Content Type. protocol handlers (which are custom interfaces often written in unmanaged C++ code), The Business Data Connectivity (BDC) using an interface used since MOSS 2003 and framework is largely the same in SharePoint deprecated since SharePoint 2010. These can 2013 as it was in SharePoint 2010, when it comes to search. There is one notable change though — Claims tokens are supported through the BDC. Previously, only Active Directory (AD)-format Access Control Lists (ACLs) were supported, which made it nearly impossible to cover some complex security scenarios. With Claims support, many of these scenarios are tractable — though still very much the domain of experts.

SharePoint 2013 The essential guide to enterprise search 31 Chapter 3 Crawling, Connectors, and Content Processing

One warning — you shouldn’t underestimate • Java Database Connectivity (JDBC) the effort involved in connector development, connector which supported direct SQL deployment, and maintenance. Don’t fear access to databases. connector development, but watch out for the Though these may seem like big gaps, there are ways classic “quicksand” trap. Too often a development to cover this functionality in SharePoint 2013, either project gets to basic connectivity quickly but with different mechanisms (many cases covered by then struggles to get security right and to get the JDBC connector can be done via the BDC), or high performance and scale. If and when this is with pre-built connectors from Microsoft Partners. successful, the project is then dragged further down in troubleshooting and maintenance, since From Crawl to Index things change every time the source system Many of the most significant content capture changes. Plan your development carefully to avoid changes you’ll see with SharePoint 2013 search this trap. The best way to avoid it is to consider don’t actually result from the connector and pre-built connectors for any complex system — crawling components. For example, the content that way you don’t have to build your own from processing component adds some remarkable scratch, and you don’t have to maintain it. capabilities that show up to the end user looking like better content. The Indexer has lower latency Changes from FAST Search for SharePoint and is much more robust, which is one key to continuous crawling and also alleviates many If you are used to FAST, there are a number of of the weird issues people encountered with changes you will notice. These changes are all a crawling after outage events with SharePoint 2010 byproduct of moving to a common, single search (which could cause the crawler and index to be engine. First, there is no way to ‘push’ content to out of sync.) Additionally, improvements in schema index in SharePoint 2013. (With FAST, there was a management make mapping content much mechanism called the Content API). simpler with SharePoint 2013. There are also three connectors that you will All of these areas are covered in other chapters notice are gone: of this book — but they contribute to the • Lotus Notes which had performance, improvements discussed above to provide robust, security, and flexibility features beyond scalable, and high performance content capture. the Notes connector included with This is a great foundation to build on for any SharePoint 2010 and 2013. search deployment or search-based application. • The Enterprise Web Crawler which rendered dynamic sites, had high Deeper Dives: performance, and several TechNet on managing continuous crawls » high-end features. MSDN on searching new content with SharePoint 2013 » Longitude Connectors Overview »

SharePoint 2013 The essential guide to enterprise search 32 Chapter 3 Crawling, Connectors, and Content Processing

Content Processing New Content Processing Subsystem — Content Processing is an essential pillar of With a Heritage search quality, but it is typically invisible to To understand the changes in the content the end user. The development of content processing structure within SharePoint 2013, processing in SharePoint 2013 is focused on it is useful to look at the heritage of this implementing platform-wide capabilities, and release, especially for the content processing integrating and supporting built-in search-based component. Those familiar with the final version applications such as WCM and e-discovery. In of a “stand-alone” search engine offering from order to support the wide range of scenarios Microsoft, Fast Search for Internet Sites 2010 that depend on search, Microsoft provided (aka FSIS), understand that FSIS was composed extensibility, so that customers and partners of three main structural components, as shown can leverage the new search platform and hook below. They were: into content processing. • Core FAST search Engine (FAST ESP The Content Processing component is brand 5.3) — in red in the figure below — new with SharePoint 2013. It takes content which was a complete search engine, to from the crawler and prepares it for indexing, which new components were added on as shown below. With SharePoint 2013, there is the content side and query side. also a new Analytics Processing component that • Content Transformation Services (CTS) feeds information into Content Processing. which was responsible for content processing and ingestion and introduced the concept of processing flows. Flows are much more dynamic and expressive than the straight linear pipeline architecture found in Fast Search for SharePoint 2010. • Interaction Management Services (IMS) which managed all query and result processing, using processing flows.

SharePoint 2013 The essential guide to enterprise search 33 Chapter 3 Crawling, Connectors, and Content Processing

In SharePoint 2013, the underlying dataflow • New format handlers implement engine for content processing, which was first document parsing. They replace IFilters introduced as CTS, has been extended and for OOB document metadata. enriched to host the content processing tasks • Higher throughput for Office document for the entire SharePoint platform. Successful types and for PDF. integration of a new content processing flow • Automatic content-based file format for search and enrichment for the whole detection removes dependencies on file SharePoint platform is a significant investment extensions. and engineering achievement. The benefits are potential scale out, improved management, • Content processing throughput and “cloud ready” system architecture, and an error reporting (this is tied to crawl improvement to Microsoft’s ability to integrate reporting) is comprehensive and far new content enrichment features inside the simpler to understand. SharePoint platform. Search analytics processing (which we cover New Capabilities for the IT Pro in more depth in the chapter on Analytics) is an important new platform capability. The For the IT Professional concerned with how analytics module feeds information back into content is processed, enriched, and made Content Processing for a variety of purposes- ready for search, these SharePoint 2013 for example, to improve search relevance content processing feature areas stand out (we based on user behavior. Usage and search cover these in more depth in the chapter on action events — document exposures and Linguistics): document click-throughs — are recorded into • Linguistics features, in particular around a new SharePoint 2013 analytics store. They are phonetic search for person names, then processed in a form that enables search continue to improve in scope depth. relevance to account for, for example, popular Cross-lingual name search (via People content, relevant query terms, or, in the context Search), for example, is a remarkable of recommendations, boosts for related user/ feature that makes it easy to find people related item results. This also supports search (since human names are notoriously history boosts. hard to spell right). • Entity Extraction management, which Hooks for the Savvy Developer was previously done via a set of separate For developers familiar with the extensibility of files and ad hoc PowerShell scripting, is FAST Search for SharePoint, SharePoint 2013 now moved into the Term Store — a big offers similar mechanisms. However, the content win because there is now a good UI and processing flow and search index are not as a robust set of tools with it. open as with previous FAST platforms — they are more of a streamlined and closed utility. You

SharePoint 2013 The essential guide to enterprise search 34 Chapter 3 Crawling, Connectors, and Content Processing

will enjoy how easy it is to set up and operate * Note the CEWS call-out is not part of O365 these capabilities, and how little head-scratching and is only available for the Enterprise Edition of you do in development — but you will be SharePoint 2013. frustrated at how little you can get at. This is SharePoint’s management of content processing a sensible tradeoff in the context of a major is highly scalable and streamlined. SP2013 platform upgrade and in accommodation of a content processing straddles the on-premise hosted multi-tenant deployment model (O365). deployment of SharePoint and the deployment The capabilities and ability to extend them is of SharePoint in hosted form via O365. If still there, but it feels limited. There are times content enrichment beyond what is provided that it takes sophistication and inventiveness to in SharePoint 2013 is important for your do what you want with the hooks provided. application, especially for content you already The extension point for content processing is have, prepare to look for custom solutions that the Content Enrichment Web Service (CEWS). leverage the Content Enrichment Web Service. This is a new mechanism to enable content processing, called from a content processing flow at a single point, as shown below. We will cover CEWS in more depth in its own chapter, and touch on its applications in the chapter on Linguistics.

Deeper Dives MSDN Section on Content Enrichment Web Service (CEWS) » TechNet content processing description »

SharePoint 2013 The essential guide to enterprise search 35 Chapter 3 Crawling, Connectors, and Content Processing

Linguistics Processing In preparing content for indexing, linguistics Linguistic processing, which aims to leverage are applied in stages, each one building on the meaning of documents or words, is the the previous one. The figure below gives an ‘special sauce’ of search — and one of the most overview of these steps in what is often called mysterious and difficult to understand areas. the ‘pipeline’. (The steps in gray are not OOB, Human language is a tricky thing, and algorithms but illustrate some of what is possible by adding aimed at understanding it are complex and third-party components.) imperfect — yet this is what makes it seems like ‘search just works’ for end users. Linguistic tools, such as spellchecking of queries or grammatical normalizing of content or queries, can greatly simplify users’ search experience. Covering the wide variety of languages (SharePoint 2013 search covers 85 languages, from Afrikaans to Zulu) also means that you can find content that is generated by users First, files must be parsed, teasing the indexable from across geographic boundaries. text out of PowerPoint, OneNote, PDF, etc. During this process the language is detected, since processing English is different from processing Japanese. Words and patterns (dates, times, URLs, etc) are found, based on the text and language. Next, the ‘magic’ begins — a variety of types of Text Analytics technology is then applied. or lemmatization (which allows forms of the same base word to be matched, for example “sing”, “singing”, “sung”, or “incorporate” Linguistic processing is applied to both content and “incorporating”), synonyms (matching, and queries (as shown above), using a similar for example, “car” and “auto”), and concept framework under the hood. As mentioned in detection of various forms deal with the wide other chapters, the content processing and variety of ways humans say essentially the query processing components have a heritage same thing. Entity extraction, which is a key from modules called CTS and IMS, and they linguistic capability for SharePoint 2013 search, share an underlying framework for processing and techniques like categorization, relationship flows. extraction, and sentiment analysis add metadata

SharePoint 2013 The essential guide to enterprise search 36 Chapter 3 Crawling, Connectors, and Content Processing

that greatly improves the ability to find and • Automatic file formatdetection explore information. no longer relies on file extensions, eliminating the kind of errors that Microsoft has the deepest natural language happened when users or applications do processing development capability on earth, creative things like making .memo files. because it has labs around the planet. This was strengthened with the FAST acquisition, since • “Deep link extraction” works like a one of FAST’s specialties was linguistics applied table of contents generator and allows to search. Strong language processing features you to click into previews for Word and show up in SharePoint 2013 search, which has PowerPoint formats. continued a tradition of steady improvement • Metadata extraction for titles, authors, in this area and has some extremely and dates provides better metadata strong linguistic technology, including many and is much easier to understand than improvements from SharePoint 2010. Some the techniques used in SharePoint 2010 of the changes will be directly apparent to the (where “Optimistic Title extraction” end user, but many of them show up in subtle was one of the top sources of user ways, and some are only relevant to specialists confusion). handling unusual situations. For those coming • High-performance format handlers for from SharePoint 2010 search, there’s some HTML, DOCX, PPTX, TXT, Image, XML remarkable new capabilities and improvements. and PDF formats mean faster crawls and For those coming from a FAST based platform, indexing. the capabilities are familiar, but are now much easier to work with. There are some capabilities The new parsing facility is enabled by default you are used to from FAST which are no and supports 55 of the most common file longer there — we mention the major ones as formats, including things like Montage, Visio and we cover each area. OneNote. By comparison, the 2010 Microsoft Filter Pack supported 15 formats, and the There are some changes in SharePoint 2013 that Advanced Filter Pack (available for FAST only) will be noticeable to nearly all search deployments: supported 422. For most deployments, this document parsing is foremost, but also synonym means you will no longer have to seek out management and custom entity extractors. Some third party IFilters — though the IFilter API is changes will only be apparent or available to those still supported and there is a rich assortment extending search, and some will be visible only to of IFilters on the market that cover file types a specialized group of deployments. beyond the OOB 55.

Changes in Document Parsing Other Changes You’ll Notice SharePoint 2013 introduces a completely Language detection has changed with new document parsing facility, with some big SharePoint 2013. In SharePoint 2010, language improvements. These changes include:

SharePoint 2013 The essential guide to enterprise search 37 Chapter 3 Crawling, Connectors, and Content Processing

detection was done ‘chunk wise’ on document outside of the term store — Synonyms via parts like paragraphs. Now a much larger part a UI or PowerShell, Custom Extractors via of the document is used. The advantage of this PowerShell, and spell correction via a dynamic is that language detection is generally better dictionary based on content in the index or a — the more language you can look at the static OOB dictionary. more reliably you can tell what language it’s in. Offensive Content Filtering was a feature There is a downside to this approach, however that could be enabled in FAST Search for — documents that have mixed languages SharePoint. This feature, made it easy to shield — partly in English and partly in French, for users from obscenities and profane language example, aren’t handled as well. that is found in content (even business content) The Term Store (MMS) is well integrated with remarkably often. However, it is no longer search now, which provides a number of big supported with SharePoint 2013, so you’ll benefits. Customizations to Query Spelling need to find a third-party alternative if this is Correction are now managed in the term important to you. Substring search, another store — both inclusions and exclusions (shown FAST-only feature, was also removed. This below). provided n-gram matching without taking into consideration word boundaries, which was good for applications like part numbers.

Changes in Extensibility There are notable changes in how you can extend linguistics processing with SharePoint Property Extraction (previously a FAST-only 2013. These include: feature) is also manageable in the term store Custom Extractors (previously FAST only) (shown below). However, only company are more powerful, and you can have more names are available — if you were using of them (12 rather than the five allowed with property extraction for people names or FAST Search for SharePoint). These allow you place names, you’ll need to find a third-party to provide a list of terms (via PowerShell) and alternative. Some things are still managed match them in the content, populating managed properties with consistent metadata which is the lifeblood of information discovery.

Custom Word-Breaking now requires only one language-independent dictionary, rather than the one-dictionary-per-language facility in SharePoint 2010.

SharePoint 2013 The essential guide to enterprise search 38 Chapter 3 Crawling, Connectors, and Content Processing

Customize stemming (done via registry Putting Linguistics to work settings in SharePoint 2010) is no longer All of these cool capabilities come into their supported. Third party specialists will find ways own when developing more specialized search to customize this level of linguistics and handle based applications. This has become much more specialized cases. powerful with the application development hooks and tooling available, and you should The biggest change is the availability of the expect to see some amazing applications built Content Enrichment Web Service (CEWS). on SharePoint 2013 using these capabilities. This provides a way to add linguistic processing of any type, such as the examples in gray in pipeline figure above (concept extraction, relationship extraction, geo-tagging, summarization, etc). With FAST Search for SharePoint, it was possible to extend the content processing pipeline through a sandboxed application, but this was both slow and limited in the information it could access. SharePoint 2013 introduces a much more open API which makes it possible to add specialized linguistics at lower levels as well as sophisticated text analytics. CEWS is covered in more depth in a separate chapter.

Deeper Dives Technet article on linguistic search features in SP 2013 » MSDN Section on Custom Word Breakers » Longitude AutoClassifier Overview »

SharePoint 2013 The essential guide to enterprise search 39 CHAPTER 4 Architecture, Deployment, and Operations — Getting Under the Hood

40 Chapter 4 Getting under the Hood

New Architecture, FAST Search for SharePoint is a very powerful product but there are numerous rough edges Single Search Engine Core due primarily to the lack of time in the previous The first and foremost change to search within development cycle. The timeline also resulted in SharePoint 2013 is there is only one search a hybrid architecture, with separate SharePoint engine core. The idea that you would use the and FAST farms, as shown below. This could be FAST engine for content and the SharePoint awkward and confusing to engine for people is completely eliminated work with. in this release. There is now only one search engine within the SharePoint 2013 platform — which you can think of as bringing FAST to all search tiers. Powerful indexing, linguistics, extraction, and query expressiveness that are the heritage of FAST are now evident throughout the platform.

To appreciate the evolution from SharePoint 2010, it’s worth looking at the history in this area. With the release of SharePoint 2013, the The acquisition of FAST Search and Transfer in full realization of Microsoft’s investment in 2008 was regarded by the industry as a major FAST Search and Transfer is now evident. The step forward in taking the lead in the enterprise capabilities now available take enterprise search search marketplace. The incorporation of FAST to a whole new level. They are the result of within the overall SharePoint 2010 architecture a new search architecture. The architecture, allowed organizations to leverage enterprise class shown below, is relatively simple, though much search capabilities in a platform that was within of it is new. the cost and budget requirements of today’s enterprises. Unfortunately, the acquisition occurred midway between release cycles. This forced Microsoft to determine which features would be available in the wave 14 release (SharePoint 2010), and which features would need to be included in the next release.

SharePoint 2013 The essential guide to enterprise search 41 Chapter 4 Getting under the Hood

There is a good walkthrough of the Not ‘just’ a Merger of components on TechNet, which we won’t FAST and SharePoint repeat here. Each of the components shown You can think of this architecture as bring FAST are covered in at least one chapter of this book to every tier of SharePoint, but it is much more as well. However before we move forward, than that. This is not a mere merging of FAST there are a few essential things to understand: and SharePoint — nearly every component • Search is fully integrated into SharePoint, in this architecture is new. Just as SharePoint and there is no longer a separate Search 2013 is a major architectural release overall, Server. Certainly, a SharePoint 2013 search is in many ways a radical re-architecture. server or services farm can be used only The computational platform underlying the for search. To do this, you do want to search based interaction for SharePoint 2013 is have the MMS (term store) and User a powerful distributed dataflow engine (called Profile service, at minimum — much as NodeRunner). you did in SharePoint 2010. An illustration that underscores this is shown • There are four different databases, below. This is the same architecture, though each independent from the other. All of not using the official technology. The Crawl and them can be partitioned, mirrored, and OOB connectors (aka crawl component), are managed. The Crawl database scales with the least changed part of search in SharePoint the amount of content crawled, so this 2013, and they retain the mssearch.exe name is typically the database that has multiple under the hood. The Content Processing instances in a large search deployment. Framework and Interaction Management • Every component can be scaled out Framework (aka Query Processing for capacity and for fault tolerance. Component) are running flows, similar to CTS Previously, there could be only one Search and IMS in the FAST Search for Internet Sites administration component, which meant 2010 product (see the Content Processing you had to do creative workarounds to chapter). These are running under NodeRunner. create truly fault-tolerant configurations. So is the search core — which is neither the FAST ESP core nor the SharePoint Search core. • Search is now multitenant — except for a few things, such as the CEWS API. Much more administration can be done at the site collection (or tenant) level.

SharePoint 2013 The essential guide to enterprise search 42 Chapter 4 Getting under the Hood

It’s a new, next-gen search core that was the server, each hosting one search component. result of a decade of research and development On a default single server install there will be at FAST, hardened through the Microsoft 5 instances of the NodeRunner.exe process, as development process. shown to the left.

Also new in this architecture is the Analyzer Although there is a fascinating dataflow engine (aka Analytics Processing Component), and a next-gen search core, those are not which we cover in the chapter on Analytics. exposed for developers — the only points of The content processing component writes configuration for interaction are ResultSources, information about links and URLs to the link QueryRules, and CEWS. In SharePoint 2013, database. In turn, the analytics processing configuration alternatives are circumscribed component writes information related to the to assure that no configuration would result relevance of these links and URLs to the search in excessive resource consumption for that index via the content processing component. instance relative to other instances that may This enables some powerful capabilities like be running through the same service. So, recommendations and usage-based relevance QueryRules run effectively in a sandbox that enhancement. restricts calls to non-SharePoint services.

If you look inside the search service, you will Full Range of Search Topologies find several search processes. This includes When people think of architecture, some think MSSearch.exe (for the crawl component), of deployment topologies — machines, nodes, NodeRunner.exe (which hosts search and processes. There is lots of good material components), and a Host Controller (a on this physical architecture, which we will not Windows Service that supervises NodeRunner repeat here. But we’ll give you a flavor. processes. The Host Controller monitors NodeRunner processes, detects failures, and As with SharePoint 2010, the minimum restarts processes if they do fail. There can be configuration is just one node, and the multiple NodeRunner instances on the same minimum configuration with fault tolerance is two nodes (FAST Search for SharePoint required two and four nodes respectively). Scaling from there to ultra-scale search (including the scale of O365) is possible, and you can grow incrementally.

The medium farm topology, shown below based on the TechNet recommendation at www.microsoft.com/en-us/download/details. aspx?id=30383, is capable of supporting approximately 40 million items in the index. Note

SharePoint 2013 The essential guide to enterprise search 43 Chapter 4 Getting under the Hood

that we expect the density (items per node) of apply it to different applications, and develop on SharePoint 2013 search to go up dramatically over top of it. time, just as FAST Search for SharePoint density What will you notice about this architecture? did. The initial focus has been on scale-out in order There are many things beyond the capabilities to support O365, not on density. that meet the eye. For example: • The core engine is different, so relevance is different. Since Microsoft has a lot of data with which to tune relevance, you’ll notice first that the relevance is better OOB. But if you had customized relevance or spent time focused on it, you may have some work to do — or you may have a pleasant surprise. • Indexing is atomic in the new search core. That has some very interesting implications, but mostly you’ll notice that it’s more robust and that you can do ‘normal’ backup and restore. For nearly all search engines it’s a dirty secret that data can occasionally get lost in indexing (so one in a million items may go missing), and an outage can result in needing a What Matters is it Works full re-index — but this core As a developer or user you don’t really need will be different. to know about the underlying algorithms or • Scale-out is possible on a dataflow engine used in search. In fact, the huge scale — big enough to search algorithms used by almost all search run O365, and big enough for cores are a complex combination of linguistics any challenge you can throw and statistics, tuned heuristically. You can enjoy at it. FAST was always great at the result, and learn how to operate it well, large scale, but this is

SharePoint 2013 The essential guide to enterprise search 44 Chapter 4 Getting under the Hood

a different level; there should be less Indexing and Partitions black art to building out big or high- In SharePoint 2013, there is a brand new search throughput systems. indexing core that is optimized for high volume Ultimately, what matters is that it works. Other throughout and overall scalability. The index than the dogfooding done at Microsoft (which component is the core of search; it accepts and is pretty big), there isn’t much production administers both content and queries. Content experience with SharePoint 2013 yet, but data is indexed and stored in index partitions every indication is that this is an architecture while the index component simultaneously that is extremely solid — for both SharePoint handles queries and generates results. generally and search specifically. Like many other features of SharePoint 2013, the Index Component and related architecture resembles FAST, with the ability to separate Deeper Dives indexes into partitions for query loads and data SharePoint 2013 — Search Logical Architecture » volumes alike. This is a significant improvement over SharePoint 2010. The index is completely Technet Search technical diagrams » contained in these partitions and stored in the TechNet on Planning for SharePoint 2013 » file system, without requiring a separate dip into SQL for metadata or for security entitlements — another huge improvement over SharePoint 2010, where the merge of results and security prevented deep refinement and also could bring performance to a snail’s pace.

Index partitions are separate, which provide a lot of flexibility. They can be stored individually on disk in a file set. Alternately, they can be further divided into discrete sections containing a unique index component.

SharePoint 2013 The essential guide to enterprise search 45 Chapter 4 Getting under the Hood

Microsoft has also developed a new one or more replicas of the index. The indexing nomenclature to describe the structure of the component is responsible for managing and index. In FAST Search for SharePoint 2010, the distributing the index across partitions. If an structure of the index and configuration was additional partition is added, the indexing described in terms of rows and columns. Adding component is responsible for the re-distribution columns increased the amount of content you of data across all the partitions. It is important can index and adding rows increased query to note that you can add additional partitions volume throughput and redundancy. without re-indexing the data, but removal of a partition will force a complete re-indexing of all In SharePoint 2013 they have now adopted content. a Partition/Replica model to define functions within the overall search index, as shown below. A Simpler, More Robust Approach Partitions are logical divisions of the overall This new structure of the search index in search index. The entire index is composed SharePoint 2013 allows for a fully redundant, of the aggregation of all the primary replicas scalable means of indexing content. The fact across the logical partitions. When content is that you are not copying index files from sent to the indexing component, a transaction server to server and row to row means is generated to acknowledge receipt of the there is considerably lower latency to making content. Each partition then indexes the search indexes replicated and available. This content from this transaction log. Secondary also significantly reduces the server to server replicas are created as read only copies of the chatter that existed in previous versions. Each primary replica for scaling query volume of partition operates independently thereby adding redundancy to the overall architecture. increasing throughput and performance of the overall search sub-system.

In a nutshell, the benefits of this approach are:

1 Better indexing throughput

2 Less network chatter

3 Faster availability of the search index.

As previously mentioned, the indexer is now atomic, which is a major breakthrough in search technology. Though the change is invisible to you, so you’ll notice that it’s more robust and Within a partition, there is only one primary that you can do ‘normal’ backup and restore. replica that is responsible for writing data in Indexing and partitioning are deep stuff, and this the partition. Each partition can be served by is a new core capability done well.

SharePoint 2013 The essential guide to enterprise search 46 Chapter 4 Getting under the Hood

Analytics The Analytics Processing Component extracts Analytics are an often overlooked area, but have a two kinds of information: crucial role in search — both in providing insight • Search analytics information such as into user behavior and system operations, and in links, anchor text, information related to improving the user experience. SharePoint 2013 people, metadata, etc. from items that has a new analytics architecture, which merges it receives via the content processing web analytics (where people click and navigate) component and stores the information and search analytics (what people search for and in the link database. what results they get). This is a great improvement • Usage analytics information such as the over SharePoint 2010, where the web analytics number of times an item is viewed, from service application was quite limited in both the front-end via the event store. capability and scale. The result is called the Web Analytics Platform, which has been completely The analytics processing component analyzes redesigned and integrated into the search service both types of information. The results are then application of SharePoint 2013. returned to the content processing component to be included in the search index. Results from The analytics architecture consists of the usage analytics are also stored in the analytics analytics processing component, analytics reporting database for reporting purposes. The reporting database and link database (as shown analytics component updates the SharePoint below). The analytics processing component search index at time intervals set via a timer analyzes crawled items (search analytics) and job, so it is independent of the crawl schedule. how users interact with search results (usage This can be confusing if you are trying to analytics). It uses the information to improve understand why search relevance changed. search relevance, and to create search reports, There is an extension point for custom events, recommendations, and deep links. but the analytics processing and search index update data flows are sealed from enrichment updates outside the SharePoint 2013 crawl.

The results are most visible to the user as reports and recommendations. But there are several other ways that analytics shows up: • Search relevance is enhanced based on user behavior (views, click thru, etc.) • Popularity of content and of topics in discussion threads — which is driven from number of views as well as number of unique users to view — and can be viewed directly

SharePoint 2013 The essential guide to enterprise search 47 Chapter 4 Getting under the Hood

• Popularity can also be used to create in the event store within the Web Front End views through the Content by Search (WFE) server and are regularly pushed to (CBS) Web Part the analytics processing component where the actions are analyzed and reconciled. They Usage analytics in WCM are particularly are then pushed into the analytics reporting important, since they provide essential insight database and made available to the query into the effectiveness of your web site. These and processing components. This allows analytics are search driven, built to scale (scaling for search to keep track of user actions, was a weakness in SharePoint 2010), and queries, and trends to provide the user open for extension. A “Top Pages” web part is with better search results and suggestions. included by default. Some data like view counts This database now powers features such as are also pushed into the index so it can be personal and engine-wide query suggestions, included in search results, sorted on (i.e. what’s favorites, and other search personalization most viewed), etc. components not found in any other enterprise Personalized search queries and personal query search platform today. suggestions in SharePoint 2013 are based on Within the analytics system, there are five parts: analytics data and usage information for each user. Recommendations (both item-to-item • Event: Each item comes into the system and popularity based) are available through this as an event with certain parameters approach, as shown below. The “recommended • Filtering and Normalization: Each for you” list is simply a preconfigured Content event is looked at for special handling, by Search web part — it looks like a static list normalization, and filtering; some are but it’s generated dynamically by search. filtered out

The addition of both the Link database and • Custom Events: You can configure up the Analytics Reporting database provide for to 12 custom events in addition to what a great deal more personalization, analysis, comes OOB and relevancy within the engine. The Analytics • Calculation: Sum or average across events reporting database has been added to keep • Reports: A number of default reports track of all forms of analytics. Search Analytics are available, including top queries, most analyze crawled items and how users interact popular documents in a library or site, and with search results. These actions are stored historic usage of an item (view counts)

SharePoint 2013 The essential guide to enterprise search 48 Chapter 4 Getting under the Hood

The figure below shows an overview of the provided by the engine, as well as improving the data flow for usage analytics, usage events, and quality of queries the user issues. recommendations.

Deeper Dives TechNet overview of Analytics in SharePoint 2013 »

Federation and Result Sources Federation has been present in SharePoint since Microsoft released Search Server 2008 and Service Pack 2 for MOSS 2007. In a nutshell, this is the ability to query multiple search indexes on behalf of the user and to return all of these results together in a single view. Thanks to federation, users no longer have to use multiple search centers in order to search all content accessible within their organization. Note that 2010 web analytics aren’t supported Instead they can go to a single search page and running 14 mode, so running in 14 mode get all results available in means running without any analytics. one place.

Better Analytics Mean Better Search The quality of search results has direct correlation to the quality of the query and the volume of information that you provide to the search engine. In SharePoint 2013, the addition of the analytics reporting database significantly increases the quality and quantity of information that is provided to the search engine. Knowledge about the person asking the question and the community asking the question greatly improves the quality of results

SharePoint 2013 The essential guide to enterprise search 49 Chapter 4 Getting under the Hood

Federating or Indexing? worker), this would represent over half a billion Whenever someone is newly introduced items to index! Finally, the remote source to federation the immediate next questions may not allow for crawling, technically or that come up is: how does federation relate through license restrictions (imagine a secured to indexing? Why should I continue to index deep-web content provider). In these cases remote systems if I can federate these? federation is pretty much the only way to go. The truth is that indexing, if possible, is always Result Sources for Federation better. If you index the content you can control in SharePoint 2013 relevancy, freshness, performance, faceted SharePoint 2013 offers improved federation navigation, and filtering for the end-users capabilities via a functionality called Result (among other things). When you federate Source. On top of the Open Search protocol across search indices, you essentially relinquish already supported in MOSS 2007 and control of these and become dependent on SharePoint 2010, you can now federate results what the other system is capable of. With from remote SharePoint farms via result federation, your page will also be as slow sources. This allows SharePoint 2013 to better as the slowest search engine queries and as cover distributed organizations. A result source relevant as the weakest sear engine queried. So is quite easy to configure, as shown below. federating results must be done carefully.

Federation has proven very useful for scenarios where indexing may not be desirable or even feasible. For instance, your content is spread across multiple offices While the options on SharePoint 2010 to with low bandwidth connection, making any provide organization-wide search were remote crawling last for days. In such conditions, limited to a multi-search center or a published you would not be able to keep your index fresh centralized search service, SharePoint 2013 enough for your end-users. Another scenario let you federate across farms. You can now is when you have so much content to index have one farm per region or office location that it may not fit within a farm. Imagine, for and federate results across farms using results instance, a 50,000-employee company wanting sources. You can do the same between your to search across SharePoint and e-mails. Even intranet and extranet farms. at a low estimate of 10,000 items per mailbox While simple on the surface, this functionality (that’s roughly six months for an information fills a serious gap that existed in the overall

SharePoint 2013 The essential guide to enterprise search 50 Chapter 4 Getting under the Hood

scalability of SharePoint 2010. In the Security via oAuth marketplace, FAST and SharePoint were SharePoint 2013 can also provide security- being criticized for not having a global systems trimmed results in a much more streamlined architecture. The approach was to tell users way. The Kerberos protocol is no longer a to centrally index all content in a large pre-requisite to providing security-trimmed central farm, if the latency allowed. For global results. Instead SharePoint 2013 offers strong organizations, this was often not feasible. security support through federation by leveraging the claim-based authentication There are limitations to the remote result mechanism built into SharePoint 2013 or by source construct. It is limited to SharePoint using the single-sign-on/secure store service. 2013 and requires that all federated farms A trust must be established between the be upgraded to SharePoint 2013. Results are farms using a new method called oAuth which not interleaved, which is what users typically allows the passing of the current user’s claims expect; rather, they are provided in result to the remote farm when making the search blocks. Refiners are also not combined in any request. This is similar in concept to establishing way. Overcoming these limitations is an exercise a trust between servers to consume service left to partners. But despite these limitations, applications. oAuth is a new methodology remote result sources are a major step forward replacing Kerberos shared authentication. and a great feature to use. When combining result sources and result Result sources also take over the function of blocks, administrators can offer their users a scopes in SharePoint 2010. They are a more single list of results comprised of both local and powerful tool than both scopes and federation, remote results. The remote results are shown and are worth getting to know. as result blocks (one per source) either above Exchange 2013 Result Source all results, or merged within the local results SharePoint 2013 also allows administrators returned. Note however that faceted navigation to federate results between SharePoint and and property filtering are still driven by local Exchange, providing a unified search experience content only and do not reflect any filters or where users can search both SharePoint facets available from the remote indexes. content and their mailboxes through a Office 365 and SharePoint Online single search center without having to index. Office 365 has rendered organizations more Exchange remains in control of indexing the agile by enabling them to consume SharePoint mailboxes and users can search across systems as a service without having to worry about using federation with no additional hardware capacity, backup, or maintenance. However it requirement. This is available because Exchange also created a new challenge as organizations 2013 has the same underlying search core (see migrating to the cloud were now facing siloed the Exchange Search chapter) data with some content available online and

SharePoint 2013 The essential guide to enterprise search 51 Chapter 4 Getting under the Hood

some content available within the organization The figure below shows an overview of the network only. There was no single place to Exchange Search architecture in Exchange search both sets of content from. SharePoint 2010. Full-text indexes are not stored in your 2013 solves this scenario by enabling Remote Exchange databases. The search index data for SharePoint result sources to also support a particular mailbox database is stored in a SharePoint online, therefore enabling scenarios directory that resides in the same location as where SharePoint online can federate with the the database files. on-premise search engine or vice versa. Result In Exchange 2013, the exsearch capability is sources represent a key piece of technology to replaced with a new search engine and index. help organization migrate to SharePoint online.

Deeper Dives Microsoft’s comparison of indexing vs. federating » TechNet — configuring result sources » Federation Use Cases » Federation vs. Indexing »

Search in Exchange Search in Exchange 2013 has been given a This provides a much more powerful, more facelift. Pull back the curtain, and it is the same effective search for exchange users — available new search core used with SharePoint 2013, through Outlook and Outlook web access alike. optimized for large volumes of e-mail. Another significant outcome of this change To provide some comparison, Microsoft is that Exchange 2013 can appear as a result Exchange Server 2010 Search allows users to source to SharePoint 2013, as introduced in the perform full-text searches across documents chapter on Federation. This opens up a number and attachments in messages that are stored in of scenarios combining e-mail and other their mailboxes. Exchange Search (also known documents. In previous versions of SharePoint, as full-text indexing) creates the initial index you had the ability to connect to, and index by crawling all messages in mailboxes within exchange public folders but not personal an Exchange 2010 database. As new messages inboxes. That remains the same with SharePoint arrive, Exchange 2010 Search updates the 2013 (unless third party connectors are used), index based on notifications from the Microsoft but now there is an ability to federate to Exchange Information Store service. exchange.

SharePoint 2013 The essential guide to enterprise search 52 Chapter 4 Getting under the Hood

The key concept to understand in regard to A Unified View is a Better View this functionality is that each system handles the Since the first release of SharePoint, there has data resident within its silo (e-mail, tasks, contacts always been a desire to be able to support in Exchange 2013 and Documents and lists in searching your personal inbox to provide a SharePoint 2013). As discussed in the Federation more holistic view of your information. In chapter, there is some downside to this approach — federation does not provide the same content processing, relevance, or performance as indexing. But this level of integration between SharePoint and Exchange is a wonderful feature that will help many users. You can get a single view across Exchange and SharePoint, as shown below.

One of the new key features in SharePoint 2013 that relies heavily upon this tight integration between SharePoint 2013 and Exchange 2013 is the new Enterprise Content Management (ECM) stack and the associated e-Discovery previous versions of SharePoint, there was components. From the e-Discovery perspective, support for indexing content from Microsoft the integration of SharePoint and Exchange allow Exchange, but only in public folders. With for in place preservation of information within the release of SharePoint 2013 and the fact SharePoint and Exchange. The e-Discovery that Exchange 2013 is using the same search console allows for dashboard view of integrated, infrastructure it is now possible to provide enterprise-wide case management. federated access to personal inbox results within SharePoint 2013.

The primary benefits of this approach are:

1  Exchange 2013 and SharePoint 2013 leverage the same core search sub-system

2  Possible to include federated personal inbox results from Exchange 2013

3  Eliminates the need to re-index all inbox data within SharePoint 2013

SharePoint 2013 The essential guide to enterprise search 53 Chapter 4 Getting under the Hood

Simpler Architecture Means Deeper Dives Simpler Administration TechNet — What’s new in Exchange 2013 » SharePoint 2013 search is simpler to administer Overview of eDiscovery and In-Place Holds on many levels than SharePoint 2010 was. Part (SharePoint 2013) » of this is that there is only one search engine core, and no hybrid architecture (see the One Search Core chapter). For FAST Search for SharePoint, you had to install two farms (a Search Administration FAST farm and a SharePoint farm) and make There tends to be a preconception that search them work together, including creating multiple requires no administration. This is due in part search service applications. There was extra to the simplicity of the search interface and the work in installation, extra work in configuration, general lack of awareness of how search works. and extra work in reconfiguring sites. There But it is also due to people’s experience of was also more troubleshooting because the web search, where they don’t have to do any architecture was more complex. upkeep. Little do they realize that Google.com has over 4,000 people administering search full There is now only one search core, only time! one installation, and only one search service application. There is a much simpler Administering Enterprise search doesn’t architecture, as shown below. As a result take that much work, but it does need to be SharePoint 2013 is much simpler to install, someone’s job (even if not a full time job). configure, and troubleshoot as a result. There are two main levels of administration: system administration (installation, configuration, topology management), and search administration (rules, best bets, looking for no-results searches).

SharePoint 2013 The essential guide to enterprise search 54 Chapter 4 Getting under the Hood

Multiple Administration Components As a new thing in SharePoint 2013, you now As mentioned in a previous chapter, the Search have site collection level search administration Administration Component is now fault too. It’s pretty similar to central administration, tolerant, a big advantage for SharePoint 2013. naturally with a few limitations. Site collection The administration database now contains administrators can set up and manage App only configuration and log information (it also catalogs, do term store management and held security entitlements in SharePoint 2010). User Profile Management, as shown in the There are new tools to export and import screenshot below. Site collection administrators configuration information, including PowerShell also have the power to manage some search commands, so there are some very cool things settings in their site collections — a huge step you can do in configuration management. forward.

Administration at Multiple Levels Central Administration is still your friend with Search Administration at this level is pretty SharePoint 2013, and still the place where you comprehensive, as you can see just by looking create search service applications. You will find at the search administration screen below (note some new services applications on SharePoint that this is from Office 365, where it’s called 2013 (such as the Machine Translation Service) tenant administration) and improvements on existing ones. But many of the operations will be familiar. The screenshot below shows an example list of service applications.

SharePoint 2013 The essential guide to enterprise search 55 It is natural that this level of administration was sources in order to give powerful search options introduced in SharePoint 2013 because of the to their end users. Query Rules and Result Types emphasis on running multitenant in the cloud. can be managed down to the site level. These Site collection administrators can start crawls; have a wizard for configuration (for example, the create result sources, and much more. This query builder interface) with a built-in preview includes creating managed properties, which of what the results look like. Result Sources are could only be done via central administration easy to manage, as shown below. in SharePoint 2010, despite the fact that site collection or site administrators typically understand their content and crawled properties much better than central IT.

Site administrators also have much more power with SharePoint 2013. They cannot create managed properties, but they have significant control over search — which applies to their sites. The table below shows some examples of what Site Collection and Site Administrators can do. There are very significant improvements in Analytics, resulting from the new Analytics module. There are also better crawl reports, and process reports (see below). Since the Host Controller (described in the One Search Core chapter) is monitoring all NodeRunner processes, it can give the administrator a Administering the New Mechanisms lot of insight into the system operations. In other chapters we described new mechanisms like Query Rules, Result Types, and Result Sources. These are very powerful for the administrator. A search service application administrator can create result sources, and the site collection administrator’s site owner and site designers can also create and configure result

SharePoint 2013 The essential guide to enterprise search 56 Chapter 4 Getting under the Hood

PowerShell like in SharePoint 2010, but in 2013 site collection administrators now have the ability to call a specific ranking model defined by the SSA admin from within query components at the site level. This means that site collection administrators can do much more with relevance control and ranking, choosing from a library created by the central administrator.

PowerShell is available at all levels: central, site collection, and PowerShell is Your Friend site administration, which gives PowerShell support was added to SharePoint you much more power. For example, we can 2010 and many administrators fell in love create a PowerShell script for configuring all our with it — for good reason. There are even search settings from the very beginning, from more PowerShell options in SharePoint 2013. creating a search service application, modifying This includes more PowerShell commands for its settings, creating the content sources, etc. search: general search administration, crawling, PowerShell can also retrieve, create, or modify search service application, querying, metadata, query results. In addition PowerShell can get and topology. In SharePoint 2013, PowerShell keywords, modify ranking models, and more. If can now manage content sources and crawlers, you haven’t learned PowerShell already, you will not just report status. There are new options definitely want to learn it now. for creating a new search topology based on Big Advances in an XML configuration file, along with export Search Administration and import commands. This means you will Search administration is still a complex task in be able to create the same search topology SharePoint 2013, but Microsoft made the job in your staging environment, in your test much easier in this new version. The new single environment, development environment and search core provides the power of FAST with a production environment. This can be very useful much simpler configuration than FAST Search for performance testing, custom development, for SharePoint. Search topology administration creating standardized configurations, etc. is still complex, but the topology can be much Ranking models are still configured via bigger and much more powerful. There are

SharePoint 2013 The essential guide to enterprise search 57 Chapter 4 Getting under the Hood

improvements on administration from all sides: Upgrade and Migration crawling, content processing, query processing, You will love the capabilities of SharePoint analytics, and user experience. This is a search 2013, and you probably own them already. But that administrators can learn to love. how much pain is it to move to them? Many organizations endured a very painful move from SharePoint 2003 to SharePoint 2007 and are still wary, despite a generally smooth move Deeper Dives from SharePoint 2007 to SharePoint 2010. TechNet — Index of Windows PowerShell cmdlets The good news is that this release has put a lot for SharePoint 2013 » of focus on upgrades and there is a lot of good Technet — search topology in SharePoint Server material. In order to move customers on O365 2013 » to the new release, Microsoft had to develop SharePoint 2013 Developer Dashboard » techniques for doing this more smoothly than TechNet — Manage the search schema in SharePoint 2013 » ever before. TechNet —View search diagnostics in SharePoint The bad news is that upgrades are still tricky, Server 2013 » especially for large and highly customized SharePoint farms. Even though the upgrade itself is fairly straightforward, there are usually lots of factors besides the software itself — the hardware necessary to handle an upgrade (there are no in-place upgrades to the new version), the user awareness and education, and the work needed to take advantage of new features.

There are techniques that can reduce the risk and pain of upgrades, especially for search. These include things like use of cross-version federation and ‘search-first migration’. But let’s start with a look at the standard basic upgrade.

Database Attach Upgrade The only upgrade method for going from SharePoint 2010 to SharePoint 2013 is a Database Attach Upgrade. (In-place upgrades are now only for build-to-build changes). This works for both content databases and services databases.

SharePoint 2013 The essential guide to enterprise search 58 Chapter 4 Getting under the Hood

The search databases have changed significantly users to preview an existing site in “15 mode”. with SharePoint 2013. The search administration Deferred site collection upgrade permits use database supports a database attach upgrade, of SharePoint 2010’s UI with fewer operational but the search index databases do not. As with hassles, while retaining the master page, JScript, essentially all search engines, to do an upgrade SPF, and CSS applications of SharePoint 2010. you will need to recrawl your content. One This is an expensive operation, so you very nice advantage with SharePoint 2013 probably don’t want to use it everywhere, is that you can use PowerShell to make this but it is a great facility to allow for safe, well happen with much less effort. managed upgrades — both from the software The Database Attach method does help a lot perspective and the user perspective. with search. Content sources, server mappings, With Search, an upgrade of search centers scheme, federated locations, scopes, best bets, generates result templates that include the and the like are all preserved and upgraded. As hover panel, and which have previews (when mentioned in the search administration chapter, a separate Office Web Apps server or set of there are tools for configuration import and servers is available). Scopes are upgraded but export as well as PowerShell commands can’t be changed — they are replaced by the that can do very interesting things, including new Result Templates, but the corresponding automate and tailor the upgrade process. result templates aren’t generated automatically. Deferred Site Collection Upgrade Working Across Versions: The visual upgrade available in SharePoint Search-First Migration Server 2010 has been replaced by a deferred Since search is greatly improved in SharePoint site collection upgrade in SharePoint 2013. This 2013, it may be worth considering a “search allows existing 2010 site collections to work first” upgrade. This lets you get the benefit unchanged in SharePoint 2013. No SharePoint of the new features and capabilities, without 2010 installation is required; SharePoint 2013 needing to do everything at once. You can has all of the required SharePoint 2010 files upgrade your content farms at any pace that included. works for you, while serving everything from This process is much safer, because it is deeply SharePoint 2013 search. backwards compatible. It is the default for all This pattern uses something called SharePoint site collections upon a database upgrade, which 2013 Federated Services. Only a few federated then automatically are running in “14 mode” on services support this: Search, Profile, Social, SharePoint 2013 servers. There is a new facility Secure Store, Managed Metadata, and BCS. But for health checks along with the upgrade, and this is everything you need to do a search first a cool capability to create Upgrade Evaluation migration. Sites. Essentially, this makes a side by side copy of an existing site collection, and allows

SharePoint 2013 The essential guide to enterprise search 59 Chapter 4 Getting under the Hood

There are several steps to a search-first migration, as shown below.

Sequentially the steps are as follows: Hybrid Solutions When you talk about the upgrade of Search 1 Deploy and configure a new SharePoint from SharePoint 2010 to SharePoint 2013 — 2013 Services farm, including a search there is the potential for some hybrid solutions center. Migrate the search settings from the using different versions of SharePoint or using SharePoint 2010 farm. When the search-first cloud and on-premise SharePoint instances in migration is complete, this farm provides the same company. search functionality to end-users who are still working in the SharePoint Server 2010 farm. Generally, hybrid means a combination of on-prem and cloud content in a single view. 2 C rawl all content in the SharePoint Server There are several ways to accomplish this, 2010 farm by using a crawler (or multiple including indexing and federation — as crawlers) in the SharePoint 2013 services farm. mentioned in the Federation chapter. The figure Continue to crawl this content regularly. below illustrates the various permutations of 3 Configure the SharePoint Server 2010 farm hybrid configurations. to consume search from the 2013 services farm, using federated services. Some things will be best consumed by doing redirects (for example using a new search center with the new functionality can’t be done via federated services).

The search-first migration pattern opens the door for a much wider set of possibilities — hybrid solutions.

SharePoint 2013 The essential guide to enterprise search 60 Chapter 4 Getting under the Hood

Crawling and indexing content from the cloud The same idea applies to more general (such as from O365) is a very solid way to scenarios. When you have more than one create a unified view, and has the benefit that SharePoint farm, you can handle cross-version indexing generally has: unified content processing, scenarios. You can have a Search on SharePoint solid and consistent relevance and navigation, 2013, while you have content and other and consistent fast performance. Although this applications on SharePoint 2010 or even in scenario is not supported by OOB connectors SharePoint 2007. You can have SharePoint 2013 with SharePoint 2013, there are partner-built in the cloud with SharePoint 2010 on-prem. connectors that accommodate it. You can include other content in the cloud that should be crawled, such as Microsoft With SharePoint 2013, the remote result CRM online or SalesForce.com. With these source construct means that a view can be techniques, it’s possible to field a very broad created using federation, specifically between with different versions and different options. O365 and on-prem SharePoint. There are This helps with many things, including migration. limitations to the remote result source construct. It is limited to SharePoint 2013 Federation applies well to cross-version and requires that all federated farms to be scenarios. Although SharePoint 2013 only upgraded to SharePoint 2013. Results are not supports same-version remote result sources, interleaved, which is what users typically expect; it is feasible for partners to create federation rather, they are provided in result blocks. across multiple versions, which appear as a And refiners are not combined in any way. result source. A configuration like the one Overcoming these limitations is an exercise shown below provides many benefits. With left to partners. But despite these limitations, respect to upgrades and migration, it means remote result sources are a major step forward that legacy search systems can be left in place and a great feature to use. and federated into a SharePoint 2013 search center. While this is not as good as combining Cross-version Configurations all content into a common index, it is a very How do these scenarios help with upgrade and useful technique that allows you to upgrade or migration? If you extend them to cross-version migrate complex systems a piece at a time. configurations, it becomes clear. Search-first migration is an example of crawling on-prem content from on-prem search (the upper left scenario in the figure above), but across versions. By crawling SharePoint 2010 content from SharePoint 2013 search, you can provide an upgrade path that can be done a step at a time, maximizing the benefit to users while minimizing initial effort.

SharePoint 2013 The essential guide to enterprise search 61 Chapter 4 Getting under the Hood

Cross-version Hybrid Configurations The case of migrating from SharePoint 2010 A variant of this is support of cross-version search to SharePoint 2013 search is the best hybrid configurations. Specifically, you may wish supported one. There are some gotchas in this to adopt SharePoint 2013 online via O365 to migration as mentioned throughout this e-book, have different versions. You may not actually but the process is generally smooth and well have a choice in the matter, since O365 will covered by Microsoft. Going from SharePoint shift to SharePoint 2013 fairly quickly, faster than 2010 search to SharePoint 2013 is a step up you may be ready to upgrade your on prem in nearly every way, so there aren’t that many SharePoint farms. But the remote result source rough spots to consider. If you are migrating mechanism in SharePoint 2013 is very powerful, from FAST Search for SharePoint, many of the and has solved many of the toughest aspects of same tools and techniques apply, but there managing hybrid configurations with O365 — are more corner cases to consider and more such as security and single sign-on. It is feasible feature changes to consider. (though not OOB) to apply this to a cross- If you are moving from FAST ESP or FAST version hybrid configuration as shown below. Search for Internet Sites, there are significantly more considerations. The migration patterns and techniques still apply, but you are more likely to have a heavily customized search deployment that uses special FAST features which have been supplanted by other mechanisms. There is help available however. Microsoft has a big ecosystem of partners and there are some that have specific focus, tools, and techniques for this kind of migration. You may not get direct support from Microsoft, but you can tap into this ecosystem for help. Migration From Previous Search Versions Summary — Options for The changes in SharePoint 2013 search are Upgrade and Migration powerful and far reaching. Fielding a single new Upgrading to SharePoint 2013 can be seamless, search core resolves a historical challenge with and there are valuable tools and processes Microsoft search (a complex product lineup provided OOB. The only supported approach with many different versions). One tricky aspect is a database attach upgrade, so you should of this change is that migrating from previous expect to provide extra hardware resources for search versions depends on the flavor of search your upgrade. But the deferred site collection you are migrating from. upgrade facility provides a safe approach to upgrades and lets you delegate the work for

SharePoint 2013 The essential guide to enterprise search 62 Chapter 4 Getting under the Hood

each site collection to the appropriate owner if you like.

Upgrading search is part of upgrading SharePoint, and the standard upgrade process from SharePoint 2010 to SharePoint 2013 covers search well. But search poses special challenges — the more complex and customized your search configuration is, the more challenging the upgrade will be.

Search also offers solutions to many upgrade challenges for SharePoint as a whole. Since search bridges information silos, it can bridge across different farms, across different versions, and across on-prem and in the cloud instances. Techniques such as search-first migration, crawling remote farms from SharePoint 2013, and use of federation are available — not OOB but through Microsoft’s ecosystem. A unified view across these different dimensions provides users a great experience while allowing you to upgrade or migrate one piece at a time.

Deeper Dives Services upgrade overview for SharePoint Server 2013 » SharePoint Online administration » BA Insight resources for Integrating O365 content » TechNet — SharePoint Server 2010 deprecated search features » TechNet — FAST Search Server 2010 for SharePoint deprecated features »

SharePoint 2013 The essential guide to enterprise search 63 CHAPTER 5 Applications and Development — New Models for Search-Based Applications

64 Chapter 5 New Models for Search-based Applications

The New Development 2010 CSOM is a Windows Communication Foundation (WCF) service with three different Model in SharePoint 2013 proxies to enable Silverlight, JavaScript, and With a new development model for SharePoint .Net managed clients to call into SharePoint 2013 and for search, the capability to extend remotely. With SharePoint 2013 the server search is much more accessible. We think this side code runs off the SharePoint server farm development will foster a lot of exciting search- via declarative hooks like apps, declarative based applications. workflow and remote events which then Development with SharePoint 2013 emphasizes communicate back to SharePoint using CSOM standard web technologies such as JavaScript or REST. and HTML, client side programming and remote calls. There’s a focus on running applications in the cloud, and there are several options for extending the out-of-the-box capabilities of the product. There is also the option to build business solutions with no or minimal use of server-side code.

JavaScript and modern HTML and CSS know-how are important for the UI designer and developer on SharePoint 2013. It should be easier for designers to use tools they are familiar with. Visual Studio 2012 offers strong tooling for both Office 2013 apps and SharePoint 2013 applications and solutions. A There are lots of advantages to this model. key goal the SharePoint 2013 for customization Traditional SharePoint development was heavy scenarios was to make developing applications lifting and had a steep learning curve; the for SharePoint much more like developing new SharePoint 2013 model is much more Facebook apps. manageable which will open up SharePoint to A New Programming Model a much wider audience of developers. Server- The figure below gives a birds’ eye view of side code can impact the performance of the changes between the SharePoint 2010 SharePoint, be complex to install and upgrade, and SharePoint 2013 programming models. In and can’t be run on public cloud services. SharePoint 2010, your custom code ran either The CSOM in SharePoint 2013 is much more server-side in SharePoint (as fully trusted code powerful — you can do almost everything or in a sandboxed solution), or via a Client the server side APIs did in SharePoint 2010. Side Object Model (CSOM). The SharePoint In addition, it supports OData — now the

SharePoint 2013 The essential guide to enterprise search 65 Chapter 5 New Models for Search-based Applications

leading industry protocol for performing CRUD 3 Azure Auto-Hosted App (which runs in an (Create, Read, Update and Delete) operations Azure instance which is invisibly provisioned against data, as shown below. Depending on by Office 365) your deployment scenario, you can still use Apps are simple and powerful, but they have a sandbox and farm solutions to push server side number of limitations, and there are still many code to SharePoint 2013, however, Microsoft cases where SharePoint solutions are called recommends that developers follow the new for instead. Anything that uses server-side app model as the preferred way of building code, does farm-level work, has a high level their custom applications for SharePoint 2013. of complexity, or has installation coupling or The message is “don’t make any new sandbox dependencies calls for a SharePoint solution solutions” and “build new farm solutions only if rather than a SharePoint App. you absolutely have to”. What’s Special for Search The SharePoint 2013 Search CSOM opens most (but not all) of the Query object model functionality for online, on-premises, and mobile development; Apps for SharePoint the search results data is in JavaScript Object There’s a new way of packaging and deploying Notation (JSON). Queries support two code in SharePoint 2013 which is aimed at language syntaxes: KQL (Keyword Query development of lightweight apps. Apps for Language) and FQL (Fast Query Language); SharePoint don’t live in SharePoint. They SQL is no longer supported. execute in the browser client or on a remote In addition to the CSOM, there is a REST Web Server; they’re granted permission into (Representational State Transfer) service, so SharePoint sites via OAuth (a standard for you can remotely execute queries against the providing delegated authorization to apps); they SharePoint 2013 Search service from client communicate over the new SharePoint 2013 applications by using any technology that CSOM APIs. There are three types of apps you supports REST web requests. The Search REST can build for SharePoint 2013: service exposes two endpoints, query and 1 SharePoint-Hosted App (which runs within suggest, and will support both GET and POST the browser) operations. Results are returned in either XML or JSON format. 2 Provider-Hosted App (which runs on another web server in the datacenter or At one level, a search app is just another cloud) SharePoint app, and a search solution is

SharePoint 2013 The essential guide to enterprise search 66 Chapter 5 New Models for Search-based Applications

just another SharePoint solution. This is automated language translation of files (think revolutionary enough: it means you can use multilingual search), and the Work Management search via a REST interface, include it in an Service that provides task aggregation Office App, and use it easily in combination functionality. with other parts of SharePoint. But customizing If you are doing query-side-only work, you search also means creating or customizing might be able to use an app model. But for the connectors using BCS or a protocol handler most part, developing sophisticated search- (see the content capture chapter), customizing based applications will remain the domain of linguistics using the Content Enrichment Web SharePoint solutions with SharePoint 2013. Service (CEWS) (see the content processing, There are several things (connectors and linguistics, and CEWS chapters), working with pipeline extensibility) which are still per SSA other service applications, and more. There are and not per tenant. numerous search-specific web parts, including the new Content by Search web part (shown Building Search-based Applications below), which is a powerful “swiss-army knife” SharePoint 2013 is a great platform for tool. building search-based applications. These run a wide gamut, from configuring departmental centers using query rules and result blocks, through extending content processing to add domain-specific entity extraction, to creating brand new user experiences. They are often specific to role, industry, and topic — and they usually have a strong and measurable business value because the end users use them for specific purposes. Wherever there is an identifiable group with a need to work with unstructured content (or a mix of structured and unstructured content) there’s a need for a search-based application. We discuss this more in the chapter on Search-Based Applications.

Nothing is perfect, and there are still challenges Search combines well with other parts of with the search development model. Some of SharePoint — with content management, the limits of SharePoint 2013 include: with workflows, with BI, and with sites. It also • On the content side, there is no ‘push’ can be used with several of the new service API for content, nor an ability to do applications in SharePoint 2013. These include partial updates. Continuous crawls the Machine Translation Service that supports

SharePoint 2013 The essential guide to enterprise search 67 Chapter 5 New Models for Search-based Applications

are limited to SharePoint content only. Developing with search is still hard. Intrinsically, There’s no mechanism for getting areas like content processing and relevance external data indexed into O365. are imperfect, since we’re dealing with human Developers that want to approximate language and subjective opinions of the ‘right’ these from the outside have to live with answer. There are no joins or aggregation limited performance and build a very internal to search so there are limits to complex structures or use third party combining structured and unstructured content. frameworks. But SharePoint 2013 is far ahead of any search • Many of the mechanisms inside search platform in terms of available capabilities, are sealed and can’t be extended. performance, ease, and safety of development. Update groups, query flows, analytics And there is a strong ecosystem with available processing, web crawling are examples. building blocks and complementary capabilities It’s completely understandable that to use in creating great applications with search. these be kept intact from meddling developers, and there are some of these that can be influenced safely using partner products. But it’s frustrating to see these mechanisms and not be Deeper Dives able to touch them. Book chapter on developing with search from Wrox • The SharePoint App model and “Professional SharePoint 2010 Development” » SharePoint Marketplace are aimed MSDN overview on developing with SharePoint 2013 » at lightweight, simple apps and not MSDN section on building search queries with something you would use for a full SharePoint 2013 » business application today. SharePoint 2013 Developer Dashboard »

SharePoint 2013 The essential guide to enterprise search 68 Chapter 5 New Models for Search-based Applications

The Content Enrichment Big Changes in Pipeline Extensibility Web Service (CEWS) CEWS replaces FAST Search for SharePoint’s As mentioned in the Content Processing pipeline extensibility stage, which had a chapter, one of the biggest changes in number of shortcomings. With FAST Search SharePoint 2013 is the availability of the for SharePoint, an executable was run within a Content Enrichment Web Service (CEWS). sandbox near the end of the content processing This provides a way to add linguistic pipeline — this was a major performance processing of any type, such as concept bottleneck and deployment headache. Some extraction, relationship extraction, geo-tagging, crawled properties were available, but derived summarization, etc. With FAST Search properties were not. No properties could be for SharePoint, it was possible to extend modified — you could return things in new the content processing pipeline through a properties but only to a limited extent. And the sandboxed application, but this was both slow executable was called for all content, so filtering and limited in the information it could access. logic was needed outside the pipeline, and the SharePoint 2013 introduces a much more open performance penalty of calling it was incurred API which makes it possible to add specialized for all content. linguistics at lower levels as well as sophisticated text analytics. With SharePoint 2013, things have changed dramatically. Using a web service callout opens CEWS is the key extension point for content up many options and removes some of the processing — in fact, the only extension point difficulties in writing pipeline extension stages. outside of changing the content or modifying it The processing pipeline passes designated in a custom connector. There is no Content API managed properties (including document in SharePoint 2013 for updating metadata into text) to the remote service. There are hidden search index independent of a crawl. and read-only properties, but some managed CEWS calls an external web service using properties (like Title) can be modified. SOAP via a proxy, as shown below. The mechanism for CEWS is fairly simple: • The content processing component sends a SOAP RPC call to a configurable endpoint over HTTP. • The payload contains an array of property objects. • The web service performs some custom logic on the array of property objects, and returns an array of modified or new property objects.

SharePoint 2013 The essential guide to enterprise search 69 Chapter 5 New Models for Search-based Applications

• The web service must send a response The advantage of this is that you can provide to the web service client within a given custom linguistics even at a fairly low level, and timeout. influence other aspects of the pipeline. The • No specific authentication or encryption control afforded by this is wonderful and will mechanisms are supported as part of the be exciting to those wanting to address specific contract. You can, however, apply your own linguistic processing at a low level. security on the transport mechanism. The disadvantage is that you can’t leverage A trigger condition is registered in the the work done in the pipeline when you are ContentEnrichmentConfiguration object doing external processing, as shown below. which allows control of when the content This not only means extra work as a developer, flow calls out to an external web service. A but introduces the potential that linguistic set of PowerShell commandlets are provided processing could get out of sync. to control the configuration, and there are robust error handling mechanisms built in.

What to Look Out For For those familiar with pipeline extensibility, you will find CEWS easy to use. However, there are a variety of limitations and gotchas to look out for. One key difference in CEWS from FAST Search for SharePoint is where The extensibility call outs are invoked it is called. Specifically, you get content and synchronously, in line with the processing managed properties after document parsing flow, so long-running enrichment tasks but before word breaking, as shown below. or batch-oriented processing tasks will require enrichment data flow management independent of and outside SharePoint 2013. Not all managed properties (or any crawled properties) are visible to the CEWS and less state (potentially useful for supplemental linguistics processing) is exposed than in FS4SP.

Finally, the CEWS is visible as a single logical endpoint to the potentially many content processing flow instances in SharePoint 2013. There is only one ContentEnrichmentConfiguration object active, and only one trigger, etc. This means

SharePoint 2013 The essential guide to enterprise search 70 Chapter 5 New Models for Search-based Applications

that throughput management, and support Search-Based Applications for multiple enrichment stages (more than once instance of taxonomy classifiers or with SharePoint 2013 SharePoint 2013 is designed to support custom entity extractors) need to be managed applications. Many parts of SharePoint operate externally, which will pose some interoperability out-of-the-box as applications (formerly challenges if you are interested in doing multiple called ‘workloads’, although this term doesn’t types of content enrichment. seem to be used much with the new release). * Note: The CEWS call out is not part of O365 In addition to a new development model and is only available for the Enterprise Edition of (covered in the previous chapter), a new App SharePoint 2013. model and App marketplace, and an emphasis on running applications in the cloud, there CEWS is a new mechanism in SharePoint are many capabilities to leverage in building 2013. It has many nice aspects — it is a more new applications. Mobile applications, which standard, higher performance mechanism than played poorly with SharePoint 2010, are fully that available in the past. It also provides the supported now. SharePoint is, more than ever, ability to modify some managed properties, an application platform with a set of prebuilt making it possible to address use cases that applications and apps included. were nearly impossible with FAST Search for SharePoint. Search-based applications are applications like any other, except that they take advantage of CEWS also has limitations, and using it will search technology in addition to other elements require special attention by developers. But all of SharePoint to create flexible and powerful mechanisms have limitations. Overall, Microsoft user experiences. Because search is essential has provided a strong and essential extensibility for dealing with diverse content, especially mechanism that lets you do magic things with unstructured content, applications using search content processing and linguistics. are found everywhere, and their importance is growing rapidly — in step with the explosion of content volume. Yet search is generally not well understood or fully used by developers. Even Deeper Dives though search is simple on the outside, it is complicated on the inside. Many people aren’t MSDN Section on Content Enrichment Web Service (CEWS) » comfortable with the notion of a search-driven application until they see one.

SharePoint 2013 The essential guide to enterprise search 71 Chapter 5 New Models for Search-based Applications

A Platform for with Lync — which provides presence Search-based Applications information and makes it easy to connect SharePoint 2013 is explicitly meant to support with people directly from search results. Site search-based applications. As the figure below search (aimed at making public web sites easily shows, search is built as an extensible platform. navigable) is a big step up with this release as There are both general-purpose search and well. There are also search facilities built into some pre-built search-based applications each site — for example, every document included — and search is also used pervasively library now has a search box at the top that throughout SharePoint, especially in WCM enables users to search across metadata and and MySites. Most importantly there are great the full text of its documents, and the result facilities for deploying apps and applications list is presented as a standard SharePoint view using search, with tooling and hooks specifically rather than as a results page. for application developers. So partners A video search SBA is provided out-of-the-box, and customers can create Search-Based including a pre-built presentation format Applications and deploy them on the same that makes it easy to recognize the video platform. content you’re looking for. There are significant enhancements in video support for SharePoint 2013 generally, including a built-in HTMLHTML 5 video player. The use of video including enterprise podcasts will be on the rise, so video search is now an important facility.

Search Driven Web Content Management Web content management makes extensive use of search in SharePoint 2013. Search makes it possible to create compelling user experiences, and drives several key features. Out-of-the-box Applications There are three ‘general-purpose’ search Content by search — The new Content by applications included out-of-the-box with Search web part displays indexed content, SharePoint 2013. Intranet search -typically used letting you show content dynamically across for all employees to find content throughout multiple site collections. Users don’t know this is the enterprise, benefits from personalized search powered, it just looks like well-presented search results based on search history and content, as illustrated in the screenshot below. rich contextual previews. People search (which For a case like online catalogs, this is an essential includes the advances from SharePoint 2010 mechanism and one that works very well. such as phonetic name search) is integrated

SharePoint 2013 The essential guide to enterprise search 72 Chapter 5 New Models for Search-based Applications

metadata navigation defined from values in the term store is available.

Page hierarchies, URLs, and Topic Pages — Pages and page hierarchies are easily defined from the term store. You can also generate topic pages, which makes SEO straightforward. The figure below illustrates how this works; SharePoint now generates ‘friendly URLs’ which makes this process work like any ‘normal’ site.

There’s an HTML-based presentation template model that makes it easy to fine tune the look and feel, and built-in web part editors to set up the query driving the content presentation, as shown below. This doesn’t require writing any code and is well within the reach of a business analyst. You see immediate previews of what the results will Recommendations — A new recommendation look like. facility is included which can surface suggestions based either on popularity or on correlations between items (see chapter on Analytics)

There are other exciting things about WCM with SharePoint 2013. Standard web design tools and workflows are supported; there are great facilities for content variations including a built-in language translation service; you can publish easily across sites, and video and images are easily embedded and beautifully rendered across multiple devices and resolutions. The URLs generated are clean, and search-engine optimization is directly supported. You can use Metadata Navigation — As described in catalog-enabled sites for scenarios such as a the chapter on refinement and faceted content repository, knowledge base, or product navigation, facets are available for users to drill catalog. But the heart of WCM in this release is into content. In addition to refiners (which search, which makes dynamic page generation are driven from the values in the content), and remarkable site experiences possible.

SharePoint 2013 The essential guide to enterprise search 73 Chapter 5 New Models for Search-based Applications

MySites — Driven by Search e-Discovery — Driven by search The social features in SharePoint, including SharePoint 2013 has gone one giant step MySites are dramatically enhanced, building on further toward fielding a full e-Discovery the capabilities introduced in SharePoint 2010. SharePoint 2013 adds new features that improve and facilitate the enterprise social activities within the organization: you can follow people as well as content, share personal documents easily and keep track of access, keep up-to-date with activities of interest. Under the hood, there are two lists for providing social features: the Microfeed list and the Social List.

Search drives several key social features in SharePoint 2013, even ones where it’s not application. There is now unified discovery apparent that search is used under the hood. across Exchange, SharePoint and Lync, as Clicking on a hash in a post or discussion shown below. Exchange now has the same shows a list of all conversations about that topic search infrastructure as SharePoint, which enterprise-wide. In MySites, users can access makes unifying the search much easier (Lync a list of all SharePoint tasks assigned to them, archives via Exchange). The Discovery Center regardless of which sites the assignments are in SharePoint uses this to provide a unified stored in. They can also see the documents console, with in-place holds that don’t impact they are following, as illustrated below. Another the end user’s ongoing work. There’s more example is in “My docs: shared with me”, which to e-Discovery than search, of course — shows you all the documents shared with preservation, holds, policy management, and you from everyone’s My Documents. It looks export. But search is the cornerstone and like a form view but, in reality, it uses Search is what makes it possible to recall all the underneath to aggregate content from all MySites information needed to react to legal actions, across site collections. Behind the curtain, there’s without getting irrelevant information that you a query against a “ShareWith” field for your have to sift through. name, which also filters out docs shared with everyone. All security trimmed, naturally. The e-Discovery functionality in SharePoint Server 2013 provides is a big step up from

SharePoint 2013 The essential guide to enterprise search 74 Chapter 5 New Models for Search-based Applications

SharePoint 2010, and is probably the first time Search to crawl all file shares and websites that you could consider this to be a full applications. contain discoverable content, and configure There are several parts to e-Discovery: the central Search service application to • A site collection where you perform include results from Exchange Server 2013. e-discovery queries across multiple Any content from SharePoint 2013, Exchange SharePoint farms and Exchange servers 2013, or a file share or website that is indexed and preserve the items that are by Search or by Exchange Server 2013 can be discovered. discovered from the eDiscovery Center. • In-place preservation of Exchange Customize, Extend, and Create mailboxes and SharePoint sites — New Search-Based Applications including SharePoint list items and Search-based applications are found over SharePoint pages — while still allowing a very wide range of roles, industries, and users to work with site content. levels of sophistication. There are common • Support for searching and exporting patterns to these applications; the table below content from file shares. shows just a few of these application patterns. SharePoint 2013 provides models that span a • The ability to export discovered content spectrum from simple configuration, through from Exchange Server 2013 and extension of capabilities, to creation of new SharePoint Server 2013. sophisticated search based applications. The The eDiscovery Center site template creates a new mechanisms in SharePoint 2013 for portal for discovery cases and lets you conduct customizing user experience (query rules, result searches, place content on hold, and export blocks, and result sources) and the ability to content. For each case, you create a new theme SharePoint easily provide a lot of power collaboration site that uses the eDiscovery Case site template. You can export the results of an eDiscovery search for later import into a review tool.

SharePoint 2013 provides in-place holds — content that is put on hold is preserved, but users can still change it. The state of the content at the time of preservation is recorded. If a user changes the content or even deletes it, the original, preserved version is still available.

To implement eDiscovery across an enterprise, you configure SharePoint 2013

SharePoint 2013 The essential guide to enterprise search 75 Chapter 5 New Models for Search-based Applications

for customizing search experiences without any code at all. Many areas can be extended — connectors, content processing, relevance, query processing, and UI — with moderate Deeper Dives effort and standard tools. Fully custom code Book chapter on developing with search from Wrox “Professional SharePoint 2010 Development” » is supported as well. Overview of eDiscovery and In-Place Holds We find that the use of modular building (SharePoint 2013) » blocks speeds the construction of search Blog on using the Content by Search Web Part » based apps dramatically. Since these BA Insight » applications follow common patterns, a Search as a Development Platform » relatively small number of sophisticated TotalView Search-Based Applications » modules can cover a large number of applications. If you undertake a sophisticated search-based application, consider what’s available on the market as well as what you might build yourself — since pre-built building blocks can save substantial time and reduce risk.

Acceleration of Search With SharePoint 2013 Microsoft has taken a big step forward in helping people do more with search: • It is far easier to own and use high-end search capabilities • Search is used pervasively • Some search-based applications are built-in • It is easier to create and operate tailored search-based applications

We expect many more interesting applications to emerge as a result.

SharePoint 2013 The essential guide to enterprise search 76 conclusion

77 Conclusion

This e-book has covered a lot of ground, since SharePoint 2013 has so many underlying changes, new capabilities, and new features. We’ve tried to cover everything in concise, readable chapters, across five major sections: User Experience; Working with Queries; Working with Content; Architecture, Deployment, and Operations; and Development and Applications.

User Working Working Architecture, Applications & Experience with Queries with Deployment & Development & Results Content Operations

This new platform has a lot to love; it is: Microsoft has done a remarkable job making this • Clean, fast, and easy to use high-end technology accessible and easy for the mainstream. However, it is not a perfect platform, • Straightforward to install, administer, and scale and there are still challenges with search. Search is, • Provides very powerful high-end search after all, a journey. features BA Insight is entirely focused on the road that lies • Makes creating search-based applications ahead for search and SharePoint 2013, and we simpler than ever stand ready to help you on your journey. As you learn more about SharePoint 2013 and search, here are some things to consider and some steps we’d suggest:

SharePoint 2013 The essential guide to enterprise search 78 Conclusion

Things to consider Suggested next steps

sharePoint 2013 includes a very powerful new search engine. get to know the new release ASAP — download the bits, read about it, and confer with folks that know it.  There are new mechanisms in SharePoint 2013 (result Try to develop a champion amongst your site administrators, sources, query rules, and result types) that replace familiar who learns the new tools. Set up a playpen system where ones, and take some getting used to. These are now in the people can get used to the new mechanisms. hands of site collection administrators and site administrators, so there is much more control at that level.

crawling and BCS have evolved further in SharePoint Take stock of your current and future content sources and 2013, including a new continuous crawl feature, however think about extending search to more content. Look at connectors are still largely left to partners. learning how to make simple connectors yourself, and at Microsoft connector partners for more complex systems.  The new search core in SharePoint 2013 is different consider how quickly you can migrate to the new platform. from either FAST or SharePoint 2010, and you will notice Factor in techniques which allow you to upgrade a step improvements in relevance, performance, and robustness. at a time, such as search-first migration and cross-version federation.  Hybrid configurations across on-prem SharePoint 2013 consider adopting O365 quickly, in ways that you don’t and O365 are supported OOB using result sources. Cross- need to do it all at once. Talk to Microsoft Partners about version configurations are not supported OOB but there are federation, cross-version configurations, and migration. techniques and partner products for these cases.

Though SharePoint 2013 Search is great, there are still look to the Microsoft partner ecosystem for training, limitations and cases where the mechanisms don’t cover what components, and innovative solutions. you wish to accomplish.  The term store is now an administrative center for entity Get familiar with the term store. Find out where there are extraction, query suggestions, faceted refinement, WCM page key lists in your organization (product names, project names, hierarchies, and more. industries, etc) — you will be able to import these into the term store and use them for entity extraction.  If you are coming from FAST, you will recognize a lot of Focus on the problem, not the specific mechanism — there’s concepts and powerful features. But you will also notice a a way to get it solved with this platform. Turn to Microsoft number of things ‘missing’. Partners for products that round out all the possibilities.

sharePoint 2013 has a new development model that consider applying JavaScript developers to building is lightweight and available to a much wider range of SharePoint Apps. Look around your organization for developers. Search in SharePoint 2013 is a powerful platform opportunities to apply search-based applications. designed to support search-based applications.

SharePoint 2013 The essential guide to enterprise search 79 BA Insight is Social! Read our blog: www.DoMoreWithSearch.com Follow us on Twitter: @bainsight Linkedin Group: Microsoft Enterprise Search Or find us on Facebook

BA Insight is a leader in agile information integration, enabling business to drive innovation by leveraging all knowledge and data across the enterprise. Offering new generation, cost effective alternative to expensive systems integration, the company‘s award-winning technology provides a scalable foundation for liberating enterprise data, both structured and unstructured. Microsoft’s go-to partner for advanced search technologies, BA Insight enables customers to leverage their investments in SharePoint, FAST and other enterprise systems, and extend them with an overlay of easy-to-assemble, highly targeted business applications. Since 2004, more than three million users around the world have relied on BA Insight for low-cost, on-demand access to the information they need.

To learn more about BA Insight, visit www.BAinsight.com.