First-hand knowledge.

Reading Sample

This sample provides an explanation of what enterprise information management really is, and what it means for any organization. You’ll also find a chapter that introduces a tool that will help you manage your information: SAP PowerDesigner, a modeling and design-time metadata management platform for information management designs.

“Introduction”

”Introducing Enterprise Information

Management”

”SAP PowerDesigner”

Contents

Index

The Authors

Brague, Dichmann, Keller, Kuppe, On Enterprise Information Management with SAP 605 Pages, 2014, $69.95/€69.95 ISBN 978-1-4932-1045-9

www.-press.com/3666 1045.book Seite 17 Montag, 25. August 2014 4:41 16

Introduction

Welcome to the second edition of Enterprise Information Management with SAP! The goal of this book continues to be to introduce readers to the concepts of Enterprise Information Management (EIM), provide examples of how SAP’s EIM solutions are used today, and offer technical instructions on performing some of the most common EIM tasks in SAP. The second edition includes updates to chap- ters on SAP Data Services, SAP HANA, SAP Information Steward, SAP Master Data Governance, SAP Information Lifecycle Management, and SAP Extended Enter- prise Content Management by OpenText, which are based on recent releases, as well as some new chapters on SAP Rapid Deployment solutions, SAP PowerDe- signer, and SAP Hana Cloud Integration.

Target Groups of the Book This book is intended for both experienced practitioners and those who are new to managing, governing, and maximizing the use of information that impacts enterprises. Specifically, it will be of use to business process experts, architects, data stewards, data owners, business process owners, analysts, and developers who are new to the topic of EIM in SAP. While there are several specific “how to build” and “how this works” sections, the book content requires no previous knowledge of EIM or SAP’s solutions for EIM.

This book is also intended for existing information management experts who need to expand their skills from a specific EIM domain to broader information management strategies. This target group won’t need to reference all chapters, but will be interested in new capability information provided in many (e.g., the latest release information and new products available).

Structure of the Book This book is divided into two parts:

̈ Part I: SAP’s Enterprise Information Management Strategy and Portfolio This part of the book starts by introducing EIM and its main concepts, including information governance and big data. After you understand the ideas behind

17 1045.book Seite 18 Montag, 25. August 2014 4:41 16 1045.book Seite 19 Montag, 25. August 2014 4:41 16

Introduction Introduction

EIM, we move on to an overview of the solutions for EIM within SAP’s portfo- standards and processes, and maps governance activities to technology enablers lio, offering brief explanations of the main EIM solutions, as well as the rapid for these standards and processes. deployment paradigm for those solutions. Finally, Part I concludes with real-life ̈ Chapter 3: Big Data with SAP HANA, Hadoop, and EIM examples of how SAP’s EIM solutions are used by several different customers. This chapter introduces Big Data in the context of SAP’s solutions for EIM. Spe- ̈ Part II: Working with SAP’s Enterprise Information Management Solutions cifically, it focuses on the role of SAP HANA and Hadoop. This part of the book focuses on how to get started using SAP’s solutions for ̈ Chapter 4: SAP’s Solutions for Enterprise Information Management EIM. Part II includes product details on topics ranging from understanding the This chapter describes SAP’s solutions for EIM, introducing and providing current state of your data, to managing unstructured content and getting overviews of specific products. After reading this chapter, you will be able to started with master data governance. This section focuses on select parts of quickly identify which chapters in Part II are of the most interest to you. SAP’s EIM offerings with the goal of providing practical examples and step-by- ̈ Chapter 5: Rapid-Deployment Solutions for Enterprise Information step instructions for key SAP capabilities. You’ll learn how to model your infor- Management mation landscape (SAP PowerDesigner), get started assessing and monitoring This chapter explains the rapid-deployment paradigm for EIM solutions with your data (SAP Information Steward), integrate both on-premise and cloud data predefined best practices, setting a foundation for the deployment of SAP EIM sources (SAP Data Services and SAP HANA Cloud Integration), use data quality solutions. transforms (SAP Data Services), turn text data into data points (SAP Data Ser- ̈ vices), govern your master data (SAP Master Data Governance), manage struc- Chapter 6: Practical Examples of EIM tured and unstructured content that impacts business processes (SAP Extended This chapter discusses specific examples of EIM application by various custom- Content Management by OpenText), and set retention rules and retire informa- ers. Content discussed includes recommendations for your EIM architecture tion (SAP Information Lifecycle Management). (written by Procter & Gamble), the evolution of SAP Data Services (written by National Vision), and tips for successful Enterprise Content Management With the division of the book into two major parts, you can read the different projects (written by Belgian Railways). In addition, there are other customer- parts as you need them. Part I is critical to understanding EIM and the role it plays written sections on data migration, managing master data, data archiving strat- in SAP’s strategy and portfolio. In Part II, you can access information and insight egy recommendations, and recommendations for positioning different SAP about the EIM capabilities that are most applicable to your projects, planning, and tools for data and process integration. information management strategy. ̈ Chapter 7: SAP PowerDesigner More specifically, the book consists of the following chapters: This chapter focuses on the discipline of enterprise information architecture, and how SAP PowerDesigner enables you to understand your current informa- ̈ Chapter 1: Introducing Enterprise Information Management tion landscape, align business information with technical implementation, and This chapter provides an introduction to the concept of EIM. It defines EIM, plan for change. discusses common use cases and business drivers for EIM, discusses the impact of big data on EIM, explains SAP’s strategy for EIM, and discusses common ̈ Chapter 8: SAP HANA Cloud Integration user roles of people and organizations that are normally involved in EIM. You’ll Chapter 8 introduces SAP HANA Cloud Integration as SAP’s solution for deliv- also get an introduction to NeedsEIM Inc., which is the fictional company used ering integration between on-premise and cloud applications. as a basis for examples throughout the book. ̈ Chapter 9: SAP Data Services ̈ Chapter 2: Introducing Information Governance Chapter 9 introduces SAP Data Services as a data foundation for EIM. It Information governance is the practice of overseeing the management of your describes the components and architecture of SAP Data Services and walks you enterprise’s information. It touches all aspects of EIM and must be considered in through specific examples of how to start doing data integration, data quality, any EIM strategy. This chapter provides tips for developing your governance and text data processing with SAP Data Services.

18 19 1045.book Seite 20 Montag, 25. August 2014 4:41 16 1045.book Seite 21 Montag, 25. August 2014 4:41 16

Introduction Introduction

̈ Chapter 10: SAP Information Steward The second edition brought back some familiar faces as well as some new, aspir- This chapter introduces SAP Information Steward, which can be used for pro- ing authors. Without exception, each brought fresh energy and commitment to filing and getting to know the current state of your data. This chapter discusses provide valuable updates and new content to the book. It was a pleasure to work cataloging your data assets, performing data profiling, and monitoring your with each and every one of them, and I feel extremely appreciative for the extra data quality over time. time many put forth to make their updates meaningful and to keep the book on ̈ Chapter 11: SAP Master Data Governance track. In addition, there were many other people that took time out of their Chapter 11 describes how to get started using SAP Master Data Governance for already busy schedules to provide a fresh perspective or critical eye to the mate- your master data governance initiatives. It includes a description of SAP-pro- rial. A special thank you to John Schitka, Ken Beutler, Marie Goodell, Connie vided master data governance processes and explains how to create custom Chan, Yingwu Gao, Bharath Ajendla, Anthony Hill, Michael Hill, and Niels Wei- governance processes. It also describes the use of SAP Business Workflow and gel—your willingness to contribute and provide feedback was truly appreciated. BRFplus for governing master data. Finally, the chapter gives an example of Finally, I would like to acknowledge my manager, Subha Ramachandran, for sup- using SAP Information Steward in conjunction with SAP Master Data Gover- porting this project as a priority for me and others in the organization. nance for monitoring and remediating master data. All of the royalties from this book will continue to be donated to Doctors Without ̈ Chapter 12: SAP Information Lifecycle Management Borders (Médecins Sans Frontières). Your purchase of this book helps us support an Chapter 12 provides background information on the concept of information international medical humanitarian organization that delivers emergency aid in lifecycle management. It then specifically introduces SAP Information Lifecycle many countries. Thank you for enabling us to provide financial support to this Management, offering discussions of retention management, system decom- important organization and its critical mission. missioning, and how SAP Information Lifecycle Management works to support I hope the book becomes a valuable resource to you and your understanding of the lifecycle of information. Enterprise Information Management with SAP. Enjoy! ̈ Chapter 13: SAP Extended Enterprise Content Management by OpenText Chapter 13 discusses the major features of SAP Extended Enterprise Content Corrie Brague Management by OpenText, how it uses SAP ArchiveLink, and how it works Enterprise Information Management Product Management with the SAP Business Suite. SAP Labs, LLC ̈ Online Appendices There are several appendices to assist you: Appendix A covers advanced data quality capabilities, Appendix B provides details on SAP’s migration content, and Appendix C provides tips for your first data archiving projects. The appen- dices and an example spreadsheet for monitoring your data migration projects can be downloaded from the book’s website at http://www.sap-press.com/3666.

Acknowledgments This second edition would not have been possible without the incredible efforts of a diverse set of authors that contributed to the first edition of this book, guided to success by the spirited leadership of Ginger Gatling. They laid down a solid foundation to build upon.

20 21 1045.book Seite 25 Montag, 25. August 2014 4:41 16

This chapter introduces Enterprise Information Management, including common use cases and big data. It also provides an overview of SAP’s strategy for Enterprise Information Management.

1 Introducing Enterprise Information Management

Cloud, big data, and social media are powering new opportunities for companies that can leverage information-driven insights in real time to respond to customer preferences, identify operational efficiencies, and in some cases, create completely new business models. To achieve transformative business results, best-run busi- nesses treat information as a corporate asset. It’s carefully managed, thoughtfully governed, strategically used, and sensibly controlled.

Effective management of enterprise information can help your organization run faster. As a result, you can achieve new business outcomes: understanding and retaining your customers, getting the most from your suppliers, ensuring compli- ance without increasing your risk, and providing internal transparency to drive operational and strategic decisions.

SAP helps businesses run better and more simply by enabling IT to more easily manage and optimize enterprise information. SAP solutions for Enterprise Infor- mation Management (EIM) provide the critical capabilities to architect, integrate, improve, manage, associate, and archive all information. This chapter introduces EIM and explains what it is, why it’s important to organizations, how it fits into SAP’s strategy, and some typical user roles. Finally, the chapter concludes by introducing NeedsEIM Inc., a fictional company that we’ll use throughout the book to illustrate EIM principles.

1.1 Defining Enterprise Information Management

On Gartner’s IT glossary page, Enterprise Information Management is defined as “an integrative discipline for structuring, describing and governing information

25 1045.book Seite 26 Montag, 25. August 2014 4:41 16 1045.book Seite 27 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management Defining Enterprise Information Management 1.1

assets across organizational and technological boundaries to improve efficiency, As illustrated in Figure 1.1, there’s an associated cost in bringing information into promote transparency, and enable business insight.”1 an organization, using the information, and hopefully retiring the information after it’s no longer producing value. The idea that organizations really just do EIM involves a strategic and governed execution of the following disciplines: enter- three things with information—on-board, actively use, and then off-board—is prise architecture, data integration, data quality, master data management, content powerful when thinking about EIM solutions. management, and lifecycle management. It addresses the management of all types of information, including traditional structured data, semi-structured and unstruc- After information is brought into your organization, it’s required for many uses tured data, and content such as documents, emails, audio, video, and so forth. beyond its original purpose. Hence, it’s advantageous to prepare the information for these manifold uses. That way, the effort to repurpose information during the To optimize the use and cost of managing information, we must first understand its active-use phase is greatly reduced. When the information is no longer required, it lifecycle. The active management and governance of information helps in avoiding should be off-boarded or retired in a manner that meets your organization’s legal the costs that are associated with blind information hoarding. The risk of having and business requirements. The truth is that most organizations don’t proactively too much information is just as real as not having enough when you need it. consider the reuse and eventual off-boarding of information, which ends up cost- Figure 1.1 shows a typical spend on information over time. This is a technology ing millions in IT resources due to maintaining systems that are no longer used. and resources spend curve. What may be surprising for most organizations is the If you adopt an information strategy, the spend changes to what is shown in increase in spend during the off-boarding phase. Many companies spend a lot of Figure 1.2. money maintaining information that is out of control. Is the information still used? In what systems? Can you decommission those systems? Are you managing pieces of information that are no longer used? On-boarding Active use Off-boarding  Creation  Preparedness  Archival  Migration  Migration  Deletion  Import  Import  Decommissioning On-boarding Active use Off-boarding  Retention Policy Planning Spend Spend

Time

Figure 1.2 Spend on Information with an Enterprise Information Management Strategy

Time Figure 1.2 also provides detailed examples of the types of activities involved Figure 1.1 Typical Spend on Information Over Time with EIM across the typical lifecycle of information. In the on-boarding phase, activities include the creation of information through online user creation, 1Source: http://www.gartner.com/it-glossary/enterprise-information-management-eim/

26 27 1045.book Seite 28 Montag, 25. August 2014 4:41 16 1045.book Seite 29 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management Defining Enterprise Information Management 1.1

integration of processes that involves the creation of new information, import NeedsEIM, Inc. of information, and migration of information. Additionally, the on-boarding Manufactures retail durable goods phase should include lifecycle planning (e.g., how long the information should be retained). Implementing governance and retention policies as the informa- tion is on-boarded dramatically lowers the cost of information over its effec- ! Finance Contracts tive lifetime. Notice that there is still some spend increase as information is ! actively used. This is from incrementally improving, enriching, and preparing Outsourced Manufacturing information for alternative uses. The key to bending the cost curve down is Highly diverse and complex supplier network understanding that information has tremendous value beyond its original pur- pose and proactively planning for that in your EIM strategy. The result is that the spend curve goes down over time in the active-use phase as information is simply repurposed. Again, this can be achieved because the incremental cost is ! Procurement just the provisioning of existing known and trusted information—as opposed ! IT to starting over for each new information initiative.

Next, we’ll look at an example of information flow through a company and then ! Engineering discuss how this relates to information management. ! Sales

Figure 1.3 NeedsEIM Inc. 1.1.1 Example of Information Flow through a Company NeedsEIM Inc. is a fictional company that’s based on real customer examples. As an example of information flow through NeedsEIM Inc., let’s look at the pro- We’ll explain NeedsEIM Inc. in detail in Section 1.7 and again throughout Part II cess of contract negotiations with a supplier: of the book when we describe how to use various EIM capabilities. For now, we 1. The proposed supplier must be researched for due diligence, including type of want to introduce NeedsEIM and the types of information it must deal with, products or services provided, similar customers serviced, reference calls with including how information flows through the company. This leads to a discussion current customers, quality history, financial and credit ratings, reliability and about the types of information included in EIM. trustworthiness, and general reputation. Figure 1.3 depicts the business processes of NeedsEIM. It manufactures retail ̈ This involves emails, online research, and getting information from external durable goods, and the majority of its manufacturing is outsourced. This business sources such as Dun & Bradstreet. model results in a complex and diverse supplier network that impacts most ̈ This information is shared among the finance, engineering, procurement, departments. The major departments include finance, which must deal with sup- and contracts departments. plier payments, and the engineering and contracts department, which must coor- 2. Assuming the due diligence indicates that the supplier is approved, the supplier dinate contracts and technical spec drawings with the manufacturers. master data needs to be created and distributed to related systems. The scope, The IT department must deal with diverse systems, including SAP and non-SAP projects, pricing, contracts, and legal documents must be created. systems. The procurement department is responsible for the supplier relation- ̈ This involves most departments and includes sales if the durable goods price ships and ensuring the company gets the most from its suppliers. The sales depart- point might be impacted. ment is always looking for new and creative sales channels, including opportuni- ̈ The supplier sends and receives legal, technical, financial, and other infor- ties in the supplier population. mation.

28 29 1045.book Seite 30 Montag, 25. August 2014 4:41 16 1045.book Seite 31 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management Defining Enterprise Information Management 1.1

3. After the contracts are negotiated, the supplier requires ongoing communica- As you can see in Figure 1.4, the reality is that information is often required by tion, including technical drawings, bills of materials, and other information many departments. Sometimes when the information doesn’t move from one required to do the work. In addition, financial documents such as invoices, pur- department to another due to application, political, and/or departmental silos, chase orders, and so on are exchanged. departments create their own “tribal” versions of the information, and each ̈ This includes a lot of collaboration among engineering, contracts, procure- department has a different sense of its ownership of the information. (We’ll talk ment, and the supplier. more about tribal information in Section 1.3.2.)

Figure 1.4 shows the information as it needs to flow through each department. Earlier, we mentioned several kinds of information needed for negotiations with Departments use the information with their perspective in mind: They store it, a supplier. This includes detailed information on the supplier, external references update it, download it, and ensure that it meets the requirements for their depart- for the supplier, pricing and detailed contract information, engineering docu- ment’s role with the supplier. ments of what the supplier will provide or build for NeedsEIM Inc., as well as bill- ing, invoicing, and all the typical supplier interactions. The next section will break this down further into types of information that are required and how this infor- NeedsEIM, Inc. Manufactures retail durable goods mation is included in EIM.

1.1.2 Types of Information Included in Enterprise Information Management Figure 1.5 shows the types of information that are included in SAP solutions for EIM that will be covered in this book.

Finance Contracts ! !

Information Governance Create Retire IT ! Procurement ! “The car should self- drive on the HTTP highway”

Figure 1.5 Types of Information Included in Enterprise Information Management Engineering ! Sales ! These information types are relevant for most companies, including NeedsEIM Figure 1.4 Example Information Flow for NeedsEIM Inc. Inc. The following provides more information about these types:

30 31 1045.book Seite 32 Montag, 25. August 2014 4:41 16 1045.book Seite 33 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management Common Use Cases for EIM 1.2

1 Structured data manufacturing process, feedback from internal departments, and comments on This includes the familiar data that’s used within an application system (e.g., surveys and service tickets. customers, products, and sales orders); for example, supplier information such As you can see, EIM includes the support of traditional structured data and as name, address, credit information, contact information, and so on. This also unstructured information, from the moment of creation through retirement. The includes all purchase orders, sales orders, and other data that’s related to this retirement of data and information has the same value as creation. After informa- supplier. tion is no longer needed, it becomes a liability—a legal liability, a cost liability, or 2 Desktop documents some other kind of liability. The entire life span of the data and information, and These include Microsoft Word, Microsoft Excel, Adobe Acrobat, and other the governance of that information, is covered in EIM. desktop application documents. This data is stored across the enterprise on shared drives and laptops, which means that much of it isn’t controlled at an enterprise level. This content may be critical to the application data, so you 1.2 Common Use Cases for EIM need to manage it with the same importance as the structured data in the data- base. Examples include purchasing documents (e.g., invoices), contracts with There are many use cases for EIM solutions. Three of the primary scenarios include suppliers, legal documents, résumés, and HR documents, to name a few. the support of operational, analytical, and information governance initiatives. 3 Pictures, scanned documents, videos, and other images These could be scanned invoices, videos, pictures of products that are sold in a 1.2.1 EIM for Operational Initiatives catalog, and drawings of products that are being designed and built. These This scenario covers the use of EIM in the operation and execution of business become part of the content that needs to be managed and related to the struc- processes and tasks that happen throughout the day. It has very broad applica- tured data when required. Managing content that’s associated with a core busi- tions, from ensuring that material replenishment data is set correctly, to customer ness process is becoming increasingly important to process efficiency and reg- data quality management, to migrating new data from a merger, to ensuring that ulatory compliance. Examples of such content include engineering documents all contracts and documents are available for the business process, to removing that are to be shared with suppliers, pictures of raw materials, routine mainte- data that is no longer required. nance records in asset management, invoices, and expense report receipts. 4 Semi-structured data SAP solutions for EIM provide trusted data to drive and deliver best practice busi- This is information such as RSS feeds, blogs and posts, emails associated with ness processes. This value includes the ability to holistically manage data within purchasing documents, and other semi-structured information that’s important business processes, ensuring the quality and ability to reuse the data. to the enterprise. Here are a few examples of operational uses of EIM: 5 Text data ̈ Cloud integration In Figure 1.5, the piece of information that reads “The car should self-drive on As more business applications are running in the cloud, organizations need a the highway” may come from a survey or be a comment on a social media or way to integrate business processes and data between on-premise and cloud other website and, by itself, might not be important. However, if you’re look- systems. ing at car design over the next five years, and 60% of the comments you receive ̈ include something about self-driving, this comment warrants further investiga- Data migration due to mergers, acquisitions, and global implementations tion. Information management includes looking into text you receive and anal- across all industries ysis to determine sentiment, feedback, input, or actions that should be taken Information management lowers the risk of business and application disrup- based on comments. Examples of text data include comments from supplier’s tion during mergers, acquisitions, and new application implementations.

32 33 1045.book Seite 34 Montag, 25. August 2014 4:41 16 1045.book Seite 35 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management Common Use Cases for EIM 1.2

̈ Harmonized master data across line of businesses quality and assessment is an ongoing business process; it includes, for example, Harmonized master data across disparate applications enables a single view of tracking articles that have not been maintained in required stores, articles miss- master data across the enterprise. ing valid sales price conditions, articles missing required procurement data, ̈ Compliance and regulations in the financial industry and articles with duplicate EAN codes. The financial industry has requirements for financial risk-related data analysis. Notice that many of these examples are focused on ensuring that information is All data must meet quality levels and industry standards, and all associated con- managed, is available, is reliable, and serves the operational business process; the tent (e.g., documents and invoices) must be correctly associated to financial list can go on and on. contracts. Chapter 6 provides more detailed real-world and practical scenarios for EIM. ̈ Suspect tracking in public safety organizations Federal, local, and state agencies must share information on criminal activity and suspect tracking. Information management ensures that each new suspect is 1.2.2 EIM for Analytical Use Cases compared to others to confirm that it’s a unique suspect. Data quality rules can EIM has a long history in business intelligence (BI) and analytics. If you look at ensure that the most up-to-date information is available for suspect tracking. some definitions of EIM online, you’ll see statements saying that EIM drives ̈ Retaining and deleting information in the pharmaceutical industry decision-making analytics. Many of the operational use cases mentioned previ- During the development of new medicines, all documents and government ously also fit into operational reporting and have some reuse for strategic report- standards must be adhered to through various stages of research, development, ing and analytics as well. Some examples of EIM for BI and analytics include the trial, and release. When the compliance period has ended, information should following: be removed unless it’s required for a legal hold. ̈ Big data analysis ̈ Fraud detection in telecommunication and other industries To unlock the potential of big data sources, EIM provides the capabilities to Telecommunications, media, high tech, and utilities share similar requirements access and understand data from any source and variety, including Hadoop, for capturing, addressing, and mitigating fraudulent activity. Large volumes of and integrates it with existing data for better analysis of customer sentiment, data and real-time transactions place these industries at increased risk, as per- fraud detection, new innovation opportunities, and competitive insights. petrators can be “on and gone” before they are caught using traditional time- ̈ Analysis of supplier spend consuming software reporting methods provided by vendors today. Informa- Analysis of who are the top suppliers, how much they spend, and payment and tion management enables the filtering of diverse data to determine where the credit issues can only be done if supplier records are transparent and harmo- company is losing money across a broad spectrum of applications and business nized, cleansed, and de-duplicated. When making decisions that are related to processes. the supply network, the supplier data must be accurate and trusted. ̈ Plant maintenance compliance and data assessment ̈ True cost assessment of manufacturing goods in the manufacturing industry Ensuring that the virtual plant aligns with the physical plant, information man- Analyze total costs for making and delivering products. Crossing multiple busi- agement ensures that maintenance plans and documents are associated with ness domains, data must be cleansed, duplicates removed, and correlations cre- each asset, asset tags are accurate, functional location information is complete, ated to ensure that analysis provides accurate information. and all asset document and maintenance guides are available on the plant ̈ floor. Bring together timely, accurate, and actionable data to provide insights into the factors impacting sales and customer behavior ̈ Data quality and data assessment in the retail industry Silos of data sources and applications, limited business user access, and depen- The retail industry requires high data quality; for instance, retailers must know article data throughout all stores where the articles are sold. For retailers, data dence on IT to create reports limits the ability of a business to gain insights on

34 35 1045.book Seite 36 Montag, 25. August 2014 4:41 16 1045.book Seite 37 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management Common Drivers for EIM 1.3

sales and customer behavior. Information management brings together the data Common business problems that require an EIM strategy often may not have the and provides data lineage and analysis so the users can create reports and know words information, enterprise, management, data, or governance in them. The busi- where the data is coming from. ness issues driving initiatives for EIM include (but are not limited to) trucks going ̈ Text mining to understand opinion and sentiment out at the wrong weight, deliveries to the wrong location, hazardous products not Text and rich media content that’s accessible on the web or on social media in compliance with government standards, customer satisfaction issues, incorrect sites contain a lot of information that can be analyzed and used for sentiment billing, misunderstood supplier networks, services that don’t align with customer analysis to get a better understanding of consumer opinions about a product or demand, lack of compliance with a government mandate that impacts payments idea. or revenue, and so on. Many process issues are the result of a lack of an informa- tion management strategy—from poor-quality data to master data not being updated correctly, to not having the documents required for order processing, to 1.2.3 EIM for Information Governance financial documents not aligned with sales documents, to different parts of the A primary use case for EIM is the management and governance of information organization using similar terms in different ways. as a strategic asset, usually referred to as information governance. Information Adoption of EIM capabilities is usually driven by a few fundamental needs— governance is a discipline that oversees the management of your enterprise’s responding to a growing set of compliance requirements, improved operational information. Without it, there is no EIM. Information governance involves peo- efficiency, and the strategic application of information to better manage your ple, processes, policies, and technologies in support of managing information organization and gain competitive advantage. across the organization. It’s advisable to have some degree of information gov- ernance in place for any EIM use case, analytical or operational, as this provides Next, we discuss specific examples of issues as drivers of EIM adoption. a framework for the enterprise to reuse policies, standards, and organizational best practices. 1.3.1 Operational Efficiency as a Driver of EIM Information governance is the linchpin of EIM that empowers business users to Operational efficiency includes many moving parts to ensure the company has an own and manage data as a strategic asset, governs data in the business process to improved operational margin. From the EIM perspective, operational efficiency optimize operational performance and ensure compliance, and establishes trust in includes the provisioning and preparation of data so that it can be used to keep structured and unstructured information by ensuring data quality throughout its the business running well. The following subsections describe typical operational lifecycle. efficiency scenarios and the role of EIM. Information governance will be a common thread throughout the book and will be covered in more detail in Chapter 2. Improving Payment Processing The time that’s taken to collect payments and the improvement of payment pro- cessing is critical in all industries, and is heavily impacted by the quality of the data. 1.3 Common Drivers for EIM One example is the healthcare industry, in which it’s critical to ensure that hospitals Information can be a strategic weapon if an organization manages enterprise collect what they should from government agencies such as Medicare and Medicaid assets such as capital. Treating information as an organizational asset recognizes in the United States. Effectively provisioning data from disparate systems ensures that it moves from a single-purpose use to something that must be managed for data compliance with U.S. laws for Medicare and Medicaid and enables hospitals to multiple uses. receive their payments, having an impact in the millions of dollars.

36 37 1045.book Seite 38 Montag, 25. August 2014 4:41 16 1045.book Seite 39 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management Common Drivers for EIM 1.3

Ensuring a Successful SAP ERP Go-Live 1.3.2 Information as an Organizational Asset An SAP customer was implementing a new SAP system and had to migrate data All organizations have assets—capital, employees, materials, brands, and physical from many non-SAP data sources. The customer was concerned about the large and intellectual property—that are all managed carefully. Information is similar, volumes of data to migrate from both the parent company and a variety of sub- as it, too, is an asset that must be managed and protected. With the right EIM sidiaries. It was critical that the entire business not be on hold during the migra- strategy, information can be leveraged and used as an organizational asset. We’ll tion, and the data from the migration had to be loaded accurately and safely. The now discuss some specific examples of how actual companies use information as requirement from the customer was a single, integrated application providing a an organizational asset. high degree of visibility that could be easily rolled out to multiple subsidiaries, eliminating hours of custom coding to load data into the SAP system. SAP’s EIM Improving Patient Care and Payer Response solutions were used to extract data from third-party applications and support a smooth transfer to a new environment. This automated approach saved valuable An SAP customer is a large hospital conglomerate focused on first-class patient care resources and expedited data migration processes, resulting in a smooth—and on- and creating innovative ways to improve care. First-class patient care requires the time—go-live, reducing the overall cost of the implementation. management of information in large volumes and with daunting complexity. EIM was used to extract, transform, integrate, cleanse, load, and correlate patient records from many diverse systems for analysis by doctors and line managers. The Consolidating Systems to Improve Information Management cleansed and aligned data enabled line managers to improve operational efficiency and Reduce IT Spend (including aligning information across multiple hospitals). The project extended An SAP customer ran 80% of its business with several SAP systems and wanted the use of information such that doctors now have the ability to “slice and dice” to reduce IT costs and improve transparency of information across the systems. information as needed on patient groups and to provide recommended treatments The company had 8 SAP systems when only 3 were needed and more than 400 and wellness programs based on trends, including re-admittance trends, long-term non-SAP systems, most of which could be retired. EIM’s role in this included performance of different treatments, and so on. The other focus of the project was the assessment, alignment, migration, and retirement of data and legacy sys- to ensure a high quality level of data provided to and by patients. The improved tems and ensuring that the 3 remaining SAP systems had accurate and timely data quality improved patient service, which led to improved payments by payers, information. resulting in the collection of several million outstanding dollars.

Speaking the Same Language to Increase Operational Efficiency Growing Past “Tribal Knowledge” to “Enterprise Information” Another SAP customer had issues where no one spoke the same language. For A large SAP customer had a wealth of information that was vitally needed across example, the term margin covered different realities depending on the depart- departmental lines, but the information—documents, spreadsheets, manuals— ment and employees concerned. To set things right, the company specified four was locked up in information silos. Shared—or nonshared—hard drives, separate objectives for itself: to centralize its data in a common environment; to secure the portals, and multiple content repositories held the data, with no central search or data; to make the data more reliable, especially for management access; and to access capability. This resulted in “tribal knowledge;” the different departments standardize its vocabulary for indicators. EIM accelerates employee access to could usually find the information that their employees created, but this informa- information and, as a result, saves significantly on the amount of time required to tion wasn’t effectively shared with other departments. By implementing a strate- perform routine tasks. Teams made enormous gains in responsiveness. Where it gic Enterprise Content Management (ECM) and global search capability, the cus- previously took one week for data to be available after accounts were closed, the tomer was able to create a single enterprise information store that all employees operation is now instantaneous. could search and use, regardless of department.

38 39 1045.book Seite 40 Montag, 25. August 2014 4:41 16 1045.book Seite 41 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management Impact of Big Data on EIM 1.4

Improving Data Quality for Customer Interactions Building According to Specs Another SAP customer had a goal to create a 360-degree view of customer data for Remember the Mars probe? Launched in 1999, the Mars Climate Orbiter was sales, marketing, and service. EIM was used to consolidate heterogeneous data designed to gather amazing data to help scientists better understand the universe. into a single database; integrate structures and processes across sales, marketing, However, none of that amazing data was gathered from the $125 million venture and service; and ensure a systematic information exchange between field sales because groups working across the globe failed to operate under similar units of and sales support. Improved customer data quality strengthened dialogue with measure. Specifically, the American units of measurement used in construction these customers and systematized customer-related processes across sales, mar- had to be converted to metric units for operation. Core information management keting, and service. The data quality improvement and improved transparency principles could have helped alleviate this risk by documenting the data defini- drove a new structured quotation process. The new process provides time savings tions and outlining use of that data throughout the data’s lifecycle. and fewer errors when creating quotations.

Maintaining Industry Standards for Data 1.3.3 Compliance as a Driver of EIM Some external standards apply to entire industries. For example, global standards As governmental regulations and controls increase, and the cost of legal issues due (GS1 standards) aim to help companies exchange information in the same format, to data issues rises, compliance plays a key role in most industries. Every company thereby increasing the efficiency and visibility of supply and demand chains glo- and its network of suppliers that produce a durable good that ends up in a shop- bally.2 To participate, however, you not only need to understand the relevant GS1 ping cart have compliance requirements. Other organizations, such as utilities, standard, but you must also fully understand your data, the data model, and cur- government agencies, security, and financial service providers, are also subject to rent data quality levels. Without this baseline understanding, your use of GS1 regulatory and compliance issues. In addition to industries, countries have import would be flawed at best, and you would miss golden optimization opportunities. and export regulations that impact the ability to do business globally.

In the following subsections, we discuss some general examples of compliance issues that indicate a need for an information management strategy. 1.4 Impact of Big Data on EIM

It’s well documented that the volume of data created in organizations is large and Keeping Data Too Long growing at an unprecedented velocity. Organizational datastores are now com- For regulatory compliance, companies must ensure that they keep retention-rel- monly measured in terabytes or even petabytes. There are many reasons for the evant data for a minimum period of time, as defined by retention laws. They must unparalleled growth in datastores: social media, compliance and regulatory also ensure that certain data is purged from the system. For example, data privacy requirements, transactional data, sensory data (such as data from real-time shop laws mandate the destruction of person-related data after a specified period of floor sensors), multimedia content, mobile devices, RFID-enabled devices, the time. In Germany, companies must delete data from rejected job applicants in “internet of things” (connected devices), the never-ending quest to improve orga- their HR systems not earlier than 6 months, but not later than 12 months, after nizational effectiveness, and the list goes on. The fact is that data creation has the applicant was rejected. Failing to comply with these regulations may result in become a by-product of nearly all individual and organizational activities. large fines for companies. Another example is a pharmaceutical company that Moreover, the reason data is preserved and reused is that it has value well beyond must keep information related to a new clinical trial for a number of years. After its original use. We dare to say that the value of the data created to automate busi- that time has passed, the information should be deleted. Not deleting sensitive ness processes may in some cases be greater than the process itself. Today, the information after the required retention period increases the risk from potential lawsuits. 2Source: http://www.GS1.org/about/overview

40 41 1045.book Seite 42 Montag, 25. August 2014 4:41 16 1045.book Seite 43 Montag, 25. August 2014 4:41 16

1 Introducing Enterprise Information Management SAP’s Strategy for EIM 1.5

market has christened the phenomena of organizations’ desire to harness the treatment. The human genome contains 6 billion DNA base pairs; as the genome great torrent of data, as well as the velocity, variety, and variability of information sequence for each patient will be decrypted in the near future, these billions of known as big data. Figure 1.6 is a representation of the volume, velocity, variety, data points must be managed. Add to that documentation and features such as and variability of data. It remains to be seen if the term big data will stick. How- speech recognition, and you’ll end up with 20 terabytes per patient. ever, as long as organizations can create value through data, the continued growth The velocity of data collection is building daily, and you must manage and make and importance of data will be immutable. Fortunately, advancements in compu- sense of your data on the fly. You need to remain flexible through instability and tational power, storage capacity, information access and management, and analyt- change. You can’t underestimate the pace of innovation, and you don’t want to be ics are progressing at an equally impressive rate. Two such advancements are playing catch-up with your competitors. If planning and implementing a coherent Hadoop and SAP HANA (to be discussed further in Chapter 3). The combination data management strategy seems daunting when your organization owns a few of massively greater amounts of data with the tools and talent to analyze it prom- terabytes of data, how difficult will it be when you own thousands of terabytes? ises to launch the next wave of innovation and productivity and even spawn new business models. The best way to realize the promise of big data, today and in the future, is to develop and adopt an EIM strategy. This strategy should cover your entire enter- prise to take advantage of the benefits of sharing information and aggregating Mobile Inventory data across your organization. Typical topics that must be considered for an CRM Data effective EIM strategy include interoperable data models, architectures for ana- lytical and transactional data, integration architecture, analytical architecture, and information security and compliance. The goal is to have data that is share- GPS able and can be leveraged over time within and across business units. The

Planning Emails Demand Tweets deployment of SAP solutions for EIM within a defined EIM strategy is a key start-

Instant Messages ing point. The alternative is to have massive amounts of disintegrated and unre- liable data analyzed fast. Speed “Garbage in, garbage out” is one of the oldest adages in information processing; when the volume of data reaches the big data stage, getting productive use of Velocity poorly managed information becomes the equivalent of searching for a priceless Opportunities Customer antique in a landfill.

Things 1.5 SAP’s Strategy for EIM Service Calls Service

SAP recognizes the importance of maximizing the value of enterprise information Sales Orders Transactions in support of any data-driven analytical, operational, or governance initiatives. To achieve this, organizations need a comprehensive suite of solutions providing the capabilities from architect to archive. Figure 1.7 shows SAP solutions for EIM. Figure 1.6 Information Growing in Volume, Velocity, Variety, and Variability SAP solutions for EIM are comprehensive in functionality, including capabilities to One example of new innovation provided by the ability to manage and analyze support , data integration, data quality, master data manage- big data is in the healthcare industry, specifically related to the area of cancer ment, enterprise content management, and information lifecycle management.

42 43 1045.book Seite 269 Montag, 25. August 2014 4:41 16

This chapter introduces SAP PowerDesigner as a modeling and design-time metadata management platform for information management designs.

7SAP PowerDesigner

All enterprises today are or will be faced with a transformative event, such as reg- ulation changes, merger and acquisition activity, or enablement of new business models from new technologies (e.g., cloud and in-memory). You need to be able to treat information as a corporate asset to succeed with such business transfor- mation. This chapter focuses on the discipline of enterprise information architec- ture (EIA) as part of SAP Enterprise Information Management (EIM), and how tools such as SAP PowerDesigner, a modeling and design-time metadata manage- ment platform, enable you to understand your current information landscape, align business information with technical implementation, and plan for change.

Architecture is about planning for, designing, and executing change. SAP PowerDe- signer (hereafter PowerDesigner)’s value is best realized when we use the current state information models, captured and documented in the tool, to help us plan the next generation business. Transformation needs a plan, and designing future state versions of data models, aligned to the current conceptual data model (CDM) and business glossary, ensures we make a united step forward in any step along the way.

Adding technical details in logical data models (LDMs) and physical data models (PDMs), together with specialized analytics models, ensures that we can commu- nicate details to the responsible database development teams. PowerDesigner’s unique Link and Sync technology streamlines impact analysis and design-time change management, reducing the time, cost, and risk associated with change.

In this chapter, we’ll explore enterprise information architecture, including the different model types, the core components of each, and how they work together to make a complete view of information for designers. This chapter will also cover how the repository helps with tasks such as managing model-to-model dependencies and impact analysis. You’ll learn the value that architecting, or

269 1045.book Seite 270 Montag, 25. August 2014 4:41 16 1045.book Seite 271 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner Defining and Describing Business Information with the Enterprise Glossary 7. 2

planning, provides to all organizations that are faced with managing complex known metadata, both operational and architectural, to be visible to the data change in information systems. steward as he manages the quality of information sources in operation.

PowerDesigner’s dimensional diagram can create SAP BusinessObjects universes. PowerDesigner can read a universe and create a new, or merge with an existing, 7.1 SAP PowerDesigner in the SAP Landscape dimensional diagram.

PowerDesigner provides architecture and modeling capabilities to all organiza- PowerDesigner can reverse engineer Replication Server’s catalog to create or tions and is uniquely integrated into many SAP products. PowerDesigner is inte- merge with an existing data movement model. This data movement model can grated with SAP Business Suite and the SAP HANA Cloud Platform (HCP). Within generate new replication definitions. Special patterns exist to streamline use cases the EIM landscape, PowerDesigner is integrated with SAP Information Steward of replication and SAP Data Services (Data Services) together to implement real- (hereafter Information Steward), SAP BusinessObjects, and SAP Replication time loading and other scenarios. Server (hereafter Replication Server). PowerDesigner is also a key element of Intelligent Business Operations powered by SAP. 7.2 Defining and Describing Business Information with 7.1.1 SAP Business Suite the Enterprise Glossary PowerDesigner can connect to the SAP Business Suite and create a PDM repre- An enterprise glossary helps everyone define and describe information assets and senting the data dictionary by reading the business and technical metadata from related technology. It lists business terms in business language, independent of SAP Business Suite. This is very useful when looking at SAP Business Suite as the any data characteristics. One term can relate to multiple data items (atomic data standard definition for any homemade applications built around common data elements), and a data item can have multiple terms associated with it. sets, or for when preparing for an enterprise data warehouse and extracting data from SAP Business Suite to populate the warehouse as one of the key sources. Example NeedsEIM Inc. defines its information model to have a customer entity that can have a 7.1.2 SAP HANA Cloud Platform customer address attribute, which is combining the terms customer and address together SAP HANA has a repository that’s used for the development and implementation to make up its name. of data structures that is optimized for helping developers get the most out of SAP HANA’s unique in-memory capability. PowerDesigner can write to the SAP In PowerDesigner, the enterprise glossary is a global service provided by the HANA repository or read from it. Reading the SAP HANA repository creates or repository that is available to all users. It contains all terms, synonyms, and updates a PDM in PowerDesigner. PowerDesigner can also take a PDM that related terms, grouped by nested term categories. A glossary term identifies the includes SAP HANA-specific attribute and analytic views and create new, or term (Name) and provides a standard abbreviation for the term (Code) and a def- merge with existing, repository objects. inition (Description). The glossary term will be created within a category folder (Category) and may also be further defined in an external system and referenced 7.1.3 SAP Information Steward, SAP BusinessObjects Universes, via a URL (Reference URL). As you can see in Figure 7.1 in the next subsection, the and Replication business term “commission” is defined, and every time the word commission appears in the design (such as a table or column name), the standard abbreviation PowerDesigner’s repository is read by Information Steward, enabling people to read metadata from PowerDesigner’s PDMs, LDMs, and CDMs. This allows all the

270 271 1045.book Seite 272 Montag, 25. August 2014 4:41 16 1045.book Seite 273 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner The Conceptual Data Model 7. 3

of “CMSN” will be used in the name. You can also see that this term is approved 7.2.2 Naming Standards Definitions Status in the box, so you know it’s the right definition for this term. PowerDesigner can be configured to use the glossary to ensure all names used PowerDesigner’s glossary is meant to be a direct reflection of the business glos- throughout a model are found within the list of terms. To configure PowerDe- sary in Information Steward. Information Steward is used to capture, define, and signer to use the enterprise glossary, follow these steps: manage the glossary terms and relate them to the metadata of operational sys- 1. Select Tools Model Options, and then select Naming Convention. tems, while in PowerDesigner, the same terms can be imported and then used to 2. Check Enable glossary for autocompletion and compliance checking. standardize names for all new information assets that are defined in any model. 3. Select the Name to Code tab, and set Conversion Table to glossary terms.

7.2.1 Glossary Terms for Naming Standards Enforcement You can combine multiple terms into one name (e.g., “Customer Address” using terms “Customer” and “Address”). Using a common business language ensures that when users collaborate across business units, or outside the company, they’re all using the same concepts in the You can also enable automatic conversions of names to implementation concept same way. This is a critical part of establishing enterprise information architecture Code values. In PowerDesigner, the Name field is the business language descrip- and a key component of any data dictionary. The enterprise glossary (see Figure tor, while the Code field represents the name used for the object when converted 7.1) can be used to manage naming standards for all design models in PowerDe- into any sort of implementation code (e.g., when used in a CREATE TABLE state- signer. The Name field is used for name lookup, and any name that matches a term ment). is linked to that term. If there are any aliases associated, when you begin to type the alias, PowerDesigner detects the use of an alias and indicates that there is a preferred term to use in lieu of the alias. This helps establish the enterprise use of 7.3 The Conceptual Data Model the preferred term and further increases understandability and readability of all models as everyone will be using the standard terms. PowerDesigner supports the definition of a CDM. For an organization to treat information as a corporate asset, all information sources should be derived from a common definition, or a core concept. A CDM is meant to model a single defi- nition of any data asset, independent of both the storage paradigm (relational, hierarchical) and the physical characteristics of the systems that will ultimately store them.

The enterprise CDM also represents the sum of all use cases for a given data con- cept. Any entity defined in the enterprise CDM will have all the attributes needed for all processes or all applications. For example, the enterprise CDM entity for customer will have all attributes together, whether used for order, relationship, support management, and more; while LDMs and PDMs that represent the indi- vidual systems will have their own subset of these attributes. This will help ensure that any attributes that are shared between implementations follow a common standard and will reduce the impedance mismatches found when you later need to integrate these data sets together.

Figure 7.1 A Glossary Term in SAP PowerDesigner

272 273 1045.book Seite 274 Montag, 25. August 2014 4:41 16 1045.book Seite 275 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner The Conceptual Data Model 7. 3

Let’s review the core components of an enterprise CDM by looking at elements, attributes using that data item). For example, a domain called Name can define the attributes, data items, and domains in the following sections. data type, length, and other common characteristics of any name type of data item in the model. Anything using “Name” (e.g., Product Name, Customer Name, or Company Name) that is also using the Name domain will share this common char- 7.3.1 Conceptual Data Elements, Attributes, and Data Items acteristic. The key difference between a domain and a data item in PowerDesigner PowerDesigner manages enterprise CDM concepts such as entities, attributes, is that the data item is a direct representation of an attribute on one or more enti- data items, and domains. These four concepts make up the core of the CDM, and ties and carries a name representing a cell of information, while the domain is a we’ll discuss them in more detail in the following subsections. common set of data characteristics used by one or more data items and doesn’t represent a cell of information itself, just its common structural characteristics. Entities Entities are structured elements that define a core business concept that you need to 7.3.2 Separation of Domains, Data Items, and Entity Attributes keep account of, such as product, customer, or delivery. Anything the business as a The key advantages to this separation of entities, data items, and domains are free- whole needs to account for and keep records of should be represented by an entity dom of expression and improved standardization. in the CDM. A CDM’s entity should represent a single global view of all possible attributes that the concept may need for any given use case or business process. Domains standardize common data characteristics for any information you need to manage for the business, regardless of what you call it. This ensures a consistent use of data structures for all attributes that are of a common concept, such as Attributes and Data Items money, name, or phone number. When data items follow a common standard In PowerDesigner, entity attributes and data items are separate but tightly related domain like this, comparing and integrating data is a lot easier. You won’t need to concepts. Data items in PowerDesigner represent a unique data cell—a single create complex transformation code to make the two different data elements value of a specific type for a specific purpose. Examples of data items are Cus- match in form and structure, so you can get right to comparing values. tomer Name, Delivery Date, Product Description, or Phone Number.

Because data items exist independent of the entity attributes they represent, you 7.3.3 Entity Relationships can use them as a data dictionary, or list of all atomic data managed in the enter- The enterprise CDM would not be complete without the relationships that are prise. This list of data items, or the data dictionary, is useful to communicate with defined between the entities. The CDM is essentially an Entity-Relationship Dia- the data stewards to ensure you have the right definition for the data independent gram (ERD). The relationships between the entities complete the understanding of any use in an entity or any physical implementation in a database. of the business data the CDM represents. There are two major types of relation- Entity attributes are a relationship, or link, between an entity and a data item. For ships in the CDM: the ones that represent how two entities are connected to each example, when the Customer entity is related to the Customer Name data item, other, and the ones that represent entities that are, in essence, a specialization of the Customer entity will have an attribute called Customer Name. Any changes another. made to the data Item will be reflected in the attribute as well. Relationships that represent the connections between two entities carry cardinal- ity; that is, the frequency of the instances of each side. You can define relationships Domains of cardinality types zero- or one-to-many, many-to-many, and one-to-one (see Customer Order Domains provide another level of data standardization. A domain is a named set of Figure 7.2, showing a one-to-many between and and a many-to- Items rder) common data characteristics for any number of data items (and therefore all many between and O . Relationships representing a supertype/subtype,

274 275 1045.book Seite 276 Montag, 25. August 2014 4:41 16 1045.book Seite 277 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner The Conceptual Data Model 7. 3

also known as an “is-a” relationship, may also be defined in the CDM using the Version Terms and the Enterprise CDM inheritance object. When you define an inheritance, or “is-a” relationship, all Different versions of the enterprise CDM will be attributed to different projects at attributes of the parent are available attributes of each child. different stages in their lifecycle. You can do this in PowerDesigner by setting up Repository To define a relationship in PowerDesigner, use the Relationship tool from the a configuration in the repository. Configurations are defined in the Configurations tool palette. Follow these steps: menu, under . You can create a new configuration and then add specific model versions to it from a select list. Using a PowerDesigner configura- Relationship 1. Select the tool, click on one of the entities, and drag to the second tion, you can indicate which specific versions of the enterprise CDM are related to entity to link. which versions of the logical and physical models representing projects and 2. To change the cardinality settings, double-click on the relationship line, and implemented systems. you can change the following: ̈ Cardinalities, One to Many, Many to Many One to One , or Don’t Overload a Single Concept ̈ Role name The (in both directions) to label the relationship, typically with a Let each data item represent a single concept. For example, break address con- verb cepts into their lowest levels of detail (street number, street name, city, etc.). You ̈ Mandatory (on each end), determining whether a parent can exist without do this manually in PowerDesigner by creating additional data items for the more any children or not, and whether a child can exist without a parent, or not granular elements and removing the complex one. This way, the language that’s used to identify the data item and the meaning of the information it represents

Employee will be crisp and clear. Is A Employee Identifier Employee name Employee Description Keep Definitions Granular If you need too many examples and too many sentences to describe a single busi- Stock Clerk Shipper Sales ness information concept, then it may be too complex for a single entity or data Hourly Rate Salaries Salaries Commission item to represent it. You should consider simplifying the concept to a common denominator or finding some way to separate it into multiple discrete concepts. Customer Order ID In PowerDesigner, you simply create additional entities and attributes to define Surname OrderID these more granular concepts. GivenName Description ......

Figure 7.2 An Example CDM Use Synonyms Where Possible Make sure a common concept shares a common language. Assign synonyms to a 7.3.4 Best Practices for Building and Maintaining an Enterprise CDM common term in the enterprise glossary so that the preferred term is always known. You do this in PowerDesigner by double-clicking the term in the glossary Business details are discovered over time, not all at once. The definitions of busi- browser and selecting the Synonyms tab. Any word you enter in the Synonyms list ness terms evolve as the business evolves. New terms are discovered, old terms will be an alternate term defining the same concept as the term itself (now known obsoleted, and existing terms redefined. In the following subsections, we’ll dis- as the preferred term). This way, you don’t confuse a different name as something cuss what to keep in mind when defining an enterprise CDM. with a completely different concept.

276 277 1045.book Seite 278 Montag, 25. August 2014 4:41 16 1045.book Seite 279 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner Detailing Information Systems with Logical and Physical Data Models 7. 4

Keep Obsolete Concepts Example If you have a concept that’s no longer needed, it’s better to leave the definition in In identifying and managing customer metadata, NeedsEIM Inc. creates an entity for the the enterprise CDM, marked as obsolete. You can do this in PowerDesigner by customer concept that has all attributes, including customer name, address, gender, age unchecking the Generate checkbox, which prevents the concept from moving range, income bracket, and more. The LDM for the order-to-cash functional area will forward into LDMs and PDMs. This way, any new concepts that are similar won’t only take the name and address attributes. A completely separate LDM for customer relationship management will take only the demographic attributes. reuse the old terms and entities, but create new ones. This ensures that there will be no future confusion with older systems using the original definition of that concept. LDMs and PDMs design information structures within a given storage paradigm. When targeting an RDBMS, the LDM represents the relational structures and includes relational concepts such as migrated foreign keys. The PDM adds the Don’t Redefine and Reuse vendor- and version-specific RDBMS details such as physical data types, triggers This complements the idea that you should keep obsolete concepts around. If and procedures, and more. Other types of LDMs exist, such as a hierarchical rep- something has really changed enough that the definition of the concept deviates resentation in canonical data models (XML structures) or an object-oriented rep- from the original idea, then a new term, new data item, or new entity should be resentation targeting object-oriented systems design. defined, and the original one should be kept around for legacy reasons. In Power- Designer, you can mark the old term as Legacy in the Stereotype field, and uncheck the Generate checkbox. A good test of this is whether the original con- 7.4.2 Structure and Technical Considerations cept fits within the new definition, or whether the data sets managed by the con- LDMs and PDMs contain structure definitions that have nothing to do with busi- cept would have to be deliberately segregated to keep them understood. ness data definitions, and everything to do with technical considerations for imple- mentation. As shown in Figure 7.3, details such as foreign keys to define how rela- tionships will be stored, or link entities storing the keys of many-to-many 7.4 Detailing Information Systems with Logical relationships are foreign to the business; they have no meaning when trying to and Physical Data Models understand a business concept. PDMs may involve denormalizing; for example, combining multiple tables or duplicating columns in more than one table to reduce The PowerDesigner LDM and PDM represent the Relational Database Manage- the number of joins needed in a query and improve application performance. ment Systems (RDBMSs) that implement the data concepts from the enterprise The LDM helps us prepare for physical implementation, and represents the data CDM. These models differ fundamentally from the enterprise CDM in three key structures for a given functional area. It may represent multiple databases, from ways: scope, structure, and technical considerations. multiple vendor/version RDBMSs. The PDM is an abstraction from the actual details of a physical implementation and is useful for application designers and 7.4.1 Scope developers to know what information is available. The PDM is there to develop LDMs and PDMs are slivers of the enterprise, representing a specific subset of the the actual database and adds details such as indexes, views, referential integrity concept to be implemented. These models represent a given functional area of the constraints, triggers, stored procedures, and more. business and their one or more physical databases. While the enterprise CDM has Each PDM is tightly related to a specific relational database vendor and version a single “namespace”—a name can only be used once for the entire enterprise and is intended to be a 1:1 representation of the actual physical database. The CDM—the logical and physical layers allow for multiple namespaces, each one PDM can be created by reverse engineering an existing running database. Any constrained by a given system boundary. PDM can be used to generate new Data Definition Language (DDL) files to create

278 279 1045.book Seite 280 Montag, 25. August 2014 4:41 16 1045.book Seite 281 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner Canonical Data Models, XML Structures, and Other Datastores 7. 5

a new database, or can be compared to an existing database to update using DDL and Data Movement Language (DML) to change the schema while keeping the CustomerType

existing data in place. Customer Address ADDRESS {Customer Type} Identifier IDENTIFIER Name NAME Phone PHONE Order Order Number State Customer ID Identifier CustomerType Employee Identifier Identifier Client Address ADDRESS Shipper Identifier Identifier {Customer Type} Identifier IDENTIFIER Sales Identifier Identifier Name NAME Phone PHONE Description Long Text Primary Identifier Figure 7.4 XML Model in SAP PowerDesigner showing complex type reuse

Order Items Many organizations have worked to standardize the structures of message formats Item ID Identifier by using a Canonical Data Model, which is an XML model that gathers all the ele- Order ID State ments of all the messages together and creates a series of XML complex types to Order Items Key define commonly reused data structures. This Canonical Data Model is a sort of data dictionary for the messages themselves. Customer In PowerDesigner, mappings can be created between the complex type defini- Customer Address Address Items Customer ID Identifier Item ID Identifier tions and the data model representing how message content can be stored in one Customer Name Name Description Long Text or more physical databases (see Figure 7.5). Customer Phone Phone Primary Identifier Customer Key

Figure 7.3 Logical Data Model with Migrated Foreign Keys

7.5 Canonical Data Models, XML Structures, and Other Datastores

Enterprise information architecture goes beyond relational databases and includes information in all structures within the enterprise. One common repre- sentation of information in nonrelational structures is the XML formatted mes- sages used to communicate between systems. XML Schema Definitions (XSDs) represent the messages and the message structure.

PowerDesigner has a special XML model, shown in Figure 7.4, that represents an XSD directly and can map that model to one or more PDMs to show where the data in messages is read from or written to. Figure 7.5 XML Model Mappings with a PDM

280 281 1045.book Seite 282 Montag, 25. August 2014 4:41 16 1045.book Seite 283 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner Data Warehouse Modeling: Movement and Reporting 7. 6

Use the Mapping Editor from the Tools menu to define mappings. Then, create type dropdown. At the physical table level, this helps report designers know what the mapping definitions by dragging the data elements from the left dropping tables contain the different types information, which ones represents things the them to the XML structures on the right. business will measure, and the variables by which we partition them.

In PowerDesigner, you can also create a library of commonly reused complex In PowerDesigner, you may select Multidimensional Objects, Retrieve Multidi- types and then use shortcuts to reuse these in any number of XML models repre- mensional Objects from the Tools menu and automatically detect the dimension senting different sets of messages. To do this, create a new XML model in Power- type based on key structures of each table. For tables that have a compound pri- Designer, and either reverse engineer an existing XSD with the complex types mary key made up of foreign keys migrated from other tables, the logic deter- defined, or use the palette to create new complex types in the model. When you mines that it’s a likely fact table, and for all other key structures, the table is deter- check the model into the repository, click the Advanced button, and select mined to be a dimension. Library in the Folder option.

Dimensional Modeling 7.6 Data Warehouse Modeling: Movement and Reporting In PowerDesigner, dimensional models represent the analytic reports themselves. The dimensional model is a graphical representation of fact and dimension objects. When you start trying to define and describe the data warehouse and business As shown in Figure 7.6, fact objects represent one or more fact tables coming analytics systems, you need to understand data in motion between source systems together to make a single fact concept. Dimension objects represent the dimension and analytics stores. You also want to know the relationship between analytics tables collapsed into a simpler representation, complete with multiple hierarchies systems and the underlying data warehouse database. This helps ensure that representing drill-up and drill-down opportunities within the attributes. you’ve identified the right data sources, that you can answer the business ques- tions needed to help in decision making, and that you know what parts of the sys- Time Location tem will be affected when changes happen to any given component of the envi- Time_ID Location ID ronment. Year State Month City PowerDesigner data mappings are captured using the Mapping Editor for easy, Day Postal/Zip Code drag-and-drop identification of the dependencies between transactional systems Order—Time Order—Location Time and analytics systems. Follow these steps: Order 1. Select Mapping Editor from the Tools menu. If this is the first time you’ve Measure started the Mapping Editor, you’ll be prompted to complete a wizard to iden- CustomerID tify the sources for the mappings. ItemsID 2. You may identify one or more PDMs to represent the source for the data ware- Date ID Product ID house or master datastore. Product Customer 3. Create mappings by dragging a source data element (table or column) from the ItemsID Order—Product Order—Customer CustomerID Description Name left-hand side to the destination (table or column) on the right. You can also define Address mappings between an enterprise data warehouse and a series of data marts. Phone

PowerDesigner table definitions allow you to mark mappings as a Fact or Dimen- sion General Dimensional . To do this, go to the tab, and select the option from the Figure 7.6 Dimensional Model with Facts, Dimensions, and Hierarchies

282 283 1045.book Seite 284 Montag, 25. August 2014 4:41 16 1045.book Seite 285 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner Link and Sync for Impact Analysis and Change Management 7. 7

These models are created either by selecting New Dimensional Diagram from the or the business process model. Linking between such models happens naturally PDM’s context menu, or running the wizard from Tools Multidimensional for the most part; for example, attaching a list of data elements to a process. Objects Generate Cube. When you define a CRUD matrix in a PowerDesigner Business Process Model (BPM) referencing data in a CDM, you’re creating links. When you create any type Note of dependency by drawing a reference, relationship, or inheritance, you’re creat- While it’s useful to mark tables as fact and dimension in order to identify where in the ing a link. You can also create links by opening any object’s property sheet, going database the structures for analytics systems will likely be finding information, it’s not a to the Traceability Links tab, and clicking the New button to select any object in description of a specific report. any model.

You also establish links when binding requirements to any object through the 7.7 Link and Sync for Impact Analysis and requirements traceability matrix. This is easily done in PowerDesigner by simply opening the requirements traceability matrix, selecting any empty cell, and press- Change Management ing the (Space) bar. To remove a link, select a cell that contains a checkmark (iden- (Space) PowerDesigner uses the dependencies that are tracked and managed between tifying the presence of a link), and press the bar. You can create dependen- models to help facilitate impact analysis and change management. This is known cies between any two objects in PowerDesigner using the dependencies matrix, as PowerDesigner’s Link and Sync technology. This allows CDMs, LDMs, and which looks and operates nearly identically to the requirements traceability PDMs to remain synchronized through iterations of change without requiring matrix, but can be established between any two objects, in the same or in differ- New Traceability designers, architects, and developers to redo their work. ent models. To create a new dependency matrix, simply select Matrix from the model’s pop-up menu in the object browser, and specify the Link and Sync captures the cross-domain dependencies, such as data used by a object types to use for the rows and columns. You can also select which attribute process step or flow, or the applications that access certain data assets. You can will be used to identify the link, if more than one way to combine these objects is show all business tasks and all applications that interact with enterprise data. possible (e.g., reference or inheritance on an entity in a CDM). In the following sections, we’ll discuss how PowerDesigner can be used to create links between any objects in any models, and how it automatically manages Synching model-to-model synchronization through the model generation engine. The synchronizing part in PowerDesigner Link and Sync is when one model is generated from another. PowerDesigner keeps track of the transformed objects 7.7.1 Link and Sync Technology and their source. When you generate a model from another (for example, when From the name, you see that Link and Sync has two parts: the Link part and the creating a PDM from an LDM), the sync technology remembers everything. If you Sync part. then make changes to the original model, the second generation isn’t a new cre- ation of a new PDM, but a write into the existing one generated the first time. Sync technology publishes only the changes made in the LDM since the last gen- Linking eration. This way, any changes made to the PDM in areas not affected by the LDM Linking is when a modeler recognizes a dependency between any two things in change will be preserved. PowerDesigner and creates the link. You can create links between any PowerDe- Tools signer model, including models that aren’t directly used for but To initiate a synch process, use the model generator from the menu. For found in information and enterprise architecture, such as the requirements model example, to synchronize an LDM to a PDM, open the LDM first, and select

284 285 1045.book Seite 286 Montag, 25. August 2014 4:41 16 1045.book Seite 287 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner Link and Sync for Impact Analysis and Change Management 7. 7

Generate Physical Data Model from the Tools menu. This initiates the sync 7.7.2 Impact Analysis Reporting Compare/Merge compare and presents you the dialog. After accepting the The most important use case for keeping all these models linked and synchro- changes you want to synchronize, PowerDesigner automatically applies them to nized together is so that you can determine what will happen if you change any- the selected PDM and opens the PDM model when complete. thing. The Impact Analysis feature in PowerDesigner produces a list of impacted PowerDesigner’s Merge Models dialog, shown in Figure 7.7, allows you to man- objects with a tree-like structure. Filters and other tools help scope the analysis to ually override any preserved changes if needed, simply by checking the empty areas of interest. To begin an impact analysis in PowerDesigner, follow these checkbox next to the detected difference. This is sometimes useful when imple- steps: mentation starts to deviate too far from the original concept, and a reset in a pre- 1. Either select Impact Analysis from the Tools menu or right-click on any object cise area is needed to get the database design back on track. in the browser or diagram area, and select Impact and lineage Analysis from the pop-up menu. 2. Generate a diagram view from the tree view by clicking the Generate Diagram button on the Impact and Lineage Analysis dialog box, as shown in Figure 7.8. This diagram is very useful to collaborate with others in an easy-to-view format (see Figure 7.9).

Figure 7.7 Compare/Merge Showing Preserved Differences

Synchronization ensures that models derived from each other remain aware of each other and that dependencies can be tracked at the smallest level. This Sync technology makes it natural and easy for business analysts, technical analysts, architects, designers, and developers to remain in lockstep while managing con- tinuous change at any level of abstraction.

Figure 7.8 SAP PowerDesigner Impact Analysis Dialog

286 287 1045.book Seite 288 Montag, 25. August 2014 4:41 16 1045.book Seite 289 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner Comparing Models 7. 8

Model comparison is used whenever changes are made to a model and the model (Order Management Process BPMN Descriptive) is checked into the repository. To initiate a compare in PowerDesigner, open the Data Access Ship Local Postal Service Ground.Customer Tools Compare (Order Management Process BPMN Descriptive) model you want to compare, and select from the menu. You Sequence Flow Create Order OK (Order Management Process BPMN Descriptive) must choose the other model to compare this one to, and select to run the Data Access Process Order.Customer comparison. (Order Management Process BPMN Descriptive) Sequence Flow Process Ship Ground Service Figure 7.10 shows a typical Compare Models dialog for two CDMs. This compari- (Order Management Process BPMN Descriptive) Data Access Process Corporate Order.Customer sons feature is also used when generating changes from one model into another (Order Management Process BPMN Descriptive) Sequence Flow Process Corporate Order when using Preserve Modifications. Compare Between can also be run at any time.

(Order Management Process BPMN Descriptive) Data Customer

(Corporate Conceptual Data Model) (Order Management MS SQL Data Model) (Order Management MS SQL 2008 Data Model) Entity Customer Table Customer View V_Orders

(Order Management Relational Logical Data Model) (Order Management Oracle 11g Data Model) Entity CUST Table Customer

Figure 7.9 SAP PowerDesigner Impact Analysis Diagram

Impact analysis makes sure you won’t forget that certain dependencies exist and will take them into consideration on each and every change request from business or technical stakeholders. Downstream, you can see what objects will need to be changed, tested, and verified based on this change. Because you know what data- bases, applications, and systems will be affected, you can get all the right people involved, and when the change is made to the operational systems, it’s done in a way that minimizes any surprises and minimizes the risk of any unplanned down- time.

7.8 Comparing Models Figure 7.10 SAP PowerDesigner Compare Dialog

Modeling is a great way to communicate and collaborate with different people on Model comparison is useful for several reasons. It’s a great way to see if there are any complex project. To communicate effectively, it’s not always practical to open any similarities in models from completely different sources. It’s also a great way a modeling tool, navigate through multiple models, and read screens. To help to see what changes are made between two different versions of a model, or for share information in any model, PowerDesigner has ways to analyze and report understanding the gap between current and desired future state. on that information and then share it with all nonmodelers in the enterprise.

288 289 1045.book Seite 290 Montag, 25. August 2014 4:41 16

7 SAP PowerDesigner

Options allow you to narrow the scope of the compare by excluding comments, data types, or other elements. We may force a compare between two objects that were not found to be the same by using the Manual Synchronization function.

Yellow and red flags indicate differences, bold and grayed out indicate presence and absence of whole objects, and the detailed compare window at the bottom shows the exact difference. The compare preview allows you to save the compar- ison as a Microsoft Excel spreadsheet for further analysis.

7.9 Summary

In this chapter, you learned that using PowerDesigner as an integral part of the SAP EIM solution gives you the power to successfully navigate the pitfalls of business transformation. PowerDesigner provides the right tools to manage information as a corporate asset today and into the future. PowerDesigner’s unique integration into the SAP landscape means designs in the models can easily translate directly to physical artifacts in databases, data movement, and reporting technologies.

In the next chapter, we’ll discuss SAP HANA Cloud Integration capabilities to con- nect databases and applications on-premise and in the cloud.

290 1045.book Seite 7 Montag, 25. August 2014 4:41 16

Contents

Introduction ...... 17

PART I SAP’s Enterprise Information Management Strategy and Portfolio

1 Introducing Enterprise Information Management ...... 25

1.1 Defining Enterprise Information Management ...... 25 1.1.1 Example of Information Flow through a Company ...... 28 1.1.2 Types of Information Included in Enterprise Information Management ...... 31 1.2 Common Use Cases for EIM ...... 33 1.2.1 EIM for Operational Initiatives ...... 33 1.2.2 EIM for Analytical Use Cases ...... 35 1.2.3 EIM for Information Governance ...... 36 1.3 Common Drivers for EIM ...... 36 1.3.1 Operational Efficiency as a Driver of EIM ...... 37 1.3.2 Information as an Organizational Asset ...... 39 1.3.3 Compliance as a Driver of EIM ...... 40 1.4 Impact of Big Data on EIM ...... 41 1.5 SAP’s Strategy for EIM ...... 43 1.6 Typical User Roles in EIM ...... 44 1.7 Example Company: NeedsEIM Inc...... 45 1.7.1 CFO Issues ...... 46 1.7.2 Purchasing Issues ...... 47 1.7.3 Sales Issues ...... 47 1.7.4 Engineering and Contracts Issues ...... 47 1.7.5 Information Management Challenges Facing NeedsEIM Inc...... 47 1.8 Summary ...... 48

2 Introducing Information Governance ...... 49

2.1 Introduction to Information Governance ...... 50 2.2 Evaluating and Developing Your Information Governance Needs and Resources ...... 52 2.2.1 Evaluating Information Governance ...... 53 2.2.2 Developing Information Governance ...... 58

7 1045.book Seite 8 Montag, 25. August 2014 4:41 16 1045.book Seite 9 Montag, 25. August 2014 4:41 16

Contents Contents

2.3 Optimizing Existing Infrastructure and Resources ...... 59 2.4 Establishing an Information Governance Process: Examples ...... 60 4 SAP’s Solutions for Enterprise Information Management ...... 113 2.4.1 Example 1: Creating a New Reseller ...... 62 4.1 SAP PowerDesigner ...... 115 2.4.2 Example 2: Supplier Registration ...... 63 4.2 SAP HANA Cloud Integration ...... 118 2.4.3 Example 3: Data Migration ...... 66 4.2.1 SAP HANA Cloud Integration for Process Integration ...... 119 2.5 Rounding Out Your Information Governance Process ...... 70 4.2.2 SAP HANA Cloud Integration for Data Services ...... 120 2.5.1 The Impact of Missing Data ...... 70 4.3 SAP Data Services ...... 120 2.5.2 Gathering Metrics and KPIs to Show Success ...... 72 4.3.1 Basics of SAP Data Services ...... 121 2.5.3 Establish a Before-and-After View ...... 76 4.3.2 SAP Data Services Integration with SAP Applications ...... 123 2.6 Summary ...... 76 4.3.3 SAP Data Services Integration with Non-SAP Applications ...... 127 3 Big Data with SAP HANA, Hadoop, and EIM ...... 77 4.3.4 Data Cleansing and Data Validation with SAP Data Services ...... 128 3.1 SAP HANA ...... 77 4.3.5 Text Data Processing in SAP Data Services ...... 130 3.1.1 Business Benefits of SAP HANA ...... 78 4.4 SAP Replication Server ...... 133 3.1.2 Basics of SAP HANA ...... 81 4.4.1 SAP Replication Server Use Cases ...... 133 3.1.3 SAP HANA Components and Architecture ...... 82 4.4.2 Basics of SAP Replication Server ...... 134 3.1.4 SAP HANA for Analytics and Business Intelligence ...... 85 4.4.3 Data Assurance ...... 136 3.1.5 SAP HANA as an Application Platform ...... 86 4.4.4 SAP Replication Server Integration with SAP 3.1.6 SAP Business Suite on SAP HANA ...... 86 Data Services and SAP PowerDesigner ...... 136 3.1.7 SAP HANA and the Cloud ...... 87 4.5 SAP Data Quality Management, Version for SAP Solutions ...... 137 3.2 SAP HANA and EIM ...... 89 4.6 SAP Information Steward ...... 139 3.2.1 Data Modeling for SAP HANA ...... 89 4.6.1 Data Profiling and Data Quality Monitoring ...... 141 3.2.2 Data Provisioning for SAP HANA ...... 89 4.6.2 Cleansing Rules ...... 143 3.2.3 Data Quality for SAP HANA ...... 94 4.6.3 Match Review ...... 146 3.3 Big Data and Hadoop ...... 96 4.6.4 Metadata Analysis ...... 147 3.3.1 The Rise of Hadoop ...... 96 4.6.5 Business Term Glossary ...... 148 3.3.2 Introduction to Hadoop ...... 98 4.7 SAP NetWeaver Master Data Management and SAP Master 3.3.3 Hadoop 2.0 Architecture: HDFS, YARN, and MapReduce ...... 99 Data Governance ...... 149 3.3.4 Hadoop Ecosystem ...... 101 4.7.1 SAP NetWeaver Master Data Management ...... 150 3.3.5 Enterprise Use Cases ...... 105 4.7.2 SAP Master Data Governance ...... 151 3.3.6 Hadoop in the Enterprise: The Bottom Line ...... 107 4.8 SAP Solutions for Enterprise Content Management ...... 154 3.4 SAP HANA and Hadoop ...... 109 4.8.1 Overview of SAP’s ECM Solutions ...... 156 3.4.1 The V’s: Volume, Variety, Velocity ...... 109 4.8.2 SAP Extended Enterprise Content Management 3.4.2 SAP HANA: Designed for Enterprises ...... 109 by OpenText ...... 160 3.4.3 Hadoop as an SAP HANA Extension ...... 109 4.8.3 SAP Document Access by OpenText and 3.5 EIM and Hadoop ...... 110 SAP Archiving by OpenText ...... 164 3.5.1 ETL: Data Services and the Information Design Tool ...... 111 4.9 SAP Information Lifecycle Management ...... 165 3.5.2 Unsupported: Information Governance and Information 4.9.1 Retention Management ...... 169 Lifecycle Management ...... 111 4.9.2 System Decommissioning ...... 170 3.6 Summary ...... 112

8 9 1045.book Seite 10 Montag, 25. August 2014 4:41 16 1045.book Seite 11 Montag, 25. August 2014 4:41 16

Contents Contents

4.10 Information Governance in SAP ...... 173 6.1.9 Role of the Enterprise Information Architecture 4.10.1 Information Governance Use Scenario Phasing ...... 174 Organization ...... 228 4.10.2 Technology Enablers for Information Governance ...... 176 6.2 Managing Data Migration Projects to Support Mergers and 4.11 NeedsEIM Inc. and SAP’s Solutions for EIM ...... 179 Acquisitions ...... 228 4.12 Summary ...... 181 6.2.1 Scoping for a Data Migration Project ...... 229 6.2.2 Data Migration Process Flow ...... 231 5 Rapid-Deployment Solutions for Enterprise Information 6.2.3 Enrich the Data Using Dun and Bradstreet (D&B) with Data Services ...... 236 Management ...... 183 6.3 Evolution of SAP Data Services at National Vision ...... 236 5.1 Rapid-Deployment Solutions for Data Migration ...... 184 6.3.1 Phase 1: The Enterprise Data Warehouse ...... 236 5.1.1 Introduction to Data Migration ...... 185 6.3.2 Phase 2: Enterprise Information Architecture— 5.1.2 Data Migration Rapid-Deployment Content ...... 187 Consolidating Source Data ...... 238 5.1.3 Getting Started with Rapid Data Migration 6.3.3 Phase 3: Data Quality and the Customer Hub ...... 239 Rapid-Deployment Content ...... 189 6.3.4 Phase 4: Application Integration and Data Migration ...... 242 5.1.4 SAP Accelerator for Data Migration by 6.3.5 Phase 5: Next Steps with Data Services ...... 242 BackOffice Associates ...... 196 6.4 Recommendations for a Master Data Program ...... 243 5.2 Rapid-Deployment Solutions for Information Steward ...... 197 6.4.1 Common Enterprise Vision and Goals ...... 243 5.2.1 Information Steward Rapid-Deployment Solution 6.4.2 Master Data Strategy ...... 243 Content ...... 198 6.4.3 Roadmap and Operational Phases ...... 244 5.2.2 Getting Started with Information Steward 6.4.4 Business Process Redesign and Change Management ...... 244 Rapid-Deployment Solution Content ...... 201 6.4.5 Governance ...... 244 5.3 Rapid-Deployment Solutions for Master Data Governance ...... 203 6.4.6 Technology Selection ...... 245 5.3.1 Master Data Governance Rapid-Deployment 6.5 Recommendations for Using SAP Process Integration and Solution Content ...... 204 SAP Data Services ...... 246 5.3.2 Getting Started with SAP Master Data Governance 6.5.1 A Common Data Integration Problem ...... 246 Rapid-Deployment Solution Content ...... 206 6.5.2 A Data Integration Analogy ...... 247 5.4 Summary ...... 207 6.5.3 Creating Prescriptive Guidance to Help Choose the Proper Tool ...... 248 6.5.4 Complex Examples in the Enterprise ...... 249 6 Practical Examples of EIM ...... 209 6.5.5 When All Else Fails… ...... 250 6.6 Ensuring a Successful Enterprise Content Management 6.1 EIM Architecture Recommendations and Experiences by Project by Belgian Railways ...... 251 Procter and Gamble ...... 209 6.6.1 Building the Business Case ...... 251 6.1.1 Principles of an EIM Architecture ...... 210 6.6.2 Key Success Factors for Your SAP Extended Enterprise 6.1.2 Scope of an EIM Enterprise Architecture ...... 212 Content Management by OpenText Project ...... 257 6.1.3 Structured Data ...... 213 6.7 Recommendations for Creating an Archiving Strategy ...... 261 6.1.4 The Dual Database Approach ...... 214 6.7.1 What Drives a Company into Starting a Data 6.1.5 Typical Information Lifecycle ...... 216 Archiving Project? ...... 261 6.1.6 Data Standards ...... 220 6.7.2 Who Initiates a Data Archiving Project? ...... 262 6.1.7 Unstructured Data ...... 221 6.7.3 Project Sponsorship ...... 263 6.1.8 Governance ...... 223 6.8 Summary ...... 266

10 11 1045.book Seite 12 Montag, 25. August 2014 4:41 16 1045.book Seite 13 Montag, 25. August 2014 4:41 16

Contents Contents

PART II Working with SAP’s Enterprise Information Management 8.2.3 Setting Up Your HCI Tenant ...... 299 Solutions 8.2.4 Setting Up Your Datastore ...... 300 8.2.5 Creating a New Project ...... 301 7 SAP PowerDesigner ...... 269 8.2.6 Moving a Task from a Sandbox to a Production Environment ...... 304 7.1 SAP PowerDesigner in the SAP Landscape ...... 270 8.3 Summary ...... 305 7.1.1 SAP Business Suite ...... 270 7.1.2 SAP HANA Cloud Platform ...... 270 7.1.3 SAP Information Steward, SAP BusinessObjects 9 SAP Data Services ...... 307 Universes, and Replication ...... 270 9.1 Data Integration Scenarios ...... 307 7.2 Defining and Describing Business Information with the 9.2 SAP Data Services Platform Architecture ...... 309 Enterprise Glossary ...... 271 9.2.1 User Interface Tier ...... 310 7.2.1 Glossary Terms for Naming Standards Enforcement ...... 272 9.2.2 Server Tier ...... 313 7.2.2 Naming Standards Definitions ...... 273 9.3 SAP Data Services Designer Overview ...... 314 7.3 The Conceptual Data Model ...... 273 9.4 Creating Data Sources and Targets ...... 318 7.3.1 Conceptual Data Elements, Attributes, and Data Items ..... 274 9.4.1 Connectivity Options for SAP Data Services ...... 318 7.3.2 Separation of Domains, Data Items, and Entity 9.4.2 Connecting to SAP ...... 321 Attributes ...... 275 9.4.3 Connecting to Hadoop ...... 323 7.3.3 Entity Relationships ...... 275 9.5 Creating Your First Job ...... 324 7.3.4 Best Practices for Building and Maintaining an 9.5.1 Create the Data Flow ...... 324 Enterprise CDM ...... 276 9.5.2 Add a Source to the Data Flow ...... 325 7.4 Detailing Information Systems with Logical and Physical 9.5.3 Add a Query Transform to the Data Flow ...... 325 Data Models ...... 278 9.5.4 Add a Target to the Data Flow ...... 325 7.4.1 Scope ...... 278 9.5.5 Map the Source Data to the Target by Configuring 7.4.2 Structure and Technical Considerations ...... 279 the Query Transform ...... 326 7.5 Canonical Data Models, XML Structures, and Other Datastores ..... 280 9.5.6 Create the Job and Add the Data Flow to the Job ...... 327 7.6 Data Warehouse Modeling: Movement and Reporting ...... 282 9.6 Basic Transformations Using the Query Transform and Functions ... 327 7.7 Link and Sync for Impact Analysis and Change Management ...... 284 9.7 Overview of Complex Transformations ...... 330 7.7.1 Link and Sync Technology ...... 284 9.7.1 Platform Transformations ...... 330 7.7.2 Impact Analysis Reporting ...... 287 9.7.2 Data Integrator Transforms ...... 332 7.8 Comparing Models ...... 288 9.8 Executing and Debugging Your Job ...... 336 7.9 Summary ...... 290 9.9 Exposing a Real-Time Service ...... 337 9.9.1 Create a Real-Time Job ...... 338 9.9.2 Create a Real-Time Service ...... 340 8 SAP HANA Cloud Integration ...... 291 9.9.3 Expose the Real-Time Service as a Web Service ...... 342 9.10 Data Quality Management ...... 343 8.1 SAP HANA Cloud Integration Architecture ...... 292 9.10.1 Data Cleansing ...... 345 8.1.1 SAP HANA Cloud Platform ...... 294 9.10.2 Data Enhancement ...... 366 8.1.2 Customer Environment On-Premise ...... 294 9.10.3 Data Matching ...... 369 8.1.3 SAP HANA Cloud Integration User Experience ...... 295 9.10.4 Using Data Quality beyond Customer Data ...... 386 8.2 Getting Started with SAP HANA Cloud Integration ...... 297 9.11 Text Data Processing ...... 388 8.2.1 Blueprinting Phase ...... 297 9.11.1 Introduction to Text Data Processing Capabilities in 8.2.2 Predefined Templates ...... 298 SAP Data Services ...... 389

12 13 1045.book Seite 14 Montag, 25. August 2014 4:41 16 1045.book Seite 15 Montag, 25. August 2014 4:41 16

Contents Contents

9.11.2 Entity Extraction Transform Overview ...... 391 9.11.3 How Extraction Works ...... 392 11 SAP Master Data Governance ...... 467 9.11.4 Text Data Processing and NeedsEIM Inc...... 394 11.1 SAP Master Data Governance Overview ...... 468 9.11.5 NeedsEIM Inc. Pain Points ...... 394 11.1.1 Deployment Options ...... 470 9.11.6 Using the Entity Extraction Transform ...... 396 11.1.2 Change Request and Staging ...... 471 9.12 Summary ...... 403 11.1.3 Process Flow in SAP Master Data Governance ...... 473 11.1.4 Use of SAP HANA in SAP MDG ...... 475 10 SAP Information Steward ...... 405 11.2 Getting Started with SAP Master Data Governance ...... 476 11.2.1 Data Modeling ...... 476 10.1 Cataloging Data Assets and Their Relationships ...... 406 11.2.2 User Interface Modeling ...... 478 10.1.1 Configuring a Metadata Integrator Source ...... 407 11.2.3 Data Quality and Search ...... 478 10.1.2 Executing or Scheduling Execution of Metadata 11.2.4 Process Modeling ...... 480 Integration ...... 409 11.2.5 Data Replication ...... 481 10.2 Establishing a Business Term Glossary ...... 410 11.2.6 Key and Value Mapping ...... 481 10.3 Profiling Data ...... 413 11.2.7 Data Transfer ...... 483 10.3.1 Configuration and Setup of Connections and Projects ...... 414 11.2.8 Activities beyond Customizing ...... 483 10.3.2 Getting Basic Statistical Information about the 11.3 Governance for Custom-Defined Objects: Example ...... 484 Data Content ...... 417 11.3.1 Plan and Create Data Model ...... 484 10.3.3 Identifying Cross-Field or Cross-Column 11.3.2 Define User Interface ...... 489 Data Relationships ...... 422 11.3.3 Create a Change Request Process ...... 494 10.4 Assessing the Quality of Your Data ...... 425 11.3.4 Assign Processors to the Workflow ...... 495 10.4.1 Defining Validation Rules Representing Business 11.3.5 Test the New Airline Change Request User Interface ...... 496 Requirements ...... 427 11.4 Rules-Based Workflows in SAP Master Data Governance ...... 497 10.4.2 Binding Rules to Data Sources for Data Quality 11.4.1 Classic Workflow and Rules-Based Workflow Using Assessment ...... 431 SAP Business Workflow and BRFplus ...... 498 10.4.3 Executing Rule Tasks and Viewing Results ...... 433 11.4.2 Designing Your First Rules-Based Workflow in 10.5 Monitoring with Data Quality Scorecards ...... 437 SAP Master Data Governance ...... 505 10.5.1 Components of a Data Quality Scorecard ...... 439 11.5 NeedsEIM Inc.: Master Data Remediation ...... 508 10.5.2 Defining and Setting Up a Data Quality Scorecard ...... 441 11.6 Summary ...... 511 10.5.3 Viewing the Data Quality Scorecard ...... 448 10.5.4 Identifying Data Quality Impact and Root Cause ...... 452 10.5.5 Performing Business Value Analysis ...... 454 12 SAP Information Lifecycle Management ...... 513 10.6 Quick Starting Data Quality ...... 461 12.1 The Basics of Information Lifecycle Management ...... 515 10.6.1 Assess the Data Using Column, Advanced, and 12.1.1 External Drivers ...... 516 Content Type Profiling ...... 462 12.1.2 Internal Drivers ...... 516 10.6.2 Receive Validation and Cleansing Rule 12.2 Overview of SAP Information Lifecycle Management ...... 516 Recommendations ...... 462 12.2.1 Cornerstones of SAP ILM ...... 517 10.6.3 Tune the Cleansing and Matching Rules 12.2.2 Data Archiving Basics ...... 518 Using Data Cleansing Advisor ...... 464 12.2.3 ILM-Aware Storage ...... 523 10.6.4 Publish the Cleansing Solution ...... 465 12.2.4 Architecture Required to Run SAP ILM ...... 527 10.7 Summary ...... 465

14 15 1045.book Seite 16 Montag, 25. August 2014 4:41 16

Contents

12.3 Managing the Lifecycle of Information in Live Systems ...... 529 12.3.1 Audit Area ...... 529 12.3.2 Data Destruction ...... 532 12.3.3 Legal Hold Management ...... 532 12.4 Managing the Lifecycle of Information from Legacy Systems ...... 534 12.4.1 Preliminary Steps ...... 534 12.4.2 Steps Performed in the Legacy System ...... 536 12.4.3 Steps Performed in the Retention Warehouse System ...... 537 12.4.4 Handling Data from Non-SAP Systems During Decommissioning ...... 539 12.4.5 Streamlined System Decommissioning and Reporting ...... 539 12.5 System Decommissioning: Detailed Example ...... 542 12.5.1 Data Extraction ...... 543 12.5.2 Data Transfer and Conversion ...... 548 12.5.3 Reporting ...... 555 12.5.4 Data Destruction ...... 559 12.6 Summary ...... 562

13 SAP Extended Enterprise Content Management by OpenText ...... 563

13.1 Capabilities of SAP Extended ECM ...... 565 13.1.1 Data and Document Archiving ...... 566 13.1.2 Records Management ...... 567 13.1.3 Content Access ...... 568 13.1.4 Document-Centric Workflow ...... 568 13.1.5 Document Management ...... 568 13.1.6 Capture ...... 569 13.1.7 Collaboration and Social Media ...... 569 13.2 How SAP Extended ECM Works with the SAP Business Suite ...... 570 13.3 Integration Content for SAP Business Suite and SAP Extended ECM ...... 572 13.3.1 SAP ArchiveLink ...... 572 13.3.2 Content Management Interoperability Standard and SAP ECM Integration Layer ...... 574 13.3.3 SAP Extended ECM Workspaces ...... 575 13.4 Summary ...... 582

The Authors ...... 583 Index...... 591

16 1045.book Seite 591 Montag, 25. August 2014 4:41 16

Index

A Archiving object definition, 519 Accelerated reporting, 542 SAP ILM-enabled, 527, 548 Access server, 313 specific customizing, 554 Active area, 473 work center, 520 Address Assessment, 59 cleansing, 125, 128 Asynchronous replication, 135 cleansing/enhancement, 137 Atomic data, 218 correction, 366 Attributes, 274 directories, 362, 363 Audit area information, 55 definition, 529 parsing, 353 demo, 530 profiling, 423 product liability, 530 validated, 137 set up, 548 Address cleanse, 350, 352 tax, 530 transform, 344 Audit package Advanced profiling, 422 create, 556 results, 424 extract to BI, 557 AIS, 559 Auditing, 172 Alias, 272 Automated electronic discovery, 169 All-world address directory, 362 Ambari, 104 Analytical use, 309 B APIs, 337 Application architecture, 117 BAdI, 469 Application integration, 242 change UI for entity type, 478 Application link enabling (ALE), 469, 481 BAPI, 321 Architecture, 209 Bar codes, 573 retention management, 527, 528 Best practices methodology, 183 system decommissioning, 528, 529 Best record strategy, 385 Archival data, 220, 223 BI, 175 Archive, 541, 565 Big data, 41, 42, 43 file, 526 processing and analysis, 323 hierarchy, 525 SAP HANA vs Hadoop, 109 index, 526 Binding, 427, 431, 447, 454 Archive administration data Blueprint, 297 transfer, 552 Break group, 373, 376 Archive Development Kit (ADK), 519 BRFplus, 179, 468, 480, 497, 498, 499, 501 Archive Management, 553, 554 custom validations, 474 Archiving, 165 single value decision table, 502, 503 object, 171, 173 user agent decision table, 502 policies, 176 Bulk data load, 120 scope, 264 Business Address Services, 123, 137 strategy, 261 Business efficiency, 154 using SAP HANA, 262 Business glossary, 115, 148

591 1045.book Seite 592 Montag, 25. August 2014 4:41 16 1045.book Seite 593 Montag, 25. August 2014 4:41 16

Index Index

Business intelligence (see Cluster, 98 Data (Cont.) Data dictionary BI), 147, 369, 370 CMC, 408 elements, 345 data items, 274 Business process descriptions, 149 connections, 414 enhancement, 366, 372 Data enrichment, 128 Business process manager, 45 internal scheduler, 434 integrated, 218 Data extract browser, 546 Business process owner, 45 set up Data Insight project, 416 integration, 50 Data flow, 238, 317, 322, 335, 385, 396, 397, Business rule, 232, 425 CMIS, 574 item, 274 400 Business Rules Framework, 468 Collaboration, 566, 569 lineage, 36, 406, 410 add query transform, 325 Business term glossary, 140, 405, 410 Compliance, 37, 60, 62 loading, 127 add source, 325 Business term taxonomy, 406 monitoring, 179 management, 224, 225 add target, 325 Business user, 142 requirements, 65 move/synchronize across enterprise, 133 add to job, 327 Business Value Analysis, 454 Conceptual data model (CDM), 116, 273 owner, 44 create, 303, 324, 339 Business-complete data, 548 Condition alias, 502 parsing, 356, 387 define, 311 Business-incomplete data, 548 Consolidate, 369 planning, 297 example, 302 Content, 32 policies, 51 GUI, 302 access, 565, 568 profiling, 232, 240 move to production, 304 C Content Data Extractor tool, 539 quality, 34, 39, 40 Data governance, 410 Content Management Interoperability Ser- real-time replication, 92 Data Insight, 406, 409, 414 Ǟ Canonical Data Model, 281 vices see CMIS retention, 217 Data Insight project, 415, 437, 439, 448, 454 Capture, 566, 569 Context data, 548 source, 71, 447 add table/file, 416 Cata dictionary, 215 extractor, 173, 535 sources, 237 define multiple, 415 CDE, 535, 545 Context information, 545 standardization, 128, 356 set up, 415 standardized, 359, 360 archive data, 545 Correction, 345 set up connection, 415 standards, 54, 220 extraction services, 548 CRM, 124, 137, 159 Data integration, 185, 246, 307, 312, 389 steward, 45 Centers of excellence, 54 content management, 159 bulk data load, 291 Ǟ stewardship, 209 Central Management Console see CMC Cross-domain dependency, 284 cloud to cloud, 291 Ǟ Culture dimension, 54 synchronization, 127, 248 Central Management Server see CMS on-premise to cloud, 291 Custom extraction rules, 392 transfer, 483 cFolders, 159 scenarios, 307 Customer complaints, 253 transformation, 327 Change management, 244, 284 Data integrator transform, 332 Customer information, 59 validation, 127, 128, 312 Change request, 472, 505, 510 Data lifecycle, 520 Customer relationship management, 62 Data archiving, 167, 168, 169, 173, 256, 261, create, 510 Data load, 228 263, 265, 518, 566 create process, 494 Data mart, 126, 210 basics, 518 process, 468 Data migration, 50, 66, 121, 124, 127, 176, process, 522 type, 499, 506 D 231, 242, 308, 369 Data Assurance, 136 UI, 496 activities, 188 DART Browser, 546 Data cleanse, 345, 353, 356, 366, 378 Checksum function, 551 business rules, 232 Data data correction, 362 Cleansing, 232 content, 185, 195 administrator, 221 data standardization, 356 package, 360, 378 data enrichment, 236 process, 352 analysis, 80 data validation, 364 standardization, 358 process flow, 231 rule, 140, 405, 406 analyst, 45, 97, 103, 410, 413, 415, 416 transform, 143, 344 rapid deployment, 184 Cleansing Package Builder, 143, 144, 387 architect, 233 assessment, 34 Data cleansing, 121, 127, 231 reasons for, 185 Cloud, 118 cleansing, 68 Data Cleansing Advisor, 464 scope, 229 applications, 118 consolidation, 383 Data destruction, 532, 559 Data model bulk data load, 120 correction, 128, 356, 362 in the live database, 532 activate, 488 rapid-deployment, 183 distribution, 224 in the retention warehouse, 559 create, 485 real-time data access, 120 domain, 174, 442, 450 security considerations, 561 plan, 485 SAP HANA database, 292 domains, 55

592 593 1045.book Seite 594 Montag, 25. August 2014 4:41 16 1045.book Seite 595 Montag, 25. August 2014 4:41 16

Index Index

Data modeling, 407, 408, 476 Data quality scorecard (Cont.) Dual database strategy, 214 Entity relationship, 275 SAP HANA, 89 view, 448 Dun and Bradstreet, 29, 63, 236 define in PowerDesigner, 276 Data movement model (DMM), 116 view Business Value Analysis, 459 Duplicate check, 479 Entity type, 476, 485 Data profiling, 122, 140, 141, 346, 372, 405, Data replication, 481 Duplicate checking, 137 choose for business object, 488 413, 428, 432 framework, 469 create, 485 basic, 419, 422 Data Services relationships, 477 create validation rule, 427 connect to Hadoop, 323 E ERP, 137, 179 project, 415 connect to SAP BW, 322 ETL, 110, 120, 126, 127, 217, 235, 247, 248, set up task, 417 connect to SAP ERP, 321 Easy Document Management, 159 307, 406, 407, 408, 438, 452, 539 Data provisioning, 84 connect to SAP HANA, 323 ECM, 43, 50, 113, 154, 156, 162, 165, 251 Executive sponsor, 55 SAP HANA, 90 server tier components, 313 integrated, 563 Extract, Transform, Load, 541 Data quality, 67, 68, 127, 131, 132, 138, 139, Data services connectivity, 292 integration layer, 574 Extraction, transformation, and loading Ǟ see 148, 174, 176, 181, 186, 225, 226, 232, Data Services Workbench, 91 workspace, 575, 576, 578 ETL 239, 240, 307, 312, 378, 387, 389, 392, Data steward, 68, 72, 142, 143, 144, 178, 438, ECMLink, 575 Extractors, 321 414, 436, 437, 451, 452, 453, 465, 478, 443, 448, 502, 507 Editions, 477 479, 509 UI, 311 EIM, 25, 27, 28, 36, 40, 43, 69 assessment, 431 Data warehouse, 126, 214, 218, 308, 333 architecture recommendations, 209 F dashboard, 435 governance, 220 Hadoop, 110 levels, 51 modeling, 282 strategy, 43 Fact object, 283 management, 343 Database administrators, 262 with SAP HANA, 89 Failed record, 450 measurement, 143 Databases, 126 E-mail Response Management System, 159 Family match, 373 metrics, 226 Datastore, 318, 396 Emails, 563 File source, 320 monitor, 427 create, 300, 319 Enterprise Financial impact, 456 monitoring, 140, 141, 406, 452 import tables, 300 application integration, 217 Financial master data, 152 process, 130 Decision tables, 499 search, 160 Floor Plan Manager (FPM), 489 requirements, 225, 405 Decommission, 26 services, 469 Flume, 102 root-cause analysis, 453 Decommissioning, 168, 539 workspace, 159 Form UIBB, 491 score, 431, 435, 437 De-duplication, 125, 231, 234, 235, 240, 365, Enterprise CDM, 273 scorecard, 426, 442, 446, 448, 452 370, 372 best practices, 276 scores, 448 Demographic data, 368 concepts, 274 G telephone patterns, 241 Dependency profiling, 423 obsolete definition, 278 Data Quality Advisor, 145, 461 Derivation, 478 versions, 277 Generic object services, 582 Data quality dimension, 438, 443, 445, 448, Digital asset management, 155 Enterprise Content Management Ǟ see ECM Geo directories, 368 450 Dimension object, 283 Enterprise data warehouse, 236, 344 Geocode, 240, 241 accuracy, 443, 445 Dimensional model, 283 Enterprise glossary, 271 Geocoding, 367, 370 completeness, 443 Direct linkage, 453 naming standards, 272 Geolocation, 219 conformity, 443, 446, 448 Direct marketing, 370 synonyms, 277 Geospatial, 366 consistency, 444 Discovery, 59 Enterprise information architecture, 269 data, 367 integrity, 444 Discrete format, 347 consolidate source data, 238 Global address cleanse, 350, 353, 357 timeliness, 444 Document, 563 details, 210 parse data, 350 uniqueness, 444 archiving, 565, 566 role of, 228 Global address cleansing, 423, 425 Data quality scorecard, 141, 437 management, 155, 566, 568 scope, 212 Global data manager, 45 Ǟ bind data sources to, 447 Document-centric workflow, 566, 568 Enterprise Information Management see Global standards, 41 components, 439 Domain, 274 EIM Governmental regulations, 40 drill into details, 449 Drawing management, 252 Entity, 274, 391 Governmental standards, 55 key data domain, 442 DRF, 469 attribute, 274 Grammatical parsing, 390 tile, 439 DSO, 557 data item, 274 GS1, 144 extraction, 391, 397 Guidelines, 227

594 595 1045.book Seite 596 Montag, 25. August 2014 4:41 16 1045.book Seite 597 Montag, 25. August 2014 4:41 16

Index Index

H HCI Agent, 294 Information governance (Cont.) L HCP customized, 153 Hadoop, 42, 96, 100, 323 service layers, 292 develop, 58 LDM as SAP HANA extension, 109 Hive, 102 establish process, 60 structure definition, 279 bulk data transfer, 101 evaluate, 53 Legacy System Migration Workbench cluster, 98 framework, 69 (LSMW), 186 collect logs, 102 I preventative, 61 Legacy systems, 172 common use, 105 technology enablers, 177 extract data, 173 Ǟ ecosystem, 101 IDEA, 559 Information lifecycle management also see Legal compliance, 261 HBase, 103 IDoc, 316, 321, 469, 481, 483 SAP ILM Legal hold, 34, 170, 179 Ǟ HDFS, 99 ILM, 165, 168, 210 Information lifecycle management see ILM management, 169 Hive, 102, 108, 109 definition, 515 Information management, 220 setting, 533 in the enterprise, 107 drivers and pain points, 515 scope, 212 Legal hold management introduction, 98 external drivers, 516 Information management strategy, 37, 154, overview, 532 machine learning libraries, 104 for legacy data, 534 165, 166 Legal requirements, 51, 168 Mahout, 104 in live systems, 529 Information platform services (IPS), 313 Lifecycle, 27 MapReduce, 99, 106 internal drivers, 516 Information Steward Link and Sync, 284 master node, 98 work centers, 519 Data Insight module, 145 parts, 284 online archive, 106 ILM object metadata, 140 technology, 269 Pig, 103, 106, 108 definition, 519 Metapedia, 410 Linking, 284 SAP HANA, 109 ILM-aware storage, 523 In-memory cloud platform, 292 Local reporting, 555, 559 scripting, 103 system, 523 In-memory computing, 78 Logical data model (LDM), 278 SQL interface, 102 ILM-BC 3.0 Integrated ECM, 563 strengths and weaknesses, 108 certification, 523 Integration flow Tez, 100 Images, 32 web-based UI, 291 M worker node, 98 Impact analysis, 228, 284, 410, 453 Integration Platform as a Service (iPaaS), 118 Hadoop Distributed File System (HDFS), 99 reporting, 287 Intelligent Driver Assistant, 254 Maintenance notification, 162 HANA Cloud Integration for data integration, Implementation methodologies, 183 iPaaS, 93 Management reporting, 214 IPS, 313 120 Individual match, 373 Management reporting and analytics, 217 IRM, 530 HCI Industry standards, 34 Managing content, 32 IT administrator, 414, 416 blueprinting phase, 297 InfoCube, 557 Manual rule binding, 431 Information connectivity, 291 Map reduce, 100 access, 51 create project, 302 Mapping, 326 discovery, 50, 76, 175 create task, 301 J Mapping Editor, 282 lifecycle, 216 data flow editor, 296 MapReduce platform services, 312, 314 Java Message Service, 316 datastore, 300 text data processing, 323 policies, 52 Job define data extraction, 294 Master data, 29, 34, 37, 64, 68, 149, 151, 176, retention manager, 530 create, 324, 327, 340 integration steps, 297 177, 181, 215, 216, 477 security, 209, 227 execute/debug, 336 logs, 305 consolidation, 150 strategy, 54 real-time, 338 on-premise component (HCI Agent), 294 customer, 152 Information asset Predefined template, 298 export, import, convert, 483 Data Quality Advisor, 145 set up prerequisites, 299 harmonization, 150 Information governance, 26, 28, 33, 49, 52, set up tenant, 299 K manage centrally, 224 55, 58, 63, 67, 68, 76, 108, 110, 114, 139, set up user roles, 299 management, 213, 308, 370 173, 177, 189, 209, 223, 225, 232, 239, Key mapping, 481, 482 transform type, 303 material, 153 244, 260 Key words, 412 tutorial, 297 program recommendations, 243 committee, 68 Knowledge worker, 72 user experience, 295 strategy, 243 council, 67 KPI, 73

596 597 1045.book Seite 598 Montag, 25. August 2014 4:41 16 1045.book Seite 599 Montag, 25. August 2014 4:41 16

Index Index

Master data governance, 499 Monitoring, 437 Physical data models, 116 Rapid-deployment solutions application framework, 469 Multiline data, 349 Pig, 103 data migration, 184 Master record, 374 Multiline format, 347 scripts, 323 Information Steward, 197 Match, 369 Multiline hybrid format, 347 Platform transformation, 330 SAP MDG, 203 comparison options, 373 PLM, 162 Real-time data replication and synchroniza- configuration, 374 Point-of-interest, 368 tion, 133 criteria, 373, 380 N Policies, 227 Real-time service, 340 group, 374, 382, 384 Policy expose as web service, 342 level, 373 No-match thresholds, 383 define, 549 Records management, 155, 179, 261, 565, 567 performance, 377 Nondiscrete data components, 349 definition, 178 Redundancy profiling, 423 scenario, 373 Nonparty data, 387 engine, 169 Reference data, 213, 215 score, 381 Nonrelational data, 109 implementation, 178 Regulatory compliance, 40 set, 373 Non-SAP systems, 534 set status to live, 550 Replication Server standards, 378 NoSQL, 96, 103 Policy category Data Assurance, 136 threshold, 374 datastore, 103 residence rules, 549 Reporting Match Criteria Editor, 381 retention rules, 549 increase performance, 542 Match Editor, 374, 380 Portal Site Management, 157 local, 559 Match method O PowerDesigner models, 116 Repository tier, 314 weighted scoring, 375 Predefined template, 298 Requirements traceability matrix, 285 Match transform, 344 OLAP, 78, 97, 126 Predictive analytics, 52, 175 Residence time Match Wizard, 374, 379, 380, 381 OLTP, 78, 97 algorithm, 80 definition, 521 Matching, 128, 129, 132 On-premise Pre-parsed data, 353 Retention, 176 process, 368 rapid-deployment, 183 Principle, 211 limits, 220 routine, 240 Oozie, 104 Print list, 531 management, 168, 169 score, 479 Open hub, 322 retrieve, 524 policies, 28, 171, 173, 178, 261 standards, 361 OpenText, 113, 156, 157, 565 Procedures, 227 time unit, 550 strategy, 371, 372 OpenText Knowledge center, 580 Process modeling, 480 Retention management, 256, 518 techniques, 372 Operational analytics, 214 Procurement, 162 capabilities, 529 Matching method Operational data, 215, 216 Product liability, 173 unstructured data, 531 combination, 374, 376 Operational efficiencies, 37, 39, 245 Product lifecycle management (see Retention Management Cockpit rule-based, 374 Operational master data management, 150 PLM), 159 Administrator, 537 weighted scoring, 374 Operational reporting, 214 Product quality, 394 Line of Business, 537 MDG communicator, 493 Profiling task, 418 Retention period Mergers and acquisitions, 228, 516 Operational use, 308 Optimical character recognition (OCR), 254 view results, 421 definition, 521 Metadata, 147, 213, 215, 221, 252, 551 Project, 302 maximum, 550 analysis, 140, 147 Organizational change management, 175 Organizational ownership, 221 minimum, 550 apply, 222 Retention rules, 223 management, 139, 147, 452, 453 Output management, 155 Output schema, 400 Q basics, 550 Metadata integration Retention warehouse, 170, 172, 542 execution, 409 Quality, 37 set up, 173 Metadata integrator, 407, 409 dimension, 440, 509 Retirement, 33 configure, 407 P Query transform, 325, 326, 327 Row data Metadata management, 405, 406, 413 report discrepancies, 136 Metapedia, 148, 149, 410 PaaS, 88 Rule binding, 441, 448, 450 synonym/keyword, 413 Parallel processing architecture, 96 Rule tasks techniques, 411 Parsed data, 350, 353, 360 R execute, 433 Migration, 28, 38 Parsed output, 352, 356 Rapid Data Migration Rule-based, 374 Missing data, 72 Parsing, 345 content, 189 Rules-based workflow, 496, 499 Model comparison, 289 Physical data model (PDM), 270, 278 design, 505 Monitor, 139 structure definition, 279 Rapid Mart, 452

598 599 1045.book Seite 600 Montag, 25. August 2014 4:41 16 1045.book Seite 601 Montag, 25. August 2014 4:41 16

Index Index

S SAP Data Services (Cont.) SAP Data Services Designer, 310 SAP HANA (Cont.) architecture, 309 SAP Digital Asset Management, 157 native advanced features, 85 SAP Accelerator for Data Migration by Back- batch jobs, 316 SAP Document Access, 157, 164, 169 real-time trigger-based replication, 92 Office Associates, 196 breakpoints, 336 SAP Document Access by OpenText, 514, 524 SAP Business Suite, 86 SAP ArchiveLink, 165, 169, 514, 572, 575 built-in functions, 327 SAP Document Presentment, 157 the cloud, 87 attachments, 551 call as external service, 480 SAP ECC, 124 with EIM, 89 documents, 524 central repository, 314 SAP Employee Management, 157 with SAP MDG, 475 SAP Archiving by OpenText, 157, 164, 165, cleansing transformation, 233 SAP Enterprise Asset Management, 162 XS server, 84 167, 169, 173, 514 CMC, 408 SAP Enterprise Portal, 157, 159, 161, 164, SAP HANA Cloud Integration (HCI), 93, 118 SAP Audit Format, 559 connect to file source, 320 470, 483, 575 SAP HANA Cloud Integration for process inte- SAP Business Process Management, 61, 179 data enhancement, 366 SAP ERP, 121, 123, 152, 170, 172, 173, 190 gration, 119 SAP Business Suite, 153, 162, 252, 255, 571 data quality, 240 document access, 164 SAP HANA Cloud Platform (HCP), 118, 294 standard business processes, 258 data validation, 364 migrate data to, 193 SAP HANA Enterprise Cloud, 87 validations, 474 migration content, 190 SAP Business Suite on SAP HANA, 86 Designer, 314 SAP HANA One, 88 SAP Extended ECM, 64, 66, 158, 161, 162, SAP Business Warehouse (SAP BW), 67, 123, enrich data, 236 SAP HANA Studio, 84 125, 126, 127, 169, 170, 322, 407, 452, 536 ETL, 237 164, 165, 178, 179, 251, 252, 254, 256, SAP Identity Management, 121 connect to retention warehouse, 173 ETL capabilities, 539 260, 565, 569, 571 SAP ILM reporting, 556 evolution, 236 ArchiveLink, 572 architecture, 527 SAP Business Workflow, 66, 74, 151, 179, extract legacy data, 242 capture, 569 cockpit roles, 537 259, 468, 494, 497, 498, 499, 573 function categories, 328 customer complaints, 254 conversion, 537, 551 configuration, 505 functions, 327 customize workspace, 579 conversion, replace old sessions, 554 SAP BusinessObjects BI, 147 history preservation, 333 integration with the SAP Business Suite, 570 cornerstones, 517 platform, 121, 125, 126, 312, 408 job, 316, 327, 336 metadata, 576 data archiving, 518 SAP BusinessObjects Business Intelligence, job server, 311, 313 migrate invoices to, 257 database storage option, 525 126, 179, 407 lineage analysis, 312 OpenText, 574 object, 548 SAP BusinessObjects Business Intelligence Local Object Library, 315 printout, 253 retention management, 518 (SAP BusinessObjects BI), 74 local repository, 314 UI options, 259 retention rules, 531 SAP BusinessObjects universe, 271 lookup function, 328 WebGUI, 578 Store Browser, 561 SAP BusinessObjects Web Intelligence, 124, major components, 309 workspace types, 577 system decommissioning, 518 189 management console UI, 311 SAP Extended Enterprise Content Manage- SAP IMG, 476 SAP Cloud Operations, 299 ment by OpenText SAP Content Server, 160 mappings, 311 SAP Information Lifecycle Management (ILM), success factors, 257 SAP CRM, 123, 138, 344 metadata, 311 113, 159, 164, 165, 166, 168, 169, 170, SAP Folders Management, 159 Customer Interaction Center, 259 migration content, 190 173, 178, 179, 182, 186, 256, 567 SAP GUI, 579 document access, 164 object types, 316 legacy functions, 173 SAP HANA, 42, 48, 77, 106, 123, 125, 126, SAP Customer Relationship Management (SAP overlap with SAP PI, 249 retention warehouse, 172 127, 170, 309, 323 CRM), 121, 190, 253 parsing, 350 SAP Information Steward, 61, 74, 113, 122, SAP Data Quality Management, 95, 128 Project Area, 315 analytics and BI, 85 139, 143, 147, 148, 149, 150, 153, 178, SDK, 127 query transform, 325 archiving, 262 179, 186, 188, 232, 233, 235, 309, 312, version for SAP solutions, 137 Rapid Data Migration, 187 as an application platform, 86 387, 407, 409, 416, 427, 431, 447, 466, SAP Data Services, 61, 64, 66, 67, 74, 120, real-time job, 316, 338 basics, 81 508, 509, 510 business benefits, 78 121, 122, 124, 125, 126, 127, 129, 150, real-time service, 337, 340 Business Value Analysis, 456 components and architecture, 82 151, 153, 168, 178, 181, 193, 229, 246, SAP HANA, 90 CMC, 408 data modeling, 89 307, 318, 322, 325, 360, 363, 367, 368, server tier, 313 Data Insight project, 414 data provisioning, 84, 89 372, 387, 394, 407, 409, 423, 425, 452, tool palette, 315 hyperlinked numbers, 420 454, 467, 514 data quality, 94 update source system, 234 metadata management, 410 address check, 137 Hadoop, 109 use Hadoop, 111 Quality Dimension attribute, 444 administration, 311 index server, 83

600 601 1045.book Seite 602 Montag, 25. August 2014 4:41 16 1045.book Seite 603 Montag, 25. August 2014 4:41 16

Index Index

SAP Information Steward (Cont.) SAP NetWeaver Master Data Management Scripting, 304 System decommissioning (Cont.) rapid-deployment solutions, 197 (SAP NW MDM) (Cont.) Semantic disambiguation, 390 enable system for SAP ILM, 535 read repository, 270 trigger workflow, 66 Sentiment, 32 extract data, 543 SAP HANA, 94 UI modeling, 478 Sentiment analysis, 131 non-SAP systems, 539 statistical information, 413 with SAP HANA, 475 Service-level agreement, 72, 414 preliminary steps, 534 UI, 311 SAP Plant Maintenance (PM), 162 normal, reverse, 72 report on legacy data, 537 SAP Invoice Management, 157, 257, 259 SAP Portal Content Management, 157 Similarity scoring, 372 reporting, 555 SAP IQ SAP Portal Content Management by Open- Single Instruction, Multiple Data (SIMD), 81 set up audit areas and rules, 548 store archive file, 526 Text, 255 Single-object maintenance transfer and convert files, 551 store archive index, 526 SAP PowerDesigner, 115 UI, 489 transfer archive administration data, 552 SAP Landscape Transformation, 186 compare dialog, 286 Slowly changing dimensions, 333 transfer data, 537 SAP Landscape Transformation Replication data mapping, 282 SN_META System Decommissioning Cockpit define relationship, 276 Server, 92 file, 551 Administrator, 538 dimensional modeling, 283 SAP LT Replication Server, 514, 539 Snapshot, 524, 545 Line of Business, 538 glossary, 272 SAP Master Data Governance Social media, 41, 569 System landscape harmonization, 516 glossary, configure, 273 rapid-deployment solution, 203 SPRO, 476 System of record, 216 impact analysis, 287 SAP HANA, 95 Sqoop, 101 library, complex types, 282 SAP NetWeaver Application Server ABAP, SRM, 62 Link and Sync technology, 284 137, 186, 580 SRS, 528 T linking, 284 Staging, 472 SAP NetWeaver Business Client, 470, 483, mapping, 281 537, 538, 548, 551, 575 area, 473 Task, 301 model compare, 289 Standardization, 345 SAP ILM cockpits, 539 move to production, 304 realize value, 269 rules, 387 SAP NetWeaver Master Data Management SAP Business Suite, 270 start with web service, 305 Standards, 227 (MDM), 467 SAP HANA, 270 template, 302 Step type, 506, 507 SAP NetWeaver Master Data Management synchronizing, 285 Tax Storage, 168, 172 (SAP NW MDM), 63, 66, 113, 123, 125, table definition, 282 audit, 166 Storage and retention service, 528 137, 150, 151, 153, 178, 179, 181, 467, XML model, 280 reporting, 173 Storage system 468, 470, 508 SAP Process Integration (PI), 246 Technical requirement, 425 ILM-aware, 523 assign processors to workflow, 495 SAP Process Orchestration, 64, 65, 66, 74, Tenant, 292 Structured data, 32, 33, 213 business activity, 480 246, 343 set up, 299 Subordinate record, 374 change request ID, 66 SAP Rapid Data Migration, 186 Term Supplier, 28 configuration steps, 476 SAP Rapid Deployment solutions, 183 hierarchies, 412 , 318 custom-defined object, 484 SAP Replication Server, 92, 114, 133 related, 412 Sybase IQ, 127 data quality, 138 Integration with SAP Data Services, 136 Text Synchronizing, 285 define UI, 489 Integration with SAP PowerDesigner, 136 analytics, 388 Synonym, 412 flex mode, 473 SAP River, 81 data, 32, 394, 395 assign to common term, 277 mining, 36 generic workflow template, 499 SAP Smart Business, 476 System consolidation, 186 Text data processing, 121, 130, 131, 132, 181, import master data, 483 SAP solutions for information lifecycle man- agement, 513 System decommissioning, 43, 169, 170, 172, 307, 389, 390, 393, 394, 399, 400 maintain SAP ERP attributes, 65 overview, 513 518, 539 dictionary, 393 master data changes, 234 SAP StreamWork, 159 archive transactional data, 543 entity, 392 master data hub, 471 SAP Travel Receipt Management, 157 configure retention warehouse system, 535 entity types, 399 multi-attribute drill-down, 475 SAPUI5, 84 convert data, 537 extraction, 392 process flow, 473 Scaling, 96 data analysis, 534 rule, 393 reuse mode, 473 Scanned invoice, 169 data transfer, 551 transform configuration, 396 rules-based workflow, 497 Schema, 338 data transfer and conversion, 548 use cases, 388 run on SAP ERP, 471 Scope, 278 define audit areas, 537 Time reference, 550 searches, 479 Scorecard, 139 detailed example, 542

602 603 1045.book Seite 604 Montag, 25. August 2014 4:41 16 1045.book Seite 605 Montag, 25. August 2014 4:41 16

Index Index

TOAx Unstructured data, 105, 109, 168, 212, 401 X Y tables, 524 lifecycle, 221 Transaction retention management, 531 XML YARN, 99 ILM, 545 text, 389 data archiving service, 528 ILM_DESTRUCTION, 559 turn into structured data, 110 export/import master data, 483 ILM_TRANS_ADMIN_ONLY, 552 Unstructured information, 33, 563 schema, 338, 342 Z IRM_CUST, 550 User interface, 571 XML DAS, 528 IRMPOL, 526, 548 XML Schema Definition (XSD), 280 ZooKeeper, 104 SARA, 543 TAANA, 535 V Transactional application, 218 Transactional data, 213, 543 Validation, 471, 478 Transform, 318, 397, 399 rule, 509, 510 address cleanse, 363, 365, 367 transform, 364 case, 331 Validation rule, 139, 142, 405, 406, 420, 425, data cleanse, 233, 353, 360, 367, 387 427, 432, 433, 441, 443, 446, 448, 450, entity extraction, 389, 393, 394, 396 452, 454 geocoder, 367, 368 add, 445 global address cleanse, 354 associate with data source, 447 history preserving, 334 create in rule editor, 429 key generation, 335 test, 430 Map_Operation, 332 Value mapping, 481, 482 match, 372, 373, 380, 385 merge, 331 query, 331 W Row_Generation, 332 Web content management, 155 SQL, 331 WebDAV table comparison, 334 ILM-enhanced interface, 523 transform configuration, 397 Weight scoring, 374 user defined, 332 Weighting, 446, 448 validation, 332, 437 What-if analysis, 460 Transformation Work center complex, 330 archiving, 520 reporting, 520 Work order, 163 U Workflow, 316, 469 distribute data maintenance, 471 UI Hadoop, 104 building blocks (UIBBs), 489 rules-based, 496 configuration, 494 Workspaces, 161, 162, 163, 254, 256, 257, Unified business language, 115 315, 581 Uniqueness profiling, 423 binder workspace, 577 Universal data cleanse, 241 business workspace, 577 Universe, 271 case workspace, 577 UNSPSC, 144 Write program Unstructured content, 212, 213, 571 log, 544

604 605 First-hand knowledge.

Corrie Brague is the director of Data Quality Product Management for SAP, where she defines software solutions that help businesses assess, improve, and monitor their data quality.

David Dichmann is director of product management for SAP’s enterprise architecture and modeling tool, Power- Designer.

George Keller has more than 20 years of experience in the field of information management, having worked within engineering, business applications, and product management organizations. He has also served as a professional delivery project manager for a number of Fortune 100 clients.

Markus Kuppe is vice president and chief solution architect for SAP Master Data Governance. He led various programs across the SAP Business Suite in topics such as analytics, user experience, or architecture. He is a frequent author and speaker at business events.

Phillip On is an industry veteran for Enterprise Information Management with more than 13 years of experience on this topic working for SAP, Business Objects, and Oracle.

Brague, Dichmann, Keller, Kuppe, On Enterprise Information Management with SAP 605 Pages, 2014, $69.95/€69.95 We hope you have enjoyed this reading sample. You may recommend ISBN 978-1-4932-1045-9 or pass it on to others, but only in its entirety, including all pages. This reading sample and all its parts are protected by copyright law. All usage www.sap-press.com/3666 and exploitation rights are reserved by the author and the publisher.