SQL Server to SQL Server PDW Migration Guide (AU3)
Total Page:16
File Type:pdf, Size:1020Kb
SQL Server to SQL Server PDW Migration Guide (AU3) Contents 4 Summary Statement 4 Introduction 4 SQL Server Family of Products 6 Differences between SMP and MPP 8 PDW Software Architecture 10 PDW Community 10 Migration Planning 11 Determine Candidates for Migration 11 Migration Considerations 12 Migration Approaches 13 SQL Server to PDW Migration 13 Main Migration Steps 13 PDW Migration Advisor 14 APS Migration Utility 14 Table Geometry 20 Optimization within PDW Microsoft SQL Server to SQL Server PDW Migration Guide (AU3) 29 Converting Database Schema Objects 29 Type Mapping 36 Syntax Differences 51 Conclusion 51 For More Information 51 Feedback 52 Appendix Microsoft SQL Server to SQL Server PDW Migration Guide (AU3) © 2014 Microsoft Corporation. All rights reserved. This document is provided “as-is.” Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. You may modify this document for Microsoft SQL Server to SQL Server PDW Migration Guide (AU3) your internal, reference purposes. In this migration guide you will learn the differences between the SQL Server and Parallel Data Warehouse database platforms, which Summary runs on the Analytics Platform System appliance, and the steps necessary to convert a SQL Server database to Parallel Data Statement Warehouse. This guide has been updated to reflect the T-SQL improvements contained within Version 2, Appliance Update 3 (AU3) Migrating a data warehouse application from Microsoft® SQL Server® SMP to the Microsoft® SQL Server® Parallel Data Warehouse (PDW) Introduction region within the Microsoft Analytics Platform System (APS) appliance, provides many benefits, including improved query performance, scalability and the simplest integration to Hadoop. This white paper explores the remediation activities and provides guidance to overcome technical differences between the two database platforms in order to simplify migration projects. It describes the implementation differences of database objects, SQL dialects, and procedural code between the two platforms. Microsoft offers a wide range of SQL Server products to accommodate SQL Server the different business needs and technical requirements to provide a Family of robust and scalable enterprise-wide solution. Compact Edition Products Microsoft SQL Server Compact 4.0 is a compact database ideal for embedding in desktop and web applications. SQL Server Compact 4.0 gives developers a common programming model with other SQL Server editions for developing both native and managed applications. SQL Server Compact provides relational database functionality in a small footprint. Express Edition SQL Server 2014 Express Edition is available for free from Microsoft and provides a powerful database engine ideal for embedded applications or for redistribution with other solutions. Independent software vendors use it to build desktop and data-driven applications. If you need more advanced database features and support for greater than 10 GB databases, SQL Server Express is fully compatible with other editions of SQL Server can be seamlessly upgraded to enterprise versions of SQL Microsoft SQL Server to SQL Server PDW Migration Guide (AU3) 4 Server without requiring remediation to overcome changes in functionality. Standard Edition SQL Server 2014 Standard edition is a robust data management and business intelligence database for departments and small workgroups to support a wide variety of applications. SQL Server Standard Edition also supports common development tools for on premise and cloud. Enabling effective database management with minimal IT resources. SQL Server Standard Edition is compatible with other editions. Web Edition SQL Server 2014 Web edition is a low total-cost-of-ownership option for Web hosters and Web VAPs to provide scalability, affordability, and manageability capabilities for small to large scale Web properties. Business Intelligence Edition SQL Server 2014 Business Intelligence edition delivers a comprehensive platform empowering organizations to build and deploy secure, scalable and manageable BI solutions. It offers exciting functionality such as browser based data exploration and visualization; powerful data mash- up capabilities, and enhanced integration management. Enterprise Edition SQL Server 2014 Enterprise edition delivers comprehensive high-end datacenter capabilities with blazing-fast performance, unlimited virtualization, and end-to-end business intelligence. Enabling high service levels for mission-critical workloads and end user access to data insights. Parallel Data Warehouse While the other versions of SQL Server mentioned here have a symmetric multi-processing (SMP) architecture, SQL Server Parallel Data Warehouse has a massively parallel processing (MPP) architecture, in the form of a data warehousing appliance, designed and built for to manage high volumes of relational data (with up to 100x performance gains) and providing the simplest integration to Hadoop. This addition of SQL Server only runs on the Microsoft Analytics Platform System appliance; available from HP, Dell and Quanta. Microsoft SQL Server to SQL Server PDW Migration Guide (AU3) 5 As data volumes grow and the number of users increase, we find the Differences traditional architecture of running your data warehouse on a single SMP server insufficient to meet the business requirements. To overcome between these limitations and scale and perform beyond your existing deployment, Parallel Data Warehouse brings Massively Parallel SMP and Processing (MPP) to the world of SQL Server. Essentially parallelizing and distributing the processing across multiple SMP compute nodes. SQL MPP Server Parallel Data Warehouse is only available as part of Microsoft’s Analytics Platform System (APS) appliance. Before diving into the PDW architecture, let us first understand the differences between the architectures mentioned above. Symmetric Multi-Processing (SMP) Symmetric Multi-Processing (SMP) is the primary parallel architecture employed in servers. An SMP architecture is a tightly coupled multiprocessor system, where processors share a single copy of the operating system (OS) and resources that often include a common bus, memory and an I/O system. Think of this as a typical single server, multi-core with locally attached storage, running the Microsoft Windows OS. Massively Parallel Processing (MPP) Massively Parallel Processing (MPP) is the coordinated processing of a single task by multiple processers, each working on a different part of the task. With each processor using its own operating system (OS) and memory. MPP processors communicate between each other using some form of messaging interface via an “interconnect”. The setup for MPP is more complicated than SMP and one approach to simplify this while providing equal amounts of resources between processors is the “Shared-Nothing” architecture. This approach is the one Parallel Data Warehouse is based upon. Shared Nothing Architecture Microsoft SQL Server to SQL Server PDW Migration Guide (AU3) 6 The term shared nothing architecture was coined by Michael Stonebraker (1986) to describe a multiprocessor database management system in which neither memory nor disk storage is shared among the processors. For a database which follows the shared-nothing architecture, each processor has its own set of disks. Data is “horizontally partitioned” across nodes, such that each node has a subset of the rows from each table in the database. Each node is then responsible for processing only the rows on its own disks. Such architectures are especially well suited to data warehouse workloads, where large fact tables can be distributed across the nodes. In addition, every node maintains its own lock table and buffer pool, eliminating the need for complicated locking and software or hardware consistency mechanisms. Because shared nothing does not typically have a severe bus or resource contention it can be made to scale to hundreds or even thousands of machines. Microsoft SQL Server to SQL Server PDW Migration Guide (AU3) 7 PDW Software Architecture Admin Console The Admin Console is a web application that visualises the appliance state, health, and performance information. MPP Engine The MPP Engine is the brain of the SQL Server Parallel Data Warehouse (PDW) and delivers the Massively Parallel Processing (MPP) capabilities by doing the following: Generates the parallel query execution plan and coordinates the parallel query execution across the compute nodes. Stores and coordinates metadata and configuration data for all of the databases. Manages SQL Server PDW database authentication and authorization. Tracks hardware and software status. Microsoft SQL Server to SQL Server PDW Migration Guide (AU3) 8 Data Movement Service (DMS) Transfers data between the Computer nodes and the Control node. Processes query operations that require transferring data among the nodes. Improves query performance by optimizing data transfer speeds. Data is loaded in parallel directly from the loading server to the Compute nodes via the DMS. DMS transfers data from each Compute node directly to the backup server. Using PolyBase, DMS transfers data to and from an external Hadoop cluster, or the HDInsight Region on the appliance SQL Server