Talend Open Studio for Data Quality User Guide

Talend Open Studio for Data Quality User Guide 7.1.1 Contents Copyleft...........................................................................................................................3 What is Talend Studio?................................................................................................4 Working with projects.................................................................................................. 5 Creating a project...............................................................................................................................................................5 Importing a demo project............................................................................................................................................... 6 Importing projects.............................................................................................................................................................. 7 Exporting a project..........................................................................................................................................................11 Profiling data...............................................................................................................13 Data profiling: concepts and principles...................................................................................................................13 Getting started with Talend Data Quality.............................................................................................................. 14 Profiling database content........................................................................................................................................... 29 Column analyses...............................................................................................................................................................41 Table analyses...................................................................................................................................................................91 Redundancy analyses................................................................................................................................................... 157 Correlation analyses..................................................................................................................................................... 170 Patterns and indicators............................................................................................................................................... 192 Other management procedures................................................................................................................................260 Tasks................................................................................................................................................................................... 273 Managing metadata in the Studio......................................................................... 283 Creating connections to data sources................................................................................................................... 283 Managing connections to data sources................................................................................................................ 294 Appendix: Regular expressions on SQLServer..................................................... 308 Main concept................................................................................................................................................................... 308 Creating a regular expression function on SQL Server...................................................................................308 Testing the created function via the SQL Server editor.................................................................................315 Copyleft Copyleft Adapted for 7.1.1. Supersedes previous releases. Publication date: October 15, 2019 The content of this document is correct at the time of publication. However, more recent updates may be available in the online version that can be found on Talend Help Center. This documentation is provided under the terms of the Creative Commons Public License (CCPL). For more information about what you can and cannot do with this documentation in accordance with the CCPL, please read: http://creativecommons.org/licenses/by-nc-sa/2.0/. Notices Talend is a trademark of Talend, Inc. All brands, product names, company names, trademarks and service marks are the properties of their respective owners. License Agreement The software described in this documentation is licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.html. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. This product includes software developed at ASM, AntlR, Apache ActiveMQ, Apache Ant, Apache Axiom, Apache Axis, Apache Axis 2, Apache Chemistry, Apache Common Http Client, Apache Common Http Core, Apache Commons, Apache Commons Bcel, Apache Commons Lang, Apache Datafu, Apache Derby Database Engine and Embedded JDBC Driver, Apache Geronimo, Apache HCatalog, Apache Hadoop, Apache Hbase, Apache Hive, Apache HttpClient, Apache HttpComponents Client, Apache JAMES, Apache Log4j, Apache Neethi, Apache POI, Apache Pig, Apache Thrift, Apache Tomcat, Apache Xml-RPC, Apache Zookeeper, CSV Tools, DataNucleus, Doug Lea, Ezmorph, Google's phone number handling library, Guava: Google Core Libraries for Java, H2 Embedded Database and JDBC Driver, HighScale Lib, HsqlDB, JSON, JUnit, Jackson Java JSON-processor, Java API for RESTful Services, Java Universal Network Graph, Jaxb, Jaxen, Jetty, Joda-Time, Json Simple, MapDB, MetaStuff, Paraccel JDBC Driver, PostgreSQL JDBC Driver, Protocol Buffers - Google's data interchange format, Resty: A simple HTTP REST client for Java, SL4J: Simple Logging Facade for Java, SQLite JDBC Driver, The Castor Project, The Legion of the Bouncy Castle, Woden, Xalan-J, Xerces2, XmlBeans, XmlSchema Core, atinject. Licensed under their respective license. 3 What is Talend Studio? What is Talend Studio? Talend provides you with a range of open source and subscription Studios you can use to create your projects and manage data of any type or volume. Using the graphical User Interface and hundreds of pre-built components and connectors, you can design your Jobs with a drag-and-drop interface and native code generation. 4 Working with projects Working with projects Creating a project A project is the highest physical structure for storing all different types of items. Once you launch your Talend Studio and before you start a Business Model, a data integration Job, a Route, or any other tasks, you need first create or import a project. Creating a project at initial Studio launch Procedure 1. Launch Talend Studio and connect to a local repository. 2. On the login window, select the Create a new project option and enter a project name in the field. Note: Bear in mind: • A project name is case insensitive • A project name must start with an English letter and can contain only letters, numbers, the hyphen (-), and the underscore (_) • The hyphen (-) character is deemed as the underscore (_) 3. Click Finish to create the project and open it in the Studio. Creating a new project after initial Studio launch About this task To create a new local project after the initial startup of the Studio, do the following: Procedure 1. On the login window, select the Create a new project option and enter a project name in the field. 5 Working with projects 2. Click Create to create the project. The newly created project is displayed on the list of existing projects. 3. Select the project on the list and click Finish to open the project in the Studio. Later, if you want to switch between projects, on the Studio menu bar, use the combination File > Switch Project or Workspace. Importing a demo project Talend provides you with different demo projects you can import into your Talend Studio. Available demos depend on the and may include ready to use Jobs which help you understand the functionalitie s of different Talend components. You can import the demo project either from the login window of your studio as a separate project, or from the Integration perspective into your current project. 6 Working with projects Importing a demo project as a separate project Procedure 1. Launch your Talend Studio and from the login window select Import a demo project and then click Select. 2. In the open dialog box, select the demo project you want to import and click Finish. Note: The demo projects available in the dialog box may vary depending on the you are using. 3. In the dialog box that opens, type in a name for the demo project you want to import and click Finish. A bar is displayed to show the progress of the operation. 4. On the login window, select from the project list the demo project you imported and then click Finish to open the demo project in the studio. All the samples of the demo project are imported into the studio under different folders in the repository tree view including the input files

Load more