Tamr on Google Cloud Platform: Walkthrough Tamr on Google Cloud Platform: Walkthrough Overview Tamr on Google Cloud Platform empowers users to manage and publish data without learning a new SDK or coding in Java. This preview version of Tamr on Google Cloud Platform allows users to move data from Google Cloud Storage to BigQuery via a visual interface for selection and transformation of data. The preview of Tamr on Google Cloud Platform covers: + Attribute selection from CSV files + Joining CSV sources + Transformation of missing values, and + Publishing a table in BigQuery, Google’s fully managed, NoOps, data analytics service Signing Into Tamr & Google Cloud Platform To get started, register with Tamr and sign into Google Cloud Platform (using a Gmail account) by going to gcp-preview.tamr.com + If you don’t have an account with Google Cloud Platform, you can go through the Tamr portion of the offering, but will not be able to push your dataset to BigQuery. + If you don’t have a Google Cloud Platform account but would like to register for one, select the “Free Trial” option at the top of the Google Cloud Platform sign-in page. Selecting Sources Once you have signed in: + Select the project and bucket on Google Cloud Platform from which you would like to pull data into Tamr. Tamr on Google Cloud Platform: Walkthrough Adding / Subtracting Attributes Now that a data source has been added, attributes related to that data source should now appear on the left side of the screen. At this point, you have the option to add all of the attributes to a preview (via ‘Add All’ button) or add some of the attributes of interest to the preview (via click-and-drag functionality). You also have the option of searching for desired attributes via filters that can applied above the listing of attributes. If you move an undesirable attribute into your preview, you can always move it back by selecting the checkbox associated with the attribute and clicking ‘Remove’. Joining Sources If you would like to combine data from two (or more) different sources, you will first need to select ‘Add a Source’ and pick the desired data source that you’d like to add. Then, you will need to specify a join key (i.e. which column in one source contains the same information as a column in the other source). Once you have selected data from a bucket in your Google Cloud Storage, you will be automatically asked what your join keys are, as shown below: Once joined, you can add and subtract attributes to / from your custom preview like you did in previous steps. Transformations At this point, you have a desired dataset in your preview but maybe the data is ‘dirty’ and you’d like to conduct transformations in order to clean it up. Using Tamr on Google Cloud Platform, you can perform some very impactful functions: + Search for relevant attributes in your preview by using the search bar above the preview. + View what percentage of an attribute contains missing values by hovering over the horizontal green bars under the attribute name. + Transform missing values by specifying the appropriate values to be used or by eliminating the row / record. 2 Tamr on Google Cloud Platform: Walkthrough Moving Data to BigQuery via Google Cloud Dataflow When you are happy with the preview of your dataset in Tamr, you can publish your new dataset to BigQuery, where you can run super-fast, SQL-like queries against your dataset. Specifically, Tamr will be leveraging Google Cloud Dataflow, a simple, flexible, and powerful system you can use to perform data processing tasks of any size, for data movement and transformation. In order to do this, select the “Move My Data To BigQuery” button above your preview, enter in destination information, and click “Publish to BigQuery”. Tamr will then start a Google Cloud Dataflow job. By following the ‘submitted’ link that appears when you click “Publish to BigQuery,” you can see the progress of your Cloud Dataflow job within the Google Cloud Platform console, as shown below: Viewing Data In BigQuery With your dataset now in BigQuery, you can run fast, SQL-like queries against it to generate needed business insight. Accessing this dataset is very easy and only requires that you select “BigQuery” under “Big Data” within Google Developers Console to find your dataset. To learn more about the sorts of queries that are possible on BigQuery, check out Google’s BigQuery documentation. For your own personalized Tamr demo, visit www.tamr.com 3.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages3 Page
-
File Size-