How to work with Talend & Couchbase component About Bigdata Dimension Labs

Bigdata Dimension Labs was built on the ideal that every business should be positioned to make more profitable, data-driven decisions. To achieve this, they have partnered with the best modern data sharing technologies in the world and are recognized as centers of excellence for:

Bigdata Dimension Labs have empowered organizations such as 3M, Agile and Bill Gates Foundation with 360° performance overview, real-time analytics for better decision-making, standardized reporting and faster data access and processing.

Imagine what they can do for you.

Find out more at BDDLabs.com

How To Work With Talend & Couchbase Component 2 Content How to work with talend & couchbase component ...... 4

What is NoSQL? ...... 4

Talend & couchbase server ...... 5

How does it work? ...... 5

Couchbase And Talend Jobs: Create A Document ...... 5

Couchbase And Talend Jobs: Read A Document ...... 7

Conclusion ...... 11

How To Work With Talend & Couchbase Component 3 Image

How to work with talend & couchbase component While the whole world is shifting towards big data, NoSQL has become a crucial technology in the data management industry. The need for moving and transforming data between traditional and modern systems has likewise become mission critical for data-driven businesses. This data movement could either be to a new data warehouse project or migrating the existing data from traditional RDBMS to the new NoSQL platform or adding new transformations to the existing jobs.

Talend offers a diverse range of components for utilizing big data to suit each data integration purpose. It also provides NoSQL connectivity to leading NoSQL databases like Couchbase, Cassandra, MongoDB, HBase, Neo4J, Apache CouchDB and Riak. Using Talend to manage unstructured data in a NoSQL scenar- io doesn’t require any specialized knowledge of NoSQL databases. In short, Talend is a big umbrella providing many connectors for all kinds of data movement/transformations.

What is NoSQL?

NoSQL stands for Not only SQL. It is a movement towards document stores that do not make use of the . The fundamental shift is in the way NoSQL stores data. For example, when you would need to store data about customer details, in RDBMS you would need to extract this information into tables and then use a server side or report side language to transfer this data back to its original state. On the other hand, in NoSQL, you just store the customer details. NoSQL is schema free, which means you don’t need to design your tables and structure up front – you can simply start storing values. All the values are stored in Documents and all the query joins are done using MapReduce. MapReduce is used to create a ‘view’ (like a resultset) this view consists of a subset of the overall data.

Coushbase Server is a NoSQL database. It is designed with a distributed architecture for performance, scalability, and availability. It enables developers to build applications easier and faster by leveraging the power of SQL with the flexibility of JSON.

How To Work With Talend & Couchbase Component 4 Talend & couchbase server

Talend enables you to manage and transform data between Couchbase Server, a NoSQL document data- base, and any other relational or big data system. This integration also allows you to efficiently build richer reports and analytics on the data stored in Couchbase, utilizing the power of Couchbase’s pre-computed indexes and aggregates.

What Components Can You Use?

Talend offers the following components to work with Couchbase Server. tCouchbaseConnection : This component allows you to create a connection to a Couchbase bucket and reuse that connection for other components. This opens a connection to a Couchbase bucket in order that a transaction may be made.

tCouchbaseInput : This component allows you to query the documents from the Couchbase database. This allows you to fetch your documents from the Couchbase database either by the unique key or through Views.

tCouchbaseOutput : This component allows you to perform actions on the JSON or binary documents stored in the Couchbase database based on the incoming flat data from a file, a database table etc. This inserts, updates, upserts or deletes the documents in the Couchbase database which are stored in the form of Key/Value pairs, where the Value can be JSON or binary data.

tCouchbaseClose : This component closes a connection to the Couchbase bucket when all transactions are done, in order to guarantee the integrity of transactions. This closes a Couchbase bucket connec- tion.

How does it work? Talend in/out Couchbase connectors allows you to manage and transform your data. To bring data from other data sources into Couchbase, the tCouchbaseInput connector takes incoming data streams and transforms it into JSON documents before they are stored in Couchbase. To import data into Couchbase, you can define which data fields need to be transformed into JSON attributes. Similarly, to export data from Couchbase to other data sources, the tCouchbaseOutput connector uses the schema mapping spec- ified by the user to read JSON documents and transform them into target data formats. You have the flexi- bility to define which attributes in your JSON document need to be exported and transformed. For this blog, I have created two simple jobs, however, more complex scenarios can be tackled with Talend as well.

Couchbase And Talend Jobs: Create A Document

The first job reads a .txt file which has unstructured data. The job creates a document with the data read. The input file consists of feedback from the customer and the customer_id is not of a single data type. It consists of characters, numbers and special characters as shown below. In the traditional approach, we would have started with creating a surrogate key (which will be a primary key) for the customer_id. Howev- er, with Couchbase, we could store this as-is.

How To Work With Talend & Couchbase Component 5 The overall job would look like the image given below. tCouchbase_Connection opens a connection to the Couchbase server. Once the connection is established, the input file is read and few transformations are done in tMap component post which the data is written to a document in the Couchbase Server.

For the example job given the default bucket is used and tCpouchbaseOutput settings look like this:

How To Work With Talend & Couchbase Component 6 Note that the JSON configuration is very important as this would define the way your document would be stored. In the example, the JSON configuration is similar to what is shown below.

Once the job runs successfully, you could login to Couchbase and check that the document is created.

Couchbase And Talend Jobs: Read A Document

This job would read the document created by our previous job. There are two ways of reading the docu- ments.

How To Work With Talend & Couchbase Component 7 image

Using the key: IDs of the documents stored in the Couchbase database document. In our example, it could be either 123,6534672 or john.

Using the views: Use this check box to view the document information as per the Map/Reduce func- tions and other settings. The schema here has three pre-defined fields, Id, Key, and Value. Where, Id holds the document ID, Key holds the information specified by the key of the Map function and Value holds the information specified by the value of the Map function.

The job given below reads the document using the key. The key must be specified in the settings.

tCouchBaseInput settings in the job are given below.

How To Work With Talend & Couchbase Component 8 Once the job runs successfully, the output according to the filter given in the settings would be displayed in the console.

The next job reads the document using the views.

Change the settings in the tCouchbaseoutput as shown below. Here I am creating a view ‘customer_view’ with customer_id,customer_name and feedback columns.

How To Work With Talend & Couchbase Component 9 Save the and run the job. Go to Couchbase console and check that the view and the result set is created. This view could further be used for jobs or for ad-hoc queries.

How To Work With Talend & Couchbase Component 10 Conclusion

You, too, are ready to begin your own enterprising journey. We hope that How To Work With Talend & Couchbase Component has armed you with the knowledge and know-how.

If you still want to learn more, I encourage you to visit BDDLabs.com to gain more information about modern cloud data sharing. You can also access documentation, view webinars, browse our offers, view scoops of upcoming events, and get support. We also invite you to schedule a free demo of our data sharehousing technology so your business can get started right away!

Email: [email protected] Phone: 888-856-2238

How To Work With Talend & Couchbase Component 11