The Ultimate Guide to Using Azure Data Box with Hubstor
Total Page:16
File Type:pdf, Size:1020Kb
The Ultimate Guide to Using Azure Data Box with HubStor HubStor Inc. Copyright 2018. www.hubstor.net Table of Contents Introduction ................................................................................................................................................................... 3 Process Overview .......................................................................................................................................................... 3 The Ordering Process ..................................................................................................................................................... 3 Device Arrival and Unpacking ........................................................................................................................................ 6 The Configuration .......................................................................................................................................................... 7 Connecting for Data Transfer ........................................................................................................................................ 9 Transferring Data to the Azure Data Box ....................................................................................................................... 9 Returning the Device ................................................................................................................................................... 11 Transferring Data to HubStor ...................................................................................................................................... 12 Run 'Blobless Archive' Job to Sync Original Metadata and Security Information ........................................................ 13 Conclusion ................................................................................................................................................................... 13 2 Introduction The logistics of archiving unstructured data to the cloud can be a challenge if your Internet connection is too slow or busy to support the transfer of a large data set. Never fear because the Azure Data Box is here! It allows you to quickly and easily move massive amounts of data into HubStor via Azure Blob Storage -- especially handy in situations where bandwidth is an issue. HubStor is certified to work with the Azure Data Box. This blog post will walk you through the new drive-shipping experience end to end including the steps to get the data into HubStor with synchronization of the data's original metadata and security access rights. Process Overview The Microsoft process for Azure Data Box is very simple. Here are the steps involved: With HubStor in the mix, there are two additional steps to complete the process: 1. Ingestion to HubStor -- the Azure Blob Storage account that contains the data loaded from Data Box is processed by HubStor to bring the data into a blob storage account within your HubStor tenant. This process runs entirely in the cloud and is fully managed by HubStor. 2. Sync of original metadata (aka. 'blobless archive') -- the HubStor Connector Service (HCS) runs against the original storage mount point containing the source data in order to match the original file attributes (i.e. last modified, created, etc.) and security information (i.e. access control lists, data ownership) with the drive-shipped content. The Ordering Process The ordering process occurs, as one might expect, through the Azure portal. A new blade called “Data Box” has been added in the Azure portal, from which you can place new orders, as well as manage your existing orders. Ordering a Data Box is as simple as clicking the add button and filling in the required information. First the shipping address: 3 Next, you specify the resource group and storage account which will be the destination wherein Microsoft will place the data that you copy to the device. This can be a new or existing storage account. 4 And that's really all there is to placing the order. Click the "Create" button, and your order is created, and will be shipped. You can always check on the status of a Data Box order by selecting it in the portal. Here’s the status of our device which has already arrived: 5 Device Arrival and Unpacking At last the day came -- we received our Data Box in a durable shipping container. Upon opening the container, we found the Data Box itself, all required cables (including two 10 Gbps ethernet cables), and some handy instructions to get us started. Getting the device up and running couldn't be simpler: Simply plug it in, connect the management interface (which has a hardcoded IP address), and at least one of the data ports (there is a 1 Gbps port, in addition to the two 10 Gbps ports). 6 We were pleasantly surprised: Expecting to have our office sound like an airport runway, we found the Data Box was incredibly quiet. The Configuration Now that the device was powered up and properly connected, the next step was to configure it. As we mentioned, the management port is configured to a specific subnet and IP address. This means you will need an ethernet adapter that is connected to the same subnet in order to access the Data Box administration portal. In our case, we used a machine with two 1 Gpbs network cards, only one of which had ever been used, so it was simply a matter of configuring the second card, and connecting it to the Data Box management port. With the connections complete, browsing to https://192.168.100.10 connected us with the Data Box: Device password - good question! Back to the Azure portal to look at the "Credentials" blade, where all required information is displayed: 7 Plugging in the device password unlocked the device, and brought us to an extremely simply Web interface on which we could finish configuring the device. In the above screenshot, we are already transferring data to the Data Box, so that is why it's reporting 3.2 TB of used space. To get to the point of transferring data, the only thing you need to do is visit the "Set network interfaces" page to configure the data port. We used the 1 Gbps data port for testing to a static IP to make it easier to find on our network. 8 Connecting for Data Transfer The Data Box itself presents on the network simply as a couple of large shares, depending on what type of Azure storage you wish to have the data created as: As you can see, in our configuration there are two shares: one for files you want to wind up in the Azure storage account as block blobs; another for those files you want in Azure storage as page blobs. In our case, accessing \\10.44.44.235\devdatabox_BlockBlob was the share for block blob destine files, and \\10.44.44.235\devdatabox_PageBlob was the share for page blob destine files. Of course, both shares are password protected, but simply clicking the proper "Get credentials" button above displays a dialog with the required user name and password. Interestingly, you can see there is an option for checksum computation, which is enabled by default. This option will allow you -- once the data has been updated to your storage account -- to download a file containg the CRC64 for every file you copied to the Data Box. If desired, this CRC64 can be used to verify the integrity of the file for chain of custody or other reasons. Transferring Data to the Azure Data Box Next up: Transferring data to our Data Box. This is really as simple as just copying folders and files, as you would to move data to any other share on your network. However, there are some important caveats to consider, as follows: • Any root level folders you create in either share will ultimately be created as containers in the storage account. Thus, these root level folder names must conform to the Azure restrictions on container names (see here for more details). • Any non-root level folders you create will simply be persisted as part of the blob path in storage account. As a result, all the usual naming restrictions apply for blob paths in Azure. 9 • Any files you create will become, as you'd expect, blobs in your storage account, and thus must adhere to naming restrictions for blobs in Azure. Fortunately, with the exception of the restrictions on container names, the above requirements are not likely to be an issue for our scenario. For our testing we had two goals: 1. Validate HubStor's 'blobless archive mode' -- 'Blobless archive' archive is a unique feature in HubStor that works with all methods of drive shipping to keep the data's original metadata and security ACLs in tact with drive-shippped workloads. For this exercise we relied on several data sets we have for testing. Contact HubStor for more details about how we seamlessly integrate with drive shipping or Data Box to get your data archived, with all original metadata intact. 2. Put the Data Box device itself through its paces, to validate for Microsoft the proper operation of the device at scale. For the second goal, we didn't happen to have 80 TB of data lying around that we could load into the Data Box. Instead, we put our engineers to work and developed a custom application, inventively named "Data Box Tester", that we could use to generate a folder structure and random files of a given size on the Data Box. This allows us to easily perform edge testing (e.g. 4.5 TB files, complex directory trees with many small files, etc.). For each file we generated on the Data Box, our tester utility recorded the details of the file along with a 64-bit checksum of the file, which we later