Querying Cloud Storage Data | Bigquery | Google Cloud

Querying Cloud Storage Data | Bigquery | Google Cloud

8/23/2020 Querying Cloud Storage data | BigQuery | Google Cloud Querying Cloud Storage data BigQuery supports querying Cloud Storage data in the following formats: Comma-separated values (CSV) JSON (newline-delimited) Avro ORC Parquet Datastore exports Firestore exports BigQuery supports querying Cloud Storage data from these storage classes (/storage/docs/storage-classes): Standard Nearline Coldline Archive To query a Cloud Storage external data source, provide the Cloud Storage URI (#gcs-uri) path to your data and create a table that references the data source. The table used to reference the Cloud Storage data source can be a permanent table or a temporary table (#table-types). Be sure to consider the location (/bigquery/external-data-sources#data-locations) of your dataset and Cloud Storage bucket when you query data stored in Cloud Storage. Retrieving the Cloud Storage URI To create an external table using a Cloud Storage data source, you must provide the Cloud Storage URI. https://cloud.google.com/bigquery/external-data-cloud-storage/ 1/13 8/23/2020 Querying Cloud Storage data | BigQuery | Google Cloud The Cloud Storage URI comprises your bucket name and your object (lename). For example, if the Cloud Storage bucket is named mybucket and the data le is named myfile.csv, the bucket URI would be gs://mybucket/myfile.csv. If your data is separated into multiple les you can use a wildcard in the URI. For more information, see Cloud Storage Request URIs (https://cloud.google.com/storage/docs/xml-api/reference-uris). BigQuery does not support source URIs that include multiple consecutive slashes after the initial double slash. Cloud Storage object names can contain multiple consecutive slash ("/") characters. However, BigQuery converts multiple consecutive slashes into a single slash. For example, the following source URI, though valid in Cloud Storage, does not work in BigQuery: gs://bucket/my//object//name. To retrieve the Cloud Storage URI: 1. Open the Cloud Storage console. Cloud Storage console (https://console.cloud.google.com/storage/browser) 2. Browse to the location of the object (le) that contains the source data. 3. At the top of the Cloud Storage console, note the path to the object. To compose the URI, replace gs://bucket/file with the appropriate path, for example, gs://mybucket/myfile.json. bucket is the Cloud Storage bucket name and le is the name of the object (le) containing the data. You can also use the gsutil ls (/storage/docs/gsutil/commands/ls) command to list buckets or objects. Permanent versus temporary external tables You can query an external data source in BigQuery by using a permanent table or a temporary table. A permanent table is a table that is created in a dataset and is linked to your external data source. Because the table is permanent, you can use access controls (/bigquery/docs/access-control) to share the table with others who also have access to the underlying external data source, and you can query the table at any time. When you query an external data source using a temporary table, you submit a command that includes a query and creates a non-permanent table linked to the external data source. When https://cloud.google.com/bigquery/external-data-cloud-storage/ 2/13 8/23/2020 Querying Cloud Storage data | BigQuery | Google Cloud you use a temporary table, you do not create a table in one of your BigQuery datasets. Because the table is not permanently stored in a dataset, it cannot be shared with others. Querying an external data source using a temporary table is useful for one-time, ad-hoc queries over external data, or for extract, transform, and load (ETL) processes. Querying Cloud Storage data using permanent external tables Required permissions and scopes When you query external data in Cloud Storage using a permanent table, you need permissions to run a query job at the project level or higher, you need permissions that allow you to create a table that points to the external data, and you need permissions that allow you to access the table. When your external data is stored in Cloud Storage, you also need permissions to access the data in the Cloud Storage bucket. BigQuery permissions At a minimum, the following permissions are required to create and query an external table in BigQuery. bigquery.tables.create bigquery.tables.getData bigquery.jobs.create The following predened IAM roles include both bigquery.tables.create and bigquery.tables.getData permissions: bigquery.dataEditor bigquery.dataOwner bigquery.admin The following predened IAM roles include bigquery.jobs.create permissions: bigquery.user bigquery.jobUser https://cloud.google.com/bigquery/external-data-cloud-storage/ 3/13 8/23/2020 Querying Cloud Storage data | BigQuery | Google Cloud bigquery.admin In addition, if a user has bigquery.datasets.create permissions, when that user creates a dataset, they are granted bigquery.dataOwner access to it. bigquery.dataOwner access gives the user the ability to create external tables in the dataset, but bigquery.jobs.create permissions are still required to query the data. For more information on IAM roles and permissions in BigQuery, see Predened roles and permissions (/bigquery/docs/access-control). Cloud Storage permissions In order to query external data in a Cloud Storage bucket, you must be granted storage.objects.get permissions. If you are using a URI wildcard (#wildcard-support), you must also have storage.objects.list permissions. The predened IAM role storage.objectViewer (/storage/docs/access-control/iam) can be granted to provide both storage.objects.get and storage.objects.list permissions. Scopes for Compute Engine instances When you create a Compute Engine instance, you can specify a list of scopes for the instance. The scopes control the instance's access to Google Cloud products, including Cloud Storage. Applications running on the VM use the service account attached to the instance to call Google Cloud APIs. If you set up a Compute Engine instance to run as the default Compute Engine service account (/compute/docs/access/create-enable-service-accounts-for-instances), and that service account accesses an external table linked to a Cloud Storage data source, the instance requires read- only access to Cloud Storage. The default Compute Engine service account is automatically granted the https://www.googleapis.com/auth/devstorage.read_only scope. If you create your own service account, apply the Cloud Storage read-only scope to the instance. For information on applying scopes to a Compute Engine instance, see Changing the service account and access scopes for an instance (/compute/docs/access/create-enable-service-accounts-for-instances#changeserviceaccountandscopes). For more information on Compute Engine service accounts, see Service accounts (/compute/docs/access/service-accounts). https://cloud.google.com/bigquery/external-data-cloud-storage/ 4/13 8/23/2020 Querying Cloud Storage data | BigQuery | Google Cloud Creating and querying a permanent external table You can create a permanent table linked to your external data source by: Using the Cloud Console or the classic BigQuery web UI Using the command-line tool's mk command Creating an ExternalDataConfiguration (/bigquery/docs/reference/rest/v2/tables#externaldataconguration) when you use the tables.insert (/bigquery/docs/reference/rest/v2/tables/insert) API method Using the client libraries To query an external data source using a permanent table, you create a table in a BigQuery dataset that is linked to your external data source. The data is not stored in the BigQuery table. Because the table is permanent, you can use access controls (/bigquery/docs/access-control) to share the table with others who also have access to the underlying external data source. There are three ways to specify schema information when you create a permanent external table in BigQuery: If you are using the tables.insert (/bigquery/docs/reference/rest/v2/tables/insert) API method to create a permanent external table, you create a table resource that includes a schema denition and an ExternalDataConfiguration (/bigquery/docs/reference/rest/v2/tables#externaldataconguration). Set the autodetect parameter to true to enable schema auto-detection (/bigquery/docs/schema-detect) for supported data sources. If you are using the bq command-line tool to create a permanent external table, you can use a table denition le (/bigquery/external-table-denition), you can create and use your own schema le, or you can enter the schema inline with the bq tool. When you create a table denition le, you can enable schema auto-detection (/bigquery/docs/schema-detect) for supported data sources. If you are using the console or the classic BigQuery web UI to create a permanent external table, you can enter the table schema manually or use schema auto-detection (/bigquery/docs/schema-detect) for supported data sources. To create an external table: https://cloud.google.com/bigquery/external-data-cloud-storage/ 5/13 8/23/2020 Querying Cloud Storage data | BigQuery | Google Cloud 1. Open the BigQuery web UI in the Cloud Console. Go to the Cloud Console (https://console.cloud.google.com/bigquery) 2. In the navigation panel, in the Resources section, expand your project and select a dataset. 3. Click Create table on the right side of the window. 4. On the Create table page, in the Source section: For Create table from, select Cloud Storage. In the Select le from Cloud Storage bucket eld, browse for the le/Cloud Storage bucket, or enter the Cloud Storage URI (#gcs-uri). Note that you cannot include multiple URIs in the Cloud Console, but wildcards (/bigquery/docs/loading-data-cloud-storage#load-wildcards) are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you're creating. For File format, select the format of your data.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us