Introduction to Bigquery Monitoring | Google Cloud
Total Page:16
File Type:pdf, Size:1020Kb
8/23/2020 Introduction to BigQuery monitoring | Google Cloud Introduction to BigQuery monitoring Monitoring and logging are crucial for running reliable applications in the cloud. BigQuery workloads are no exception, especially if your workload has high volumes or is mission critical. This document provides a high-level overview of the monitoring data that is available for BigQuery. Metrics Metrics are numerical values that are collected at regular intervals and made available for analysis. You can use metrics to: Create charts and dashboards. Trigger alerts for critical conditions or situations that need human intervention. Analyze historical performance. In the case of BigQuery, the available metrics include the number of jobs that are running, how many bytes were scanned during a query, the distribution of query times, and other statistics. For the complete list of metrics available for BigQuery, see bigquery (/monitoring/api/metrics_gcp#gcp-bigquery) under Google Cloud metrics (/monitoring/api/metrics_gcp). Use Cloud Monitoring (/monitoring/docs) to view BigQuery metrics and create charts and alerts. Each metric has a resource type, either bigquery_dataset, bigquery_project, or global, and a set of labels. Use this information to build queries in Monitoring Query Language (MQL) (/monitoring/mql). You can group or lter each metric by using the labels. For example, to chart the number of interactive queries in ight, use the following MQL statement, which lters by priority equal to interactive: global ric 'bigquery.googleapis.com/query/count' ter metric.priority = 'interactive' The next example gets the number of load jobs in ight, grouped into 10-minute intervals: https://cloud.google.com/bigquery/docs/monitoring/ 1/5 8/23/2020 Introduction to BigQuery monitoring | Google Cloud bigquery_project ric 'bigquery.googleapis.com/job/num_in_flight' ter metric.job_type = 'load' up_by 10m For more information, see Creating charts and alerts for BigQuery (/bigquery/docs/monitoring-dashboard). Logs Logs are text records that are generated in response to particular events or actions. BigQuery creates log entries for actions such as creating or deleting a table, purchasing slots, or running a load job. For more information about logging in Google Cloud, see Cloud Logging (/logging/docs). A log is an append-only collection of log entries. For example, you could write your own log entries to a log named projects/PROJECT_ID/logs/my-test-log. Many Google Cloud services, including BigQuery, create a type of log called audit logs (/logging/docs/audit). These logs record: Administrative activity, such as creating or modifying resources. Data access, such as reading user-provided data from a resource. System events that are generated by Google systems, rather than by user actions. Audit logs are written in a structured JSON format. The base data type for Google Cloud log entries is the LogEntry (/logging/docs/reference/v2/rest/v2/LogEntry) structure. This structure contains the name of the log, the resource that generated the log entry, the timestamp, and other basic information. The details of the logged event are contained in a subeld called the payload eld. For audit logs, the payload eld is named protoPayload. The value of this eld is an AuditLog (/logging/docs/reference/audit/auditlog/rest/Shared.Types/AuditLog) structure, indicated by the value of the protoPayload.@type eld, which is set to type.googleapis.com/google.cloud.audit.AuditLog. For operations on datasets, tables, and jobs, BigQuery currently writes audit logs in two different formats, although both share the AuditLog base type. https://cloud.google.com/bigquery/docs/monitoring/ 2/5 8/23/2020 Introduction to BigQuery monitoring | Google Cloud In the older format: The resource.type eld is bigquery_resource. Details about the operation are written to the protoPayload.serviceData eld. The value of this eld is an AuditData (/bigquery/docs/reference/auditlogs/rest/Shared.Types/AuditData) structure. In the newer format: The resource.type eld is either bigquery_project or bigquery_dataset. The bigquery_project resource has log entries about jobs, while the bigquery_dataset resource has log entries about storage. Details about the operation are written to the protoPayload.metadata eld. The value of this eld is a BigQueryAuditMetadata (/bigquery/docs/reference/auditlogs/rest/Shared.Types/BigQueryAuditMetadata) structure. We recommend consuming logs in the newer format. For more information, see Audit logs migration guide (/bigquery/docs/reference/auditlogs/migration). Here is an abbreviated example of a log entry that shows a failed operation: otoPayload": { @type": "type.googleapis.com/google.cloud.audit.AuditLog", status": { "code": 5, "message": "Not found: Dataset my-project:my-dataset was not found in location US" , authenticationInfo": { ... }, requestMetadata": { ... }, serviceName": "bigquery.googleapis.com", methodName": "google.cloud.bigquery.v2.JobService.InsertJob", metadata": { source": { type": "bigquery_project", labels": { .. }, verity": "ERROR", gName": "projects/my-project/logs/cloudaudit.googleapis.com%2Fdata_access", https://cloud.google.com/bigquery/docs/monitoring/ 3/5 8/23/2020 Introduction to BigQuery monitoring | Google Cloud For operations on BigQuery Reservations, the protoPayload is an AuditLog (/logging/docs/reference/audit/auditlog/rest/Shared.Types/AuditLog) structure, and the protoPayload.request and protoPayload.response elds contain more information. You can nd the eld denitions in BigQuery Reservation API (/bigquery/docs/reference/reservations/rpc). For more information, see Monitoring BigQuery Reservations (/bigquery/docs/reservations-monitoring). INFORMATION_SCHEMA views INFORMATION_SCHEMA (/bigquery/docs/information-schema-intro) views are another source of insights in BigQuery, which you can use in conjunction with metrics and logs. These views contain metadata about jobs, datasets, tables, and other BigQuery entities. For example, you can get real-time metadata about which BigQuery jobs ran over a specied time period, and then group or lter the results by project, user, tables referenced, and other dimensions. You can use this information to perform more detailed analysis about your BigQuery workloads, and answer questions like: What is the average slot utilization for all queries over the past 7 days for a given project? Which users submitted a batch load job for a given project? What streaming errors occurred in the past 30 minutes, grouped by error code? In particular, look at jobs metadata (/bigquery/docs/information-schema-jobs), streaming metadata (/bigquery/docs/information-schema-streaming), and reservations metadata (/bigquery/docs/information-schema-reservations) to get insights into the performance of your BigQuery workloads. What's next https://cloud.google.com/bigquery/docs/monitoring/ 4/5 8/23/2020 Introduction to BigQuery monitoring | Google Cloud To learn how to create charts and alerts, see Creating charts and alerts for BigQuery (/bigquery/docs/monitoring-dashboard). Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies (https://developers.google.com/site-policies). Java is a registered trademark of Oracle and/or its aliates. Last updated 2020-08-19 UTC. https://cloud.google.com/bigquery/docs/monitoring/ 5/5.