Troubleshooting and Debugging | Cloud Dataﬂow | Google Cloud

8/23/2020 Troubleshooting and debugging | Cloud Dataflow | Google Cloud Troubleshooting and debugging This page provides troubleshooting tips and debugging strategies that you might nd helpful if you're having trouble building or running your Dataow pipeline. This information can help you detect a pipeline failure, determine the reason behind a failed pipeline run, and suggest some courses of action to correct the problem. Dataow provides real-time feedback on your job, and there is a basic set of steps you can use to check the error messages, logs, and for conditions such as your job's progress having stalled. For guidance on common errors you might encounter when running your Dataow job, see the Common error guidance (/dataow/docs/guides/common-errors) page. If a job fails, Dataow might not automatically clean up your temporary and staged les. In this case, manually e these les from any Cloud Storage buckets that are used as temporary buckets. Checking your pipeline's status You can detect any errors in your pipeline runs by using the Dataow monitoring interface (/dataow/pipelines/dataow-monitoring-intf). 1. Go to the Google Cloud Console (https://console.cloud.google.com/). 2. Select your Google Cloud project from the project list. 3. Click the menu in the upper left corner. 4. Navigate to the Big Data section and click Dataow. A list of running jobs appears in the right-hand pane. 5. Select the pipeline job you want to view. You can see the jobs' status at a glance in the Status eld: "Running," "Succeeded," or "Failed." https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline/ 1/10 8/23/2020 Troubleshooting and debugging | Cloud Dataflow | Google Cloud Figure 1: A list of Dataow jobs in the Developers Console with jobs in the running, succeeded, and failed states. Basic troubleshooting workow If one of your pipeline jobs fails, you can select the job to view more detailed information on errors and run results. When you select a job, you can view key charts for your pipeline, the execution graph, the Job info panel, and a panel for Job logs, Worker logs, andJob error reporting. Checking job error messages To expand the Job Logs generated by your pipeline code and the Dataow service, click . You can lter the messages that appear in Job logs by clicking Any log level. To only display error messages, click Filter and select Errors. To expand an error message, click the expandable section . https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline/ 2/10 8/23/2020 Troubleshooting and debugging | Cloud Dataflow | Google Cloud Alternatively, you can click the Job error reporting panel. This panel shows where errors occurred along the chosen timeline and a count of all logged errors. Viewing step logs for your job When you select a step in your pipeline graph, the logs panel toggles from displaying Job Logs generated by the Dataow service to showing logs from the Compute Engine instances running your pipeline step. https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline/ 3/10 8/23/2020 Troubleshooting and debugging | Cloud Dataflow | Google Cloud Cloud Logging (/logging) combines all of the collected logs from your project's Compute Engine instances in one location. See Logging pipeline messages (/dataow/pipelines/logging) for more information on using Dataow's various logging capabilities. Handling automated pipeline rejection In some cases, the Dataow service identies that your pipeline might trigger known SDK issues (/dataow/docs/resources/release-notes-service). To prevent pipelines that will likely encounter issues, from being submitted, Dataow will automatically reject your pipeline and display the following message: workflow was automatically rejected by the service because it may trigger an ified bug in the SDK (details below). If you think this identification is ror, and would like to override this automated rejection, please re-submit workflow with the following override flag: [OVERRIDE FLAG]. https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline/ 4/10 8/23/2020 Troubleshooting and debugging | Cloud Dataflow | Google Cloud etails: [BUG DETAILS]. ct Google Cloud Support for further help. e use this identifier in your communication: [BUG ID]. After reading the caveats in the linked bug details, if you want to try to run your pipeline anyway, you can override the automated rejection. Add the ag --experiments=<override- flag> and resubmit your pipeline. Determining the cause of a pipeline failure Typically, a failed Apache Beam pipeline run can be attributed to one of the following causes: Graph or pipeline construction errors. These errors occur when Dataow runs into a problem building the graph of steps that compose your pipeline, as described by your Apache Beam pipeline. Errors in job validation. The Dataow service validates any pipeline job you launch. Errors in the validation process can prevent your job from being successfully created or executed. Validation errors can include problems with your Google Cloud project's Cloud Storage bucket, or with your project's permissions. Exceptions in worker code. These errors occur when there are errors or bugs in the user- provided code that Dataow distributes to parallel workers, such as the DoFn instances of a ParDo transform. Slow-running pipelines or lack of output. If your pipeline runs slowly or runs for a long period of time without reporting results, you might check your quotas for streaming data sources and sinks such as Pub/Sub. There are also certain transforms that are better- suited to high-volume streaming pipelines than others. Errors caused by transient failures in other Google Cloud services. Your pipeline may fail because of a temporary outage or other problem in the Google Cloud services upon which Dataow depends, such as Compute Engine or Cloud Storage. Detecting graph or pipeline construction errors A graph construction error can occur when Dataow is building the execution graph for your pipeline from the code in your Dataow program. During graph construction time, Dataow https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline/ 5/10 8/23/2020 Troubleshooting and debugging | Cloud Dataflow | Google Cloud checks for illegal operations. If Dataow detects an error in graph construction, keep in mind that no job will be created on the Dataow service. Thus, you won't see any feedback in the Dataow monitoring interface. Instead, you'll see an error message similar to the following in the console or terminal window where you ran your Apache Beam pipeline: Java: SDK 2.xPython (#python)Java: SDK 1.x (#java:-sdk-1.x) For example, if your pipeline attempts to perform an aggregation like GroupByKey on a globally windowed, non-triggered, unbounded PCollection, you'll see an error message similar to the following: ... ... Exception in thread "main" java.lang.IllegalStateException: ... GroupByKey cannot be applied to non-bounded PCollection in the GlobalWindow with ... Use a Window.into or Window.triggering transform prior to GroupByKey ... Should you encounter such an error, check your pipeline code to ensure that your pipeline's operations are legal. Detecting errors in Cloud Dataow job validation Once the Dataow service has received your pipeline's graph, the service will attempt to validate your job. This validation includes the following: Making sure the service can access your job's associated Cloud Storage buckets for le staging and temporary output. Checking for the required permissions in your Google Cloud project. Making sure the service can access input and output sources, such as les. If your job fails the validation process, you'll see an error message in the Dataow monitoring interface, as well as in your console or terminal window if you are using blocking execution. The error message will look similar to the following: https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline/ 6/10 8/23/2020 Troubleshooting and debugging | Cloud Dataflow | Google Cloud INFO: To access the Dataflow monitoring console, please navigate to https://console.developers.google.com/project/google.com%3Aclouddfe/dataflow/job/2 Submitted job: 2016-03-08_18_59_25-16868399470801620798 ... ... Starting 3 workers... ... Executing operation BigQuery-Read+AnonymousParDo+BigQuery-Write ... Executing BigQuery import job "dataflow_job_16868399470801619475". ... Stopping worker pool... ... Workflow failed. Causes: ...BigQuery-Read+AnonymousParDo+BigQuery-Write failed. Causes: ... BigQuery getting table "non_existent_table" from dataset "cws_demo" in p Message: Not found: Table x:cws_demo.non_existent_table HTTP Code: 404 ... Worker pool stopped. ... com.google.cloud.dataflow.sdk.runners.BlockingDataflowPipelineRunner run INFO: Job finished with status FAILED Exception in thread "main" com.google.cloud.dataflow.sdk.runners.DataflowJobExecutio Job 2016-03-08_18_59_25-16868399470801620798 failed with status FAILED at com.google.cloud.dataflow.sdk.runners.DataflowRunner.run(DataflowRunner.java: at com.google.cloud.dataflow.sdk.runners.DataflowRunner.run(DataflowRunner.java: at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:180) at com.google.cloud.dataflow.integration.BigQueryCopyTableExample.main(BigQueryC Detecting an exception in worker code While your job is running, you may encounter errors or exceptions in your worker code. These errors generally mean that the DoFns in your pipeline code have generated unhandled exceptions, which result in failed tasks in your Dataow job.

Load more