Elastic Mapreduce Operation Guide
Total Page:16
File Type:pdf, Size:1020Kb
Elastic MapReduce Elastic MapReduce Operation Guide Product Documentation ©2013-2019 Tencent Cloud. All rights reserved. Page 1 of 228 Elastic MapReduce Copyright Notice ©2013-2019 Tencent Cloud. All rights reserved. Copyright in this document is exclusively owned by Tencent Cloud. You must not reproduce, modify, copy or distribute in any way, in whole or in part, the contents of this document without Tencent Cloud's the prior written consent. Trademark Notice All trademarks associated with Tencent Cloud and its services are owned by Tencent Cloud Computing (Beijing) Company Limited and its affiliated companies. Trademarks of third parties referred to in this document are owned by their respective proprietors. Service Statement This document is intended to provide users with general information about Tencent Cloud's products and services only and does not form part of Tencent Cloud's terms and conditions. Tencent Cloud's products or services are subject to change. Specific products and services and the standards applicable to them are exclusively provided for in Tencent Cloud's applicable terms and conditions. ©2013-2019 Tencent Cloud. All rights reserved. Page 2 of 228 Elastic MapReduce Contents Operation Guide Configure Cluster Software Configuration Mounting CHDFS Exporting Software Configuration Unified Management of Hive Metadata Setting EMR Security Groups Managing Cluster Setting Tag Instance Information Custom Service Roles Bootstrap Actions Checking and Updating Public IP Operation Logs Cluster Scripts Cluster Termination Task Center Viewing Bills Managing Service Adding Components Restarting Services Resetting Native UI Password Software WebUI Entry Operation Guide for Access to WebUI over Private Network Service Monitoring Role Management Configuration Management Configuration Rollback Managing Configuration Groups YARN Job Query Impala Query Management Managing Resources Node Status Node Specification Management Adjusting Configuration ©2013-2019 Tencent Cloud. All rights reserved. Page 3 of 228 Elastic MapReduce Scaling Clusters Cluster Scale-in Graceful Scale-In Auto Scaling Starting/Stopping Services Monitoring and Alarming Cluster Overview Log Search Configuring Alarms Cluster Event Cluster Inspection Monitoring Metrics Node Monitoring Metrics HDFS Monitoring Metrics YARN Monitoring Metrics ZooKeeper Monitoring Metrics Hive Monitoring Metrics HBase Monitoring Metrics Spark Monitoring Metrics Presto Monitoring Metrics ClickHouse Monitoring Metrics Druid Monitoring Metrics Kudu Monitoring Metrics Alluxio Monitoring Metrics PrestoSQL Monitoring Metrics Impala Monitoring Metrics ©2013-2019 Tencent Cloud. All rights reserved. Page 4 of 228 Elastic MapReduce Operation Guide Configure Cluster Software Configuration Last updated:2020-04-27 22:27:55 Feature Software configuration enables you to customize configurations of components such as HDFS, YARN, and Hive when creating a cluster. Custom Software Configuration Software programs such as Hadoop and Hive have many configuration items. By using the software configuration feature, you can customize component parameters when creating a cluster. During the configuration, you need to provide the corresponding JSON files as required. You can customize the files or generate them by exporting software configuration parameters of an existing cluster for quick cluster creation. For more information on how to export software configuration parameters, please see Exporting Software Configuration. Currently, only parameters in the following files can be customized: HDFS: core-site.xml, hdfs-site.xml, hadoop-env.sh, log4j.properties YARN: yarn-site.xml, mapred-site.xml, fair-scheduler.xml, capacity-scheduler.xml, yarn-env.sh, mapred-env.sh Hive: hive-site.xml, hive-env.sh, hive-log4j2.properties Sample JSON file and description: [ { "serviceName": "HDFS", "classification": "hdfs-site.xml", "serviceVersion": "2.8.4", "properties": { "dfs.blocksize": "67108864", ©2013-2019 Tencent Cloud. All rights reserved. Page 5 of 228 Elastic MapReduce "dfs.client.slow.io.warning.threshold.ms": "900000", "output.replace-datanode-on-failure": "false" } }, { "serviceName": "YARN", "classification": "yarn-site.xml", "serviceVersion": "2.8.4", "properties": { "yarn.app.mapreduce.am.staging-dir": "/emr/hadoop-yarn/staging", "yarn.log-aggregation.retain-check-interval-seconds": "604800", "yarn.scheduler.minimum-allocation-vcores": "1" } }, { "serviceName": "YARN", "classification": "capacity-scheduler.xml", "serviceVersion": "2.8.4", "properties": { "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<?xml-stylesheet type=\"text/xsl\" href= \"configuration.xsl\"?>\n<configuration><property>\n <name>yarn.scheduler.capacity.maximum-am-res ource-percent</name>\n <value>0.8</value>\n</property>\n<property>\n <name>yarn.scheduler.capacit y.maximum-applications</name>\n <value>1000</value>\n</property>\n<property>\n <name>yarn.schedul er.capacity.root.default.capacity</name>\n <value>100</value>\n</property>\n<property>\n <name>ya rn.scheduler.capacity.root.default.maximum-capacity</name>\n <value>100</value>\n</property>\n<pr operty>\n <name>yarn.scheduler.capacity.root.default.user-limit-factor</name>\n <value>1</value> \n</property>\n<property>\n <name>yarn.scheduler.capacity.root.queues</name>\n <value>default</va lue>\n</property>\n</configuration>" } } ] Configuration parameter descriptions: serviceName: component name, which must be in uppercase. classification: filename, which must be a full name with file extension. serviceVersion: component version, which must be the same as the corresponding component version in the EMR product version. properties: parameters that need to be customized. If you want to modify configuration parameters in capacity-scheduler.xml or fair-scheduler.xml , set key in properties to content , and set value to the content of the entire file. If you want to adjust the component configuration of an existing cluster, you can configure component parameters. ©2013-2019 Tencent Cloud. All rights reserved. Page 6 of 228 Elastic MapReduce Accessing External Clusters After configuring the HDFS access address information of an external cluster, you can read data in it. Configuration during purchase EMR allows you to configure access to external clusters when you create an EMR cluster. On the purchase page, you only need to import a JSON file that meets the requirements in the software configuration section. Below is an example based on assumption: Assumption Assume that the nameservice of the external cluster to be accessed is HDFS8088 , and the access method is as follows: <property> <name>dfs.ha.namenodes.HDFS8088</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.http-address.HDFS8088.nn1</name> <value>172.21.16.11:4008</value> </property> <property> <name>dfs.namenode.https-address.HDFS8088.nn1</name> <value>172.21.16.11:4009</value> </property> <name>dfs.namenode.rpc-address.HDFS8088.nn1</name> <value>172.21.16.11:4007</value> <property> <name>dfs.namenode.http-address.HDFS8088.nn2</name> <value>172.21.16.40:4008</value> </property> <property> <name>dfs.namenode.https-address.HDFS8088.nn2</name> <value>172.21.16.40:4009</value> </property> <property> <name>dfs.namenode.rpc-address.HDFS8088.nn2</name> <value>172.21.16.40:4007</value> <property> JSON file and description: Taking the assumption as an example, import the JSON file (the requirements for its content are the same as those for custom software configuration) in the box. ©2013-2019 Tencent Cloud. All rights reserved. Page 7 of 228 Elastic MapReduce [ { "serviceName": "HDFS", "classification": "hdfs-site.xml", "serviceVersion": "2.7.3", "properties": { "newNameServiceName": "newEmrCluster", "dfs.ha.namenodes.HDFS8088": "nn1,nn2", "dfs.namenode.http-address.HDFS8088.nn1": "172.21.16.11:4008", "dfs.namenode.https-address.HDFS8088.nn1": "172.21.16.11:4009", "dfs.namenode.rpc-address.HDFS8088.nn1": "172.21.16.11:4007", "dfs.namenode.http-address.HDFS8088.nn2": "172.21.16.40:4008", "dfs.namenode.https-address.HDFS8088.nn2": "172.21.16.40:4009", "dfs.namenode.rpc-address.HDFS8088.nn2": "172.21.16.40:4007" } } ] Configuration parameter description serviceName: component name, which must be "HDFS". classification: filename, which must be "hdfs-site.xml". serviceVersion: component version, which must be the same as the corresponding component version in the EMR product version. properties: the content must be the same as that in the assumption. newNameServiceName: it indicates the nameservice of the newly created cluster, which is optional. If this parameter is left empty, its value will be generated by the system; if it is not empty, its value must consist of a string, digits, and hyphen. Access to external clusters is supported only for high-availability clusters. Access to external clusters is supported only for clusters with Kerberos disabled. Configuration after purchase EMR allows you to use the configuration distribution feature to access external clusters after creating an EMR cluster. Below is the assumption: Assume that the nameservice of the cluster is HDFS80238 (if it is not a high-availability cluster, the nameservice will usually be masterIp:rpcport , such as 172.21.0.11:4007). ©2013-2019 Tencent Cloud. All rights reserved. Page 8 of 228 Elastic MapReduce The nameservice of the external cluster to be accessed is HDFS8088 , and the access method is as follows: <property> <name>dfs.ha.namenodes.HDFS8088</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.http-address.HDFS8088.nn1</name> <value>172.21.16.11:4008</value>