git steps checkout download version . The Hive resides in the (SVN) repository. (read-only) (read-only) (read-write) Github Mirror Instructions: Anonymous SVNCommitter SVN. Copyright © 2011-2014 The Apache Software Foundation Licensed under the , Version 2.0. Apache Hive, Hive, Apache, the Apache feather logo, and the Apache Hive project logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners. Downloads. On the mirror, all recent releases are available, but are not guaranteed to be stable. For stable releases, look in the stable directory. 17 January 2021: release 2.3.8 available¶ This release works with Hadoop 2.x.y You can look at the complete change log for this release. 18 April 2020: release 2.3.7 available¶ This release works with Hadoop 2.x.y You can look at the complete JIRA change log for this release. 26 August 2019: release 3.1.2 available¶ This release works with Hadoop 3.x.y. You can look at the complete JIRA change log for this release. 23 August 2019: release 2.3.6 available¶ This release works with Hadoop 2.x.y. You can look at the complete JIRA change log for this release. 14 May 2019: release 2.3.5 available¶ This release works with Hadoop 2.x.y. You can look at the complete JIRA change log for this release. 7 November 2018: release 2.3.4 available¶ This release works with Hadoop 2.x.y. You can look at the complete JIRA change log for this release. 1 November 2018: release 3.1.1 available¶ This release works with Hadoop 3.x.y. You can look at the complete JIRA change log for this release. 30 July 2018: release 3.1.0 available¶ This release works with Hadoop 3.x.y. You can look at the complete JIRA change log for this release. 21 May 2018 : release 3.0.0 available¶ This release works with Hadoop 3.x.y. The on-disk layout of Acid tables has changed with this release. Any Acid table partition that had Update/Delete/ statement executed since the last Major compaction must execute Major compaction before upgrading to 3.0. No more Update/Delete/Merge may be executed against these tables since the start of Major compaction. Not following this may lead to data corruption. Tables/partitions that only contain results of Insert statements are fully compatible and don't need to be compacted. You can look at the complete JIRA change log for this release. 3 April 2018 : release 2.3.3 available¶ This release works with Hadoop 2.x.y You can look at the complete JIRA change log for this release. 18 November 2017 : release 2.3.2 available¶ This release works with Hadoop 2.x.y You can look at the complete JIRA change log for this release. 24 October 2017 : release 2.3.1 available¶ This release works with Hadoop 2.x.y You can look at the complete JIRA change log for this release. 25 July 2017 : release 2.2.0 available¶ This release works with Hadoop 2.x.y You can look at the complete JIRA change log for this release. 17 July 2017 : release 2.3.0 available¶ This release works with Hadoop 2.x.y You can look at the complete JIRA change log for this release. 07 April 2017 : release 1.2.2 available¶ This release works with Hadoop 1.x.y, 2.x.y You can look at the complete JIRA change log for this release. 8 December 2016 : release 2.1.1 available¶ This release works with Hadoop 2.x.y. Hive 1.x line will continue to be maintained with Hadoop 1.x.y support. You can look at the complete JIRA change log for this release. 20 June 2016 : release 2.1.0 available¶ This release works with Hadoop 2.x.y. Hive 1.x line will continue to be maintained with Hadoop 1.x.y support. You can look at the complete JIRA change log for this release. 25 May 2016 : release 2.0.1 available¶ This release works with Hadoop 2.x.y. Hive 1.x line will continue to be maintained with Hadoop 1.x.y support. You can look at the complete JIRA change log for this release. 15 February 2016 : release 2.0.0 available¶ This release works with Hadoop 2.x.y. Hive 1.x line will continue to be maintained with Hadoop 1.x.y support. You can look at the complete JIRA change log for this release. 28 Jan 2016 : hive-parent-auth-hook made available¶ This is a hook usable with hive to fix an authorization issue. Users of Hive 1.0.x,1.1.x and 1.2.x are encouraged to use this hook. More details can be found in the README inside the tar.gz file. 27 June 2015 : release 1.2.1 available¶ This release works with Hadoop 1.x.y, 2.x.y. 21 May 2015 : release 1.0.1, 1.1.1, and ldap-fix are available¶ These two releases works with Hadoop 1.x.y, 2.x.y. They are based on Hive 1.0.0 and 1.1.0 respectively, plus a fix for a LDAP vulnerability issue. Hive users for these two versions are encouraged to upgrade. Users of previous versions can download and use the ldap-fix. More details can be found in the README attached to the tar.gz file. You can look at the complete JIRA change log for release 1.0.1 and release 1.1.1. 18 May 2015 : release 1.2.0 available¶ This release works with Hadoop 1.x.y, 2.x.y. 8 March 2015: release 1.1.0 available¶ This release works with Hadoop 1.x.y, 2.x.y. 4 February 2015: release 1.0.0 available¶ This release works with Hadoop 1.x.y, 2.x.y. 12 November, 2014: release 0.14.0 available¶ This release works with Hadoop 1.x.y, 2.x.y. 6 June, 2014: release 0.13.1 available¶ This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y. 21 April, 2014: release 0.13.0 available¶ This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y. 15 October, 2013: release 0.12.0 available¶ This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y. 15 May, 2013: release 0.11.0 available¶ This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y. March, 2013: HCatalog merges into Hive¶ Old HCatalog releases may still be downloaded. 11 January, 2013: release 0.10.0 available¶ This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y. Copyright © 2011-2014 The Apache Software Foundation Licensed under the Apache License, Version 2.0. Apache Hive, Hive, Apache, the Apache feather logo, and the Apache Hive project logo are trademarks of The Apache Software Foundation. Other names appearing on the site may be trademarks of their respective owners. HowToRelease. Skip this section if this is NOT the first release in a series (i.e., release X.Y.0). Notify developers on the #hive IRC channel and dev@hive mailing lists that you are about to branch a release. Create a branch for the release series: Increment the value of the version property in all pom. files. For example, if the current value is 0.7.0-SNAPSHOT , the new value should be 0.8.0-SNAPSHOT . Please note that the SNAPSHOT suffix is required in order to indicate that this is an unreleased development branch. This can be accomplished with a single command using Maven's Versions plugin as follows: these changes to master with a comment "Preparing for X.Y+1.0 development". Updating Release Branch. These operations take place in the release branch. Check out the release branch with: Update the version property value in all pom.xml files. You should remove the SNAPSHOT suffix and set version equal to hive-X.Y.Z where Z is the point release number in this release series (0 for the first one, in which case this step is a no-op since you already did this above when creating the branch). Use Maven's Versions plugin to do this as follows: Commit these changes with a comment "Preparing for X.Y.Z development". If not already done, merge desired patches from into the branch and commit these changes. Avoid usage of " merge" to avoid too many merge commits. Either request the committer who committed that in master to commit to this branch, or commit it yourself, or try doing a git cherry-pick for trivial patches. Specifics of this step can be laid down by the release manager. Select all of the JIRAs for the current release that aren't FIXED and do bulk update to clear the 'Fixed Version' field. Likewise, use JIRA's Release Notes link to generate content for the RELEASE_NOTES.txt file. Be sure to select 'Text' format. (It's OK to do this with a direct commit rather than a patch.) Update the release notes in trunk with the release notes in branch. Tag the release candidate (R is the release candidate number, and also starts from 0): Building. sure your release notes have been updated for any new commits, and go through the previous steps if necessary. Build the release (binary and source versions) after running unit tests. Manually create the md5 files. On a Mac use md5 in place of md5sum. Verify that the MD5 checksums are valid: See http://www.apache.org/dev/release-signing.html, http://www.apache.org/dev/openpgp.html. Sign the release (see Step-By-Step Guide to Mirroring Releases for more information). Copy release files to a public place. Publish Maven artifacts to the Apache staging repository: Voting. Call a release vote on dev at hive.apache.org. Verifying the Release Candidate. Verifying the PGP signature: Publishing. Tag the release: Follow instructions in http://www.apache.org/dev/release-publishing.html#distribution to push the new release artifacts to http://www.apache.org/dist/ , making sure to create a new directory for the new release, and re-linking the stable link to the latest build. In your base hive source directory, generate javadocs as follows: After you run this, you should have javadocs present in your /target/site/apidocs. Check out the javadocs svn repository as follows: Copy the generated javadocs from the source repository to the javadocs repository, add and commit: If this is a bugfix release, svn rm the obsoleted version. (For eg., when committing javadocs for r0.13.1, r0.13.0 would have been removed) Prepare to edit the website. Edit files content/downloads.mdtext and javadoc.mdtext to appropriately add entries for the new release in the appropriate location. For example, for 1.2.0, the entries made were as follows: As you can see, you will need a release note link for this release as created previously for this section. Go to https://hive.staging.apache.org/hive/ , get the hive working copy, force a new copy if you don't see your changes yet, and then ask it to publish the new copy. That should update the downloads and javadocs page as per the changes you made to downloads.mdtext, javadoc.mdtext earlier, and also make available the new javadocs. If you don't see it yet, you may have to wait 24 hours - it should be there by then. Ensure that only issues in the "Fixed" state have a "Fix Version" set to release X.Y.Z. Release the version. Visit the "Administer Project" page, then the "Manage versions" page. You need to have the "Admin" role in Hive's Jira for this step and the next. Close issues resolved in the release. Disable mail notifications for this bulk change. Send a release announcement to Hive user and dev lists as well as the Apache announce list. This email should be sent from your Apache email address: Rebuilding HDP Hive: patch, test and build. The HDP distribution will soon be deprecated in favor of Cloudera’s CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunity to try to manually rebuild HDP 2.6.0’s Hive version. This article will go through the process of backporting a list of patches provided by Hortonworks on top of the open source Hive project in the most automated way. Re-building HDP 2.6.0’s Hive. Our first step was to see if we were able to build Apache Hive as it is in HDP 2.6.0. For each HDP release, Hortonworks (now Cloudera) describes the list of patches applied on top of a given component’s release in the release notes. We will retrieve here the list of patches applied on top of Apache Hive 2.1.0 for HDP 2.6.0 and put them in a text file: We know there are 120 of them: Now comes the question how to apply those patches? Applying the patches in the order provided by Hortonworks. The first step is to clone the Apache Hive repository and checkout the version HDP 2.6.0 embeds which is the 2.1.0 : Our first guess was that the patches were listed on HDP’s release notes in the chronological order of merging in Hive’s master branch so we tried to get the patches and apply them in that order. For each patch in the list, we need to fetch the patch file issued in the associated JIRA and run the git apply command against it. It can be tedious for a large amounts of patches so we will see later in this article how to automate it. Let’s start small by trying to apply the first patch of our list: HIVE-9941 . We can see in the associated JIRA issue that there are multiple attachments: We will pick the most recent one and apply it: The git apply command gave us a few warnings about some white spaces and blank lines but the application of the patch worked. The success is confirmed with git status : In this case, the patch only created new files and did not perform any modifications but it worked regardless. Moving on to the second patch of the list, HIVE-12492 : This time, we encountered multiple No such file or directory errors. Tracking the cause of the error is kind of tricky. Let me show why it is so and why it make the all patching process hard to automate. Once we download and open the “HIVE-12492.02.patch” file, we see at the beginining that the “No such file or directory” error is applying to the file common/src/java/org/apache/hadoop/hive/conf/HiveConf.java : But the error is telling that the src/java/org/apache/hadoop/hive/conf/HiveConf.java does not exist instead of common/src/java/org/apache/hadoop/hive/conf/HiveConf.java . Notice how the second path prefixes the first one with common . This is because git apply has a property -p (described here) which removes the leading part components of file paths. By default it is set to 1 and not 0 , removing the first directory, common in our case. We did not encounter this problem with the previous patch because the made use of prefixing in the file paths, for example: Let’s try again with -p0 : This is better but still the patch won’t apply. This time we have an error patch does not apply which is clearly indicating us the patch can not be applied. Looking back at the JIRA list given by Hortonworks, we realized that they actually were given in an alphanumerical order. If we take the example of these two issues: HIVE-14405 and HIVE-14432 . The first one has been created before the second one but the patch from the later has been release before the former. If these two patches were to modify the same file, it could lead to the error we just encountered. Applying the patches in the alphanumerical order seems like a bad choice in this case. We will now order the patches in chronological order according to the date when they were committed/applied in the master branch. Applying the JIRA patches in the commit order. Let’s try to iterate through every patch to see how many patches we can apply by going with this strategy. Getting and applying every patches file automatically from every JIRA’s attachments can be tedious: There are 120 of them We have just seen some patches are applied at -p0 level while others are at -p1 Some JIRA indicated by Hortonworks do not have any attachments (ex: HIVE-16238) Some JIRA have attachments that are not patches (ex: HIVE-16323 has a screenshot attached next to the patches) Some JIRA have patches in .txt format instead of .patch (ex: HIVE-15991) Here is an excerpt (in CoffeeScript) from the script we made to get the latest version of each patch from JIRA: This method generates a patch_manifest.yaml file that looks like: Creating this manifest is going to help us applying the patches in right order. Once we have built this manifest, we iterate through the list, download and try to apply as much patches as we can. Here is another code excerpt to do that: This did not go as good as expected. Only 19 out of the 120 patches in the list applied successfully. All failures were due to patch does not apply errors which could be because: We are still not applying the fixes in the right order We are missing some code pre-requisites (ie: the JIRA list is incomplete) Some code need to be modified to be backported All of the above. In the next part, we will use another strategy by cherry-picking the commits related to the HIVE-XXXXX JIRAs we want to have in our build. Applying the patch by pushed-to-master chronological date. Every commit message in Apache Hive has the related JIRA issue in its description: In this attempt, we are going to get the list of all the commits pushed to the Apache Hive master branch in order to generate the list of the commits containing the patches we want to apply in a chronological order. The git log command with the right parameters generates a CSV file with the format: timestamp;commit_hash;commit_message . This next script will match only the commits we want to apply by matching the JIRA ids provided by Hortonworks. Then we order the CSV by date from oldest to newest: Now that we have an ordered list of commits we want to apply we can go back to the 2.1.0 release tag and cherry-pick these commits. Cherry-picking seems like a better solution than patch applying in our situation. Indeed, we have just seen that we could not apply some patches because of missing code dependency. Let’s try with the first one, the JIRA is HIVE-14214 and the commit hash is b28ec7fdd8317b47973c6c8f7cdfe805dc20a806 : We removed some of the outputs but the takeaway here is that we are having some conflicts. git cherry-pick is trying to git merge a commit from the official Hive repository into our current local branch. The benefit of cherry-picking is that we are comparing two file instead of just applying a diff, leaving us the choice on how to merge the content. To do that, we need to add the --strategy-option theirs parameter to the cherry-pick command. This will solve most conflicts but we can still encounter conflicts like: The conflict resolution must be done manually with git add or git rm depending on the situation. Using the cherry-picking method and the manual resolutions for some patches, we were able to import all 120 JIRA patches in our code branch. Important note : We tried to automate it as much as possible but keep in mind that running the unit tests after patching is extremely recommended before building and applying the fix to any production cluster. We will see that in the next part. With the script above, we are able to apply 100% of the patches. In the next part, we will go over the steps to build and test the Hive distribution. Before running the unit tests, let’s try to see if we can actually build the Hive release: After a few minutes, the build stops with the following error: Looking at the file ./storage-api/src/test/org/apache/hadoop/hive/ql/exec/vector/TestStructColumnVector.java , the class java..Timestamp , is not imported indeed. This file was modified by cherry-picking the HIVE-16245. This patch changes 3 files: ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java storage- api/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java storage- api/src/test/org/apache/hadoop/hive/ql/exec/vector/TestStructColumnVector.java. How did Hortonworks apply this specific patch without breaking the build? In the next part, we will dig into the Hortonworks’ public Hive repository to find clues on how they applied the patches listed for HDP 2.6.0.0. Exploring Hortonworks’ hive2-release project. On GitHub, Hortonworks keeps a public repository for every component in the HDP stack. The one we will want to check is [hive2-release] (https://github.com/hortonworks/hive2-release) : The repository seems empty at first but the files can actually be found by checking out the tags. There is one tag per HDP release. Let’s try to find out how Hortonworks applied the patch which provoked the build error we have seen earlier. Now, let’s see which files were modified by this commit: As opposed to what’s in the patch (and been pushed to Apache Hive’s master branch), the backport operated by Hortonworks only changes 2 out of 3 files. There seems to be some black magic involved or the process is manual and not fully documented. The files that provoked the build error we encountered earlier is not changed here. Backporting a list of patches to an open source project like Apache Hive is hard to automate. With the HDP distribution and depending on the patches, Hortonworks has applied some manual changes to the source code or its engineers were working on a forked version of Hive. A good understanding of the global project structure (modules, unit tests, etc.) and its ecosystem (dependencies) is required in order to build a functional and tested release. Canada - Morocco - France. 10 rue de la Kasbah 2393 Rabbat Canada. We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science… We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market. If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you. GettingStarted. You can install a stable release of Hive by downloading a tarball, or you can download the source code and build Hive from that. Running HiveServer2 and Beeline. Requirements. Java 1.7 Note: Hive versions 1.2 onward require Java 1.7 or newer. Hive versions 0.14 to 1.1 work with Java 1.6 as well. Users are strongly advised to start moving to Java 1.8 (see HIVE-8607). Hadoop 2.x (preferred), 1.x (not supported by Hive 2.0.0 onward). Hive versions up to 0.13 also supported Hadoop 0.20.x, 0.23.x. Hive is commonly used in production Linux and Windows environment. Mac is a commonly used development environment. The instructions in this document are applicable to Linux and Mac. Using it on Windows would require slightly different steps. Installing Hive from a Stable Release. Start by downloading the most recent stable release of Hive from one of the Apache download mirrors (see Hive Releases). Next you need to unpack the tarball. This will result in the creation of a subdirectory named hive-x.y.z (where x.y.z is the release number): Set the environment variable HIVE_HOME to point to the installation directory: Finally, add $HIVE_HOME/bin to your PATH : Building Hive from Source. The Hive GIT repository for the most recent Hive code is located here: git clone https://git-wip-us.apache.org/repos/asf/hive.git (the master branch). All release versions are in branches named "branch-0.#" or "branch-1.#" or the upcoming "branch-2.#", with the exception of release 0.8.1 which is in "branch-0.8-r2". Any branches with other names are feature branches for works-in-progress. See Understanding Hive Branches for details. As of 0.13, Hive is built using . Compile Hive on master. To build the current Hive code from the master branch: Here, refers to the current Hive version. If building Hive source using Maven (mvn), we will refer to the directory "/packaging/target/apache-hive- -SNAPSHOT-bin/apache-hive- - SNAPSHOT-bin" as for the rest of the page. Compile Hive on branch-1. In branch-1, Hive supports both Hadoop 1.x and 2.x. You will need to specify which version of Hadoop to build against via a Maven profile. To build against Hadoop 1.x use the profile hadoop-1 ; for Hadoop 2.x use hadoop-2 . For example to build against Hadoop 1.x, the above mvn command becomes: Compile Hive Prior to 0.13 on Hadoop 0.20. Prior to Hive 0.13, Hive was built using . To build an older version of Hive on Hadoop 0.20: If using Ant, we will refer to the directory " build/dist " as . Compile Hive Prior to 0.13 on Hadoop 0.23. To build Hive in Ant against Hadoop 0.23, 2.0.0, or other version, build with the appropriate flag; some examples below: Running Hive. Hive uses Hadoop, so: you must have Hadoop in your path OR export HADOOP_HOME= In addition, you must use below HDFS commands to create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir ) and set them chmod g+w before you can create a table in Hive. You may find it useful, though it's not necessary, to set HIVE_HOME : Running Hive CLI. To use the Hive command line interface (CLI) from the shell: Running HiveServer2 and Beeline. Starting from Hive 2.1, we need to run the schematool command below as an initialization step. For example, we can use "derby" as db type. HiveServer2 (introduced in Hive 0.11) has its own CLI called Beeline. HiveCLI is now deprecated in favor of Beeline, as it lacks the multi-user, security, and other capabilities of HiveServer2. To run HiveServer2 and Beeline from shell: Beeline is started with the JDBC URL of the HiveServer2, which depends on the address and port where HiveServer2 was started. By default, it will be (localhost:10000), so the address will look like jdbc:hive2://localhost:10000. Or to start Beeline and HiveServer2 in the same process for testing purpose, for a similar user experience to HiveCLI: Running HCatalog. To run the HCatalog server from the shell in Hive release 0.11.0 and later: To use the HCatalog command line interface (CLI) in Hive release 0.11.0 and later: Running WebHCat (Templeton) To run the WebHCat server from the shell in Hive release 0.11.0 and later: Configuration Management Overview. Hive by default gets its configuration from /conf/hive-default.xml The location of the Hive configuration directory can be changed by setting the HIVE_CONF_DIR environment variable. Configuration variables can be changed by (re-)defining them in /conf/hive- site.xml configuration is stored in /conf/hive-log4j.properties Hive configuration is an overlay on top of Hadoop – it inherits the Hadoop configuration variables by default. Hive configuration can be manipulated by: Editing hive-site.xml and defining any desired variables (including Hadoop variables) in it Using the set command (see next section) Invoking Hive (deprecated), Beeline or HiveServer2 using the syntax: $ bin/hive --hiveconf x1=y1 --hiveconf x2=y2 //this sets the variables x1 and x2 to y1 and y2 respectively $ bin/hiveserver2 --hiveconf x1=y1 -- hiveconf x2=y2 //this sets server-side variables x1 and x2 to y1 and y2 respectively $ bin/beeline --hiveconf x1=y1 --hiveconf x2=y2 //this sets client-side variables x1 and x2 to y1 and y2 respectively. Runtime Configuration. Hive queries are executed using map-reduce queries and, therefore, the behavior of such queries can be controlled by the Hadoop configuration variables. The HiveCLI (deprecated) and Beeline command 'SET' can be used to set any Hadoop (or Hive) configuration variable. For example: The latter shows all the current settings. Without the -v option only the variables that differ from the base Hadoop configuration are displayed. Hive, Map-Reduce and Local-Mode. Hive generates map-reduce jobs for most queries. These jobs are then submitted to the Map-Reduce cluster indicated by the variable: While this usually points to a map-reduce cluster with multiple nodes, Hadoop also offers a nifty option to run map-reduce jobs locally on the user's workstation. This can be very useful to run queries over small data sets – in such cases local mode execution is usually significantly faster than submitting jobs to a large cluster. Data is accessed transparently from HDFS. Conversely, local mode only runs with one reducer and can be very slow processing larger data sets. Starting with release 0.7, Hive fully supports local mode execution. To enable this, the user can enable the following option: In addition, mapred.local.dir should point to a path that's valid on the local machine (for example /tmp//mapred/local ). (Otherwise, the user will get an exception allocating local disk space.) Starting with release 0.7, Hive also supports a mode to run map-reduce jobs in local-mode automatically. The relevant options are hive.exec.mode.local.auto , hive.exec.mode.local.auto.inputbytes.max , and hive.exec.mode.local.auto.tasks.max : Note that this feature is disabled by default. If enabled, Hive analyzes the size of each map-reduce job in a query and may run it locally if the following thresholds are satisfied: The total input size of the job is lower than: hive.exec.mode.local.auto.inputbytes.max (128MB by default) The total number of map-tasks is less than: hive.exec.mode.local.auto.tasks.max (4 by default) The total number of reduce tasks required is 1 or 0. So for queries over small data sets, or for queries with multiple map-reduce jobs where the input to subsequent jobs is substantially smaller (because of reduction/filtering in the prior job), jobs may be run locally. Note that there may be differences in the runtime environment of Hadoop server nodes and the machine running the Hive client (because of different jvm versions or different software libraries). This can cause unexpected behavior/errors while running in local mode. Also note that local mode execution is done in a separate, child jvm (of the Hive client). If the user so wishes, the maximum amount of memory for this child jvm can be controlled via the option hive.mapred.local.mem . By default, it's set to zero, in which case Hive lets Hadoop determine the default memory limits of the child jvm. Hive Logging. Hive uses log4j for logging. By default logs are not emitted to the console by the CLI. The default logging level is WARN for Hive releases prior to 0.13.0. Starting with Hive 0.13.0, the default logging level is INFO . The logs are stored in the directory /tmp/< user.name > : /tmp/< user.name >/hive.log Note: In local mode, prior to Hive 0.13.0 the log file name was " .log " instead of " hive.log ". This bug was fixed in release 0.13.0 (see HIVE-5528 and HIVE-5676). To configure a different log location, set hive.log.dir in $HIVE_HOME/conf/hive-log4j.properties. Make sure the directory has the sticky bit set ( chmod 1777 < dir > ). hive.log.dir= If the user wishes, the logs can be emitted to the console by adding the arguments shown below: bin/hive --hiveconf hive.root.logger=INFO,console //for HiveCLI (deprecated) bin/hiveserver2 --hiveconf hive.root.logger=INFO,console. Alternatively, the user can change the logging level only by using: bin/hive --hiveconf hive.root.logger=INFO,DRFA //for HiveCLI (deprecated) bin/hiveserver2 --hiveconf hive.root.logger=INFO,DRFA. Another option for logging is TimeBasedRollingPolicy (applicable for Hive 1.1.0 and above, HIVE-9001) by providing DAILY option as shown below: bin/hive --hiveconf hive.root.logger=INFO,DAILY //for HiveCLI (deprecated) bin/hiveserver2 --hiveconf hive.root.logger=INFO,DAILY. Note that setting hive.root.logger via the 'set' command does not change logging properties since they are determined at initialization time. Hive also stores query logs on a per Hive session basis in /tmp// , but can be configured in hive-site.xml with the hive.querylog.location property. Starting with Hive 1.1.0, EXPLAIN EXTENDED output for queries can be logged at the INFO level by setting the hive.log.explain.output property to true. Logging during Hive execution on a Hadoop cluster is controlled by Hadoop configuration. Usually Hadoop will produce one log file per map and reduce task stored on the cluster machine(s) where the task was executed. The log files can be obtained by clicking through to the Task Details page from the Hadoop JobTracker Web UI. When using local mode (using .framework.name=local ), Hadoop/Hive execution logs are produced on the client machine itself. Starting with release 0.6 – Hive uses the hive-exec-log4j.properties (falling back to hive-log4j.properties only if it's missing) to determine where these logs are delivered by default. The default configuration file produces one log file per query executed in local mode and stores it under /tmp/ . The intent of providing a separate configuration file is to enable administrators to centralize execution log capture if desired (on a NFS file server for example). Execution logs are invaluable for debugging run-time errors. For information about WebHCat errors and logging, see Error Codes and Responses and Log Files in the WebHCat manual. Error logs are very useful to debug problems. Please send them with any bugs (of which there are many!) to [email protected] . From Hive 2.1.0 onwards (with HIVE-13027), Hive uses Log4j2's asynchronous logger by default. Setting hive.async.log.enabled to false will disable asynchronous logging and fallback to synchronous logging. Asynchronous logging can give significant performance improvement as logging will be handled in a separate thread that uses the LMAX disruptor queue for buffering log messages. Refer to https://logging.apache.org/log4j/2.x/manual/async.html for benefits and drawbacks. HiveServer2 Logs. HiveServer2 operation logs are available to clients starting in Hive 0.14. See HiveServer2 Logging for configuration. Audit Logs. Audit logs are logged from the Hive metastore server for every metastore API invocation. An audit log has the function and some of the relevant function arguments logged in the metastore log file. It is logged at the INFO level of log4j, so you need to make sure that the logging at the INFO level is enabled (see HIVE-3505 ). The name of the log entry is "HiveMetaStore.audit". Audit logs were added in Hive 0.7 for secure client connections ( HIVE-1948 ) and in Hive 0.10 for non-secure connections ( HIVE-3277 ; also see HIVE-2797 ). Perf Logger. In order to obtain the performance metrics via the PerfLogger, you need to set DEBUG level logging for the PerfLogger class ( HIVE-12675 ). This can be achieved by setting the following in the log4j properties file. If the logger level has already been set to DEBUG at root via hive.root.logger, the above setting is not required to see the performance logs. DDL Operations. The Hive DDL operations are documented in Hive Data Definition Language . Creating Hive Tables. creates a table called pokes with two columns, the first being an integer and the other a string. creates a table called invites with two columns and a partition column called ds. The partition column is a virtual column. It is not part of the data itself but is derived from the partition that a particular dataset is loaded into. By default, tables are assumed to be of text input format and the delimiters are assumed to be ^A(ctrl-a). Browsing through Tables. lists all the tables. lists all the table that end with 's'. The pattern matching follows Java regular expressions. Check out this link for documentation http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html. shows the list of columns. Altering and Dropping Tables. Table names can be changed and columns can be added or replaced: Note that REPLACE COLUMNS replaces all existing columns and only changes the table's schema, not the data. The table must use a native SerDe. REPLACE COLUMNS can also be used to drop columns from the table's schema: Metadata Store. Metadata is in an embedded Derby database whose disk storage location is determined by the Hive configuration variable named javax.jdo.option.ConnectionURL . By default this location is ./metastore_db (see conf/hive-default.xml ). Right now, in the default configuration, this metadata can only be seen by one user at a time. Metastore can be stored in any database that is supported by JPOX. The location and the type of the RDBMS can be controlled by the two variables javax.jdo.option.ConnectionURL and javax.jdo.option.ConnectionDriverName . Refer to JDO (or JPOX) documentation for more details on supported databases. The database schema is defined in JDO metadata file package.jdo at src/contrib/hive/metastore/src/model . In the future, the metastore itself can be a standalone server. If you want to run the metastore as a network server so it can be accessed from multiple nodes, see Hive Using Derby in Server Mode. DML Operations. The Hive DML operations are documented in Hive Data Manipulation Language. Loading data from flat files into Hive: Loads a file that contains two columns separated by ctrl-a into pokes table. 'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS. The keyword 'OVERWRITE' signifies that existing data in the table is deleted. If the 'OVERWRITE' keyword is omitted, data files are appended to existing data sets. NO verification of data against the schema is performed by the load command. If the file is in hdfs, it is moved into the Hive-controlled file system namespace. The root of the Hive directory is specified by the option hive.metastore.warehouse.dir in hive-default.xml . We advise users to create this directory before trying to create tables via Hive. The two LOAD statements above load data into two different partitions of the table invites. Table invites must be created as partitioned by the key ds for this to succeed. The above command will load data from an HDFS file/directory to the table. Note that loading data from HDFS will result in moving the file/directory. As a result, the operation is almost instantaneous. SQL Operations. The Hive query operations are documented in Select. Example Queries. Some example queries are shown below. They are available in build/dist/examples/queries . More are available in the Hive sources at ql/src/test/queries/positive . SELECTS and FILTERS. selects column 'foo' from all rows of partition ds=2008-08-15 of the invites table. The results are not stored anywhere, but are displayed on the console. Note that in all the examples that follow, INSERT (into a Hive table, local directory or HDFS directory) is optional. selects all rows from partition ds=2008-08-15 of the invites table into an HDFS directory. The result data is in files (depending on the number of mappers) in that directory. NOTE: partition columns if any are selected by the use of *. They can also be specified in the projection clauses. Partitioned tables must always have a partition selected in the WHERE clause of the statement. selects all rows from pokes table into a local directory. selects the sum of a column. The avg, min, or max can also be used. Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*) . GROUP BY. Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*) .