KH Coder 3 Reference Manual
Total Page:16
File Type:pdf, Size:1020Kb
KH Coder 3 Reference Manual Koichi HIGUCHI*1 March 16, 2016 *1 Ritsumeikan University <[email protected]> i Contents A KH Coder Reference Manual 1 A.1 Setup ............................................... 1 A.1.1 Requirements ....................................... 1 A.1.2 Installation and start-up ................................. 2 A.1.3 Settings .......................................... 3 A.2 Basic issues ........................................... 5 A.2.1 Data preparation ..................................... 5 A.2.2 Extraction of words ................................... 8 A.2.3 Configuration of ChaSen and MeCab .......................... 9 A.2.4 Data in English and other non{Japanese Languages ................. 11 A.2.5 Coding rules ....................................... 13 A.2.6 Window operations .................................... 20 A.3 [Project] menu .......................................... 22 A.3.1 Create a [New] Project .................................. 22 A.3.2 [Open] an existing project ................................ 23 A.3.3 [Import and Export] projects .............................. 24 A.3.4 [Settings] ......................................... 24 A.4 [Pre{Processing] menu ..................................... 24 A.4.1 [Check the Target File] ................................. 24 A.4.2 [Run Pre-Processing] ................................... 26 A.4.3 [Select Words to Analyze] ................................ 26 A.4.4 [Word Clusters] > [use TermExtract] .......................... 29 A.4.5 [Word Clusters] > [use ChaSen] ............................. 30 A.4.6 [Check the Result of Word Extraction] ......................... 30 A.5 [Tools] > [Words] menu ..................................... 31 A.5.1 [Frequency List] of words ................................ 31 A.5.2 [Descriptive Stats] > [Term Frequency Distribution] ................. 32 A.5.3 [Descriptive Stats] > [Document Frequency Distribution] .............. 33 A.5.4 [Descriptive Stats] > [TF-DF Plot] ........................... 34 A.5.5 [Search Words] ...................................... 34 A.5.6 [KWIC Concordance] .................................. 36 A.5.7 [Word Association] .................................... 40 A.5.8 [Correspondence Analysis] of words ........................... 43 A.5.9 [Multi-Dimensional Scaling] of words .......................... 47 A.5.10 [Hierarchical Cluster Analysis] of words ........................ 49 A.5.11 [Co-Occurrence Network] of words ........................... 50 A.5.12 [Self-Organizing Map] of words ............................. 53 A.6 [Tools] > [Documents] menu .................................. 55 A.6.1 Search Documents .................................... 55 ii Contents A.6.2 [Cluster Analysis] of documents ............................. 59 A.6.3 [Naive Bayes Classifier] > [Build a Model from a Variable] .............. 61 A.6.4 [Naive Bayes Classifier] > [Classify Documents using a Model] ........... 64 A.6.5 [Naive Bayes Classifier] > [View a Model File] .................... 65 A.6.6 [Naive Bayes Classifier] > [View a Classification Log] ................. 66 A.6.7 [Export Document{Word Matrix] ............................ 67 A.6.8 [Export Word{Context Matrix] ............................. 71 A.7 [Tools] > [Coding] menu .................................... 72 A.7.1 [Frequency] of codes ................................... 72 A.7.2 [Crosstab] of codes .................................... 73 A.7.3 [Similarity Matrix] of codes ............................... 75 A.7.4 [Correspondence Analysis] of codes ........................... 76 A.7.5 [Multi-Dimensional Scaling] of codes .......................... 77 A.7.6 [Hierarchical Cluster Analysis] of codes ........................ 77 A.7.7 [Co-Occurrence Network] of codes ........................... 77 A.7.8 [Self-Organizing Map] of codes ............................. 77 A.7.9 [Export Document{Code Matrix] ............................ 78 A.8 [Tools] > [Variables and Headings] menu ........................... 78 A.8.1 [Import Variables] .................................... 78 A.8.2 [List] of variables ..................................... 79 A.9 [Tools] > [Convert the Target File] menu ........................... 81 A.9.1 [Extract Partial Text] .................................. 81 A.9.2 [Convert to CSV] ..................................... 81 A.10 [Tools] > [Plugin] menu ..................................... 82 A.10.1 [Sample] plugins ..................................... 82 A.10.2 [Unify *.txt Files in the Folder] ............................. 82 A.10.3 [New Project with Blank Line Support] ........................ 83 A.10.4 [Export Document-Word Matrix (Surface Form): Variable-length CSV (WordMiner)] ........................... 84 A.10.5 [Reload Morphological Analysis Results] ........................ 84 A.10.6 [Word Clusters: UniDic] ................................. 84 A.11 [Tools] menu ........................................... 84 A.11.1 [Execute SQL Statements] ................................ 84 A.12 [Help] menu ........................................... 85 A.12.1 [Manual (PDF)] ..................................... 85 A.12.2 [Latest Info (Web)] .................................... 86 A.12.3 [About] .......................................... 86 A.13 Miscellaneous .......................................... 86 A.13.1 Conditions of use ..................................... 86 A.13.2 Support .......................................... 86 A.13.3 Source code ........................................ 87 A.13.4 Uninstalling KH Coder ................................. 87 B GNU GENERAL PUBLIC LICENSE 89 Bibliography 94 1 Part A KH Coder Reference Manual Introduction The basic procedures for quantitative text analysis using KH Coder can be learned using the Quick Start Tutorial. However, to carry out analyses with your own data, more detailed descriptions of its functions may be required. This Reference Manual provides users with such information. Before preparing new text data to be analyzed, We reccomend you read through section A.2.1 to ensure your analyses progress smoothly. Similarly, before creating coding rules, consult the descriptions in section A.2.5. We also suggest reading through Section A.2 before starting your analysis. From Section A.3 onward, there are definitions of each command, in which you will occasionally see paragraphs named \Internal processing". If you use the commands provided by KH Coder as they are, you may skip these paragraphs. As outlined in section A.13.1, KH Coder is an open source software, which can be modified and upgraded by individual users. Section A.11.1 describes how data stored in the databases (MySQL) by KH Coder can be retrieved directly, allowing more flexible data searches. We outline the details necessary for carrying out such operations under the subsection on \Internal processing". KH Coder supports Windows, Linux and Macintosh. Once installed, the specifications and operat- ing methods are the same, regardless of the OS used. However, because the installation methods and configuration procedures are somewhat different for each platform, this manual provides separate in- formation for Windows and Linux in Section A.1: Installation and Start-up. The installation method for Macintosh is almost the same as Linux, but its configuration procedure is more complex. We now distribute automatic set-up software for Macintosh, as part of our paid support available through the official website. A.1 Setup A.1.1 Requirements For Windows To use KH Coder on Windows, a PC with Windows Vista or a later version is required. All software, such as ChaSen, MySQL, Perl and R, necessary to work with KH Coder is provided in the installation package for Windows. The package carries out all relevant configurations automatically. To analyze English text data, Java also must be installed. For Linux To use KH Coder on Linux, MySQL and Perl must be already installed. If the R software environment for statistical computing and graphics is not installed, then some of KH Coder's functions will not be available. To analyze Japanese text data, Chasen or MeCab is required. To analyze English text data, Stanford POS Tagger and Java need to be installed. The notes below are useful for setting up these 2 Part A KH Coder Reference Manual essential software on Linux. ■ChaSen The dictionary used in ChaSen (IPADIC) must be in the EUC-JP character encoding sys- tem. It is desirable to copy the whole dictionary folder of ChaSen into the home folder so that KH Coder can modify the settings of the dictionary, as required. After copying the dictionary, modify the settings file named \chasenrc" in the destination folder so that ChaSen refers to the copied dictionary. The \文法ファイル" (grammar file) statement in the \chasenrc" file specifies the directory of the dic- tionary. Normally, when ChaSen is installed on Linux, IPADIC is stored in the \dic" folder under /usr/local/share/chasen. ■MySQL First, you need to configure MySQL so it can correctly handle the EUC Japanese character sets. KH Coder 2.x converts Japanese text strings into EUC and then submits them to MySQL to search and retrieve data. Second, you need to configure access privileges in MySQL to allow KH Coder to carry out all operations, including database creation. To configure KH Coder, refer to \Manually setting up a connection to MySQL" under section A.1.3. ■Perl To use KH Coder on Linux, it is necessary to install various Perl modules, such as DBI, DBD::mysql, Jcode, and Tk. If these required Perl modules are missing,