Package 'Korpus'

Package 'Korpus'

Package ‘koRpus’ February 15, 2013 Type Package Title An R Package for Text Analysis Author m.eik michalke <[email protected]>, with contributions from Earl Brown <[email protected]>, Alberto Mirisola, and Laura Hauser Maintainer m.eik michalke <[email protected]> Depends R (>= 2.10.0),methods Enhances rkward Suggests testthat Description A set of tools to analyze texts. Includes, amongst others,functions for automatic lan- guage detection, hyphenation,several indices of lexical diversity (e.g., type token ratio,HD- D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX,Dale-Chall). Basic import func- tions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats). #’ Note: For full functionality a local installation of TreeTagger is recommended. Be encouraged to send feedback to the author(s)! License GPL (>= 3) Encoding UTF-8 LazyLoad yes URL http://reaktanz.de/?c=hacking&s=koRpus Version 0.04-36 Date 2012-08-27 Collate ’ARI.R’ ’bormuth.R’ ’C.ld.R’ ’kRp.tagged-class.R’’kRp.txt.freq-class.R’ ’kRp.txt.trans- class.R’’kRp.TTR-class.R’ ’kRp.analysis-class.R’ ’koRpus-internal.R’’clozeDelete- method.R’ ’coleman.liau.R’ ’coleman.R’’kRp.hyphen-class.R’ ’correct-method.R’ ’cTest- method.R’’CTTR.R’ ’dale.chall.R’ ’danielson.bryan.R’ ’dickes.steiwer.R’’DRP.R’ ’ELF.R’ ’farr.jenkins.paterson.R’ ’flesch.kincaid.R’’flesch.R’ ’FOG.R’ ’FORCAST.R’ ’freq.analysis.R’ ’fucks.R’’get.kRp.env.R’ ’guess.lang.R’ ’harris.jacobson.R’ ’HDD.R’’hyphen.R’ ’hyph.XX- data.R’ ’jumbleWords.R’ ’K.ld.R’’koRpus-internal.import.R’ ’kRp.readability- class.R’’kRp.lang-class.R’ ’kRp.hyph.pat-class.R’ ’kRp.filter.wclass.R’’kRp.corp.freq- 1 2 R topics documented: class.R’ ’koRpus-internal.rdb.formulae.R’’koRpus-internal.rdb.params.grades.R’’koRpus- internal.roxy.all.R’ ’koRpus- package.R’ ’kRp.cluster.R’’kRp.POS.tags.R’ ’kRp.text.analysis.R’ ’kRp.text.paste.R’’kRp.text.transform.R’ ’lang.support- de.R’ ’lang.support-en.R’’lang.support-es.R’ ’lang.support-it.R’ ’lang.support- ru.R’’lex.div.num.R’ ’lex.div.R’ ’linsear.write.R’ ’LIX.R’ ’maas.R’’manage.hyph.pat.R’ ’MATTR.R’ ’MSTTR.R’ ’MTLD.R’ ’nWS.R’’plot.kRp.tagged.R’ ’query- methods.R’ ’readability.num.R’’readability.R’ ’read.corp.celex.R’ ’read.corp.custom.R’’read.corp.LCC.R’ ’read.hyph.pat.R’ ’RIX.R’ ’R.ld.R’’segment.optimizer.R’ ’set.kRp.env.R’ ’show.kRp.lang.R’’show.kRp.corp.freq.R’ ’show.kRp.readability.R’’show.kRp.TTR.R’ ’S.ld.R’ ’SMOG.R’ ’spache.R’ ’strain.R’’summary.kRp.lang.R’ ’summary.kRp.readability.R’’summary.kRp.tagged.R’ ’summary.kRp.TTR.R’’summary.kRp.txt.freq.R’ ’textFeatures.R’ ’tokenize.R’’traenkle.bailer.R’ ’treetag.R’ ’TRI.R’ ’TTR.R’ ’U.ld.R’’wheeler.smith.R’ Repository CRAN Date/Publication 2012-08-28 05:32:05 NeedsCompilation no R topics documented: koRpus-package . .4 ARI .............................................4 bormuth . .5 C.ld .............................................7 clozeDelete . .8 coleman . .8 coleman.liau . 10 correct.tag . 11 cTest............................................. 12 CTTR............................................ 13 dale.chall . 14 danielson.bryan . 15 dickes.steiwer . 16 DRP............................................. 17 ELF............................................. 18 farr.jenkins.paterson . 19 flesch . 20 flesch.kincaid . 21 FOG............................................. 22 FORCAST . 23 freq.analysis . 24 fucks . 25 get.kRp.env . 26 guess.lang . 27 harris.jacobson . 28 HDD............................................. 30 hyph.XX . 31 hyphen . 32 jumbleWords . 33 K.ld............................................. 34 kRp.analysis-class . 35 kRp.cluster . 35 kRp.corp.freq-class . 36 kRp.filter.wclass . 37 R topics documented: 3 kRp.hyph.pat-class . 38 kRp.hyphen-class . 38 kRp.lang-class . 39 kRp.POS.tags . 40 kRp.readability-class . 41 kRp.tagged-class . 44 kRp.text.analysis . 45 kRp.text.paste . 47 kRp.text.transform . 47 kRp.TTR-class . 48 kRp.txt.freq-class . 50 kRp.txt.trans-class . 51 lex.div............................................ 51 lex.div.num . 55 linsear.write . 56 LIX ............................................. 57 maas............................................. 58 manage.hyph.pat . 59 MATTR........................................... 60 MSTTR . 61 MTLD............................................ 62 nWS............................................. 63 plot ............................................. 64 query . 65 R.ld ............................................. 66 read.corp.celex . 67 read.corp.custom . 68 read.corp.LCC . 69 read.hyph.pat . 71 readability . 72 readability.num . 81 RIX ............................................. 83 S.ld ............................................. 84 segment.optimizer . 85 set.kRp.env . 86 show............................................. 87 SMOG............................................ 87 spache . 88 strain . 89 summary . 90 textFeatures . 91 tokenize . 92 traenkle.bailer . 94 treetag . 95 TRI ............................................. 97 TTR............................................. 98 U.ld............................................. 99 wheeler.smith . 100 4 ARI Index 102 koRpus-package An R Package for Text Analysis. Description An R Package for Text Analysis. Details Package: koRpus Type: Package Version: 0.04-36 Date: 2012-08-27 Depends: R (>= 2.10.0),methods Enhances: rkward Encoding: UTF-8 License: GPL (>= 3) LazyLoad: yes URL: http://reaktanz.de/?c=hacking&s=koRpus A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats). Note: For full functionality a local installation of TreeTagger is recommended. Be encouraged to send feedback to the author(s)! Author(s) Meik Michalke <[email protected]> ARI Readability: Automated Readability Index (ARI) Description This is just a convenient wrapper function for readability. Usage ARI(txt.file, parameters = c(asl = 0.5, awl = 4.71, const = 21.43), ...) bormuth 5 Arguments txt.file Either an object of class kRp.tagged-class, a character vector which must be be a valid path to a file containing the text to be analyzed, or a list of text features. If the latter, calculation is done by readability.num. parameters A numeric vector with named magic numbers, defining the relevant parameters for the index. ... Further valid options for the main function, see readability for details. Details Calculates the Automated Readability Index (ARI). In contrast to readability, which by default calculates all possible indices, this function will only calculate the index value. If parameters="NRI", the simplified parameters from the Navy Readability Indexes are used, if set to ARI="simple", the simplified formula is calculated. This formula doesn’t need syllable count. Value An object of class kRp.readability-class. References DuBay, W.H. (2004). The Principles of Readability. Costa Mesa: Impact Information. WWW: http://www.impact-information.com/impactinfo/readability02.pdf; 22.03.2011. Smith, E.A. & Senter, R.J. (1967). Automated readability index. AMRL-TR-66-22. Wright- Paterson AFB, Ohio: Aerospace Medical Division. Examples ## Not run: ARI(tagged.text) ## End(Not run) bormuth Readability: Bormuth’s Mean Cloze and Grade Placement Description This is just a convenient wrapper function for readability. 6 bormuth Usage bormuth(txt.file, word.list, clz=35, meanc=c(const=0.886593, awl=0.08364, afw=0.161911, asl1=0.021401, asl2=0.000577, asl3=0.000005), grade=c(const=4.275, m1=12.881, m2=34.934, m3=20.388, c1=26.194, c2=2.046, c3=11.767, mc1=44.285, mc2=97.62, mc3=59.538), ...) Arguments txt.file Either an object of class kRp.tagged-class, a character vector which must be be a valid path to a file containing the text to be analyzed, or a list of text features. If the latter, calculation is done by readability.num. clz Integer, the cloze criterion score in percent. meanc A numeric vector with named magic numbers, defining the relevant parameters for Mean Cloze calculation. grade A numeric vector with named magic numbers, defining the relevant parame- ters for Grade Placement calculation. If omitted, Grade Placement will not be calculated. word.list A vector or matrix (with exactly one column) which defines familiar words. For valid results the long Dale-Chall list with 3000 words should be used. ... Further valid options for the main function, see readability for details. Details Calculates Bormuth’s Mean Cloze and estimted grade placement. In contrast to readability, which by default calculates all possible indices, this function will only calculate the index value. This formula doesn’t need syllable count. Value An object of class kRp.readability-class. Examples ## Not run: bormuth(tagged.text, word.list=new.dale.chall.wl) ## End(Not run) C.ld 7 C.ld Lexical diversity: Herdan’s C Description This is just a convenient wrapper function for lex.div. Usage C.ld(txt, char = FALSE, ...) Arguments txt An object of either class kRp.tagged-class or kRp.analysis-class, contain- ing the tagged text to be analyzed. char Logical, defining whether data for plotting characteristic curves should be cal- culated. ... Further valid options for the main function, see lex.div for details. Details Calculates Herdan’s C. In contrast to lex.div, which by default calculates all

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    105 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us