Treebank of Chinese Bible Translations Andi Wu GrapeCity Inc.
[email protected] represent different styles of Chinese writ- Abstract ing, ranging over narration, exposition and This paper reports on a treebanking poetry. Due to the diversity of the transla- project where eight different modern tors’ backgrounds, some versions follow Chinese translations of the Bible are the language standards of mainland China, syntactically analyzed. The trees are while other have more Taiwan or Hong created through dynamic treebanking Kong flavor. But they have one thing in which uses a parser to produce the common: they were all done very profes- trees. The trees have been going sionally, with great care put into every sen- through manual checking, but correc- tence. Therefore the sentences are usually tions are made not by editing the tree well-formed. All this makes the Chinese files but by re-generating the trees with translations of the Bible a high-quality and an updated grammar and dictionary. well-balanced corpus of the Chinese lan- The accuracy of the treebank is high guage. due to the fact that the grammar and dictionary are optimized for this specif- To study the linguistic features of this text cor- ic domain. The tree structures essen- pus, we have been analyzing its syntactic tially follow the guidelines of the Penn structures with a Chinese parser in the last few Chinese Treebank. The total number years. The result is a grammar that covers all of characters covered by the treebank is the syntactic structures in this domain and a 7,872,420 characters.