Edlin: an easy to read linear learning framework Kuzman Ganchev ∗ Georgi Georgiev University of Pennsylvania Ontotext AD 3330 Walnut St, Philaldelphia PA 135 Tsarigradsko Ch., Sofia , Bulgaria
[email protected] [email protected] Abstract main advantage of Edlin is that its code is easy The Edlin toolkit provides a machine to read, understand and modify, meaning that learning framework for linear models, variations are easy to experiment with. For in- designed to be easy to read and un- dustrial users, the simplicity of the code as well derstand. The main goal is to provide as relatively few dependencies means that it is easy to edit working examples of im- easier to integrate into existing codebases. plementations for popular learning algo- Edlin implements learning algorithms for rithms. The toolkit consists of 27 Java classes with a total of about 1400 lines linear models. Currently implemented are: of code, of which about 25% are I/O and Naive Bayes, maximum entropy models, the driver classes for examples. A version Perceptron and one-best MIRA (optionally of Edlin has been integrated as a pro- with averaging), AdaBoost, structured Percep- cessing resource for the GATE architec- tron and structured one-best MIRA (option- ture, and has been used for gene tagging, ally with averaging) and conditional random gene name normalization, named entity fields. Because of the focus on clarity and con- recognition in Bulgarian and biomedical ciseness, some optimizations that would make relation extraction. the code harder to read have not been made. This makes the framework slightly slower than it could be, but implementations are asymp- Keywords totically fast and suitable for use on medium Information Extraction, Classification, Software Tools to large datasets.