<<

2014 22nd International Conference on Pattern Recognition

Scribe attribution for early medieval by means of letter extraction and classification and a voting procedure for larger pieces

Mats Dahllof¨

Dept of Linguistics and Philology and Dept of Information Technology, Division of Visual Information and Interaction, Uppsala University, Sweden E-mail: mats.dahllof@lingfil.uu.se

Abstract—The present study investigates a method for the a discipline tend to be difficult to communicate, to reproduce, attribution of scribal hands, inspired by traditional and to validate (see Stokes [10]). By contrast, a computer- in being based on comparison of letter shapes. The system aided palaeography, making use of techniques inspired by was developed for and evaluated on early medieval Caroline those developed in forensic analysis of modern handwriting, minuscule . The generation of a prediction for a page will be better equipped to characterize historical writing in image involves writing identification, letter segmentation, and precise and quantitative terms, and to work with models which letter classification. The system then uses the letter proposals to predict the scribal hand behind a page. Letters and sequences of can be objectively validated. connected letters are identified by means of connected component This paper describes a method for attributing pieces of labeling and split into letter-size pieces. The hand (and character) handwriting to scribes by extracting plausible letter candidates prediction makes use of a dataset containing instances of the from pages. The proposal is inspired by traditional letters , d, p, and q, cut out from manuscript pages whose scribal origin is known. Letters are represented by features palaeography, where writing styles and scribal hands in most capturing the distribution of foreground. Cosine similarity is cases are analyzed through inspection of letter forms. Its mode used for nearest neighbor classification. The hand behind a page of operation consequently allows us to generate justifications is finally predicted by means of a voting procedure taking the for the predictions which would be close to the reasoning of a highest scoring letter-level hits as its input. This hand prediction traditional philologist. Another motivation for this approach method was evaluated on pages from five different hands and is that it fits within a framework for textual search and reached an accuracy above 99% for four of them and 87% for interpretation (OCR), where human supervision through letter a fifth significantly more difficult one. The hand behind single data would be necessary for predicting which character each toplisted letters was correctly predicted in 83% of the cases. proposed letter instance represents.

I. INTRODUCTION The system was developed for the early medieval writing style known as the Caroline minuscule. It was evaluated on The issue of predicting who physically produced a piece five works, each representing one scribal hand. Together they of handwriting (the “scribe”, “hand”, or “writer”) has received comprised about 1000 manuscript pages. considerable attention. When it comes to modern handwriting, an important motivation for computational analysis is pro- The analysis involves scaling and binarization as prepro- vided by forensic science, where the purpose is to establish cessing steps, writing identification, and letter segmentation. facts about the origin of documents with a level of certainty The system predicts which scribal hand has produced a pro- appropriate for conclusions put forth as legal evidence. As posed letter instance by means of a nearest neighbor classi- can be expected, computational systems for hand prediction fication procedure using a dataset of cut-out letter instances make use of machine learning. A feature extraction mechanism from known hands. The letter-level classifier also produces and a classification framework can be seen as the two central character predictions, and its performance in that regard will components in such a system. Both of them allow a variety of also be discussed. design choices, as is generally the case for machine learning The hand behind a page is predicted by a voting tasks of this kind. procedure operating on a toplisted subset of the letter-level In the recently established field of “digital palaeography”, proposals generated for that page. where historical documents are at the centre of interest, there is A Java implementation was used to develop and evaluate a hope that the use of computational methods will be a way of the method. supporting and developing the philological methods for scribal attribution. Traditional palaeographers have in general relied II. PREVIOUS STUDIES on describing handwriting in qualitative terms. Their conclu- sions about the scribal origin of pieces of writing likewise Following Jain and Doerman [6], we can distinguish be- tend to be based on qualitative evidence. The methods of such tween literate and illiterate methods for the generation of

1051-4651/14 $31.00 © 2014 IEEE 1910 DOI 10.1109/ICPR.2014.334 features characterizing writing. The former analyze stretches manuscript pages. Each image in this dataset is labeled with of writing as sequences of letters, and the latter treat them a scribe and a character (letter type, ) tag. It will be as pen tip traces, without ascribing any conventional linguistic assumed that a dataset representing only a few letter types will structure to them. The virtue of the illiterate approaches is that be sufficient. they do not rely on transcriptions of the manuscript images, and that they, at least in principle, can be applied independently B. Preprocessing steps of and style. Examples of features of this kind are probability distributions of character fragment contours The hand (and character) prediction process takes a (based on a codebook generated from training data), as sug- manuscript image as input without requiring any human inter- gested by Schomaker and his coworkers [9]. The idea is devel- vention. In the experiments reported in section IV, image oped by Bulacu and Schomaker [2], with character fragments data are processed at a resolution of 8 pixels/mm. (This represented by normalized bitmaps. These “sub or supra- means that the images are resized from the original 13–21 allographic fragments” are called “”, and do not in pixels/mm.) Different choices of processing resolution might general directly correspond to characters (as “graphemes” do generate different rounding effects. in the terminology of linguists). Other examples of “illiterate” features are distributions of stroke fragments (Tang, et al. The text lines of the manuscript pages are assumed to be [11]), and joint probability distributions of the orientations of roughly parallel to the x-axis of the images, and the image any two hinged edge fragments (Bulacu and Schomaker [2]). processing preserves the coordinate systems given by the Brink et al. [1] propose features based on the relation between source images. Pixels belonging to the black (camera table) the local width and direction of ink traces in a probability margins of each manuscript image are automatically identified distribution. The approach of Jain and Doerman [6] relies as such (i.e. as large dark areas) and ignored in the further on a simple character segmentation scheme and clustering of analysis targeting the parchment page area. Writing foreground proposed segments to derive a “pseudo-” of contour and parchment background are separated (“binarized”) by gradient descriptors. The proposal involves a distance measure means of a customized version of the Otsu [8] algorithm, over these pseudo-. which works quite well for manuscripts with as good contrast as the ones selected for the experiments reported here. The individual practices of different scribes are also re- flected in the layout of the manuscript pages. This fact is used by De Stefano et al. [4] for hand prediction based on layout C. Letter segmentation features, and the method is applied to a 12th century Caroline Connected components of foreground, presumed to form minuscule multi-hand manuscript. letters and letter sequences, are identified by means of con- These methods use different clustering procedures to gen- nected component labeling. These components are split into erate codebooks and pseudo-alphabets of recurring writing ele- pieces presumed to correspond to single letters. Vertical cuts ments, whose distribution is used to characterize the individual are proposed where the horizontal pixel projection profile is features of handwriting. The task of deciding which hand to thinnest, looking from a window of a certain size (corre- propose is in many systems left to nearest-neighbor classifier sponding to 2 mm), but not thicker than a certain amount of ([9], [2], [1], [6], [11]). A multi-layer perceptron was used in foreground (corresponding to 1 mm). Any segment between [4]. these cuts is proposed as a letter candidate (to be classified), if its width is between 90% of the narrowest and 111% of the According to an overview of nine approaches (Brink et al. widest image in the letter dataset. This condition was assumed [1]) hand identification systems achieve accuracy rates between to filter out many and mostly non-letter connected components, 62% and 97% for modern handwriting, when several hundred while admitting all the letter proposals which are relevant for hands are included in the data. The authors add that “the num- hand prediction in an even-sized style like the writing studied bers cannot be well compared because of differences in dataset in section IV. The letter segmentation module essentially maps material, required level of human interference, and number of each manuscript image to a set of bounding boxes. The next writers”. The results reported by Brink et al. [1] for medieval step in the classification process will further filter out bad and manuscript data amount to lower accuracy rates for smaller irrelevant proposals. sets of scribes, suggesting that hand prediction for medieval handwriting is a more difficult task. An explanation for this may be that medieval handwriting, in particular the items D. Classification (hand, character) of letter instances that have been preserved, typically represents the standardized Each image (assumed to correspond to a single letter) is craft of educated scribes, whereas modern handwriting is represented by a feature vector, which is computed with re- an informal and non-professional mode of expression, which ference to the minimal bounding box enclosing the foreground allows various personal idiosyncrasies to shine through. pixels. They characterize the image in the following terms (where a and b are two constants): III. PROPOSED METHOD • Distribution of foreground pixels as captured by a A. Input data grid of a × b subrectangles over the bounding box, The proposed method operates with two kinds of data: with the total number of foreground pixels as divisor: First, there are the manuscript images for which the scribal rn = fn/f, where fn is the number of foreground hand is to be predicted. Secondly, the generation of hand pixels in the subrectangle n, 1 ≤ n ≤ ab, as illustrated predictions relies on a dataset of letter images, cut out from in figure 1.

1911 Fig. 1. The grid of a × b rectangles corresponding to the features (here TABLE I. AN EXAMPLE OF THE TOP 24 LETTER HITS FOR A 5×6=30) used to capture the distribution of foreground (ink) in the bounding MANUSCRIPT PAGE (COD.SANG. 186, P. 103). (CHARACTER boxes (enclosing letters). The image is binarized as a step in the computation CLASSIFICATION CORRECT IN ALL CASES, AND HAND IN 17 CASES.) of these features.

p, 186 d, 186 b, 186 d, 186 p, 562 d, 186 q, 186 b, 186

d, 186 b, 186 p, 186 p, 562 b, 186 p, 562 p, 186 d, 186

p, 557 p, 562 d, 186 p, 562 d, 186 d, 565 b, 186 d, 186

• Bounding box proportions: p = w/(w + h), where w and h are the width and height, respectively, of the E. Predicting the hand of a manuscript page bounding box. • d = f/wh f As mentioned above, the system uses a set of letter images Foreground density: , where is the representing known scribal hands (and characters) for hand number of foreground pixels. prediction. The letter recognition process generates, for each manuscript page, a set of letter predictions (character, hand, As we see, this gives us a sequence of ab +2 features bounding box) ranked by the similarity score. See table I for (r1,...,rab,p,d). Furthermore, all features carry values in the an example from the data set used in section IV. Only letter interval [0.0, 1.0], and can be understood as quotients. predictions whose score is 0.985 or higher are considered. The As a step before the computation of these feature values, hand that receives the largest number of votes from the top 29 each separate connected element containing less than 20% of letter predictions (or all of them if their number is smaller than the foreground pixels in the letter bounding box is removed. that) is returned as the hand prediction for the page. If a toplist The purpose of this is to eliminate, for instance, overhanging gives the same largest number of votes to several hands, voting parts from adjoining letters, which might have been cut off by based on a one item shorter toplist will decide instead. This the bounding box, and other small detached foreground items. procedure will consequently produce a hand prediction for a manuscript page if at least one letter is recognized. If we compare these features with traditional scholarly criteria for hand identification, guided by the overview given by Stokes [10], we can note that they in particular will capture IV. EVALUATION aspects such as: A. Manuscript data • Form: the morphology of the letters. The method proposed here was evaluated on five well- • preserved manuscripts written in the style known as the Modulus: the proportions of the letters. Caroline (or Carolingian) minuscule, see table II. The books • Weight: the differences in thickness between different were all written at the Abbey of St. Gall (Switzerland) and kinds of line. still belong to the St. Gall Stiftsbibliothek. They have been published by the e-codices website hosted by the Univer- Cosine similarity, defined by Jones and Furnas [7] as in sity of Fribourg. High-quality digital reproductions are found equation (1), is used as the similarity metric. The features are at http://www.e-codices.unifr.ch/en/list/one/csg/0112, etc. The given equal weight. highest resolution images provided by the website (“x-large”, “converted into JPEG files [...] and minimally processed to improve legibility”) have served as manuscript page data. n (Pages from Cod. Sang. 562 are also included in the dataset Ii × Ji i=1 compiled by Fischer et al. [5].) The use of the e-codices images similarity(I,J)=  (1) n n is regulated by a Creative Commons License.  2  2 Ii × Ji i=1 i=1 According to palaeographers’ verdicts, quoted in the meta- data at the e-codices website, each of these codicological units represents a single scribe (“hand”). It will be assumed Given the feature model and the similarity measure, the that we are dealing with five different scribes, i.e. in the hand (and character) prediction is performed by instance- present context, one for each codex number. Four pages (from based nearest neighbor classification. This means that it returns Cod. Sang. 557) without any minuscule writing were removed the prediction that an image is of the same type (hand and from the page dataset. There are between 2 and 24 lines of character) as the most similar instance in the letter dataset. minuscule writing on each manuscript page.

1912 TABLE II. THE FIVE CODICOLOGICAL UNITS (EACH REPRESENTING TABLE III. THE PERFORMANCE OF THE HAND CLASSIFIER ON ONE “HAND”) PROVIDING DATA FOR THE PRESENT STUDY: TEXT TOPLISTED, AUTOMATICALLY EXTRACTED LETTERS, EXAMPLE, SHELFMARK, DATE (CENTURY), NUMBER OF PAGES INCLUDED BY CODICOLOGICAL UNIT/HAND, GENERATED IN THE DATASET, AND THEIR DIVISION (INTO LETTER SOURCE PAGES, IN THE ANALYSIS OF THE UNSEEN SUBSET OF THE DATA. TUNING DATA, AND EVALUATION DATA). Hand/unit Total Correct Accuracy Shelfmark, unit Date Pages Division Cod. Sang. 112 8337 7214 86.5% Cod. Sang. 186 3712 2649 71.4% Cod. Sang. 557 6816 5466 80.2% Cod. Sang. 562 2291 1872 81.7% Cod. Sang. 565 5684 5206 91.6% All five 26840 22407 83.5% Cod. Sang. 112. 9th c. 322 1+32+289 TABLE IV. THE PERFORMANCE OF THE HAND CLASSIFIER ON TOPLISTED, AUTOMATICALLY EXTRACTED LETTERS, LIKE TABLE III, BUT BY PREDICTED CHARACTER.

Cod. Sang. 186 (unit p. 3–146). Early 9th c. 143 1+14+128 Character (pred.) Total Correct (hand) Accuracy b 6339 4799 75.7% d 9864 8763 88.8% p 7231 6221 86.0% q 3406 2624 77.0% Cod. Sang. 557. 9th c. 270 2+27+241 All four 26840 22407 83.5%

Cod. Sang. 562 (unit p. 3–93). Late 9th c. 91 2+10+79 21, etc., in the file name alphabetical order) of the codices was used during the development process for design and parameter tuning. The remaining pages were left unseen until the final evaluation. Table II shows the distribution of the pages into Cod. Sang. 565 (unit p. 3–222). 10th or 11th c. 220 2+22+196 the letter source subset, the tuning dataset, and the evaluation All five 1046 8+105+933 dataset.

C. Using the letter dataset to decide on a feature model The Caroline minuscule is the most important book style Different letter classification models were evaluated during of the early medieval period in Europe, well-known for being the development stage using the letter dataset. The accuracy easier to read than the later Gothic styles for both visual of the letter classifier (based on the feature scheme and the and linguistic reasons. Strictly speaking, the term minuscule similarity metric) with respect to hand and character prediction is used for both a kind of letter (corresponding to lowercase) can be computed by applying it in a leave one out manner, i.e. and a writing style (in which minuscule letters are used in by predicting the type of each letter instance by comparing it the body text). The scribes were generally skilled craftsmen, to every other letter in the letter dataset (containing 436 items). who produced a highly standardized and formal kind of handwriting. However, even if the Caroline minuscule is not The choices for a × b were optimized with regard to hand a style, many of the letters are connected. prediction to 5 × 6 for feature models using only (r1,...,rab) (see section III-D). The addition of the p and d features B. Letter data and development jointly further improved the performance, thus justifying the feature model (r1,...,r30,p,d), which was used in the further The four characters b, d, p, and q were assumed to be useful experiments. for the present task, in being distinctive of different hands and relatively easy targets for the segmentation procedure. This model yields an accuracy rate of 92.4% (403 correct) Instances of these were cut out manually from one or two for hand prediction and 99.8% (435 correct) for character (b, suitable (representative and visually high-contrast) pages from d, p,orq) prediction, when evaluated in the leave one out each of the five codicological units. The selection of char- manner. acter instances and determination of bounding boxes around them consequently represent the manual input to the method D. Performance of the classifier on automatically extracted proposed here. Only letters of the “normal” minuscule form letters were used for hand prediction. (Majuscule letters in different styles, used for emphasis and in titles, are also common in the As can be expected, if we remember that the letter dataset Caroline books.) Furthermore, only the “half-uncial” d with contains manually selected and excerpted “good” character straight upright (like the d examples in table II) was instances, the performance of the letter-level classifier for used, excluding its “uncial” allograph. The letter dataset used hand prediction on single, automatically extracted letters (see for the present study contained a total of 436 items, 59–108 table I for examples), gives us lower accuracy scores. The letter images for each hand, and 9–33 instances of each hand- performance on those letters extracted for the voting procedure character combination. (i.e. those which have been classified by the nearest neighbor classifier with a similarity score of at least 0.985 and which are The data subset formed by every tenth page (position 1, 11, among the 29 top hits for a page) is shown in table III, with

1913 TABLE V. THE PERFORMANCE OF THE HAND PREDICTION SYSTEM to investigate this possibility, the evaluation procedure was APPLIED TO SINGLE PAGES OF THE FIVE CODICOLOGICAL UNITS/HANDS, repeated using a dataset containing exactly 9 instances of each EVALUATED ON THE UNSEEN SUBSET OF THE DATA. hand-character combination, i.e. a subset of 180 letters of the Hand/unit Total Correct Incorrect Accuracy original 436. This improves the performance somewhat (but Cod. Sang. 112 289 287 2 (C.. 186) 99.3% not in a statistically significant way), giving us only 14 mis- Cod. Sang. 186 128 112 15 (C.S. 562) + 1 (C.S. 565) 87.5% classifications, which is a better than the 18 misclassifications Cod. Sang. 557 241 241 0 100.0% Cod. Sang. 562 79 79 0 100.0% we got before, see tables V and VI. Cod. Sang. 565 196 196 0 100.0% All five 933 915 18 98.1% V. C ONCLUSION

TABLE VI. LIKE TABLE V, BUT USING A SMALLER AND BALANCED The main purpose of the system described here is to predict LETTER DATASET. which scribe, among a set of known scribes, has produced Hand/unit Total Correct Incorrect Accuracy a manuscript page. The system does this by using a partial Cod. Sang. 112 289 287 1 (C.S. 557) + 1 (C.S. 565) 99.3% OCR technique for extracting letters. This involves a letter Cod. Sang. 186 128 117 11 (C.S. 562) 91.4% segmentation method and a letter classifier. Cod. Sang. 557 241 241 0 100.0% Cod. Sang. 562 79 79 0 100.0% The system uses a simple and somewhat naive letter Cod. Sang. 565 196 196 0 100.0% segmentation method. Its good performance is probably due to All five 933 919 14 98.5% the clearly separated lines and good contrast between ink and parchment which are seen in the Caroline manuscripts studied here. They are thus less challenging segmentation-wise than the scores computed separately for the five codicological units. many other examples of writing. We see some interesting differences, which will be discussed A nearest neighbor classifier based on a letter shape feature below. Table IV shows the accuracy scores for each predicted model and the cosine similarity measure performs both hand character in the corresponding way. As character location and character prediction for the extracted letters. The process ground truth data were not available, an exact quantitative eval- is supervised by relying on a set of cut-out letters representing uation of the performance of the letter classifier for character known characters and scribal hands. The hand prediction for prediction can not be provided. However, a manual inspection whole pages is based on voting, using the letter-level hand of the extracted letters, which were saved as images, suggests proposals. – some caution should be observed when it comes to reading letters deprived of their context – that its accuracy rate lies The page-level hand prediction method was evaluated on above 99%. Most errors seem to relate to overprediction of b, Caroline manuscript pages from five different codicological for instance, when h and li were the correct readings. Perhaps units five different hands and reached an accuracy rate above this explains why, as table IV shows, hand prediction gives 99% for four of them and 87% (which improved to 91% a lower accuracy score for letters predicted to be instances by when the size of the letter dataset was reduced) for the of b than in the other cases. The differences shown in table “hardest” one. Even if it remains to be seen how well a clas- IV could, if some speculation is excused, indicate that the sifier based on these principles would perform on manuscript execution of some characters is more sensitive to the personal collections representing a larger number of hands, the present writing practice of individual scribes. performance figures are promising. The remarkable accuracy rate level of 70%–90% with which the hand of single, auto- E. Performance of the hand classifier for manuscript pages matically extracted letters can be predicted is a new kind of result. The performance scores for the page-level hand classifi- cation system are based on its predictions for the “unseen” The feature model used here is very simple, yet sensitive manuscript pages left as evaluation data. The results are to both the individual variation among scribes executing the summarized in table V. For three of the five codological units Caroline script and to features distinguishing its different the classifier gives a 100.0% accuracy rate. For Cod. Sang. 112 characters. No learning process, apart from the computation there are just two errors. For the pages from Cod. Sang. 186 the of feature vectors, is involved in the classifier construction, performance is considerably worse, with an error rate around since it is based on a nearest neighbor search procedure. 12%. This unit exhibits a striking variation in the thickness of The Java implementation of the present hand prediction the letter’s lines (see e.g. p. 15), and this might explain why system requires, on an ordinary laptop, a few seconds to the classifier runs into considerably more difficulties there. The process a manuscript page. Most of that time is spent on errors for Cod. Sang. 186 correspond, as can be expected, to the connected component labeling and letter segmentation, for lower letter-level accuracy rates, as we see in table III. which only a naive implementation has been used.

F. Using a smaller and balanced letter dataset The present “literate” system represents a new approach to the issue of hand prediction. Its letter-based mode of operation Given that the letter dataset (see section IV-B) has a makes it potentially useful for the purposes of digital palaeog- composition reflecting an opportunistic compilation procedure, raphy. An analysis which processes handwriting as a linear there is a possibility that the effects reported in the previous sequence of linguistic forms (graphemes, characters) allows section are due to its containing an unbalanced number of us to derive a justification for each hand prediction which instances for different hand-character combinations. In order would be comprehensible to a traditional palaeographer. Such

1914 an account would point to the pairwise similarities between [5] A. Fischer, V. Frinken, A. Fornes,´ and H. Bunke, Transcription Align- specific letters, one in the manuscript under scrutiny, and one ment of Latin Manuscripts using Hidden Markov Models, Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, in a reference (letter source) manuscript. As a by-product, 29–36, 2011. it automatically excerpts character instances which could be [6] . Jain and D. Doerman, Writer Identification Using an Alphabet subjected to further palaeographical analysis, for instance, by of Contour Gradient Descriptors, 12th International Conference on tools of the kind proposed by Ciula [3] and Stokes [10]. Document Analysis and Recognition (ICDAR), 550–554, 2013. [7] W. P. Jones and G. W. Furnas, Pictures of Relevance: A Geometric Admittedly, several components of the present proposal Analysis of Similarity Measures, Journal of the American Society for would have a hard time coping with examples of handwriting Information Science and Technology 38(6), 420–442, 1987. which are more difficult than the well-preserved Caroline [8] N. Otsu, A threshold selection method from gray-level histograms, codices studied here. Discolored parchment, faded ink, bleed- IEEE Transactions on Systems, Man and Cybernetics 9(1), 62–66, 1979. through, and similar problems are challenges to any binariza- [9] L. Schomaker, M. Bulacu, and K. Franke, Automatic writer identifi- tion method, and they necessitate more sophisticated solutions cation using fragmented connected-component contours, in F. Kimura than Otsu’s algorithm. Furthermore, more cursive handwriting and H. Fujisawa (Eds.), Ninth IWFHR, Tokyo, 185–190, 2004. styles and manuscripts exhibiting overlapping between the [10] P. Stokes, P., Computer-Aided Palaeography, Present and Future, in F. Fischer, C. Fritze, and G. Vogeler (Eds.), Kodikologie und lines would make letter extraction a more demanding business Palaoographie¨ im digitalen Zeitalter – Codicology and Palaeography in than it is for the typically well-spaced and clearly articulated the Digital Age, Schriften des Instituts fur¨ Dokumentologie und Editorik Caroline minuscule style. 2, BoD, Norderstedt, 309–338, 2009. [11] Y. Tang, X. Wu, and W. Bu, Offline Text-independent Writer Iden- The main disadvantage of the present kind of approach, tification Using Stroke Fragment and Contour Based Features, 2013 if we look at things from the perspective of the ”illiterate” International Conference on Biometrics (ICB), 2013. systems discussed in section II, is that it requires a manually produced dataset of letters. A set of 436 images (covering all combinations of five codicological units and four characters) like the one used here would take a couple of hours to compile. This suggests that it would be worthwhile to explore the possibility of combining the letter-based approach for hand classification with automatic extraction of reference letters. One way of doing this would be to use, for instance, the letters already collected, to extract letters from a new collection of manuscripts. A system based on the hand prediction method proposed here could then be used to predict which pages were most likely to have been written by the same hand. It could also be used to search for those manuscript pages which are script-wise most similar to a given page. The extraction-voting classification method could also be used with more arbitrary writing elements of the kind identified in the codebook-based methods mentioned in section II, without requiring that these elements correspond to characters.

ACKNOWLEDGMENTS The author would like to thank Anders Brun, Fredrik Wahlberg, Lasse Martensson,˚ Tomas Wilkinson, and Kalyan Ram Ayyalasomayajula for helpful discussions on the system reported here.

REFERENCES [1] A. A. Brink, J. Smit, M. L. Bulacu, and L. R. B. Schomaker, Writer identification using directional ink-trace width measurements, Pattern Recognition 45, 162–171, 2012. [2] M. Bulacu and L. Schomaker, Text-independent writer identification and verification using textural and allographic features, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 29(4), 701–717, 2007. [3] A. Ciula, The Palaeographical Method under the Light of a Digital Approach, in F. Fischer, C. Fritze, and G. Vogeler (Eds.), Kodikologie und Palaographie¨ im digitalen Zeitalter – Codicology and Palaeography in the Digital Age, Schriften des Instituts fur¨ Dokumentologie und Editorik 2, BoD, Norderstedt, 219–234, 2009. [4] C. De Stefano, F. Fontanella, M. Maniaci, and A. Scotto di Freca, A method for scribe distinction in medieval manuscripts using page layout features, in G. Maino and G.L. Foresti (Eds.), ICIAP 2011, Part I, LNCS 6978, Springer-Verlag, Berlin, Heidelberg, 393–402, 2011.

1915