A Generalized Approach to Optical Mark Recognition

International Conference on Computer and Communication Technologies (ICCCT'2012) May 26-27, 2012 Phuket A Generalized Approach To Optical Mark Recognition Surbhi Gupta, Geetila Singla, and Parvinder Singh Sandhu might be a department’s student evaluation, or an organization Abstract—OMR is an automated process of capturing the data in survey for information; it’s useful for all. This piece of the form of bubbles, square, tick marks. Though the technology research work is an effort to computerize the process of makes use of OMR scanners but it has some disadvantages. The evaluation of multiple choice question response sheets of an present work proposes to automate the same using the machine vision examination. The main task is to detect the presence and for exam evaluation. Optical Mark Recognition (OMR), also called “mark sensing”, is a method of scanning technology in which data is absence of dark marks and extracting information depending input via marks made in predefined positions on a form and entering upon these marks from an image. There are a number of data into a computer system. Therefore, OMR is best for handling software and hardware in present market that are discrete data, where values fall into a limited number of values. For professionally used to detect such images however, here the examples, sex, occupation, religion, etc. OMR is a technology that aim is to develop a suitable software that would detect marks detects the absence or presence of a mark, but not the shape of the and hence prepare results according to the needs. mark. Forms are scanned through an OMR scanner. The forms contain small circles, referred to as bubbles, or boxes that are filled in A. Advantages of OMR by the respondent. Optical Mark Recognition (OMR) reads marks written by pencil or ballpoint pen in the pre-defined positions on the 1. There is a large number of document to justify questionnaire sheet. The OMR can judge the existence of written designing and printing them. marks by recognizing their depth (darkness) on the sheet. 2. The user can only make marks and cannot write any information. I. INTRODUCTION MR is a technology which uses hardware to detect the B. Disadvantages O presence or absence of marks. This process is entirely 1. Document for mark reader are complicated to design. automated, although it requires the use of specialist answer 2. Input of the data to computer is slow. sheets, each of which is capable of holding 75 answers. A 3. It is difficult for a computer to check marked data. number of schools use an OMR system called Multiquest from 4. The person putting the marks on the document has to Speedwell to do their optical mark recognition. The most follow the instruction. common use of optical mark recognition is to process student responses to a multiple choice exam, or responses to a C. Applications of OMR questionnaire or feedback form. Typically the questions are 1. In Exam Evaluation provided on paper, and students mark their responses onto 2. Automated attendance special pre-printed forms. These forms are then read 3. Marking automatically. Some departments (e.g BTO, Economics, 4. Voting community surveys. Medical School) already make extensive use of OMR systems, others who do not have sufficient demand to justify dedicated equipment of their own use the service provided by II. REVIEW OF LITERATURE Information Services to process smaller numbers of examinations or feedback forms. Exams processed in this way A. Pattern Recognition can usually be returned to departments within a couple of Pattern recognition is an important field of computer days. Detailed analytical results are provided.The technology science concerned with recognizing patterns, particularly OMR has revolutionized the whole process. The questionnaire visual and sound. It is central to voice recognition, handwriting recognition, and optical character recognition (OCR). As OCR is the root technology for OMR, it is Surbhi Gupta is Assistant Professor, Department of Computer Science, Rayat Institute of Engineering & Information Technology. important to study OCR. Schantz [12] studied the concept of Geetila Singla is Master Student (CSE Dept.), Rayat & Bahra Institute of Optical Character Recognition that it has been around, in one Engineering & Biotechnology, Mohali, India. form or another, for a good 200 years. The process of Dr. Parvinder Singh Sandhu is Professor, Department of Computer Science & Engg.,Rayat & Bahra Institute of Engineering & Biotechnology, Mohali, transforming an image of printed text into a text code, thereby India. making it machine-readable found its earliest incarnation in 171 International Conference on Computer and Communication Technologies (ICCCT'2012) May 26-27, 2012 Phuket US patents for reading aids for the blind in the early 1800s. special training is needed to use the paper forms, and the Srihar [13] describes that modern OCR technology is said to automatic-reading process eliminates keying-in errors and have born in 1951 with M. Sheppard's invention, GISMO- A greatly reduces clerical and turnaround time. Accuracy of Robot Reader-Writer. In 1954, J. Rainbow developed a most OMR systems approaches 100%, 98% for optical prototype machine that was able to read uppercase typewritten character recognition and 95% for intelligent character output at the “fantastic” speed of one character per minute. recognition. Dillman [4] studied the impact of OMR forms on Several companies, including IBM, Recognition Equipment response rates that it is a relevant issue. One possible Inc., Farrington, Control Data, and Optical Scanning disadvantage with OMR surveys is that they may suppress Corporation, marketed OCR systems by 1967. Nartker et al response rates. This can occur for several reasons. OMR [11] mentioned that an accuracy rate exceeding 98% is often surveys are often combined with other cost-cutting measures cited as necessary for document conversion to be more (e.g., no follow up), so their low response rates may simply be efficient.Accuracy can be affected by a number of factors such an artifact of other choices about survey administration. as hardware and software variables, scan resolution, paper Generally OMR forms have one standard ink color that quality and typeface clarity. Haigh [5] provided an overview provides limited visual appeal creating a disincentive for of Optical Character Recognition as it pertains to library response. Moreover these forms are also more tedious to fill digitization activities. It is suggested that the decision to run out. Rather than simply reading through the survey and OCR should be based on the projected use of the document. checking off or circling responses, the respondent must Factors that affect accuracy and throughput rates within an carefully fill in a circle or “bubble” for each question answer. OCR operation are provided as a framework for determining Moore [10] mentioned that OCR and ICR are character- the cost-effectiveness of this process compared with its based recognition systems. OCR recognizes machine-printed alternatives, which include retaining the document as an text while ICR (Intelligent Character Recognition) recognizes image and re-keying the original into electronic form. machine and hand-printed characters. OCR has typically been According to him the OCR process of turning that image into used where selected information on a pre-printed form is to be computer-editable text involves five discrete processes: read. Mark Sense and OMR recognize the presence or absence identification of text and image blocks in the image, character of a mark in a specific area of a specially designed form. The recognition, word identification/recognition, correction, and exact meaning of the mark depends on the form's design. formatting output. Lopresti et al [9] explained the process of B. Graphic File Format optical mark recognition with reference to Remark Office OMR 3.0, made by Principia Products. He reported that, for A bitmap is a graphical object used to create, manipulate years people who do statistical analysis have been designing and store images as files on a disk. The BMP format is widely questionnaires, getting them filled out by respondents or compatible with existing Windows programs, especially older interviewers, and then somehow wrestling the data into a programs. computer. With the first computers, much of the data was Hetzl [6] described that .bmp file format is the standard for input by creating decks of punched cards. While this process a Windows 3.0. It may use compression and is (by itself) not allowed one to create the necessary computer files, it was capable of storing animation. There are two classes of subject to input error and thus had to be verified. At about the bitmaps: same time another technology came into use that allows • Device-independent bitmaps-(DIB) file format was multiple-choice forms to be read. Kia [8] mentioned that designed to ensure that bitmapped graphics created Optical Mark Recognition (OMR) is used for standardized using one application can be loaded and displayed testing as well as course enrollment and attendance in in another application, retaining the same education. Human resource departments across industries use appearance as the original. OMR for applications such as benefits enrollment, employee • Device-dependent bitmaps-(DDB), also known as testing, and change of employee status, payroll deductions, GDI bitmaps, were the only bitmaps available in and user training. early versions of 16-bit Microsoft Windows (prior Zahniser [14] studied that the US government has utilized to version 3.0). OCR technology in the US Post Office system for close to Bourke [3] explained that BMP files are an historic (but still three decades to automate the mail handling process and commonly used) file format for the historic operating system improve its efficiency. called "Windows". BMP images can range from black and Bergeron [2] explained that OMR technology could read white (1 byte per pixel) upto 24-bit color (16.7 million marks that have been made in predefined positions (e.g., colors).

Load more