www.csi-india.org ISSN 0970-647X | Volume No. 36 | Issue No. 2 | May 2012 ` 50/- of ArgumentsinC Passing Variable Number Programming.Tips() » Practitioner Workbench on theRise Desi Cover Story Language Computing - System A Speech-to-Text Article

5 Probability Matrices Matching usingMutation Approximate/Fuzzy String Research Front

18

29

12 Plotting withPython Programming.Learn (“Python”) » Practitioner Workbench Sentiment Analysis Opinion Miningand Article

30

22 Automatically “Correcting” SMSText Cover Story Languages Emails andWeb Pages inLocal Article

Underlying Architectures Business Information Systems: Managing Technology » CIO Perspective

9 Maharashtra: ACaseStudy T Article 14 elemedicine intheState of

31

24 CSI Communications | May 2012 | B www.csi-india.org CSI Communications Contents

Volume No. 36 • Issue No. 2 • May 2012

Cover Sto ry Technical Trends Editorial Board Desi Language Computing - on the Rise Extending WEKA Framework for 5 Hareesh N Nampoothiri 27 Learning New Algorithms Chief Editor Satyam Maheshwari and Sunil Joshi Dr. M Sonar “Correcting” SMS Text Automatically Practitioner Workbench Editors 9 Deepak P and L Venkata Subramaniam Programming.Tips() » Dr. Debasish Jana 29 Passing Variable Number of Dr. Achuthsankar Nair Arguments in C Research Front Dr. Debasish Jana Resident Editor Approximate/Fuzzy String Matching Mrs. Jayshree Dhere 12 using Mutation Probability Matrices Programming.Learn (“Python”) » Sajilal Divakaran and Achuthsankar S Nair 30 Plotting with Python Advisors Umesh P Dr. T V Gopal Mr. H R Mohan Articles CIO Perspective Emails and Web Pages in Local Languages Managing Technology » Published by 14 M Jayalakshmi 31 Business Information Systems: Executive Secretary Underlying Architectures Dr. R M Sonar Mr. Suchit Gogwekar A Speech-to-Text System For Computer Society of India Nishant Allawadi and Parteek Kumar 18 Security Corner Design, Print and Information Security »

Opinion Mining and Sentiment Analysis Cyber Crimes on/by Children Dispatch by Jaganadh G 35 CyberMedia Services Limited 22 Adv. Prashant Mali

Telemedicine in the State of Maharashtra: A IT Act 2000 » 24 Case Study Prof. IT Law Demystifi es Randhir Kumar, Dr. P K Choudhary, and S M F 36 Technology Law Issues: Issue No. 2 Pasha Mr. Subramaniam Vutha

Please note: CSI Communications is published by Computer Society of India, a non-profi t organization. Views and opinions expressed in the CSI Communications are those of individual authors, contributors and advertisers and they may PLUS diff er from policies and offi cial statements of CSI. These should not be construed as legal or professional advice. The CSI, the publisher, the ICT@Society: Graphic Texting editors and the contributors are not responsible Achuthsankar S Nair 37 for any decisions taken by readers on the basis of these views and opinions. Brain Teaser Although every care is being taken to ensure Dr. Debasish Jana 38 genuineness of the writings in this publication, CSI Communications does not attest to the originality of the respective authors’ content. Ask an Expert Dr. Debasish Jana 39 © 2012 CSI. All rights reserved. Instructors are permitted to photocopy isolated Happenings@ICT: ICT News Briefs in April 2012 articles for non-commercial classroom use 40 without fee. For any other copying, reprint or H R Mohan republication, permission must be obtained in writing from the Society. Copying for other CSI Report than personal use or internal reference, or of Prof. Dipti Prasad Mukherjee and Dr. Dharm Singh 41 articles or columns not owned by the Society without explicit permission of the Society or the CSI News 43 copyright owner is strictly prohibited.

Published by Suchit Gogwekar for Computer Society of India at Unit No. 3, 4th Floor, Samruddhi Venture Park, MIDC, Andheri (E), Mumbai-400 093. Tel. : 022-2926 1700 • Fax : 022-2830 2133 • Email : [email protected] Printed at GP Off set Pvt. Ltd., Mumbai 400 059.

CSI Communications | May 2012 | 1 Know Your CSI

Executive Committee (2012-13/14) »

President Vice-President Hon. Secretary Mr. Satish Babu Prof. S V Raghavan Mr. S Ramanathan [email protected] [email protected] [email protected] Hon. Treasurer Immd. Past President Mr. V L Mehta Mr. M D Agrawal [email protected] [email protected]

Nomination Committee (2012-2013) Dr. D D Sarma Mr. Bipin V Mehta Mr. Subimal Kundu

Regional Vice-Presidents Region - I Region - II Region - III Region - IV Mr. R K Vyas Prof. Dipti Prasad Mukherjee Mr. Anil Srivastava Mr. Sanjeev Kumar Delhi, Punjab, Haryana, Himachal Assam, Bihar, West Bengal, Gujarat, Madhya Pradesh, Jharkhand, Chattisgarh, Pradesh, Jammu & Kashmir, North Eastern States Rajasthan and other areas Orissa and other areas in Uttar Pradesh, Uttaranchal and and other areas in in Western India Central & South other areas in Northern India. East & North East India Eastern India Region - V Region - VI Region - VII Region - VIII Prof. D B V Sarma Mr. C G Sahasrabudhe Mr. Ramasamy S Mr. Pramit Makoday Karnataka and Andhra Pradesh Maharashtra and Goa Tamil Nadu, Pondicherry, International Members Andaman and Nicobar, Kerala, Lakshadweep

Division Chairpersons, National Student Coordinator & Publication Committee Chairman Division-I : Hardware (2011-13) Division-II : Software (2012-14) Division-III : Applications (2011-13) National Student Coordinator Dr. C R Chakravarthy Dr. T V Gopal Dr. Debesh Das Mr. Ranga Raj Gopal [email protected] [email protected] [email protected] Division-IV : Communications Division-V : Education and Research Publication Committee (2012-14) (2011-13) Chairman Mr. Sanjay Mohapatra Dr. N L Sarda Prof. R K Shyamsundar [email protected] [email protected] Important links on CSI website » Structure & Organisation http://www.csi-india.org/web/csi/structure ExecCom Transacts http://www.csi-india.org/web/csi/execcom-transacts1 National, Regional & http://www.csi-india.org/web/csi/structure/nsc News & Announcements archive http://www.csi-india.org/web/csi/announcements State Students Coordinators CSI Divisions and their respective web links Statutory Committees http://www.csi-india.org/web/csi/statutory-committees Division-Hardware http://www.csi-india.org/web/csi/division1 Collaborations http://www.csi-india.org/web/csi/collaborations Division Software http://www.csi-india.org/web/csi/division2 Join Now - http://www.csi-india.org/web/csi/join Division Application http://www.csi-india.org/web/csi/division3 Renew Membership http://www.csi-india.org/web/csi/renew Division Communications http://www.csi-india.org/web/csi/division4 Member Eligibility http://www.csi-india.org/web/csi/eligibility Division Education and Research http://www.csi-india.org/web/csi/division5 Member Benefi ts http://www.csi-india.org/web/csi/benifi ts List of SIGs and their respective web links Subscription Fees http://www.csi-india.org/web/csi/subscription-fees SIG-Artifi cial Intelligence http://www.csi-india.org/web/csi/csi-sig-ai Forms Download http://www.csi-india.org/web/csi/forms-download SIG-eGovernance http://www.csi-india.org/web/csi/csi-sig-egov BABA Scheme http://www.csi-india.org/web/csi/baba-scheme SIG-FOSS http://www.csi-india.org/web/csi/csi-sig-foss Publications http://www.csi-india.org/web/csi/publications SIG-Software Engineering http://www.csi-india.org/web/csi/csi-sig-se CSI Communications* http://www.csi-india.org/web/csi/info-center/communications SIG-DATA http://www.csi-india.org/web/csi/csi-sigdata Adhyayan* http://www.csi-india.org/web/csi/adhyayan SIG-Distributed Systems http://www.csi-india.org/web/csi/csi-sig-ds R & D Projects http://csi-india.org/web/csi/1204 SIG-Humane Computing http://www.csi-india.org/web/csi/csi-sig-humane Technical Papers http://csi-india.org/web/csi/technical-papers SIG-Information Security http://www.csi-india.org/web/csi/csi-sig-is Tutorials http://csi-india.org/web/csi/tutorials SIG-Web 2.0 and SNS http://www.csi-india.org/web/csi/sig-web-2.0 Course Curriculum http://csi-india.org/web/csi/course-curriculum SIG-BVIT http://www.csi-india.org/web/csi/sig-bvit Training Program http://csi-india.org/web/csi/training-programs SIG-WNs http://www.csi-india.org/web/csi/sig-fwns (CSI Education Products) SIG-Green IT http://www.csi-india.org/web/csi/sig-green-it Travel support for International http://csi-india.org/web/csi/travel-support SIG-HPC http://www.csi-india.org/web/csi/sig-hpc Conference SIG-TSSR http://www.csi-india.org/web/csi/sig-tssr eNewsletter* http://www.csi-india.org/web/csi/enewsletter Other Links - Current Issue http://www.csi-india.org/web/csi/current-issue Forums http://www.csi-india.org/web/csi/discuss-share/forums Archives http://www.csi-india.org/web/csi/archives Blogs http://www.csi-india.org/web/csi/discuss-share/blogs Policy Guidelines http://www.csi-india.org/web/csi/helpdesk Communities* http://www.csi-india.org/web/csi/discuss-share/communities Events http://www.csi-india.org/web/csi/events1 CSI Chapters http://www.csi-india.org/web/csi/chapters President’s Desk http://www.csi-india.org/web/csi/infocenter/president-s-desk Calendar of Events http://www.csi-india.org/web/csi/csi-eventcalendar * Access is for CSI members only. Important Contact Details » For queries, correspondence regarding Membership, contact [email protected]

CSI Communications | May 2012 | 2 www.csi-india.org Satish Babu President’s Message From : [email protected] Subject : President’s Desk Date : 1st May, 2012

Dear Members CSI organized its customary joint ExeCom on 31st March and 1st April, 2012 where the 2011-12 ExeCom demitted offi ce and the new ExeCom took charge. The ExeCom meeting held on 1st April, 2012, discussed several important policy matters and also started the process of constitution of the statutory committees that would steer the activities of CSI during the year. These yearly start-up processes would be completed latest by the month of May, so that they can get going with their business.

WITFOR: One of the fi rst events of the year that was supported by CSI, was the 5th IFIP World IT Forum (WITFOR), held in New Delhi during 16th-18th April, 2012. The Conference, attended by over 950 delegates and • 4th International Conference on Human Computer over 80 speakers from India and abroad, was organized Interaction held during 18th-21st April, 2012 at Symbiosis in partnership with the Department of Electronics and Institute of Design (SID), Pune, organized by IFIP TC-13. Information Technology (DEITY), Government of India. The Many thanks to Prof. Anirudh Joshi. National Organizing Committee of the Forum was headed by the Union Minister of Communications & IT, Mr. Kapil • RACSS-2012: International Conference on Recent Sibal, who inaugurated the Forum at Vigyan Bhawan. The Advances in Computing and Software Systems held speakers at the Conference also included the Minister of during 25th-27th April, 2012 at Dept. of CSE, SSN College State for Communications & IT, Mr. Sachin Pilot. The 2-day of Engineering, Chennai. I convey my sincere thanks to the event focused on the developmental opportunities off ered joint organization committee of CSI Chennai Chapter & by digital technologies in the areas of agriculture, education, Division IV, IEEE Madras Section, and IEEE CS. e-Gov, and health. As we get going with the current year, it is important to plan Nashik Chapter’s 25th Anniversary: It is a pleasure to for diff erent events for the year, in particular Conferences, note that CSI’s Nashik Chapter is entering their 25th year which form an important segment of our activities, and also of activity in 2012. One of the very active chapters of CSI, contribute to the fi nancial stability of CSI. The formal call for Nashik Chapter has been privileged to carry out a number of proposals for events will be put forth shortly, and I request you important activities for its members and other stakeholders, to start the process of planning events in your locations. and also contribute to the national leadership of CSI. I wish Membership Growth: Membership growth is a high-priority the Nashik Chapter, its leaders, and members many more area for CSI. While the growth in student membership is years of adding value to the CSI community and to society satisfactory, the growth in professional and institutional at large. membership has potential for improvement. We are examining Chapter AGMs and New Offi ce Bearers: In most chapters of diff erent mechanisms to enhance professional membership CSI, the Annual General Meetings have been conducted and and attract the new IT professional to CSI. One of the means the new chapter Offi ce Bearers have taken charge. CSI is keen of doing this is to join hands with other societies, including that all chapter Offi ce Bearers - especially those new to CSI - international societies, to provide additional value to our get adequate support when they require it, particularly about members. Another mechanism being explored is the use of the conduct of the business of the chapter and for the conduct social media to build a more accessible community. We hope of events. The key resources for support are your Regional Vice to put in place some of these steps over the next two months President and the CSI HQ. for stimulating membership growth.

Kindly contact your RVPs and the CSI HQ Helpdesk (helpdesk@ csi-india.org) for any aspect where yo u need support. With greetings

CSI Events during April: I convey my sincere appreciation to organizers of following events that took place during the Satish Babu month of April, 2012. President

CSI Communications | May 2012 | 3 Rajendra M Sonar, Achuthsankar S Nair, Debasish Jana and Jayshree Dhere Editorial Editors

Dear Fellow CSI Members,

It’s pleasure to bring to you CSIC issue with cover story on by M. Jayalakshmi supplements and complements the fi rst cover ‘Linguistic Computing’. Computers have aff airs with both story article. Mr. Nishant Allawadi and Prof. Parteek Kumar of programming languages and natural languages. With the wider Thapar University in an article titled “Speech-to-Text System”, penetration of ICT in society, especially in the form of mobile present speech to text conversion using Hidden Markov Model phones, the aff air with natural languages is becoming more (HMM). Concept of sentiment analysis is introduced briefl y by central. While in the case of the programming languages it was Jaganadh G in his article titled “Opinion Mining and Sentiment the programmer who was struggling, in case of natural language Analysis”. computing, the challenge is really for the computer. In a country like India, which is a linguistic cauldron, the Articles section also includes an article titled "Telemedicine problem of linguistic computing is amplifi ed. Organised eff orts in the State of Maharashtra: A Case Study" by S M F Pasha, Randhir are on in India towards this end. Technology Development for Kumar and Dr. P K Choudhary based on their paper submitted at Indian Languages (TDIL) programme launched by the Ministry SEARCC 2011. Technical Trends section is enriched with an article of Communication & Information Technology (MC&IT), Govt. of on “Extending WEKA Framework for Learning New Algorithms” India aims at developing systems to facilitate human-machine by Mr. Satyam Maheshwari and Mr. Sunil Joshi. interaction without language barrier; creating and accessing multilingual knowledge resources; and integrating them to develop innovative user products and services. The programme Google is an important player in the scene also promotes language technology standardization through as the whole world and its languages are of participation in ISO, UNICODE, World-Wide-Web consortium (W3C) and BIS (Bureau of Indian Standards). Of course, Google concern to it. is an important player in the scene as the whole world and its languages are of concern to it. Practitioner Workbench column has a section titled Programming.Tips() and it provides an interesting write-up on Technology Development for Indian “Passing Variable Number of Arguments in C” by Dr Debasish Languages (TDIL) programme launched by Jana. The other section called Programming.Learn("Python") under Practioner Workbench includes information about "Plotting the Ministry of Communication & Information with Python". Technology (MC&IT), Govt. of India aims Managing Technology section of the CIO Perspective at developing systems to facilitate human- column includes an article titled “Business Information Systems: machine interaction without language barrier; Underlying Architectures” by Dr. RM Sonar. It is the third article in creating and accessing multilingual knowledge the series of articles on Business Information Systems. It throws light on various types of architecture starting from single-tier to resources; and integrating them to develop web-based multi-tier architecture and discusses key benefi ts and innovative user products and services. key issues of the respective systems. Information Security section of the Security Corner feature In this issue we have an assortment of articles that touch has an article titled “Cyber Crimes on/by Children” written by basic settings and services related to the use of language on the Advocate Prashant Mali. The article starts with two cases and web and in mobile phones to selected microscopic applications then goes about explaining how a child can be at risk in cyber such as sentiment analysis. (We suppose that readers have noted space and how computing platform can be used for committing that the cover page depicts the CSI web site translated into various crime by children. The IT Act section under Security Corner comes Indian languages by on-line tools). with an article by Advocate Mr. Subramaniam Vutha, wherein he Hareesh Namboothiri in his cover story article titled “Desi demystifi es technology law and provides inputs on electronic Language Computing on the Rise” introduces basic desi-language (Internet-based) contract. settings and services in computers and mobile phones. Another Our ICT@Society covers a curio theme "Graphic Texting". As cover story article on “ ‘Correcting’ SMS Text Automatically” by usual there are other regular features such as Brain Teaser, Ask an P. Deepak and L. Venkata Subramaniam of IBM Research provides Expert and Happenings@ICT. CSI Reports and CSI News are about insight into challenges posed by unusual abbreviations, shortening and omissions, textese or SMS language to conventional electronic various region, SIG, chapter and student branch events. processing of text. Please note that we welcome your feedback, contributions Research Front column brings an article titled “Approximate/ and suggestions at [email protected]. Fuzzy String Matching using Mutation Probability Matrices” by Sajilal D and Achuthsankar S Nair. The article addresses fuzzy/ With warm regards, approximate string matching in Indian languages. Three other Rajendra M Sonar, Achuthsankar S Nair, articles on the cover topic are specialised articles in the Articles Debasish Jana and Jayshree Dhere section. Article on “Emails and Web Pages in Local Languages” Editors

CSI Communications | May 2012 | 4 www.csi-india.org Cover Hareesh N Nampoothiri Story University of Kerala, Thiruvananthapuram Desi Language Computing - on the Rise

English was the fi rst language that got placed in modern computer systems and naturally got accommodated exclusively, to the disadvantage of the other world languages. From the mnemonics used in assembly language, to the programming language keywords, to commands, English embedded itself. Some early programming languages like COBOL almost sounded like English of nonnative speakers of the language. It is easy to weave an Anglo-centric conspiracy story, but in all fairness to the professionals of the yesteryears, it must be remembered that computers were not foreseen then as gizmo gadgets that ordinary citizens all over the world would own. As the popularity of the notebooks, netbooks, and mobile devices shot up, the language problem began to take a central stage and naturally multiple solutions began to emerge. Perhaps the turning point in language computing is the emergence of the Unicode. Unicode is simply a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems[1]. It set the stage for an organized development of a large number of linguistic computing issues. Even though the fi rst version of Unicode was introduced in October 1991, it became popular only in the last decade. As of now, Unicode supports a long list of languages including Indian languages such as Bengali, Hindi, Kannada, Malayalam, Oriya, Tamil, Telugu etc. Now software developers come up with diff erent language packs for diff erent regions and computers are becoming truly desi in this aspect. An example is Microsoft's CLIP (Caption Language Interface Pack) for Visual Studio 2010 in which the author was also associated for developing a language interface pack. Apart from reaching a wider audience through incorporating as many languages as possible, Unicode also opens a wide range of possibilities for developers and service providers to come up with language-based tools and applications for common man. It is not surprising that Google is the one in the lead, tapping the possibilities in this sector. We introduce below a few of the language-based tools from Google.

Google Translate and useful, regardless of the language in patterns in large amounts of text is called Google Translate is a free translation which it’s written[3]. "statistical machine translation". Since the translations are generated by machines, service from Google, which provides How does it work? not all translation will be perfect. The more instant translations between 65 diff erent Google describes the working of Google human-translated documents that Google languages (as of Apr 2012) including some Translate as follows: When Google Translate can analise in a specifi c language, of the major Indian languages like Bengali, Translate generates a translation, it looks for the better the translation quality will be. This Gujarati, Hindi, Tamil, Telugu, and Urdu. patterns in hundreds of millions of documents is why translation accuracy will sometimes Google Translation enables the users to to help decide on the best translation for you. vary across languages[3]. translate words, paragraphs of text, or By detecting patterns in documents that a whole website (using the Translator have already been translated by human In Practice toolkit) from one language to another. translators, Google Translate can make Let's see how it becomes useful in practice According to Google the service aims to intelligent guesses as to what an appropriate by trying to translate a simple paragraph make information universally accessible translation should be. This process of seeking from English to Hindi (Fig. 1). Of course, it

What is Unicode? In early days, there were many diff erent encoding systems for characters used in computers. These encoding systems used to confl ict with one another. That is, two encoding systems may use the same number to represent two diff erent characters or they may use diff erent numbers for the same character. As a result, any given computer was required to support many diff erent encoding systems and even after that the chances of getting data corrupted was very high. To solve this issue, Unicode provides a unique number for every character irrespective of the platform, application, or language. The Unicode Standard has been adopted by most of the leading players of the industry such as Apple, Microsoft, Oracle, IBM, Sun etc. Also it is required by modern standards such as XML, Java, JavaScript, WML etc. It is supported in many operating systems (including Linux distributions), all modern browsers, most of the recent versions of offi ce suites, and many other applications. The Unicode Consortium, a non-profi t organization, is dedicated to develop, extend, and promote use of the Unicode standard. According to them the advantage of using Unicode is: Incorporating Unicode into client-server or multi-tiered applications and websites off ers signifi cant cost savings over the use of legacy character sets. Unicode enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many diff erent systems without corruption[2].

CSI Communications | May 2012 | 5 Translate source window itself. Another option is to copy-paste the typed text from Google Transliteration window. Note: Apart from Google Transliteration, there are many online and offl ine tools available, that will help you to type-in text in Indian languages. For Windows-based systems one may use Indic Input 2 (for Windows Vista / 7) or Indic Input 1 (for Windows XP). By installing this tool, one can type-in text in any text editor (such as Notepad, Wordpad, LibreOffi ce, Writer etc.) by enabling the phonetic keyboard and selecting the appropriate . The tool can be downloaded freely from the BhashaIndia website. URL: http://bhashaindia.com/Downloads/ Here are some amusing translation examples – the lyrics of a Hindi fi lm song (Fig. 4) and our national anthem (Fig. 5). When the Hindi fi lm song lyrics are translated, the tool produces acceptable results but the translation for the national anthem is amusing, to say the least. In short, for simple functional Fig. 1: Google Translator Page: htt p://translate.google.com/ sentences it produces better translations and for creative writings (such as poems) the results may not be of utility. does not produce a grammatically correct in labs) that will help you to type in other Developers can integrate the translation, but it does produce a useful languages without learning the actual application in the websites and it text in Hindi. Apart from providing the keys corresponding to translation of the text, it also provides the the alphabets of that phonetic rendition of the text in English. particular language. One can hear the translated text by Here we will type 'mera clicking the speaker icon. bhArath mahaan' to get There is also an option to rate the 'esjk kjr egku' in Hindi. resulting translation by clicking the tick The transliteration mark. One can rate a particular translation window (Fig. 3) as Helpful, Not helpful, or Off ensive. The provides required tool also off ers alternative translations options to edit and and an option to re-order blocks of words format the text. Google for reconstructing the translated sentence provides transliteration (Fig. 2). API that helps the So what about translating from one developers to enable Indian language to English? For that we transliteration facilities need to type-in the text in the required in their websites. The Indian language. There is another tool transliteration API is from Google, Google Transliteration (still incorporated in Google Translate as well. When a language other than English is selected in Google Translate source window, an option to enable phonetic typing will be available. By enabling the option, Fig. 2: The tool suggests alternati ve translati on one can directly type-in when the user click and hold on a block of words Fig. 3: Google Transliterati on window the required text in the

CSI Communications | May 2012 | 6 www.csi-india.org Alternatively, you may install Google Toolbar or get a bookmark for your language from the Tools and Resources page. URL: http://translate.google.com/ translate_tools

Mobiles & Tablets Too Go Desi! It is not happening with computers alone. Most of the modern mobile devices (Smartphones, tablets etc.) boast the power of computers we had three decades Fig. 4: Hindi fi lm song lyrics translated to English back. Apple Lisa[4] (released in Jan 1983), the fi rst personal computer which off ered automatically translates the website late?tl=hi&u=http://www.csi-india.org GUI, had the processing power of Motorola to another language according to the The tl (target language) parameter 68000 @ 5 MHz. Now the medium range choice selected by the user (Fig. 6). corresponds to the language of your smartphone, Motorola Defy has 800 Mhz Even though the tool does not produce choice (hi for Hindi, tl for Tamil, bn processor. If the memory of Apple Lisa was acceptable results all the time, it will for Bengali and so on) and u is the URL 1 MB RAM (In Lisa 2 only Apple introduced be useful in translating websites to of the website you wish to translate. The 10MB internal hard disk drive!), Motorola local languages (or foreign languages) translated version of the CSI website is Defy has 512 MB RAM, 2 GB internal using the Translator Toolkit provided by shown in in Fig. 7. storage, and it supports microSDHC upto 32 GB! The tablets currently available in the market are even more powerful and we may consider them as minicomputers, only diff erence being the lack of input devices like keyboard and mouse (Of course, they permit to add them too via Bluetooth or USB!). Mobile devices are becoming more popular and the manufacturers are trying to reach mass public by incorporating local language support in their mobile devices. Clearly, the 'desifi cation' is not going to happen in computers alone but it will extend to mobile devices as well. Many of the devices produced by various cell phone/tablet manufacturers like Nokia, Sony, Samsung, LG, Motorola etc. already allow the users to select a language for the phone interface. Entering and displaying Indic languages directly in Fig. 5: Nati onal Anthem translated from Bengali to English mobile devices (for sending messages, for contact details, for writing notes etc.) is still in the development stages. Apple, Google. At least the users will get some the leading mobile device manufacturer, idea about the contents of the website provides local language support in instead of seeing the website in some alien language. What if the website does not provide a translation option by default? Still, it is possible to view the website in a language of your choice. For example, Computer Society of India website does not have an option to switch between languages. But still it is possible to display the website in Hindi or in any one of the 65 languages provided by Google Translator. If you wish to see the CSI website in Hindi, enter the Fig. 6: Sample website with Google Translate enabled using the API. following URL in the address bar: When the user scrolls over the text, the original text will be displayed as a tool-ti p dialogue http://translate.google.com/trans

CSI Communications | May 2012 | 7 their iPhones and iPads based on iOS mobile operating system. Even though many of the other devices from various manufacturers do not have native support for Unicode, there are device specifi c work-arounds available for incorporating Unicode functionality in those mobile devices, especially for devices based on Android platform. Android-based devices from Samsung, LG etc. comes with support for Indian languages by default. In some mobiles, in the keypad itself, the Hindi alphabets are printed along with English alphabets to make entering the text easy as possible. Fig. 8 shows a low- end Android mobile phone from LG using Google Translate. The text produced is then copy-pasted to a message and send. If the party receiving the message has a mobile device with Unicode support, then the text will be rendered correctly or else the receiver will get a series of squares Fig. 7: CSI website translated to Hindi instead of the actual message. It is very obvious that developments in Indian language computing have moved very much to web and mobile platform rather than as stand-alone applications on PCs. The demand for these tools now arise from the common man and not from business or universities. That explains the vibrancy of this fi eld in this current times.

References [1] Wikipedia – Unicode http://en.wikipedia.org/wiki/Unicode [2] What is Unicode? http://www.unicode.org/standard/ WhatIsUnicode.html [3] About Google Translate http://translate.google.com/about/intl/ en_ALL/ [4] Wikipedia - Apple Lisa http://en.wikipedia.org/wiki/Apple_Lisa n

Fig. 8: Hindi text displayed on an Android mobile phone

About the Author

Hareesh N Nampoothiri is a visual design consultant with an experience of more than a decade and worked with government organizations like C-DIT, C-DAC, University of Kerala, and other private organizations. Currently, he is doing interdisciplinary research in ethnic elements in visual design in computer media. He is an author of two books on graphic design and a regular contributor in leading technology magazines including CSI Communications. Kathakli, blogging, and photography are his passions. He has directed a documentary feature on Kathakali and also directed an educational video production for IGNOU, New Delhi.

CSI Communications | May 2012 | 8 www.csi-india.org Deepak P* and L Venkata Subramaniam** Cover * IBM Research - India, Bangalore; [email protected] Story ** IBM Research - India, New Delhi; [email protected] “Correcting” SMS Text Automatically Abstract of phonetic substitution (“to” transformed With the rapidly increasing penetration of to “2”) and character deletion. The same 2moro tomm tomo mobile phones and microblogging, texting word may be transformed by diff erent language is fast becoming the language users to diff erent kinds of noisy variants. tomoro tomorow 2mro of the youth. Characterized by unusual The single word, tomorrow, was observed abbreviations, shortening, and omissions, to manifest in 16 diff erent forms[3,7] in a textese or SMS language poses a challenge corpus of thousand SMSes; a few of them tomra tomorrow tom to conventional electronic processing of are illustrated in Fig. 2. text. In this article, we present an overview SMS normalization refers to the morrow tomora tomrw of recent work on automatically cleaning task of converting SMS text that could be SMS text. noisy into their intended non-noisy form. Fig. 2: Noisy variants of “tomorrow” Thus, an SMS normalization technique Introduction could potentially transform the noisy the word v such that p(v|w) is maximum SMS language, also called textese, SMS itll b gud 2 c u tonite to the clean among the conditional probabilities is becoming increasingly popular version it will be good to see you tonight. with widespread usage of SMS and involving w, i.e. p(.|w). An illustration Most SMS normalization techniques microblogging sites to share information. of the normalization phase appears in need a set of noisy SMSes and their clean Normalization of text written in such lingo, Fig. 4. State-of-the-art techniques use versions that may have to be manually i.e. conversion to their clean versions, is a more sophisticated models than a simple generated, referred to as the training necessary prerequisite to enable electronic formulation of conditional probabilities set. A machine learning algorithm then processing of such text. Conversion of outlined above. We will outline techniques works on such pairs to learn a model. This SMSes to non-noisy versions would that use statistical machine translation learning process is illustrated in Fig. 3. A aid improved speech synthesis to help (SMT) and spelling correction-based simplistic learner may simply learn a set of visually impaired mobile phone users. models in the remainder of the paper. conditional probabilities as a model, with Clean SMSes can be accurately translated p(w’|w) denoting the probability that the Statistical Machine Translation automatically, thus enabling seamless noisy word w is actually a variant of the We now use a toy example to illustrate SMS communication between users of [2] non-noisy word w’: how a simple SMT model may be used diff erent natural languages. # SMSes where w and w' occur in the noisy and to learn the mappings between words and Noise in text is defi ned as any kind clean version respectively p(w'|w)= of diff erence in the surface form of an # SMSes where w occurs in the noisy version SMS1:[ma, my] [hse, house] = 0.5 electronic text from the intended, correct, The normalization phase uses the SMS1:[ma, house] [hse, my] = 0.5 [6] or original text . Under such a defi nition, learned model to normalize (clean) a SMS2: [ma, my] [buk, book] = 0.5 SMS language would qualify to be very noisy input SMS and output the clean SMS2: [ma, book] [buk, my] = 0.5 noisy. The types of noise in SMS text have SMS. Our simple model could be used to [1,6] Table 1: Initi al word alignment confi gurati on been classifi ed into various categories replace each word, w, in the noisy SMS by such as character deletion, phonetic substitution, and word deletion. Common categories of noise and their examples at [Noisy SMS, clean SMS] Pairs the word or phrase level are tabulated in Fig. 1. Many a time, combinations of noise “btw, r u goin 4 d movie” “by the way, are you going for the movie?” categories may be used to shorten long words. For example, tomorrow may often be “itll b gud 2 c u tonite” transformed to 2mro using a combination it will be good to see you tonight” Type of Noise Example ” Character deletion “message” “msg”

Phonetic substitution “to” “2”

Abbreviation “laugh out loud” “lol” Learner Learned Informal usage “going to” “gonna” model “lemme no wen u gt thr” Word deletion “driving“driving backback home”home” “drivin“drivin hm”hm” “let me know when you get there”

Fig. 1: Types of noise in SMS text Fig. 3: Learning process

CSI Communications | May 2012 | 9 [my, house, book]. Since the my house book SMS1:[ma, my] [hse, house] = 0.50 * 0.50 = 0.250 mapping [ma, my] occurs SMS1:[ma, house] [hse, my] = 0.50 * 0.25 = 0.125 ma 1.0 0.5 0.5 in two diff erent alignments, SMS2: [ma, my] [buk, book] = 0.50 * 0.50 = 0.250 hse 0.5 0.5 0.0 each with confi dence 0.5, SMS2: [ma, book] [buk, my] = 0.50 * 0.25 = 0.125 we will initialize the mapping Normalization of word-alignment probabilities per training SMS buk 0.5 0.0 0.5 to have a confi dence of Column-wise normalization 1.0. Similarly, all pairs are SMS1:[ma, my] [hse, house] = 0.250/(0.250+0.125) = 0.67 initialized to the sum of SMS1:[ma, house] [hse, my] = 0.125/(0.250+0.125) = 0.33 my house book confi dences of all alignments SMS2: [ma, my] [buk, book] = 0.250/(0.250+0.125) = 0.67 SMS2: [ma, book] [buk, my] = 0.125/(0.250+0.125) = 0.33 ma 0.50 0.50 0.50 in which they occur. Such a matrix, shown in Table 3: Modifi ed word alignments for SMSes hse 0.25 0.50 0.00 Table 2, is then normalized buk 0.25 0.00 0.50 column-wise so that each word in the probabilities; such an iterative process target vocabulary (i.e. vocabulary of leads to a final converged matrix Table 2: Populated word-word table clean SMSes) has values summing up approximately of the form as shown in to unity. Such a process of creation and Table 4. Thus, an iterative sequence of their noisy variants using the training set normalization of the word-word mapping estimating word-alignment probabilities of SMS pairs. Consider two hypothetical probability tables is illustrated in Table 2. and word-word mappings enables us noisy SMSes, ma hse and ma buk, which In an iterative style, the word- to drill-down to the correct mappings map to their correct variants my house and mapping probabilities may now be used [ma → my, hse → house, buk → book] my book respectively. We will not make to compute refined word alignments for that can then be used to convert a new any assumptions on the preservation of training SMSes. The confidence of each SMS to its clean version in a word-by- word ordering in the noisy variant of the alignment is computed as the product word manner. Though such a simplistic clean SMS. Thus, we have the two possible of the word mappings contained in translation model (called IBM Model 1) word alignments for the [ma hse, my house] the alignment. Thus, the {[ma,house] is very popular, sophisticated SMT pair that we will initialize to being equally [hsr,my]} alignment of SMS1 is assigned models that can learn many-to-many likely. A word alignment for a training SMS a confidence of 0.125 (product of mappings between words are often used is a mapping from each word in the noisy 0.50 from [ma,house] and 0.25 from to achieve more accurate mappings. version to a word in the clean version. [hse,my]). These are then normalized so Such an initial confi guration of SMS word that the confidences of all alignments for my house book alignments are depicted in Table 1. a single SMS sums up to unity. Table 3 ma 0.99 0.00 0.00 Now, we will use these word illustrates this process of refinement hse 0.00 0.99 0.00 alignment probabilities to populate the of word alignment confidences. These word-to-word mapping probabilities can then be used to estimate new word- buk 0.00 0.00 0.99 between the noisy word vocabulary [ma, word mapping probabilities followed Table 4: Converged word-mapping probabiliti es hse, buk] and the correct vocabulary by estimation of new alignment SMT-based Approaches to SMS Noisy Normalization SMS “wot a match, luvd evry bit o it” The SMT paradigm has been found to be the most eff ective among the various paradigms that have been tried for SMS normalization. An adaptation of the [2] Learned traditional SMT models was fi rst used for SMS normalization to learn phrase- model based alignments between the SMS Model and a candidate clean text. This uses a applier phrase-based model instead of the word- based model described above and learns mappings between phrases in clean text and phrases in SMSes using an iterative approach. A comparative study of SMS normalization approaches[5] fi nds that Cleaned “what a match, loved evry bit of it” SMT-based systems are signifi cantly less SMS error-prone than other approaches. Even in cases where a training set of noisy and clean SMS pairs are unavailable, the [4] Fig. 4: Normalizati on process machine translation paradigm has been used by creating a pseudo-translation

CSI Communications | May 2012 | 10 www.csi-india.org T @ O @ D @ A @ Y @ represented as a linear sequence of text is a prerequisite for eff ective hidden states, each state corresponding development and deployment of services G G G G G S 1 2 3 4 5 S 0 ‘T’ ‘O’ ‘D’ ‘A’ ‘Y’ 6 to a token in the grapheme set. In a non- such as text-to-speech and automatic

(a) Graphemic path noisy version, each HMM state would translation. There has been a lot of interest

T A O U D Y E I emit the corresponding token; thus, a in developing techniques to cleanse SMS left-to-right HMM would always emit the text of late. In this article, we outlined the P P P P S 1 2 3 4 S correct word. However, since noise is what 0 6 problem of normalization of SMSes to /T/ /AH/ /D/ /AY/ 2 is to be modeled, each state is formulated their intended clean versions, and briefl y to be able to emit either the corresponding surveyed various techniques that have been S1 “2” (b) Phonemic path grapheme, any other token (represented developed for the purpose. We specifi cally by ‘@’ in the fi gure), or nothing at all focused on the usage of machine translation G1 G2 G3 G4 G5 ‘T’ ‘O’ ‘D’ ‘A’ ‘Y’ (represented as ε). A similar phonemic models, a popular paradigm for accurate

P P P P 1 2 3 4 HMM is represented in Fig. 5(b). decoding of SMS text. S0 S6 /T/ /AH/ /D/ /AY/ The transformation of a phoneme to a S 1 grapheme is itself noisy, and thus, the References “2” (c) Cross-linked emission set only includes the graphemes [1] AiTi Aw, et al. (2006). “A Phrase- Fig. 5: Word HMMs for SMS normalizati on that could possibly map to the phoneme Based Statistical Model for SMS Text associated with the state. The “to” part Normalization”, Proceedings of COLING/ ACL Conference, Sydney, Australia. model based on heuristic-based in “today” may be transformed to the [2] Brown, P, et al. (1993). “The mathematics numeral “2” due to phonemic similarity, estimation of SMS word to clean word of statistical machine translation: and Fig. 5(b) shows how that is accounted mappings. parameter estimation”, Computational for in the phonemic HMM. Hidden Markov Models for SMS Linguistics, 19(2), 263-311. The graphemic and phonemic HMMs [3] Choudhury, M, et al. (2007). Normalization are cross-linked intuitively to produce a “Investigation and modeling of the Another paradigm that has been explored single HMM as shown in Fig. 5(c) (emission structure of texting language”, 1st for SMS normalization is to model graphemes are omitted in the fi gure to Intl. Workshop on Analytics for Noisy omissions and noisy variations explicitly. reduce clutter). Each clean word, along Unstructured Text Data, Hyderabad, Towards this, an HMM-based word with its noisy variants, is used as a training India. model[3] is constructed for each word in corpus to learn the transition probabilities [4] Contractor, D, et al. (2010). a training set of words. A hidden markov and emission probabilities. For example, “Unsupervised cleansing of noisy text”, model may be considered as a set of at the end of the training, state G1 may Proceedings of the COLING Conference, interconnected states, each of which may have an emission probability distribution Beijing, China. [5] Kobus, C, et al. (2008). “Normalizing emit certain values based on their output [‘T’:0.8, :0.1, @:0.1] and an onward state ε SMS: are two metaphors better than probabilities which are then seen in the transition distribution as [G2: 0.6, P2: 0.4]. output. In the formulation proposed in one?” Proceedings of the COLING Such learnt HMMs are then post-processed Conference, Manchester. Choudhury et. al.[3], the noisy variant of a and harnessed using standard techniques [6] Venkata Subramaniam, L, et al. (2009). word is considered to be emitted from a to decode the “clean” version from a “A survey of types of text noise and word’s HMM. noisy word. Such word-level cleansing is techniques to handle noisy text”, Consider the word today; the aggregated to achieve normalization of Proceedings of the Third Workshop on ordered set of graphemes within it SMS text to their clean versions. Analytics for Noisy Unstructured Text is [`t`,`o`,`d`,`a`,`y`] whereas the Data, Barcelona, Spain. corresponding set of phonemes is [/T/, Summary [7] Venkata Subramaniam, L (2010). /AH/, /D/, /AY/]. Fig. 5(a) represents a With increasing popularity of the “Noisy Text Analytics”, Tutorial at the HMM constructed out of the graphemes SMS language through SMSes and NAACL HLT Conference, Los Angeles, (characters, in our context). This is microblogging websites, cleansing SMS USA. n

Deepak P is currently with the Information Management group at IBM Research - India, Bangalore. He received a B.Tech degree in computer science and engineering from Cochin University at , and M.Tech in the same discipline from IIT Madras, India. He is currently pursuing his PhD with the department of computer science and engineering at IIT Madras. His main research interests are in the area of data mining, similarity search, case-based reasoning and information retrieval. L Venkata Subramaniam received the BE degree in electronics and communication engineering from Mysore university, the MS degree in electrical engineering from Washington University, St. Louis, and the PhD degree in electronics from IIT Delhi. He presently manages the Information Processing and Analytics group in IBM Research - India, New Delhi. His research interests include machine learning, natural language processing, speech processing and their applications to data analytics. About the Authors

CSI Communications | May 2012 | 11 Research Sajilal Divakaran* and Achuthsankar S Nair** *FTMS School of Computing, Kuala Lumpur Front **University of Kerala Approximate/Fuzzy String Matching using Mutation Probability Matrices

We consider the approximate/fuzzy string matching problem in Malayalam language and propose a log-odds scoring matrix for score-based alignment. We report a pilot study designed and conducted to collect a statistics about what we have termed as “accepted mutation probabilities” of characters in Malayalam, as they naturally occur. Based on the statistics, we show how a scoring matrix can be produced for Malayalam which can be used eff ectively in numeric scoring for the approximate/fuzzy string matching. Such a scoring matrix would enable search engines to widen the search operation in Malayalam. Being a unique and fi rst attempt, we point out a large number of areas on which further research and consequent improvement are required. We limit ourselves to a chosen set of consonant characters and the matrix we report is a prototype for further improvement.

Keywords – approximate string matching, as a query in Google Malayalam search, we with standard optimization techniques[14] fuzzy string matching, scoring matrix, are directed to documents that contain a to derive the optimal score for each string Malayalam Computing, Language similar word (Payinaayiaram - a common matching and thereby choose matches in Computing. mispronunciation of the original word) but the order of closeness. not the word . This is because Approximate or fuzzy string Introduction പയിനായിരം approximate/fuzzy string matching has matching is in vogue not only in Linguistic Computing issues in non-English not been addressed in Malayalam. In this natural languages but also in artifi cial languages are generally being addressed paper we make preliminary attempts languages. In fact approximate string with less depth and breadth, especially toward addressing this very special issue matching has been developed into a for languages which have small user base. of approximate/fuzzy string matching in fi ne art in computational sciences, such Malayalam, one such language, is one of Malayalam. the four major Dravidian languages, with a as bioinformatics. Bioinformatics deals rich literary tradition. The native language Approximate/Fuzzy String mainly with bio sequences derived from DNA, RNA, and Amino Acid of the South Indian state of Kerala and the Matching Sequences[9]. Dynamic programming Lakshadweep Islands in the west coast The fi eld described as approximate or algorithm (Needleman–Wunch and of India, Malayalam is spoken by 4% of fuzzy string matching in computer science Smith–Waterman algorithms)[11] India’s population. While Malayalam is has been fi rmly established since 1980s. which enable fast approximate string integrated fairly well with computers, Patrick & Geoff [5] defi ne approximate with a user base that may not generate string matching problem as follows: Given matching using carefully crafted scoring huge market interest, such fi ne issues a string s drawn from some set S of possible matrices are in great use in bioinformatics. of language computing for Malayalam strings (the set of all strings composed The equivalent of Google for modern remains unaddressed and unattended. of symbols drawn from some alphabet biologist is basic local alignment search [1] If we were to search Google to look A), fi nd a string t which approximately tool (BLAST) , which uses scoring for information on the senior author of this matches this string, where t is in a subset matrices such as point accepted mutation [3] paper, Achuthsankar, and we gave the query T of S. The task is either to fi nd all those matrices (PAM) and BLOcks of Amino [6] as Achutsankar or Achudhsankar, in both strings in T that are “suffi ciently like” s, Acid SUbstitution Matrix (BLOSUM) . To cases Google would land us correctly in the or the N strings in T that are “most like” the best of the knowledge of the authors, offi cial web page of the author. This “Did s. One of the important requirements to such a scoring system is not in existence you mean” feature of Google is managed by analyze similarity is to have a scientifi cally for any natural language including English. the Google-diff -match-patch[4]. The match derived measure of similarity. The soundex Recently an attempt has been made in part of the algorithm uses a technique system of Odell and Russell[13] is perhaps this direction for English language[7]. The known as the approximate string matching one of the earliest of such attempts to statistics for accepted mutation in English or fuzzy pattern matching[10]. The close/ use such a measure. It uses a soundex was cleverly derived based on already fuzzy match to any query that is received by code of one letter and three digits. designed Google searches. the search engine is routine and obvious to These have been used successfully in In the case of Malayalam, statistics the English language user. However, when hospital databases and airline reservation of character mutations are not easily a non-English language such as Malayalam systems[8]. Damerau-Leveshtein metric[2] derivable from any corpus or any existing is used to query Google, the same facility proposed a measure - the smallest number search engines or other language is not seen in action. When the word of operations (insertions, deletions, computing tools. Hence, data for this പതിനായിരം (Pathinaayiram - Malayalam substitutions, or reversals) to change one needs to be generated to go ahead with word for the number ten thousand) is used string into another. This metric can be used development of scoring matrix system. We

CSI Communications | May 2012 | 12 www.csi-india.org will now describe generation of primary will need to multiply probabilities, which കk കk കk കk data of natural mutation in Malayalam. might result in numeric underfl ow. Hence, കk കഖ കഘ കക we will use a logarithmic transformation. Occurrence and Mutation 2 + 5 2 - 30 2 - 30 2 - 4 Another eff ect that we will use is to convert Probabilities Total Total Total Total from probability to odds. The odds can be Score: 7 Score: -28 Score: -28 Score: -2 Malayalam has a set of 51 characters, defi ned as the ratio of the probability of and basic statistics of its occurrence and occurrence of an event to the probability Table 4: Demonstrati ng use of scoring mutation are required for developing that it does not. If the probability of an matrix in Table 3 on sample approximate a scoring matrix. The occurrence event is p, then odds is p/1-p. We will string matches probabilities are available, derived from however not use this formula directly, but corpus of considerable size in 1971 and defi ne odds for any given match i-j as: Our demonstration has been on a again in 2003[12]. We describe here only a chosen set of consonant characters, but subset of characters in view of economy of it can be expanded to cover all Malayalam space. In Table 1, we give the probabilities characters. For demonstrating more In the above equation, pij is the of one set of consonants, which we have probability that character i mutates to general words, scoring matrix for vowels extracted from a small test corpus of is essential. We have computed the same character j and pj is the probability of Malayalam text derived from periodicals. natural occurrence of character j. Thus and will be reporting it in a forthcoming publication. During our studies, we also the negative score for a mutation of a കഖഗഘങk noticed that the grouping of characters less frequently occurring character will 0.606 0.009 0.044 0.004 0.039 0.297 as done conventionally may not suit be more in this scheme. The multiplier our studies. For example, we found that Table 1: Occurrence probabiliti es of a set 10 is used just to bring the scores to a the character is a possible mutation of selected Malayalam consonants convenient range. Table 3 shows the log- ഹ for , very rarely, even though they are odds score thus derived using occurrence ക not grouped together conventionally. A We then designed and conducted a probabilities and mutation probabilities regrouping based on natural mutations study to extract the character mutation given in Table 1 and 2. These can be used is a work we see as requiring attention. probabilities. We selected 150 words that to score approximate matches and select To the best of our knowledge, our cover all the chosen consonant characters. the most similar one. work is a unique proposition for the A dictation was administered among a Malayalam language, which can be small group of school children (N=30). കഖഗഘങk incorporated into Malayalam search The observed mistakes (natural mutations) 2 15 10 11 -30 -4 ക engines. We would like to reiterate that are tabulated in Table 2 as probabilities. ഖ -30 18 -30 -30 -30 -30 our work is in prototype stage. The It is noted that the sample size ഗ -16 6 11 13 -30 -30 sample size of the corpus as well as the of N=30 is inadequate for a linguistic size of the subjects in the survey is not ഘ -30 3 -30 23 -30 -30 study of this kind. However, as already substantial. The authors hope to expand highlighted, this paper reports a pilot ങ -30 -30 -30 -30 -30 -30 the work with a sizable database from study to demonstrate proof of the k -9 11 0.08 -30 -30 5 which statistics is extracted and then concept. Moreover, the sample size can be the scoring matrix can be made more made larger once the research community Table 3: Log-odds probability of natural reliable. We also propose to validate mistakes (mutati on probabiliti es) of chosen whets the approach put forward by us. the scoring approach with sample trials set of consonant characters (We set score involving language experts. കഖഗഘങk corresponding to 0 as -30. It may be noted that the diagonal elements are strongest in each References ക 0.85 0.25 0.45 0.07 0 0.10 respecti ve column.) [1] Altschul, S F, et al. (1990). “Basic local ഖ 0 0.55 0 0 0 0 alignment search tool”, Molecular ഗ 0.06 0.04 0.47 0.09 0 0 Results, Discussions, and Biology, 215(3), 403-410. [2] Damerau, F J (1964). “A technique ഘ 0 0.01 0 0.85 0 0 Conclusion The prototype scoring matrix we have for computer detection and ങ 000000 designed above can be demonstrated correction of spelling errors”, ACM k 0.08 0.11 0.08 0 0 0.90 to be capable of scoring approximate Communications, 7(3), 171-176. Table 2: Probability of natural mistakes matches and can therefore be a means [3] Dayhoff , M O, et al. (1978). “A model (natural mutati on probabiliti es) of chosen of selecting the closest match. We will of Evolutionary Change in Proteins”, set of consonant characters demonstrate this with an example of Atlas of protein sequence and structure, scoring four approximate matches for 5(3), 345-358. Log-odds Scoring Matrix the word കk. Table 4 lists the scores for [4] Google-diff -match-patch, [Online]. It is possible to use Table 2 itself for the four diff erent matches and the exact Available: http://code.google.com/p/ scoring string matches. However, it might match scores best. The next best match as google-diff -match-patch/, Accessed be unwieldy in practice. For long strings we per the new scoring scheme is കക. on 20 Jan. 2012. Continued on Page 37

CSI Communications | May 2012 | 13 M Jayalakshmi Article Formerly of Vikram Sarabhai Space Centre, Dept of Space, Govt of India

Emails and Web Pages in Local Languages

Emails, text chats, and instant messages character to be produced. This is called (“Chillaksharam”), they can also be will become personalized and more phoneme transliteration. incorporated by these alphanumeric impressive at times, if they are received in On the left window (English language character sequences or from virtual most familiar local languages. Similar is the editing window) you can type and edit keyboards of Unicode characters case with online news and local language the characters according to the target installed in your system. Department of web pages. Those who are less literate in language phonetics (character sound) and Information Technology, Government English as compared to their fl uency in on the right or bottom pane, the vernacular of India has accepted Unicode local languages, feel comfortable with a character will be simultaneously encoding for fonts as Indian standard local language scripted emails/web pages generated. For example, the typed text (on in this regard. compared to the corresponding English left column) will be rendered as follows: Set Up Your System for Local versions of the same. Here the local After you complete the partial language is used in Indian context only. Let or full editing of the English phonetics Language Use us look into some of the specifi c language corresponding to a local language text, If you are using Linux operating system, tools and the languages they support. the local language characters will appear the installation procedure is as follows: To read or write a local language on the text-box (right window pane) in a 1. Download the font fi le from the site scripted text, the required fonts must be present on your computer (PC). sa ri ga ma pa dha ni sa - स िर ग म प ध िन स Devanagari Windows, Macintosh, and Linux operating sa ri ga ma pa Dha ni sa - स िर ग म प ध िन स Hindi systems can use true-type fonts, which sa ri ga ma pa Dha ni sa - സ രി ഗ മ പ ധ നി സ Malayalam are available via downloadable installers. Installation needs to be done only once. sa ri ga ma pa dha ni sa - ஸ ரி க ம ப த னி ஸ Tamil Some web browsers have to be set up sa ri ga ma pa Dha ni sa - స రి గ మ ప ధ ని స Telugu in utf-8 encoding format also. Nothing further is required for reading. sa ri ga ma pa Dha ni sa - ಸ ರಿ ಗ ಮ ಪ ಧ ನಿ ಸ Kannada Now in order to create, edit, and sa ri ga ma pa Dha ni sa - স ির গ ম প ধ িন স Bangala upload (send) texts in local languages, some sa ri ga ma pa Dha ni sa - ସ ରି ଗ ମ ପ ଧ ନି ସ Oriya language converters are to be installed or must be available in your PC. A number of sa ri ga ma pa Dha ni sa - ਸ ਿਰ ਗ ਮ ਪ ਧ ਿਨ ਸ Punjabi language support tools, offl ine and online, free as well as non-free are available on the Unicode font. This local language text thus 2. Run the command: tar -xvzf Hindi.tar. net. This article addresses some of these generated, you can copy and paste on the gz basic tools required to be set up in your PC new mail editing area of the email client, 3. This will create the directory "Hindi" for this purpose. There are keyboard maps in the html editing area of the web inbox, 4. Go into the directory "Hindi" and virtual keyboards supported by offi ce message-box of a chat line, or the web 5. Run the fi le FontInstaller.sh, give the software packages to type directly into the page editing window. Now you are ready to command: ./FontInstaller.sh editor to create or update documents in any upload and dispatch the vernacular script. 6. Now restart your X server This is the basic principle used for local language, which comes along with the OS. language web page creation too. The font is now installed on your machine. But this will be a cumbersome process unless You can also create a new directory, one is not conversant with that particular To generate the vowel accents of local say “myfonts” in /user/share/fonts/ and language typing and editing. Moreover, the language sounds or compound characters copy the required font in “myfonts” in fonts generated out of this process may in that particular language alphabet, a Fedora. not be web-fonts and hence readability will sequence of English characters may have Windows 2000, Windows XP, and be lost. To overcome this, further software to be typed at times. The guidelines for Windows Vista have inbuilt support for conversions and processing may be required this will be generally available in the Unicode encoding at the operating system to make them web loadable. There are transliteration language web page itself. level, but the feature needs to be enabled. some simpler short cuts to overcome these But all tools need not support all processes by sticking to the typing in the languages. Indian languages generally Windows VISTA familiar English keyboard itself. have a maximum of 15 vowel sounds • Go to the Control Panel and then click A number of online and offl ine and 36 consonants. There are compound to the Regional and Language Option. transliteration (language conversion letters formed by combination of Choose the Country - India. according to sound) tools are available free consonants and vowels. Most of these • Click on the keyboard and Languages on the net in the form of html web pages with patterns are handled in these tools. Still Tab and choose the Hindi keyboard. multiple text-boxes, like window panes. One there will be a few left out which have to • EN will appear in the system tray. can type English alphanumeric characters be addressed separately. Left click on the EN or press the (lower and upper cases in combination) In languages like Malayalam ALT+SHIFT keys and choose the according to the sound of the local language where certain words end in half sounds language to type.

CSI Communications | May 2012 | 14 www.csi-india.org With the enabling of Unicode in your system, (tick the Indic) and click OK. Phonemes are the basic building the INSCRIPT keyboard driver and Unicode blocks of phonetics of a language. supported Mangal and Unicode MS Graphemes form as an abstract fonts will be installed in the system. conceptual layer in between physically To download the other keyboard conceivable glyphs and phonemes. drivers, such as Typewriter/Remington, Unicode Consortium is standardizing the Phonetic/Roman, Platform-free and character sets of the world languages. browser-free Open type fonts, fonts Character sets of 30+ languages are converter, keyboard tutor for learning the currently standardized under Unicode. INSCRIPT Typing, Hindi version of Indian A font spanning many Unicode Open Offi ce, and other software free of ranges can be helpful in several practical cost visit the site www.ildc.in applications. For instance, it can provide • Choose the language (Hindi) some scripts and characters that are 2. Click OK (Figure Below). • Click on the ‘Download’ for the hard to find, ease installation of base required software and driver support for many languages, facilitate • A zip fi le will get downloaded documents mixing symbols and language • After unzipping the fi le, run the .exe scripts, and improve appearance of web of that software 3. You will require the Windows 2000 CD pages with mixed symbols and scripts. to enable Indic. Those who use Windows OS (only Option - 3 Open-type fonts Again go to Regional Options NT), 2000, and XP can take advantage Option - 4 Keyboard Drivers and Click on Input Locales. Add those of Unicode. In these operating systems, Option - 5 Fonts Converter languages on which you want to type. it is possible to read, type, print etc. From System tray Click on EN and for Unicode can be enabled in the Windows using Unicode mappings, provided of typing select language. 2000 and later version Operating Systems course that you have the appropriate as under: Unicode Fonts font and keyboard drivers. With the You should fi rst install Windows Files Unicode is a map, a chart of all of the other Windows (95, 98, me), typing in for display of Indic languages. characters, letters, symbols, punctuation Unicode is not really possible. Unicode marks etc. necessary for writing all of the also works on recent Mac operating Enable Indic for Windows XP & above world’s languages. systems. 1. Go to Start-> Control Panel> Regional Graphemes are the basic building Virtual Keyboards & Character Maps & Language Options >Languages Tab-> blocks of a written script. Grapheme is a The combinations of consonants and (Tick the Install fi les for complex scripts...) synonym for a character. In English, there vowels to render the diff erent phonetics and click OK. is one-to-one correspondence between may be rendered by successive hits of key a character and its glyphs (ornamental strokes as given below (Fig. 1). This can marks). Glyphs in a font should comprise easily be rendered faster by transliteration a unifi ed design entity. Font represents packages, generally available as html the graphical form of a script. Fonts are forms as given in the subsequent fi gures therefore formed with a collection of (Fig. 3). Graphemes and glyphs.

2. Click OK (Figure Below).

3. You will require the Windows XP CD to enable Indic. Again Go to Control Panel >> Regional and Language Option >> Click on Language Tab Click on Details and Click on Add for Selection of the language of your choice From System tray Click on EN and for typing select language Enable Indic for Windows 2000 1. Go to Start->Settings->Control Panel- >Regional Options ->Languages->Indic Fig. 1: Key strokes for rendering phoneti cs (+ implies successive hits)

CSI Communications | May 2012 | 15 Conclusion Setting up of your PC for local language reading and writing, installation of fonts, and language converters are a one-time activity. These installations are to be done only once for any typical local language. Rest of the work of reading, typing, editing, and uploading scripts are as easy as any other English language text. Some of the Unicode fonts for Indian languages are: 1. Windows: Arial Unicode MS, Akshar Unicode, ALPHABETUM Unicode, Aparajita, JanaHindi, JanaMarathi, JanaSanskrit, Kalimati, Kanjirowa, Kokila, Sans, Mangal, Raghindi, Roman Unicode, Sanskrit 2003, Fig. 2: A typical keyboard character map for devanagari font Santipur OT, Saraswati5, shiDeva, SHREE- DV0726-OT, Language Converters Transliterate to Hindi SiddhiUni, Sun-ExtA, Thyaka There are a number of free language Rabison, TITUS Cyberbit converters available in Windows and Type your text here See your results here Basic, Uttara, Chrysanthi LINUX. The following list refers to a few namaskaara ueLdkj Unicode, CN-Arial, of them. , Ekushey Azad, Ekushey Durga, Ekushey Offl ine Converters Fig. 3: A typical Hindi transliterati on page Puja, Ekushey Punarbhaba, 1. Indian language converter (ILC) Ekushey Saraswatii, Ekushey - Bengali, Hindi, Kannada, Malayalam, Sharifa, Ekushey Sumit, Free Oriya, Punjabi, Sanskrit, Telugu, and Tamil. , Likhan, Mitra Mono, 2. Scripto0.2.0 – Gujarathi, Gurumukhi, Mukti, Mukti Narrow, Raga, Hindi, Malayalam Fig. 4 Roman Unicode, Rupali, 3. Keraleeyam, Varamozhi, mozhi, Saraswati5, SolaimanLipi, Madhuri - Malayalam Sun-ExtA, UniBangla, Vrinda, 4. Baraha - Kannada, Hindi, Marathi, Conversion Guidelines aakar, Chrysanthi Unicode, CN - Arial, Sanskrit, Tamil, Telugu, Malayalam, Code2000, padma, Rekha etc. Gujarati, Gurumukhi, Bengali, Assamese, Manipuri, and Oriya languages. 5. Hindi Editor For The Unicode™ Standard – Hindi Online Converters Google mails have built in transliteration facility. The language of choice may be selected from a list box in the html text creation of mails. • Aksharamala • Bangla Unicode Converter • Devanagari Editor etc. Some of the other online URLs are • http://www.translatorindia.com • http://www.tamilcube.com • http://unicode.org/resources/online- tools.html Transliteration A typical transliteration software package ILC downloaded from the Internet will look like the following: Fig. 5

CSI Communications | May 2012 | 16 www.csi-india.org 2. Macintosh OS 9: Devanagari MT, 3. Tamil - Akshar Unicode (akshar. 6. Bengali - Arial Unicode MS Devanagari MTS ttf),Arial Unicode MS (arialuni.ttf), All true-type Unicode fonts are 3. Linux: GNU FreeFont, Devanagari, JanaTamil (RRJanaTamil.ttf), Latha portable in LINUX system. Lohit Malayalam, Latha, Valluvar etc. (latha.ttf), ThendralUni (Thendral Uni.ttf) TheneeUni (TheneeUni.ttf), Bibliography More specifi cally, these are the fonts for VaigaiUni (VaigaiUni.ttf) [1] Baraha - Free Indian Language typical Indian language scripts: 4. Telugu - Akshar Unicode (akshar. Software - Typing Software, http:// 1. Hindi - Akshar, Cdac - GIST ttf), Code2000 (code2000.ttf), www.baraha.com Surekh, Gargi (Gargi.ttf), JanaHindi Gautami (gautami.ttf), Pothana2000 [2] Indian language transliteration | (RKJanaHindi.TTF) JanaMarathi (Pothana2000.ttf), Vemana2000 Indian language unicode, (RVJanaMarathi.TTF), Mangal (Vemana.ttf) http://vikku.info/indian-language- (mangal.ttf), Raghindi (raghu.ttf), 5. Kannada - Akshar Unicode unicode-converter Sanskrit 2003 (Sanskrit2003.ttf), (Akshar.ttf), Arial Unicode [3] The Indian Language Converter, Shusha Fonts, Mangal font Mangal. MS (arialuni.ttf), Sample of http://www.yash.info/indian ttf, Hindi for Devanagari, Arial JanaKannada at 25pt JanaKannada LanguageConverter Unicode (ROJanaKannada.TTF from Download this site's code: or ilc.zip. 2. Malayalam - Kartika, Arial Unicode, JanaKannada.zip), Kedage, Mallige The code used is free for use. GNU FreeFont, Lohit Malayalam, (Malige-n.TTF_ RaghuKannada [4] GNU FreeFont: Why Unicode fonts? Meera, dyuthi, rachana, suruma, (RORaghuKannada_ship.ttf), http://www.gnu.org/software/ raghu, Anjali old lipi, ML-Nila Saraswati5 (SaraswatiNormal.ttf and freefont/articles n SaraswatiBold.ttf), Tunga (Tunga.ttf)

M.M Jayalakshmi is a retired scientist/engineer from the Vikram Sarabhai Space Centre, Dept of Space, Govt of IIndia. She was the Webmaster of VSSC intranet & Head of its Enterprise Software Section, Computer Division. HHer expertise are in the area of 1. Computational Numerical Software in Avionics sub-Systems, 2. Microprocessor bbased On-board computers & Telemetry Systems, 3. Quality Assessment of Launch Vehicle Mission Software. DDevelopment of applications software for VSSC intranet. She can be contacted at [email protected]. About the Author

CSI Communications | May 2012 | 17 Nishant Allawadi* and Parteek Kumar** Article * Masters Student, Thapar University, Patiala ** Assistant Professor, CSED, Thapar University, Patiala A Speech-to-Text System

Abstract: Speech-to-Text (STT) can be The conclusion has been derived in sixth A System-on-Programmable-Chip described as a system which converts section. (SOPC) based Speech-to-Text architecture speech into text. This paper discusses has been proposed by Murugan and Balaji. Applications of the Speech-to-Text about the applications of STT system in This speech-to-text system uses isolated health care instruments, banking devices, System word recognition with a vocabulary of aircraft devices, robotics etc. This paper STT system is applicable in hospitals for ten words (digits 0 to 9) and statistical [6] discusses the existing system like SOPC Health Care Instruments . In banking, STT modeling (HMM) for machine speech based Speech-to-Text architecture, is implemented in input devices where credit recognition. They used Matlab tool for architecture for Hindi Speech Recognition card numbers are given input as speech. It is recording speech in this process. The System using HTK and Phonetic Speech widely used in aircraft systems, where pilots training steps have been performed using Analysis for Speech to Text Conversion. give audio commands to manage operations PC-based C programs. The resulting This paper presents the architecture of in the fl ight. Mobile phones are devices which HMM models are loaded onto an Field- the Speech-to-Text system. This paper use STT in its many applications. These programmable gate array (FPGA) for the provides a tutorial to implement STT applications are like writing text messages recognition phase. The uttered word is system. In this, it describes four phases of by speech input, e-mail documentation, recognized based on maximum likelihood development of STT system, namely, data mobile games commands, music player estimation. preparation, monophone HMM creation, song selection etc. STT systems are used An architecture for Hindi Speech tied-state triphone HMM creation and in computers for writing text documents. Recognition System using HTK has been execution with julius. First phase is used It is also used for opening, closing and proposed by Kumar and Aggarwal[7]. The for processing of raw data for further use. operating various applications in computers. proposed system was built as a speech Second phase is used for the training of Battle Management command centres recognition system for Hindi language. the system using monophones. Third require rapid access to and control of large, Hidden Markov Model Toolkit (HTK) phase is used for the training of the system rapidly changing information databases. has been used to develop the system. using triphones. Final phase explains the Commanders and system operators need The proposed architecture has four execution of the system. The paper also to query these databases as conveniently phases, namely, preprocessing, feature highlights the futuristic applications of as possible, in an eyes-busy environment extraction, model generation and pattern Speech-to-Text system. where much of the information is presented classifi cation. The system recognizes the in a display format. Human-machine isolated words using acoustic word model. Keywords: HMM, monophones, interaction by voice has the potential to be dictionary and triphones. The system was trained for 30 Hindi very useful in these environments. Robotics words. Training data was collected from Introduction is a new emerging fi eld where inputs are eight speakers. The developer reported Speech-to-Text (STT) system is a system given in speech format to robots. Robot the accuracy of 94.63%. for conversion of speech into text. It takes processes the speech input command and Phonetic Speech Analysis for Speech [3] speech as input and divides it into small perform actions according to that . to Text Conversion has been given by segments. These small segments are Existing Speech-to-Text Systems Bapat, and Nagalkar[4]. Their work aimed sounds, known as monophones. It extracts There are a number of systems that have in generating phonetic codes of the the feature vectors of the monophones been proposed world-wide for Speech-to- uttered speech in training-less, human and matches them with stored feature Text System. independent manner. The proposed vectors[1]. Hidden Markov Model (HMM) is used to fi nd the most probabilistic result config and gives out the text for the input speech. The system is developed by re-estimating the feature vectors at each step of training proto INPUT using HMM Tool Kit (HTK) commands. dict Wav files monophones hmmdefs hmmdefs The HMM is a result of the attempt to word.mlf Tied-State Monophones macros Data Triphones Execution model the speech generation statistically. prompts HMM Text Preparation HMM with Julius It is the most successful and commonly phones.mlf Creation macros tiedlist Creation used speech model used in speech .grammar MFC files file recognition[2]. This paper is divided into six sections. MFC files monophones Second section discusses about the .voca file applications. Third section highlights vocabulary the existing STT systems. Architecture .dfa .dict file of the STT has been described in the fourth section. Fifth section describes Fig. 1: Speech-to-Text Conversion Architecture the implementation of the STT system.

CSI Communications | May 2012 | 18 www.csi-india.org */sample3 ADVICE ADVICE BOY .grammar .dfa and .dfa BOY CHARLIE CHARLIE DOOR DOOR KICK KICK …(3) .dict file Speech fi les .voca creation .dict Speech fi les are stored in .wav format. These fi les are recorded by a recording tool like audacity. The training text, written Fig. 2: Process of creati on of. dfa fi le and .dict fi le in prompts fi le, is recorded and saved in these fi les. system has four phases, namely, end % NS_E point detection, segmenting speech into sil Vocabulary fi le This fi le contains a sorted collection of phonemes, phoneme class identifi cation % CALL commonly used words of a language along and phoneme variant identifi cation in the ADVICE ae d v ay s with their combination of monophones. class identifi ed. The proposed system uses BOY b oy …(2) This fi le is used as a reference to create diff erentiation, zero-crossing calculation As given in (1), S refers to start symbol of a dictionary for the training words. A and FFT operations. input, while NS_B indicates the beginning snapshot of this fi le is given in (4). Architecture of Speech-to-Text of silence and NS_E indicates end of silence by sil monophone. The data to ABACK [ABACK] ax b ae k System ABACUS [ABACUS] ae b ax k ax s be recognized is given by the keyword The conversion process of speech to text ABALON [ABALON] ae b ax l aa n is divided into four phases, namely, Data SENT which refers to CALL as given in (1). …(4) preparation, Monophones HMM creation, The details of CALL is provided in .voca Tied-state triphones HMM creation and fi le as given in (2). The CALL provides Creation of phones.mlf and Execution with Julius interface as given the recognition of words with their dictionary fi le in Fig. 1. The description of each of these monophones combination. For example, In data preparation phase, wordlist phases is given in subsequent sections. ADVICE has monophones combination and words.mlf fi les are created from of “sil ae d v ay s sil”. The .grammar fi le prompts fi le. The wordlist fi le contains Implementation of Speech-to-Text and .voca fi le are compiled to generate a all the unique words of prompts fi le. The System dictionary fi le and fi nite automata fi le, words.mlf fi le contains the same text as The conversion process of speech to text namely .dict and .dfa fi le, respectively. prompts fi le with each word of prompts is divided into four phases, namely, Data These fi les are required at the time of fi le in a new line. The wordlist fi le creates preparation, Monophones HMM creation, execution of the system as shown in Fig. 2. monophones0 and dictionary fi le, with the Tied-state triphones HMM creation and help of vocabulary fi le. The dictionary fi le Execution with Julius interface as given Training text fi le contains all the training words with their in Fig. 1. The description of each of these Training text fi le is named as prompts. corresponding monophone combination phases is given in subsequent sections[5]. It contains a list of words that are to and monophones0 fi le contains list of all be recorded and the names of their unique monophones. The dictionary and Data Preparation corresponding audio fi les that are to be This phase is used to prepare the data words.mlf fi le generate phones0.mlf fi le as stored. The description of this fi le is given given in Fig. 3. A monophones1 fi le is also for processing in subsequent phases. in (3). It requires grammar fi le, speech fi les, generated in this process without sp i.e. vocabulary fi le and training text fi le as raw */sample1 ADVICE BOY CHARLIE short-pause monophone. input for processing. The processing of DOOR KICK MAID NURSE ONCE Creation of MFC fi les these fi les is explained below. RULE TARGET The .mfc are created from .wav fi les by */sample2 TARGET RULE ONCE Grammar fi les using HCopy command of HTK with the NURSE MAID KICK DOOR CHARLIE [8] In this phase, the grammar of the help of a confi guration fi le, confi g . These BOY ADVICE language in the form of rules is provided .mfc fi les contain the feature vectors for in .grammar fi le and words are provided in .voca fi le. The .grammar fi le is used to vocabulary defi ne the recognition rules. The .voca fi le is used to defi ne the actual words in each monophones word category and their pronunciation Word List wordlist Dictionary information. The description of .grammar Creation Creation dictionary Phoneme fi le is given in (1). Master prompts phones.mlf S : NS_B SENT NS_E Label File words.mlf Creation SENT: CALL …(1) Master Label File The description of .voca fi le is given in (2). Creation % NS_B sil Fig. 3: Master Label File Creati on

CSI Communications | May 2012 | 19 best possible pronunciation. In order to do config this, HVite command is used with words. monophones0 mlf fi le, monophones1 fi le, dict fi le, confi g fi le monophones1 and previously generated HMM fi le and saves it in a new transcript fi le, i.e., aligned. Creating Flat proto hmmdefs mlf. In order to retrain the system, HERest Start Realigning Fixing Silence Monophones macros hmmdefs macros hmmdefs Training command is used two times with newly Models [8] and aligned.mlf Data created aligned.mlf and monophones1 . mfc files macros Re-estimating Tied-State Triphones HMM Creation This phase has triphones creation, tied- phones0.mlf phones1.mlf state triphones creation and training as two important sub-phases as shown in Fig. 5. Triphones Creation and Training aligned.mlf In this sub-phase, triphones are created. Fig. 4: Monophone Creati on and Training Triphone is a combination of three monophones. This greatly improves the .wav fi les and are used in subsequent and saved with name sp. It has 5 states recognition accuracy, because now the phases for training. where state 1 and state 5 are opening and system looks to match a specifi c sequence Monophone HMM Creation closing states. The State 2 and state 4 are of three sounds together rather than only removed from sp model and only a state 3 one sound. For example, ADVICE has This phase is used to create a well-trained is kept in sp model. The HHEd command triphones as given in (6). set of single-gaussian monophones is used to tie sp model with central state HMM. This phase requires a prototype of sil model with the help of monophones1 sil for HMM, .mfc fi les, confi guration fi le, fi le. The script fi le for this operation is ae+d monophone fi les and phones.mlf fi le for given in (5). ae-d+v creating the HMM. Each HMM fi le follows d-v+ay the prototype given in proto fi le. There AT 2 4 0.2 {sil.transP} v-ay+s are a number of monophones in HMM AT 4 2 0.2 {sil.transP} ay-s …(6) fi le. Generally, each monophone has fi ve AT 1 3 0.3 {sp.transP} states. Here, state 1 and state 5 are opening TI silst {sil.state[3],sp. The HLEd command is used to create and closing states, while state 2, 3 and 4 state[2]} …(5) triphones as given earlier in (6). It requires has values for means and variances for its two fi les, aligned.mlf and a script, as shown In this manner, short pauses between corresponding monophone. This phase in (7). is further divided into three sub-phases, spoken words are treated as silence. In order to retrain the system, the HERest namely, creating fl at start monophones WB sp command is used two times with the and re-estimation, fi xing the silence WB sil help of previously generated HMM fi le, models and realigning the training data as TC …(7) given in Fig. 4. .mfc fi les, confi g fi le, phones.mlf fi le and monophones1 fi le[8]. Creating Flat Start Monophones and As the system has been updated by Re-estimation Realigning the Training Data including triphones fi le. The HERest In this sub-phase, HMM fi le is created In case of multiple pronunciations of a command is used two times to train the manually by using default global values of word in dictionary, this phase selects the system with triphones. means and variances. These default values are calculated by HCompV command of config HTK with the help of .mfc fi les and confi g fi le[8]. These values are re-estimated three times using HERest command with the help of previously generated HMM fi le, aligned.mlf hmmdef Creating hmmdefs Creating .mfc fi les, confi g fi le, phones.mlf fi le and Triphones macros [8] Tied-State monophones0 fi le . from hmmdefs Triphones macros Monophones Fixing the Silence Models stats and Training and Training triphones This sub-phase is used to make the tiedlist macros wintri.mlf model more robust to absorb various impulsive noises in the training data. This is done by including short pause mfc files monophone in the HMM fi le and linking it with sil monophone. In order to do this, Fig. 5: Triphones Creati on and Training a temporary copy of sil model is created

CSI Communications | May 2012 | 20 www.csi-india.org of that language and training the system with training text of that language. References [1] A. Kemble Kimberlee, “An Introduction to Speech Recognition (Unpublished work style),” unpublished. [2] Aymen M., Abdelaziz A., Halim S., Maaref H., “Hidden Markov Models for automatic speech recognition”, in International Conference on Communications, Computing and Control Applications (CCCA), Hammamet, Tunisia, 2011, pp. 1-6. [3] Balaganesh M., Logashanmugam E., Aadhitya C.S., Manikandan R., in International Conference on Emerging Trends in Robotics and Communication Technologies (INTERACT), Chennai, India, 2010, pp. 12-15. Fig. 6: System Executi on [4] Bapat Abhijit V., Nagalkar Lalit K., “Phonetic Speech Analysis for Speech to Text Conversion”, in IEEE Region 10 Tied-State Triphones Creation and -hlist tiedlist Colloquium and the Third International Training -h hmm15/hmmdefs -dfa sample.dfa Conference on Industrial and In this sub-phase, diff erent triphone states Information Systems, Kharagpur, India, -v sample.dict are tied together in order to share the data 2008, pp. 1-4. -smpFreq 48000 …(8) and to make the system more robust. In [5] “Create Speaker Dependent Acoustic order to tie states, the HHEd command is The julian command is used to execute Model Using Your Voice”, http:// used with previously generated HMM fi le the system. It requires julian.conf and mic www.voxforge.org/home/dev/ and triphones fi le. This command creates as parameters. After execution of this acousticmodels/windows/create. [6] Grasso Michael A., “The Long-Term tiedlist fi le that is used in further training command the system prompts the user to Adoption of Speech Recognition in of the system. Since, the system has been speak the sentence as given in Fig. 6. updated with new fi le tiedlist, the HERest Medical Applications”, in 16th IEEE Now the speaker can speak input Symposium Computer-Based Medical command is used to retrain the system sentence and the system will give its two times with newly created tiedlist fi le. Systems, New York, NY, USA ,2003, corresponding text. pp. 257-262. Execution with Julius Interface Conclusion [7] Kumar Kuldeep and Aggarwal R.K., “Hindi Speech Recognition System Julius is as an interface used to execute A Speech-to-Text system for small using HTK”, J. of International Journal STT system. Julius requires four fi les, .dfa vocabulary can be developed by using HTK of Computing and Business Research, commands. As discussed in the architecture, fi le, .dict fi le, previously generated HMM vol. 2, pp. 3-7, 2011. there are four phases in the development of fi le and tiedlist fi le. The fi rst two fi les, .dfa [8] Steve Young, Gunnar Evermann, Mark fi le and .dict fi le, have already been created the STT system. The above discussed STT Gales, Thomas Hain, Dan Kershaw, in phase 1 and HMM fi le and tiedlist fi le system is speaker dependent. To make this Xunying (Andrew) Liu, Gareth Moore, have been created in phase 4. In order to system speaker independent, adaptation Julian Odell, Dave Ollason, Dan Povey, execute the system, these fi les are passed technique is required. Valtcho Valtchev and Phil Woodland, as parameters in its confi guration fi le, i.e., A STT system for other languages can The HTK Book, Cambridge University julian.conf as given in (8). also be developed by using monophones Engineering Department, 2009. n

Nishant Allawadi is pursuing Master’s of Engineering in Computer Science at Thapar University, Patiala. He has received his Bachelor of Technology degree from Guru Jambheshwar University of Science and Technology, Hisar (Haryana) in the year 2010. He is doing his ME thesis in the fi eld of Natural Language Processing. Parteek Kumar is Assistant Professor in the Department of Computer Science and Engineering at Thapar University, Patiala. He has more than thirteen years of academic experience. He has earned his B.Tech degree from SLIET and MS from BITS Pilani. He is pursuing his Ph.D in the area of Natural Language Processing from Thapar University. He has published more than 50 research papers and articles in Journals, Conferences and Magazines of repute. He has undergone various faculty development programme from industries like Sun Microsystems, TCS and Infosys. He has co-authored six books including Simplifi ed Approach to DBMS. He is acting as Co-PI for the research Project on Development of Indradhanush: An Integrated WordNet for Bengali, Gujarati, Kashmiri, Konkani, Oriya, Punjabi and Urdu sponsored by Department of Information Technology, Ministry of Communication and Information Technology, Govt. of India. About the Authors

CSI Communications | May 2012 | 21 Jaganadh G Article Consultant in Text Analytics and Free and Open Source Software

Opinion Mining and Sentiment Analysis

Introduction consumer opinion in real-time gave birth holder who experiences the object and It is human to have opinion on whatever to a new fi eld of study in Natural Language expresses the opinion and the opinion. Processing and Computational Linguistics. may be experienced in his/her life. Opinion Mining/Sentiment Analysis Opinion is expressed with the help of The very fi eld is called as "Opinion Information contained in any text language either as written or spoken. Mining". Sentiment Analysis, Sentiment document can be either subjective or Human being used to mine opinion in a Mining, Opinion Mining, Review objective or both. Subjective text will natural way whenever he started living Mining, Opinion Detection, Sentiment be mostly contains positive or negative as social being. All his/her adventures Detection, Subjectivity Detection, Polarity opinions, while objective text will be facts. or new procurement etc. were subject to Classifi cation, Semantic Orientation, and So the art Opinion Mining and Sentiment the opinion mining. Before wearing a new Appraisal Extraction etc. refers to same Analysis tries to identify subjectivity and apparel for a public function, or buying state of the art. The current article aims objectivity of a text and further identifi es some appliances or before watching to give a brief introduction to Opinion polarity of subjective text. The polarity of a movies people solicited opinions from Mining, its technical aspects and business text will be either positive or negative or a friend, family and others. They mined applications in real-world. mix of both. The polarity of objective text is the entire opinion collection with worlds considered as neutral. In short Sentiment complex opinion mining system "human Opinion Analysis is automated extraction of brain". When human being entered in To get a deeper insight on the art lets see subjective content from digital text and to a consumer oriented world corporate what is the defi nition, structure and social predicting the subjectivity such as positive and non-corporate establishments role of opinion. or negative. It aims to explore attitude started producing, selling and advertising We are surrounded with opinions of a person who created the text. It used their products/services. Corporate than facts in our life. Oxford Dictionary Natural Language Processing and Machine establishments used media to advertise defi nes opinion as (a) 'a view or Learning principles to spot linguistic their service/product which eventually judgment formed about something, not structures that determines polarity. leads to word-of-mouth advertisement necessarily based on fact or knowledge' and sales opportunities. Non-corporate (b) 'a statement of advice by an expert on Detecting Sentiment from Text establishments were almost depending a professional matter'. Opinion is more We can perform three level of sentiment word-of-mouth publicity only. In both of or less results from state of mind when analysis over a subjective text, document the scenarios people buys and experiences, we experience something in our day level sentiment analysis, sentence level then they expressed their opinions on the to day life. Based on the socio-cultural sentiment analysis and faceted sentiment same. These opinions were key factors in standard of the person he the sentiment/ analysis or feature level sentiment determining the market of services and opinion express with the help of linguistic analysis. Document level sentiment products. Corporate establishments were units appropriate to the mental state, analysis aims to detect the sentiment keen to understand the customer opinions experience and situation. This expression of whole document. It is quite obvious and to derive Business Intelligence may be an appraisal or a negative that there are less chance that a single from it. So they conducted surveys to comment up to the extreme of using document may contain 100% positive or understand customer needs satisfaction sarcasm or un-parliamentary words. It negative sentiment. But still the sentiment and dissatisfaction. Consolidated is also quite natural that people may analysis predicts the predominant reports on such surveys helped them compare stuff s when expressing opinion. sentiment expressed in the document. to improve product, marketing startegy There are other kind of opinion which (Predicting polarity of a full length review and even withdraw product from market comes from experts or experienced from http://www.rottentomatoes.com/.) to avoid loss and manage reputation. It people. In social life we call them as trust Sentence level sentiment prediction aims can be called as a second generation of worthy source of opinion. They provides to identify polarity of a given sentence in Opinion Mining. The advent of Web 2.0 comparative and structured opinion on a text. Faceted sentiment analysis aims to based technologies and tools opened the topics we seek advice. Such people predict polarity of sentences or phrases wast window to express and share are there in the online community too. In which deals with attributes of object under opinions. Thus the opinions reached to terms of business and marketing strategy question (such as predicting sentiment of a wide audience across the globe. Also we can call them as 'infl uence leaders' features related to mobile phone from a the opinions expressed through web or 'infl uencer'. We can categories such textual review). platforms such as social media (Twitter, infl uencers as trust worthy and non-truth There are diff erent ways to identify Facebook etc.) created opportunity to worthy infl uencers too, because some and predict sentiment from text. They create real-time sharing of opinion; which are biased people. Now we can observe are lexicon based, Natural Language leads to real-time market up and down for a structure for the opinion; an opinion Processing based and Machine Learning corporate and similar entities. Deriving requires an object (a brand/product based techniques. There is no harm in Business Intelligence from heavy fl ow of such as mobile/movie etc.), opinion trying hybrid approaches to obtain the

CSI Communications | May 2012 | 22 www.csi-india.org results. In lexicon base approach pre- such false infl uencer and spam content tools available for performing Natural populated list of words with sentiment is another major challenge in this area. Language Processing and Machine probability will be used to spot key There are other interesting challenges in Learning tasks. Also wast amount of sentiment indicators. The approach is sentiment analysis such as identifi cation consumer generated text data, prepared quite straightforward; read a text, consult of sarcasm and using deep semantic for sentiment analysis task is available on with lexicon fi nd probability value, sum pragmatic concepts to determine granule internet. Tools like GATE, NLTK, Apache the probability and get the highest level emotion expressed in text. Even Mahout, Weka, Rapidminer, KNIME, probability class. Similarly, we can use though industry adopted it as a technology OpenNLP etc. can be used to develop your to earn, still there are open ended issues a pre populated list of positive and own sentiment analysis system. negative words to predict the sentiment and challenges to be resolved. References too. A combination of linguistic rules and Business Applications Natural Language Processing Techniques Sentiment Analysis will be the most [1] Bing Liu (2010). "Sentiment Analysis can be used to spot opinion indication widely adopted art from Natural Language and Subjectivity". Handbook of and predict the sentiment. Generally, Processing to Business and Business Natural Language Processing, Second such rules will be fi nding adjective noun Intelligence Applications. Popularity of Edition, (editors: N. Indurkhya and sequences and examines context rules social networks and high volume of user F. J. Damerau), 2010. to get the polarity. For example, in the generated content, especially subjective [2] Peter Turney (2002). "Thumbs sentence “Service of XYZ mobile phone content caused the heavy demand to Up or Thumbs Down? Semantic is not good” 'good' is a positive word but adopt sentiment analysis in business Orientation Applied to Unsupervised the presence of negation 'not' contradicts applications. Since it mainly deals with Classifi cation of Reviews". the polar nature of the word. Or simple consumer centric content the very art can Proceedings of the Association for negative and positive word combination be called as “Marketing Research 3.0”. Computational Linguistics (ACL). creates a negative expression. Adjective Sentiment Analysis helps corporates to pp. 417–424 or adjective noun sequences can be get customer opinion in real-time. This [3] Bo Pang; Lillian Lee and Shivakumar identifi ed with POS tagging, chunking real-time information helps them to Vaithyanathan (2002). "Thumbs or with parsing. Once the chunks are design new marketing strategies, improve up? Sentiment Classifi cation using identifi ed we can apply rules to identify product features and can predict chances Machine Learning Techniques". the polarity. In machine learning based of product failure. It is not applied only Proceedings of the Conference approach a sample data will be populated in consumer centric applications. It can on Empirical Methods in Natural to train a selected algorithm. The be used in Politics and diplomacy to populated data will be manually classifi ed get clear picture of peoples mentality Language Processing (EMNLP). by the polarity value. The trained models about election campaigns and strategic pp. 79–86. will be used along with algorithms to policies and bills. Sentiment Analysis can [4] Michelle de Haaff (2010), predict the sentiment. Since the article even predict the eff ectiveness of “viral Sentiment Analysis, Hard But Worth is very short not details of the process Marketing” and chances of ups and downs It!, CustomerThink, retrieved involved in each methodologies omitted in stock prices too. 2010-03-12. deliberately. I hope I can cover it in a There are good number of [5] Lipika Dey, S K Mirajul Haque later note. commercial as well as free sentiment (2008). "Opinion Mining from analysis services. Radiant6, Sysomos, Challenges in Sentiment Analysis Noisy Text Data". Proceedings of Viralhealt, Lexalytics, AiAiO Labs, etc. Language is the most wonderful, dynamic the second workshop on Analytics are some of the top commercial players and mysterious phenomena in the for noisy unstructured text data, in the fi eld. There are some free tools like universe. Language and its structure is the pp. 83-90. twittersentiment.appspot.com too exist. primary challenge in Sentiment Analysis. [6] Minqing Hu; Bing Liu (2004). Especially the language or “slanguage” I Would Like to Develop a Sentiment "Mining and Summarizing Customer used social networks like Twitter and Analysis System !! Reviews". Proceedings of KDD 2004. Facebook. As like in the society there are It is not rocket science. Even you can [7] Pang, Bo; Lee, Lillian (2008). Opinion false infl uences or false opinion leaders develop a sentiment analysis system. Mining and Sentiment Analysis. Now who works for money. Identifi cation of There are lots of Free and Open Source Publishers Inc. n

Jaganadh G is a Natural Language Processing and Machine Learning Developer and Researcher with experience in Sentiment Analysis, Information Extraction, Machine Translation, Spell checker Development, Automatic Speech Recognition (ASR), Text to Speech System (TTS), Internationalization of Domain Names (IDN), Localization, Perl and Python programming. Experienced in preparing software documentation according to ISO and IEEE standards. Well versed in GNU/Linux operating system. A smart Computational Linguist with abilities in developing algorithms for Machine Translation and related NLP fi eld. (365Media Pvt. Ltd., Project Lead (NLP), Coimbatore, Tamilnadu, India, AU-KBC Research Centre, Chennai, C-DAC, C-DIT, Rashtriya Sanskrit Vidyapeeth, Thirupathi, Andhrapradesh) About the Author

CSI Communications | May 2012 | 23 Randhir Kumar*, Dr. P K Choudhary**, and S M F Pasha*** Article * PhD Candidate of AISSR, at University of Amsterdam (The Netherlands) ** HOD University Department of Sociology, Ranchi University, Ranchi *** Assistant Manager,Computer Society of India

Telemedicine in the State of Maharashtra: A Case Study

Abstract: The Government of The Maharashtra State Telemedicine Specialist End Patient End Maharashtra telemedicine project was project is a part of larger initiative operationalised in the year 2007 and undertaken by Government of India and SH 1 S DH 1 U since then it has taken a path to expand World Health Organisation. Under the B SH 2 its outreach and number of benefi ciaries. banner of National Rural Health Mission D This instance provides an example of how (NRHM), Telemedicine is one of the key I SH 3 S the modern ICT can be gainfully used for initiatives to improve the health services T DH 2 R benefi tting the masses, who till now were for the rural people of India. SH 4 I C deprived from getting advanced medical T

The General Framework of SH 5 care. The attempt of this case study is to DH 3...26 H Telemedicine Project in Maharashtra O document the path taken by the Health S The overall network of Telemedicine in P Ministry of Maharashtra in implementing I the telemedicine successfully. Maharashtra can be classifi ed under two T DH 27 A broad subheadings, viz. L Key Words: NRHM, HER, Specialist End, 1. Specialist End S 2. Patient End Patient End, Teleradiology Fig. 1: An overview of Telemedicine frame Introduction work in Maharashtra. SH stands for fi ve Specialist End: The Specialist end consists Telemedicine is an umbrella term Specialist Hospitals who provides consultati on of Five Medical colleges. The medical services. DH stands for District Hospitals (27 which involves all the medical activity colleges that have been developed in number) which once again has 4 subdistrict having an element of distance (Wotton, as specialist end are KEM Hospital hospitals each. Each sub-district hospital 1998). Although, telemedicine has Mumbai, B. J. Medical College Pune, further has several primary health centers (not been practiced since hundred of years GMC Aurangabad, GMC Nagpur, Sir J. depicted in the fi gure). by means of letters (See, Wotton, n.d), J. Hospital Mumbai. Nanavati Hospital but with advancement of Information specialist centers and patient centers. at Mumbai has been made has honorary and Communication Technology, there Additionally, it also provides consultation specialist centre. has been a manifold increase in using service for the referred patient through The J. J. Hospital at Mumbai has a telemedicine as a tool for delivering teleconference. dual role to play. It acts as main server medical treatment. Telemedicine not centre for coordinating between the only includes the real time consultation Patient End: The patient end constitutes between patient and expert, but it also has of 27 districts hospitals of Maharashtra the element of getting medical advises on Thane Bombay (See, Annexure 1). Furthermore 4 Sub prerecorded medical data such as in the Alibagh Nashik district hospitals in each district acts [1] case of ‘teleradiology’or ‘telepathology’ . Pune Satara as centers where patient from nearby A more sophisticated model has been areas come for consulting the doctors. using it extensively for providing health Ratnagiri Osmanabad All the district and sub district hospitals care benefi ts to the unprivileged people. Latur Bid are equipped with modern state of art These interventions usually are taken Ahmednagar Parbhani telecommunication network system for in the form of welfare projects involving carrying out teleconferences. The Sub- substantial investment, coordination and Jaina Aurangabad District hospitals are further sub-divided planning. Jalgaon Buldana into Regional Hospital (RH) and Primary The Government of Maharashtra Amravati Wardha Health Centre (PHC). launched its pilot project on Telemedicine The diagrammatic representation of in the year 2007, with one Specialist Nagpur Chandrapur the present set up has been depicted in node at KEM Hospital, Parel, Mumbai Garhchiorli Bhandara Fig. 1. and 5 sub district hospitals. The prime Gondia Nandurbar target areas for this intervention were Technical Support: The fi rst phase of tribal areas such as those of Sindhudurg, Hingoli Washim telemedicine was technically supported Nandurbar, Beed and Satara. The second Sindhudurg by Indian Space Research Organisation phase of expansion involved participation (ISRO) who provided their expertise in Annexure 1: Name of the districts where of 5 specialist node, 23 district hospitals Telemedicine have been implemented. network connectivity. Initially there were and 4 sub-district hospitals. serious troubles with internet connectivity

1 Radiology is specialized medical branch which involves using of imaging technologies (X-Ray, MRI, CT Scan etc.) to identify and treat the anomalies in human body. Pathology involves with identifi cation of diseases based on laboratory analysis.

CSI Communications | May 2012 | 24 www.csi-india.org as many times the connection would be cost and improve the quality of treatment The CME division (Continuing Medical snapped. Later, this trouble was solved by by getting specialist opinion at their place of Education) is very proactive in dissipating using dedicated lease lines of fi ber optic residence only.” In a way this was a positive the latest knowledge or medical cases cables having a high bandwidth capacity. development for rural folks who did not to the staff. At regular intervals of Thereafter a medical equipment supplier have an idea of how and where to go for time CME is organised and along with company “Progonosis” provided facilities a particular type of disease of illness. technical knowledge various attitude for video conferencing along with other Furthermore, by early detection of serious and behavioral skills related session are basic medical equipments such as those life threatening illness such as in cancer delivered, which in turn helps in creating of scanner, BP apparatus etc. patients, lives can be saved by early improved clinical performance and detection and timely intervention. professional development. Additionally, Management Structure Another key benefi cial feature of via tele-conferencing between medical The whole project has a Mission Telemedicine intervention is its ability colleges and district hospitals computer Managing Director (MD) under whom to build and maintain a central database specific skill set are imparted to equip there are several Joint Directors followed having all the details pertaining to the medical professionals to trouble by Assistant Directors. All three positions patient medical history and treatment shoot minor technical problems. administered to him/her. This means together form the top management Impact and outreach of Telemedicine who make the critical decisions in the that there is one centralized monitoring hub from where all the data can be in Maharashtra implementation of overall project. The telemedicine drastically reduced the Additionally, independent consultants are accessed from any remote location at a given point of time. This also means that time taken for seeking an expert advice. hired for giving their expertise from time According to Ms. Tayde earlier the wait to time. patient digitized data related to X-Ray, CT scan, Pathology report etc. are easily period for a patient to seek an appointment The ground level day to day operations with specialist was on an average of three are taken care by the coordinators and accessible and opinion from diff erent specialist can be sought before deciding months. However, now the wait time has facility managers of technical support reduced drastically as they can divert the services. Each district has nodal offi cer a particular course of treatment. It also ensures completeness and correctness patient digital information to any expert who is responsible for overall day to day who is willing to handle the case. The operation of telemedicine project at their of information and past data records are often utilized by specialist for better junior doctors involved in district hospital district. Other than these managerial and also learn in this whole process of referring support staff , a whole set of dedicated management of health care doctors both at Specialist and Patient services. The Telemedicine S.No. Specialty Patients Referred Opinion Received End are involved in the consultation and from District from Specialty treatment of patients. The doctors are system in Maharashtra has (April 2010 to Centers (April 2010 March 2011) to March 2011) not paid any extra by the government for been equipped to seamlessly consulting patient through telemedicine. capture and upload patient 1 Medicine 1059 1032 information, waveforms and However, an honorary sum of Rs. 100/- 2 Surgery 344 316 images from remote location and Rs. 300/- are paid to the doctors of 3 OBGY 146 207 District Hospital and Specialist Hospital to a centralized server and 4 Pediatrics 393 387 per patient. get experts opinion or review instantly within the network 5 Cardiology 65 51 (intranet) or at a later point 6 Neurology 45 44 The Motto of Telemedicine of time. An Electronic Health The primary motive of implementing a Record (EHR) is generated for 7 Anesthesia 28 29 pan state telemedicine network was to each patient and is archived in 8 Chest 25 23 provide a better access of super-specialty digital format. During cardiac 9 Ophthalmology 24 24 medical care to the residence of remote arrest or other emergencies, 10 Skin VD 85 83 areas where they either do not have the ECG and other relevant suffi cient time or lack enough resource to data can be instantly 11 ENT 76 43 travel to big cities for advance treatment. transmitted and the doctors at 12 Orthopedics 278 287 Highlighting the present medical system remote location can suggest a 13 Psychiatry 40 40 Nodal Offi cer of Mumbai area, Ms. course of action based on the 14 Radiology 1301 1400 Sandhya Tayde apprised that “The areas live data. targeted for telemedicine intervention had Other than consulting 15 Ayurvedic 68 30 a poor access to trained doctors or medical and archiving medical 16 Unani 155 160 staff . Furthermore, due to the distance factor data, Telemedicine has 17 Forensic 38 36 and cost involved in seeking a fi rst hand been innovatively used specialist opinion was both time consuming in Maharashtra to train Table 1: Specialty wise pati ent referred and opinion received and costly aff air. We using telemedicine have and develop medical staff for the same in the year 2010-11. tried to reduce the time of intervention and personnel at patient end. Source: Arogya Bhavan, CST Mumbai.

CSI Communications | May 2012 | 25 Year Patient Opinion opinion received for them, from the year and morbidity thus saving more lives by Referred Received 2008-11. ensuring continuity of care throughout the Thus one can observe from the table network. 2008-09 538 448 above that the telemedicine has been The present setup of Telemedicine 2009-10 3640 3739 quite popular among its end user and network in Maharashtra is one of the largest has been catering for the service needs in India. Telemedicine intervention has been 2010-11 4230 4195 of the poor and unprivileged rural people successful in reducing travel by patient residing in remote areas of Maharashtra. and therefore saving their costs involved in Total 8408 8382 travel, food, accommodation along with pay Conclusion and the way ahead loss due to taking leave from regular work. Table 2: The number of pati ent referred The development in telecommunication It also meant less fl ocking of patient in the through telemedicine and expert opinion technology has given birth to modern specialty hospital and the doctors can give received for the referred cases. telemedicine, which has found its way Source: Arogya Bhavan, CST, Mumbai. their opinion by looking the digitized data into improving the health services for of patients, according to their convenience. the unprivileged masses. Maharashtra the cases and having a discussion with the Telemedicine has also reduced cost involved has successfully implemented the specialist over tele-conference. At times in training and development of medical staff telemedicine across its districts in two special rural camps on community health for Primary Health Care center. Therefore, phases. In the fi rst and second phase of the and ophthalmology are organised through telemedicine is a perfect instance where project, all the district and sub divisional telemedicine equipments mounted on amalgamation of technology and social hospitals have been linked with the state mobile vans. cause has resulted in welfare of deprived medical colleges. Now Maharashtra The kind of specialist services masses. extended through telemedicine is in 30 government is planning to implement the area of medicine which is quite broad. phase 3 of the project which proposes to References The key and most used specialist services link all the Primary Health Care Centers [1] Wootton R. (1998) Telemedicine in are related to cardiology, dermatology, (PHC, Primary level) to medical colleges the National Health Service, J R Soc pathology, ophthalmology, ENT, surgery (tertiary level). This means creation of Med. Vol. 91, No. 12, pp. 614-21. (consultation), neurology and medicine. a complete network of primary (PHC), [2] Wootton R. (n.d) ‘Telemedicine’ in The data related to the number of patient Secondary (District Hospitals) and Lock S, Dunea G, Pearn J, (eds.) referred in the year 2010-11 has been Tertiary (Medical Colleges) for ensuring Illustrated Companion to Medicine, summarized in table 1. Table 2 summarizes proper and better care of the patients. This UK: Oxford University Press (in the total number of patient referred and network is expected to reduce mortality press). n

About the Authors

Randhir Kumar is a PhD candidate of AISSR (Amsterdam Institute of Social Science Research) at University of Amsterdam (The Netherlands). He secured his Masters degree in 'Globalisation and Labour Studies' from Tata Institute of Social Sciences (Mumbai); after which he worked as a Research Associate in Personnel Management and Industrial Relations Area of IIM (Ahmedabad).

Dr. P K Choudhary (Double MA, PhD and NET JRF) is a HOD of University Department of Sociology, Ranchi University, Ranchi. Having more than 17 years of Experience in Research and Teaching at University level, he is Program Committee chair for various national and International Conferences. He has written several articles and books on various societal issues. Considering his knowledge and expertise, State and Central Governments have given him additional authority to lead various Development Projects of Jharkhand.

S M Fahimuddin Pasha is an Assistant Manager at Computer Society of India. He has done M A in Globalization and Labour from Tata Institute of Social Sciences (Mumbai) and MA in Sociology from Ranchi University . He is on the verge of completing his PhD in Industrial Sociology. He is also a Researcher with International Institute of Social History (Amsterdam, The Neetherlands) and an invitee to the University of Leipzig, (Germany) to adress on the issues of 'Detorization of Working Class'.

CSI Communications | May 2012 | 26 www.csi-india.org Satyam Maheshwari* and Sunil Joshi** Technical * Assistant Professor, computer applications in SATI Degree, Vidisha (MP) Trends ** Assistant Professor, computer applications in SATI Degree, Vidisha (MP) Extending WEKA Framework for Learning New Algorithms

Waikato Environment for Knowledge preprocessing, clustering, classifi cation, data uses a so-called fi ltering algorithm. Analysis (WEKA) is a collection of state- regression, visualization, and feature These fi lters can be used to transform the of-the-art machine learning algorithms selection. All of WEKA techniques are data (e.g. turning numeric attributes into and data preprocessing tools. It is designed predicted on the assumption that the data discrete ones) and make it possible to so that you can quickly try out existing is available as a single fl at fi le or relation, delete instances and attributes according methods on new datasets in fl exible where each data point is described by to specifi c criteria. The “Classify panel” ways. It provides extensive support for a fi xed number of attributes (normally, enables the user to apply classifi cation and the whole process of experimental data numeric, or nominal attributes and it also regression algorithms (indiscriminately mining, including preparing the input data, supports other type of attributes). The called classifi ers in WEKA) to the resulting evaluating learning schemes statistically, easiest way to use WEKA is a graphical dataset; to estimate the accuracy of the and visualizing the input data and the result user interface called the Explorer. The resulting predictive model; and to visualize erroneous predictions, ROC curves, or the All of WEKA techniques are predicted on the model itself (if the model is amenable to visualization, e.g. a decision tree). The assumption that the data is available as a single fl at “Associate panel” provides access to fi le or relation, where each data point is described by association rule learners that attempt to identify all important interrelationships a fi xed number of attributes (normally, numeric, or between various attributes in the data. nominal attributes and it also supports other type of The “Cluster panel” gives access to the clustering techniques in WEKA, e.g. the attributes). simple k-means algorithm. There is also an implementation of the expectation maximization algorithm for learning a of learning. WEKA was developed at the other user interfaces to WEKA are mixture of normal distributions. The University of Waikato in New Zealand and Experimenter, KnowledgeFlow, and Simple next panel, “Select attributes”, provides is an open source software issued under CLI. The Experimenter gives access to algorithms for identifying the most General Public License[2] written in java. all of its facilities using menu selection It runs on almost any platform and has and form fi lling. The been tested under Linux, Windows, and KnowledgeFlow provides Macintosh operating systems. Recently an an alternative to Explorer article was published in CSI which showed for showing how data application of WEKA in Bio-inspired fl ows through the algorithm[1]. The authors emphasized on system. It also allows MLP classifi er using genetic algorithm and the design and execution fuzzy logic. They gave information about of confi gurations the existing framework. In this article, we for streamed data extend the existing framework of WEKA processing. The Simple in which we can add new classifi er and CLI is a command line cluster and then can trend the dataset interface for executing from new algorithms. WEKA commands. The key features of WEKA’s success The main interface are as follows: Explorer has several 1. It is open source and freely available; panels that give access 2. It provides many diff erent algorithms to the main components for data mining and machine learning; of the workbench. The 3. It is platform-independent; and “Preprocess” panel has 4. It is up-to-date, with new algorithms facilities for importing being added as they appear in the data from a database, research literate. a comma-separated WEKA[3] supports several standard values (CSV) fi le etc., and Fig. 1: Existi ng snapshot of WEKA data mining tasks, more specifi cally, data for preprocessing this

CSI Communications | May 2012 | 27 adding a new classifi er or a %USERPROFILE%\Generic cluster which is not included PropertiesCreator.props in existing WEKA GUI, want %USERPROFILE%\GenericObject to investigate a new learning Editor.props scheme,, or want to learn more 3. Remove WEKA.JAR from the about the inner workings of an CLASSPATH. induction algorithm by actually 4. Edit the GenericPropertiesCreator. programming it yourself then props fi le in the home directory and set integrate new workspace in UseDynamic to false. WEKA. 5. Add SmWork/classifi ers in Generic WEKA can be extended PropertiesCreator.props and Generic to include the elementary ObjectEditor.props. learning schemes for research 6. Run the command and educational purposes. Fig. java –classpath c:\progra~1\weka-3- 1 shows the existing framework 6\weka.jar;c:\SmWork\classifiersweka. of WEKA. Now we represent the gui.GUIChooser method to add a new classifi er in Now we can write our new java code, WEKA, we follow the following compile it, and then copy the class fi le into steps: a specifi ed folder. Fig. 2 shows snapshot of 1. Create a new folder in a newly added classifi er. Similarly, we can window directory hierarchy. extend WEKA for cluster and association Ex. C:\SmWork\classifi ers as well. 2. To enable or disable References dynamic class discovery, [1] Goli, B and Govindan, G (2011). the relevant fi le to edit is Fig. 2: Snapshot of WEKA displaying new added classifi er WEKA - A powerful free software GenericPropertiesCreator.props for implementing Bio- inspired (GPC). This fi le can be obtained Algorithms, CSI Communication, predictive attributes in a dataset. The last from the weka.jar or weka-src.jar archive. 35(9), 09-11. panel, “Visualize”, shows a scatter plot These fi les can be opened with an archive [2] Mark Hall, Eibe Frank, Geoff rey matrix, where individual scatter plots can manager that can handle ZIP fi les and Holmes, Bernhard Pfahringer, Peter be selected and enlarged and analyzed navigate to the weka/gui directory, where Reutemann, Ian H. Witten (2009); further using various selection operators. the GPC fi le is located. All that is required The WEKA Data Mining Software: WEKA can handle a number of fi le formats, is to change the Use Dynamic property An Update; SIGKDD Explorations, including the ever-popular CSV (which in this fi le from false to true (for enabling can be exported from any spreadsheet Volume 11, Issue 1. it) or the other way round (for disabling [3] Written I H and Frank, E (2005). Data program). WEKA prefers, however, to it). After changing the fi le, just place it in work with ARFF fi les, which are basically Mining: Practical Machine Learning home directory. For generating the GOE Tools and Techniques, San Francisco: CSV fi les with some header information fi le, we need to execute the following Morgan Kaufmann. n tacked on. steps: Suppose we want to implement a Java weka.gui.GenericProperties special-purpose learning algorithm i.e. Creator

Satyam Maheshwari received the MTech degree in Computer Technology and Applications from RGPV Bhopal. Since 2003, he is Assistant Professor in the department of computer applications in SATI Degree, Vidisha (MP). His research interest is classifi cation of imbalanced dataset in Data Mining. He is a Member of IEEE, CSI, and ISTE.

Sunil Joshi received MCA degree in 2001 from SATI Vidisha. Since 2005, he is Assistant Professor in the department of computer applications in SATI Degree, Vidisha (MP). Currently he is pursuing PHD degree in frequent pattern mining at University of RGPV. He is a member of IEEE and ISTE. About the Authors

CSI Communications | May 2012 | 28 www.csi-india.org Practitioner Dr. Debasish Jana Workbench Editor, CSI Communications Programming.Tips() » Passing Variable Number of Arguments in C

Ever wondered how a printf or scanf is declared in C or C++? Why do I void va_end(va_list ap); raise this? Because, printf and scanf are such type of functions that can This must be called once after arguments processed and before take variable number of arguments. For example, you could use as: function exit. printf(“%d %c”, someinteger, somechar); An example program follows: Where someinteger and somechar are of int and char types respectively. #include int someinteger; #include char someinteger; int sum( int first, ... ); We could have decided to print only one integer as below: int main() printf(“%d”, somechar); { // Call with 3 integers Or, simply, just a string as: // (-1 is used as terminator). printf(“Hi There”); cout << "sum is: " In the above three examples, we have printf taking three, two and << sum( 2, 3, 4, -1 ) one argument respectively. If you look closely, you will wonder that << endl; in all three cases, fi rst argument is a character string, and in 1st case, the second argument is an integer (int), third argument is character // Call with 4 integers (char). In 2nd case, second argument is an integer (int) and there is cout << "sum is: " no third argument. In 3rd case, there is no second or third argument << sum( 5, 7, 9, 11, -1 ) either. << endl; But, C does not support functions to be overloaded. So, we don't expect that we have so many diff erent variants of printf (and scanf // Call with no integer : just -1 terminator and similar functions) are declared. C++ inherited these from C, so in cout << "sum is: " C++ we have printf/scanf taking similar form. << sum( -1 ) In C, there is a syntax for optional parameter as triple dots i.e. << endl; "...". This allows to pass a list of variables as defi ned in the format return 0; string (fi rst argument). Thus, the same method can be used to print } things like this: int someinteger; // Returns the sum of a variable list of char someinteger; // integers printf(“%d %c”, someinteger, somechar); printf(“%d”, somechar); int sum( int first, ... ) printf(“Hi There”); { int s = 0, i = first; In fact, printf is a function with the following signature: va_list marker; int printf(const char *format, ...); This means that it requires at least one argument as a character string, // Initialize variable arguments followed by 0 or more number of arguments (which can be of several va_start(marker, first); diff erent types). The return type (int) signifi es how many bytes have while( i != -1 ) been printed in the result. The number and type of the arguments are { determined by the format string. s += i; There is a C header fi le stdarg.h that contains functions related to i = va_arg( marker, int); facilities for stepping through a list of function arguments of unknown } number and type. The important functions are as given below: va_end( marker ); // reset variable arguments void va_start(va_list ap, lastarg); return s; This Initialization macro is to be called once before any unnamed } argument is accessed. ap must be declared as a local variable, and The output when the program is run is given below: lastarg is the last named parameter of the function Output 7.4 type va_arg(va_list ap, type); sum is: 9 This produce a value of the type (type) and value of the next unnamed sum is: 32 argument. Modifi es ap. sum is: 0 n

Do you have some Interesting Programming Tips to share? This could be in any Programming Language or Software tool. Share with us. Send your summarized write-up to CSI Communications with subject line ‘Programming Tips’ at email address [email protected]

CSI Communications | May 2012 | 29 Practitioner Umesh P Workbench Department of Computational Biology and Bioinformatics, University of Kerala

Programming.Learn (“Python”) » Plotting with Python

Snakes are becoming popular among pet lovers as it is easy to care, exotic, and you don’t need to feed them daily like a dog or cat. Corn snake, Ball python, California King snake, Milk snake, Boa constrictor etc. are popular pet snakes. Among pythons, Ball python is considered to be one of the best pets for beginners. Ball pythons are docile and are 5-feet long. In some countries, there are online stores who deliver snakes on payment. Matplotlib is an object-oriented plotting library for python. It is a MATLAB/Scilab-like application programming interface (API) and provides accurate high-quality fi gures, which can be used for publication purposes. Matplotlib contains pylab interface, which is the set of functions provided by matplotlib.pylab to plot graph. matplotlib.pyplot is a collection of command-style functions that helps matplotlib to work like MATLAB. To start a plotting experiment, fi rst we need to import matplotlib.pylab.

>>>import matplotlib.pyplot as plt

Here library - matplotlib.pyplot - is imported and labeled as plt for easy future reference of the module.

>>>import matplotlib.pyplot as plt You can plot the graph using diff erent colors and styles by putting >>>plt.plot([1,2,3,4], [4,3,2,1]) an argument after the plot function. >>>plt.axis([0,5,0,5]) >>>plt.show() >>>import matplotlib.pyplot >>>x=arange(1.,10.,0.1) The plot function accepts the plotting points as two arrays with >>>y=x*x x,y coordinate respectively. Pyplot fits a straight line to the >>>plot(x,y,'g--') points. If you need only a scatter diagram of the points try the >>>show() following code: After plotting the graph, to view it, you need to type show() >>>plt.plot([1,2,3,4], [4,3,2,1], 'ro') command. Here you will get a green line graph; try with r for red, y for yellow etc. We can specify shapes with cryptic reference such as S for square, ^ for triangle etc. >>plot(x,y,'rs') # Red square >>plot(x,y,'g^') # Green triangle Standard mathematical function can also be plotted. Let us plot sine curve: >>>from pylab import * >>> x = arange(0.,10.,0.1) # to define x values >>> y = sin(x) # function definition >>>plot(x,y) # to plot >>>grid(True) # to show graph in grid >>>show() # to show the plot Pylab contains the pyplot with numpy functionalities. If you are importing matplotlib library, you need to import numpy also for defi ning array. n

CSICCSI CoCommunicationsmmunications | MayMay 22012012 | 30 wwwww.csi-india.orgww.csi-inndia.orgg Dr. R M Sonar CIO Perspective Chief Editor, CSI Communications

Managing Technology » Business Information Systems: Underlying Architectures

Previous article covered basic elements of a connected to each other. Examples of Business Data Interfaces system such as input, processing, and output. such systems include reservation systems logic services Interfaces facilitate interactive environment which are developed in languages like to get input into a system and present COBOL and deployed under centralized Fig. 1: Logical separati on of tasks (ti ers) in IS output in a variety of forms such as reports. mainframe environments. As shown in Processing involves a) execution of business Fig. 3, thin clients are just devices with Key issues logic implemented through programming no processing capabilities (called dumb • Users have limited choice while languages and b) management of required terminals) that are used for input (e.g. accessing data. data: storage, access, and manipulation. data entry) and display information. Many • Lot of explicit and exhaustive coding In short, software that implements ISs can independent ISs that were developed in is required as the program that be logically divided into three layers based languages like C were single tier where implements IS has to manage all on functionality: interfaces (presentation the program manages user interfaces, functionalities. services), core business logic, and data services processing as well as fi le handling. Decision • More dependence on the vendor for as shown in Fig. 1. Table 1 describes these support systems developed using desktop support, especially if the systems layers. The components which implement productivity tools like MS excel manage providing customization capabilities functionality of those layers can be coupled user interfaces, processing as well as data are not based on open standard. either tightly or loosely. Loose coupling inside the same excel workbook are also • Disadvantages of conventional fi le brings a) greater fl exibility in developing examples of single-tier systems. handling. and deploying components separately • Since most of these systems are in networked environment in distributed Key benefi ts based on centralized computing, • These are centralized systems fashion, b) fl exibility in interconnecting failure of such systems can cause where all functionalities are tightly heterogeneous systems and platforms, and major disruption in services. connected and implemented c) better scalability and maintenance of in a single information system Client/Server Systems information systems. The ISs which have all (monolithic). Easier to support and In client/server systems (server is referred these layers managed by a single computer maintain. as database server), interfaces are taken program is called as single-tier system, while • These are secure systems as there care by client machines (usually desktops) ISs that have separate programs/systems to are only limited entry points to the and data services by DBMS as shown in implement individual functionality are called system. The users have to access the Fig. 4. Client machines interact with as three-tier systems. In some systems, system through interfaces provided database systems in a loosely coupled business logic may be implemented using and typically these are through dumb manner. The client machines send requests multiple programs/systems, which are terminals with no other devices/ (or send DB commands) to the database called as n-tier systems (refer Fig. 2). systems connected to them. systems; the database systems respond Single-tier Systems • Computationally effi cient because to that request and send required data or The program that implements ISs takes most of them are written in core execute requested command. The business care of interfaces, business logic, and programming languages, no logic is split into two parts: client side and data services as shown in Fig. 3. The overheads of other software like server side. Since the majority of business components of all these layers are tightly database servers. logic is implemented at client side, the

Layer Functionality Components/Types Interfaces Takes care of presentation services. Facilitate input, Text-based data entry interfaces, GUI-based (windows) validation, and output. interactive forms, IVR, SMS, WAP and web-based forms, unstructured supplementary service data (USSD), static and interactive reports, and dashboards and multimedia interfaces. Business logic Execution of core processing logic. Core modules, functions, procedures, APIs (libraries), Web- services, stored procedures etc. Data services Defi ning data models, creation, storage, access, File handling and management, data stores, database and manipulation of data required. management systems (DBMS), XML storage and access etc.

Table 1: Functi onality implemented by layers and components

CSI Communications | May 2012 | 31 Key issues • These systems are deployed on networking environment; if not properly confi gured security can be an issue as there can be multiple Web-based entry points into the systems. For (N-tier) example, users can have direct access Client/server to data in database server. Database administrator needs to set proper access rights and controls based on users and their roles. Single-tier • Scalability can be an issue especially when the number of clients increase. Load on database server increases as the number of clients accessing that

Flexibility, personalization, access, and ROI access, personalization, Flexibility, server can increase, as it manages Distributed computing, modularity, open standard, and scalability exclusive session for each one. Fig. 2: Computi ng architectures • Such system is diffi cult to manage, especially support and maintenance, clients are typically fat client (machines data storage, access, and manipulation. when deployed in large scale at requiring higher computing resources). ISs They take care of concurrency, diff erent locations. Even a small developed using tools VB (Visual Basic) redundancy, security, and consistency change in user interface needs to as front-end and Oracle as back-end fall of data. Most of the database servers update client components at all under this category. All installations of such use standard query languages to locations. ISs at every deployment locations (such as access and manipulate data. • Dependence on database, especially branch offi ces) need database server and • Database systems are loosely if lot of business logic is implemented coupled; end users have a greater client machines connected over local area at database server. degree of freedom in accessing data network. Client/server systems can be • If systems are not properly designed, and creating customized report developed, and confi gured, it may further enhanced to have better ROI using based on requirement. thin-client (GUI-based) technologies like lead to ineffi cient use of network • Option of using various database bandwidth; for example, lot of data ones from the vendors such as Citrix. Fig. management systems and client side exchange between client and server. 5 shows an example. In such architectures, development tools. instead of many fat-client machines only few client machines (even only one) are used where application processing is done. Operating systems like Windows 2000 allow multiple instances of IS running on Interface Business Data services Data the same machine. Using such thin-client (thin client) logic (file handling) files technologies, these ISs can be accessed by many users over thin clients. Fat-client machines are typically server machines often called as servers. Such technologies drastically reduce support and maintenance eff orts as they do not need to install interfaces and business logic on many fat-client machines. Only one instance is shared amongst many through thin clients. This is some sort of virtualization.

Key benefi ts • These are distributed systems normally deployed in LAN Mainframe File storage environment where many client machines are connected to a common database server. They use Thin clients various resources: client side, server (e.g. dumb side as well as network. terminals) • Data services are managed by Fig. 3: Single-ti er systems database servers which take care of

CSI Communications | May 2012 | 32 www.csi-india.org business logic and database services can be centralized. • Database systems are loosely Business Business Data coupled; end users have a greater Interface logic logic services degree of freedom in accessing data and creating customized report based on requirement. Fat client DB server (DBMS) • Since components are loosely coupled, these systems are highly scalable (load balancing is possible by deploying many servers) and accessible. • These systems are based on open standards and can interconnect diff erent systems. • Core business logic as well as Network interfaces can be designed and implemented at granular/component level (e.g. as web service, mashups DB server etc.) thereby increasing reuse and new systems can be built with relatively less eff ort using service- oriented architecture (SOA).

Client PCs Key issues • Since these systems are highly Fig. 4: Client/server systems distributed, openly accessible, have multiple entry points, and Web-based N-tier Systems need to install some components like interconnect many systems they are In web-based systems, functionalities of ActiveX etc. However, many web-based vulnerable to attack. If systems are all the three layers are separated, run on information systems just need a browser not properly confi gured, they can diff erent machines/devices and are loosely to access them from client machine. face security threats. coupled. They are deployed under Internet, Key benefi ts • Dependence on network connectivity. intranet (Internet-like setup within the • These are completely distributed Table 2 shows examples of how organizations using all technologies, systems and use optimal resources: components in three layers are protocols, and standards that are used client side, server side, and Internet/ implemented in single, client/server and in Internet), and extranet (extending intranet and extranet. However, core web-based systems. intranet setup to outside stakeholders like business partners, dealers, vendors, agents etc.) environments. In such ISs, interface functionality is taken care by client machines/devices, business logic by web server (which stores and delivers web Interface Business Business Data pages), and data services by database (thin client) logic logic services servers. However, in some cases part of business logic is moved at DB server. (e.g. Citrix) Fat client DB server Business logic can be split into multiple servers like web server and application server (which takes care of specifi c functional requirements like CRM). The client can be a desktop machine, thin- client machine, smart device supporting Network Network browser, or any device that supports Internet connectivity (refer Fig. 6). The computational resource requirements at client side depend upon functionality to Terminal server DB server be executed on that. Some clients require more processing power (e.g. rich Internet Thin clients applications (RIA)) and applications that Fig. 5: Thin client based client/server systems

CSI Communications | May 2012 | 33 the client organizations can take advantages of the same. • Diff erent service/subscription Interface Business Data logic services models can be opted depending upon requirement. Client (thin/rich) Web server DB server • It off ers point-to-point and seamless connectivity to the client fi rm and all its stakeholders like employees and business partners in the ecosystem. For example, employees can access email directly from cloud services Internet/ Network (e.g. Gmail) instead of connecting Intranet/ Extranet to/accessing it from corporate email server. Similarly, business partners can access the system from the cloud DB server (can Web server (can Application (instead of accessing it from the fi rm’s have many be in multiples) server IT data center) that the organization Desktops, instances) laptops, smart accesses. There is no need of even devices, thin having extranet kind of environments. clients Fig. 6: Web-based n-ti er systems Key issues • Security is one of the major concerns Cloud-based Systems deal of fl exibility in selecting subscription for client organizations as services The Internet has evolved from a platform models based on functional and technical are off ered on shared basis and that delivered web contents to the platform requirements. executed remotely. • Lock-in cost can increase in case to perform a variety of computing services. Key benefi ts cloud vendor uses proprietary Instead of managing information system • Client organizations neither need to technologies. ISs and IT infrastructure on premise, own IT infrastructure and resources nor • Cross-country legal framework to organizations are outsourcing them to third- need to maintain and support them. enforce service-level agreements party vendors called cloud vendors. Vendors They do not have to deal with constant (SLA) between client organizations do not sell their software, platforms, or changes in technologies. Better ROI. and cloud vendors. infrastructure as products and solutions but • Better ROI, cloud vendors make • Dependence on availability of as services. Client organizations do not need resources available based on Internet connectivity and required to buy them but use and access on demand. requirement and demand. Since cloud bandwidth. This is equivalent to renting a car instead of vendors provide services to many clients owning it. There are various service models they can have economies of scale. Summary cloud vendors off er: software as a service • Systems, platforms can be tested Many client and software fi rms are opting (SaaS), platform as a service (PaaS), and before renting/subscribing etc. for n-tier computing architectures, and there infrastructure as a service (IaaS). Fig. 7 shows • There are many players who are is a clear shift toward building and using basic architecture of cloud computing. The part of cloud vendor ecosystem cloud infrastructures and services. The IS/IT client organizations can choose services (e.g. independent software vendors, has moved from highly distributed systems based on their requirements. There is great developer, and expert communities) to centralized architectures (like core

Single Client/server Web-based n-tier systems Interface Developed using core GUI forms (e.g. visual basic). Interfaces Web forms/pages. Interfaces are loosely programming language/tools. are tightly integrated to client integrated and downloaded from web information system. server. Business Through core programming logic/ Implemented using languages like VB Implemented using core programming logic supported by tool. and partially at server side using DB and scripting languages. programming languages. Data Program which implements the Managed by database servers. Server Managed by database servers, data services IS confi gures, accesses, and side business logic is implemented using stores, and XML fi les. manipulates data fi les. DB programming languages such as PL/ SQL in Oracle (commonly referred as SPs: stored procedures).

Table 2: Implementati on of various layers: some examples

Continued on Page 36

CSI Communications | May 2012 | 34 www.csi-india.org Adv. Prashant Mali [BSc (Physics), MSc (Comp Science), LLB] Security Corner Cyber Law Expert Email: [email protected]

Information Security » Cyber Crimes on/by Children

I would like to start this article with two Technique of Cyber Criminals arrested for creating a devastating computer distinct cases I am handling: one in which the Cyber criminals often use a tactic called worm. How did he learn to do this? A simple child is the prey to cyber crime and another "grooming". The fi rst step in this process is Internet search will reveal all the tools where the child has committed cyber crime. fi nding a victim. This can be done in a chat necessary to create viruses and hack into Case One: This child is 14 years old, the room or by reading blogs. The criminal will others’ computers. Hacking can take a variety biggest mistake she made was that she used often look for something to share with the of forms, ranging from stealing passwords and to write every single disagreement or fi ght victim. It could be a birthday or a favorite classifi ed information to vandalizing websites. she had with her mother or father on daily sport, anything will do. This is simply done Unauthorized entry into an information basis. Moreover, she used to substantiate to initiate communication. The next thing system through hacking or viruses has serious her loneliness further with a small poem. A you know emails are being exchanged and legal consequences. Talk with your child about cyber criminal befriended her and used her a friendship has started. The next step in the ethical and legal implications of hacking, loneliness as a sword to sexually abuse the the "grooming" process is to create a wedge which attracts up to 3 years of imprisonment girl. The girl is in deep mental trauma and the between the victims and their parents, and Rs. 5 lakhs of penalty in India. family in distress. Even though we traced the guardians, or protectors of any sort. This can The Computer as a Weapon (Using cyber criminal, but the larger question still be done by waiting for the right moment. a Computer to Commit Real World remains. Perhaps an email from the victim describes Case Two: This standard IX boy a disagreement between them and their Crimes) suff ering from dyslexia was abandoned by parent or a blog tells of an argument. This Take, for instance, email. Children believe his girlfriend studying in VIII. Moreover, the is the perfect opportunity for the cyber email is harmless because they don’t see girl often taunted him with being impotent. criminal to become a friend and ally. Before the impact on the person who receives it. A growing trend with the use of email and This boy decided to take revenge on her, and you know it, the relationship has developed Facebook is harassment; children are saying using the girl’s photograph made her fake into a trust where the predator is always on things to other children—both at school and profi le on Facebook. Further, he went ahead the victim's side no matter what. Eventually in other communities—that they would never and wrote her actual mobile number with this leads to a face-to-face meeting where say face-to-face. Parents need to teach their a comment that “I am a prostitute. Please the actual crime takes place. children about appropriate communication call”. The girl started receiving hundreds of It is extremely important for parents through email and Facebook. unsolicited calls. The case was investigated to be completely aware of their children's and the boy was arrested for his cyber crime. actions on the Internet. What seems like The Computer as an Accessory (Using Children use the Internet for everything a simple friendship to a child could be a a Computer to Store Illegal Files or these days, from homework to keeping in predator catching their prey. Information) touch with friends. Chat rooms, message What are the Diff erent Signals that The Internet is a useful tool for fi nding boards, forums, instant messages, and Your Child is at Risk on Internet? information in a quick and convenient way. Facebook has changed the way the world 1. Your child spends large amounts of Even though much of this information is talks to each other. Thanks to these new time online, especially at night. available for everyone to use, many products communication portals, it is now possible 2. You fi nd pornography on your child's and services found online are not permissible to be in contact with people from all over computer. to be reproduced or downloaded, especially the world instantly. While the majority of 3. Your child receives phone calls from music and purchasable programs. people on the Internet are simply using it men you don't know or is making calls, Popular peer-to-peer software programs for research or a form of entertainment, sometimes long distance, to numbers you make it easy to share copyrighted material there are some who use the World Wide don't recognize. and actually encourage downloading. Web as a way to stalk and hunt prey. These 4. Your child receives mail, gifts, or However, it is a violation of copyright law cyber criminals are considered by most to packages from someone you don't know. to take music or software from the Internet be psychologically ill and in need of help. 5. Your child turns the computer monitor without the permission of the owner. It is easy However while that is true, these pedophiles off or quickly changes the screen on the for children to understand why the theft in the are also extremely manipulative and know monitor when you come into the room. real world is wrong, but it is diffi cult for them how to not only attain their prey, but they 6. Your child becomes withdrawn from to understand theft of intellectual property. are also experts at isolating those innocent the family. Teach your children not to download pirated members of online communities in order to 7. Your child is using an online account or counterfeit material. Downloading illegal get what they want. belonging to someone else. material attracts IT Act, 2000 provisions as According to a recently released survey well as Copyright Act provisions. of online security technology fi rm, McAfee, Children Can Commit Cyber Crimes Cyber Parenting is the need of the hour, 62% of children shared personal information in Following Ways by Using the schools and colleges should take initiatives online and 39% of parents were unaware of Computer as a Target (Using a to make parents aware of the current issues, what their children do online. The survey says Computer to Attack Other Computers) crimes, and the law of the land. I do my bit 58% of the children polled shared their home Did you know that the majority of cyber by conducting free workshops in schools and address on the Internet, while 12% have been crimes in this category are committed by classes, but a major awareness drive by the victims of some kind of cyber threat. children? In April 2012, a teenager was Government is the need of the hour. n

CSI Communications | May 2012 | 35 Mr. Subramaniam Vutha Security Corner Advocate Email: [email protected]

IT Act 2000 » Prof. IT Law Demystifi es Technology Law Issues: Issue No. 2 The Basics of an Electronic Prof. IT Law: There are other contracts you can accept an off er by ordering a [Internet-based] Contract: that are not so obvious to most people. book or a bag. On a website that provides For example, when you access a website, mere information, you can accept their IT Person: Prof. I. T. Law, it is a pleasure to meet you agree to their terms and conditions off er of information by merely browsing you again. I look forward to an enlightening and that is also in the nature of an the site. Thus, your acceptance can be discussion with you on Technology law issues electronic contract. indicated by the mere action of browsing that people like me should know. IT Person: But I do not sign anything there. the site, which results in a contract that Prof. IT Law: I enjoy talking to you too. On the other hand, when I buy something binds you to its terms. What topic should we discuss today? I click on the BUY button or something IT Person: But accepting a contract by just IT Person: How about electronic contracts? like that. doing something rather than signing off Prof. IT Law: Yes, that is a fundamental Prof. IT Law: When you browse a site you sounds a little incomplete to me. issue in electronic commerce. All have, by that very action of browsing, Prof. IT Law: If the law were not so fl exible commercial dealings over the Internet accepted the terms and conditions for we would have had to sign documents are in the form of electronic contracts. accessing that site. for every deal we do. For any contract, However, it is so easy to engage in buying IT Person: But I do not ever read the terms you need an off er from one party and or selling over the Internet that we may and conditions. an acceptance of the off er by another sometimes overlook the fact that we are Prof IT Law: Like millions of others. But party. Over the Internet that happens all getting into electronic contracts. that does not mean you have not agreed to the time. IT Person: Can you give me some examples, the “access terms” of that site. Moreover, IT Person: That is interesting. please? it also does not mean that you have no Prof IT Law: Yes. In a future meeting Prof. IT Law: Well, think of the air tickets binding electronic contract with that site we shall discuss how an off er and an you buy over the Internet, products you buy or its owners. acceptance is actually made over the on Flipkart or Snapdeal, or train tickets or IT Person: This is confusing. Please explain in Internet, and the issues that should be bus tickets. a way I can understand. kept in mind in electronic commerce. IT Person: Yes, I understand. Those are Prof IT Law: In terms of contract law, you IT Person: I shall look forward to that. the obvious contractual transactions we can accept an off er in many ways. For Talking to you is always so stimulating. n engage in. instance, on a website for sale of products Continued from Page 34

roles and functionalities of IT/IS personnel. Such paradigm shift is going to have some Shared Interface Internet Cloud Shared issues and challenges such as security and application platform infrastructure privacy of confi dential data, dependence or lock-in on cloud providers, management and enforcement of SLAs, and cross-country legalities. However, there are initiatives like having private clouds to take care of some such issues and challenges.

Bibliography Internet [1] Laudon, Kenneth. C., and Laudon,

Web/real-time Application DB servers Jane. P. (2012). Management servers /platform information systems: Managing the servers Digital Firm, 12th edn., Pearson Network infrastructure Education. [2] James O'Brien, George Marakas, and Desktops, Cloud services Ramesh Behl. (2010). Management laptops, smart Information Systems, 9th edn., Tata devices/existing McGraw Hill. IT setups Fig. 7: Basic cloud computi ng framework [3] Henry C. Lucas Jr. (2008). Information Technology: Strategic Decision Making banking solutions) deployed at data centers. shifts are helping client organizations to For Managers, Wiley India. Now such centrally deployed systems are get rid of managing IT systems, resources, [4] http://www.citrix.com/ accessed in likely to move to cloud infrastructure. Such and infrastructure. This has also changed April 2012. n

CSI Communications | May 2012 | 36 www.csi-india.org Achuthsankar S Nair ICT@ Society Editor, CSI Communications

Graphic Texting

When you hear the word is the typewriter, strange forms of art dragon can be a screen-full. She seems 'computer art' you might used to be practiced with these machines. to have picked up the liking for keyboard start thinking about the Such 'typewriter art' is believed to exist art while she got to play with her father’s wonders of computer from 1890s itself. Expert typists could offi ce typewriter during her childhood. graphics, from Adobe create a close image of Mona Lisa by After hearing about ASCII art fi ve years Photoshop to dazzling clever over-typing. ago, she has been churning out exciting image processing and During 1950s, some computers artwork. All she uses is the Notepad, and morphing software. even accepted this method to produce of course her wonderful imagination. Links There was a time when graphics from text printers. These days to her works are available in her wiki page. all the computers could are fortunately gone, but the art from URL: http://en.wikipedia.org/wiki/Joan_ handle was plain text. the keyboard had been reborn in the Stark. People who have used 'line printers' during computers in a big way. Joan G. Stark of Take a fresh look at the computer those days would know how far away was Cleverland, Ohio, one of the leading ASCII keyboard before you visit her site. Do the the computer from graphics. artists, could surprise anyone with the keys (,), “’,’,-, = look capable of creating any Well, even when there was no immense creativity she can refl ect on the art? Now prepare for the pleasant surprise computer and the king of text processing computer keyboard. in the links available in her wiki page. Her (ASCII, or American standard code web site also has enough resources for for information interchange, is a number would be ASCII artists. Her own works are coding scheme for computer keyboard classifi ed into birds, cats, zoo animals etc. characters used since 1960s. For She has dated, titled and initialized most example, when you type the character of her exhibits. 'a' on the keyboard, the number code A history of the art, an account of 97 is what is stored inside the PC. In her personal experiments with it, tips for practice, ASCII is simply a reference to beginners and links to related sites are the set of characters that you can see on available in the External links section of the keyboard.) Sterk's wiki page. Joan, being a mother The smiles that we often stick up in of four kids whom she introduces in the e-mails are miniature ASCII art. However, web site, not surprisingly, through ASCII Stark’s variety of ASCII art is not single art. n Fire-breathing Dragon by Joan G. Stark line. Some of them like the fi re-spitting

Continued from Page 13

[5] Hall, P A V and Dowling, G R (1980). [8] Leon, D (1962). “Retrieval of the amino acid sequence of two “Approximate String Matching”, misspelled names in an airlines proteins”, Journal of Molecular Biology, ACM Computing Surveys, 12(4), 381- passenger record system”, ACM 48(3), 443-53. 402. Communications, 5, 169-171. [12] Prema, S (2004). “Report of Study on [6] Henikoff , S and Henikoff , J G (1992). [9] Nair, A S (2007). “Computational Malayalam Frequency Count”, Dept. of “Amino Acid Substitution Matrices Biology & Bioinformatics: A Gentle Linguistics, University of Kerala. from Protein Blocks”, Proceedings of Overview”, Communications of the [13] Soundex, [Online]. Available: http:// the National Academy of Sciences of Computer Society of India, 31(1), 1-13. en.wikipedia.org/wiki/Soundex, the United States of America, 22(22), [10] Navarro, G (2001). “A Guided Tour to Accessed on 2 Dec. 2011. 10915-10919. Approximate String Matching”, ACM [14] Wagner, R A and Fischer, M J (1974). [7] Kanitha, D (2011). “A scoring matrix Computing Surveys, 33(1), 31-88. “The String-to-String Correction for English”, MPhil Dissertation in [11] Needleman, S B and Wunsch, C D Problem”, Journal of the ACM, 21(1), Computational Linguistics, Dept. of (1970). “A general method applicable 168-178. n Linguistics, University of Kerala. to the search for similarities in

CSI Communications | May 2012 | 37 Dr. Debasish Jana Brain Teaser Editor, CSI Communications

Crossword » Test your Knowledge on Linguistic Computing Solution to the crossword with name of fi rst all correct solution provider will appear in the next issue. Send your answers to CSI Communications at email address [email protected] with subject: Crossword Solution - CSIC May 2012

12 CLUES ACROSS 1. Determine the part of speech for each word from a sentence 34 (10) 4. Yahoo's text and web page language translation tool (9) 56 7 5. A set of parameters defi ning user's language, country etc. (6) 8. The study of how meaning is aff ected by context (10) 11. A formal system in mathematical logic for expressing computation by way of variable binding and substitution (6) 13. A database engine for annotated or analyzed text (6) 8 15. A lexical database for the English language (7) 16. Name of lemma that helps to tell that a language is not 910 regular (7) 17. Vocabulary of a language (7) 11 12 21. Type of machine learning task to infer a function from training data (10) 23. Abbreviation formed from the initial parts in a word or a 13 14 phrase (7) 25. Meaning encoded in a language expression (9) 15 26. A multilingual dictionary for language translations on Windows (6) 16 28. Process of analyzing a text as a sequence of tokens (words) (7) 29. A very important data structure (4) 17 18 DOWN 19 2. A company dealing with language translation software (7) 3. Phase structure grammar (11) 20 21 22 6. ISO standard markup framework for natural language processing (3) 7. A variation of fi nite automaton (8) 9. The study of the nature, structure, and variation of language (11) 23 24 10. A variant form of a morpheme (9) 12. Microsoft's language translation service (4) 25 14. A search algorithm for traversing or searching a tree structure or alike (10) 18. Interaction between computers and humans (3) 26 27 19. The study of the origin and history of individual words (9) 20. Rules that describe formation of correct sentence in a language (7) 21. One of the oldest machine translation companies (7) 28 29 22. A variety of a language peculiar to a particular region (7) 24. Abbreviation of processing natural language (3) 27. An international scientifi c and professional society dealing with computational linguistics (3)

Solution to April 2012 crossword

1 2 3 A F T AGG I NG 4 5 "I am a failure as P YTHON C W 6 7 8 9 a computational OL W OA D I P H P 10 MK POTA O linguist! My son 11 S R ESTJTV D 12 13 14 sends me an SMS "U H B LOG C O PERA C 15 R 2 YY 4 ME" and CN HRTX A 16 17 AOA J AX A M S none of my algorithms 18 RMQ P ORT L ET 19 20 21 could crack it. His D A R CYD INUCCI H friends all are able SWE T 22 23 24 25 26 27 J S P R R UBY B T H to read it as "You are 28 SE N YUL PC 29 too wise for me" OR E M ICROFORMAT NL T G EL 30 31 V S A F AR I Q E Congratulations to ILOUN Ms. P Deepa (Chennai), Dr. Suresh Kumar (Faridabad), Er. Aruna Devi (Mysore), VALED 32 Dr. T Revathi (Sivakasi) and Mr. S K Khatri (New Delhi) W EBSERVICES L S A for getting ALMOST ALL correct answers to April month’s crossword. 3S H TR C

CSI Communications | May 2012 | 38 www.csi-india.org Dr. Debasish Jana Ask an Expert Editor, CSI Communications

Your Question, Our Answer

“Take up one idea. Make that one idea your life - think of it, dream of it, live on that idea. Let the brain, muscles, nerves, every part of your body, be full of that idea, and just leave every other idea alone. This is the way to success.” ~ Swami Vivekananda Subject: C++ example } } Sir, I have a couple of questions in C++ which I am having doubts. }; So could you please answer these questions for me? I will be very Here, the overloaded operator [] returns the actual data grateful if you kindly do so. element by reference, otherwise, we cannot use this element to 1. A data member of a class cannot be declared as friend. Why? 2. What should the overloaded operator [ ] return? be as modifi able like 3. Can virtual function be declared as a static member of a class? Array a(10);//int array of 10 elements 4. Should destructors be declared virtual as a good programming a[0] = 4; // assign 4 to first element of array practice? This would not have been possible if you returned by value. 5. An overloaded function can have default arguments? That would have resulted in a copy of actual element be created Thanks. and the copy be assigned the value, original content remaining Sourideb Bhattacharya Student, unassigned. BE (Instrumentation & Electronics Engineering) 3rd year 3. No, static member is meant for class type and not object Jadavpur University, Kolkata type. Dynamic binding is applicable only on objects depending on Here are the answers to your questions: dynamic type of object (X * or Y *, depends on how the new was A issued like new X; or new Y; where Y is a subclass of X), the virtual 1. Data cannot access another data or function, functions can function mechanism is applicable for objects. Static members access. So, no point giving access right to data. cannot be virtual. 2. The overloaded operator [] is meant for accessing single element in a list of multiple elements like an array. So return type class X should be element type with reference e.g. int & or in template { form T&. The code snippet in template form is given below: public: template virtual void vf() { class Array cout << “X::vf” << endl; { } private: } T *data; }; int size; class Y : public X public: { Array(int s) { public: data = new T[s = size]; void vf() { } cout << “Y::vf” << endl; Array(int s) } { } data = new T[s = size]; }; } Now, if we have a program snippet as below: ~Array() { if (data) { X * p = new Y(); delete [] data; p->f(); } This will print Y::vf and not X::vf. Here, the virtual } function vf is applicable for the object of type X and Y and T& operator [] (int indx) { require the object instance to be called with. Static won’t do. if ((indx < 0) || (indx > size -1)) { 4. Yes, always. Otherwise, in a situation where Y is a subclass of .... raise exception ... } X, and you have X* p = new Y, then, delete p would not call Y's else { destructor if X destructor was declared as virtual. return data[indx]; 5. Yes. But Overloaded operators cannot. n

Send your questions to CSI Communications with subject line ‘Ask an Expert’ at email address [email protected]

CSI Communications | May 2012 | 39 H R Mohan Happenings@ICT AVP (Systems), The Hindu, Chennai Email: [email protected]

ICT News Briefs in April 2012

The following are the ICT news and headlines • Aakash-II, sub $40 Android tablet launch • Uninor employees take to the streets to of interest in April 2012. They have been likely in May – Sibal. save company. compiled from various news and Internet • Supreme Court rejects 2G operators’ • TCS employee addition at all-time sources including the fi nancial dailies - The review petition. high with a gross addition of 70,400 Hindu, Business Line, Economic Times. • Airtel rolls out 4G at Kolkata, to off er high employees in the year ending March speed Internet services. 2012. Voices & Views • DoT panel sees merit in merger of BSNL, • 150 Bangalore staff hit in Yahoo!'s 2,000 • In a couple of years, 85% of people will MTNL. cut globally. be using smartphones - Microsoft India • Centre may clear Karnataka's plan to set • SingTel Global (India), to expand its Chairman. up IT investment region at an estimated operations in fi ve more cities, including • Chinese hacker is responsible for cyber- investment of Rs. 90,000 crore. The Jaipur and Ahmedabad, and double its attacks on Government of India, military project would generate about 1.1 million workforce by 2014. research organisations and shipping direct and 2.7 million indirect jobs. • Tata Elxsi to increase headcount at companies - Trend Micro. • AICTE and Microsoft announced the Bangalore lab. • Global handset shipments will increase implementation of Microsoft Live@edu • TCS chief, Mr N. Chandrasekaran, 29% from 1.7 billion in 2012 to 2.2 billion for all the technical colleges in India. to assume the offi ce of Chairman of in 2016 of which smartphone to touch • TRAI wants licensing powers under new Nasscom. 1 billion - ABI Research. unifi ed regime. • IT companies step up hiring of • Emerging markets to spend $1.22 trillion • The future of Aakash tablet hangs engineering graduates. The average (representing 31% of the worldwide in balance as Datawind and QUAD salary increased by about 10% compared total) on IT in 2012 - Gartner. Electronics have locked horns over to last year and in the range of Rs. 3.05 • Indian enterprise software market will alleged violation agreements. lakh to Rs. 3.25 lakh per annum. grow 13% in 2012 with revenue of $3.22 • Mobile ARPUs start rising for fi rst time in billion – Gartner. Company News: Tie-ups, Joint many years. • Tablet sales touch 4.75 lakh in 2011 - Ventures, New Initiatives • DoT asks telcos to comply with new CyberMedia Research. • Cisco is considering to set up a tower radiation norms. • The US Citizenship and Immigration manufacturing and services unit in • TRAI sets quality norms for mobile Services has received about 22,000 Maharashtra. banking services. petitions (against the cap of 65,000) for • Wipro asks component vendors to • TRAI makes one ‘per second’ plan H-1B work visas in the fi rst four days. disclose emission data as part of Green mandatory. • Computer Society of India to promote IT initiatives. • Prospects brighten for silicon wafer fab free software - CSI president Satish Babu. • HCL Info launches operations in Qatar. units as global fi rms off er support. • ‘I warned Raja against advancing 2G cut- • Micromax joins the tablet war with its • TRAI sets base price for 2G spectrum at off date' - Ex-Telecom Secretary. FunBook priced at Rs. 6,499. 10 times 2008 rate with price varying • Media tablet sales will double this year • Facebook’s mobile app now in seven between Rs. 3,622 and Rs. 14,480 crore globally to 12 crore units from 6 crore Indian languages. per megahertz of airwaves. units in 2011 - Gartner. • Local search engine hudku.com launched. • Panel set up to frame norms for telecom • The Indian logistics industry is estimated • Reliance emerges as the fi rst telecom fi rms for issuing SIM cards. at $130 billion and is expected to grow to operator in the country to off er tablets on • TRAI launches online facility (www. $385 billion in the next four to fi ve years both the 3G and CDMA networks after tccms.gov.in) to monitor consumer - Mr P. Srikanth Reddy, Chairman, Four launching the CDMA tablet. complaints. Soft. • Kaspersky comes out with suggestions • Publishers reach settlement with US IT Manpower, Staffi ng and Top Moves on how to protect your Mac OS. Will be useful to 10 crore Mac OS X users around Justice Dept on e-book pricing. • Cyrus Mistry and O.P. Bhatt (Ex. SBI the world. • By 2015, the market for ‘big data' Chairman) join TCS board. • Facebook buys Instagram – smartphone technology and services globally will • Progress Software to help engineering reach $16.9 billion up from $3.2 billion in photo sharing application for $1 billion. colleges in setting up incubation centres • Four Soft bets big on cloud-based product 2010. Every day, 2.5 quintillion bytes of in Hyderabad. data are created – IDC. for logistics sector. • Potential job losses in telcos 'enormous' • MonsterIndia launches app for mobiles. • Social networking sites should set up - HR Experts. • HP unveils ‘converged cloud' services. servers in India – Rajasthan CM Gehlot. • Hiring of NRI professionals up 5% in Jan- • Wipro to provide tech services for San • Karnataka's IT exports zoomed nearly Mar 2012. Francisco Marathon. 50% to touch Rs. 1.3 lakh crore in 2011-12. • Infosys BPO to recruit 13,000 across • Green Platinum rating for Infosys. • ‘Data breach costs Indian organisations 18 locations. Also plans to hire 35,000 • Now, ‘Google Drive' to take on rivals' Rs. 5.35 crore annually’ – Symantec. people this fi scal. cloud storage service. Telecom, Govt, Policy, Compliance • Steelwedge Software to raise India • Samsung overtakes Nokia to become top • Govt will help fund buys of foreign fi rms headcount from 180 to 1150 by 2016. selling phone brand globally. with high-end cyber security technology. • Walmart Labs to hire 200 engineers. • Zenith launches TigerCloud. n

CSI Communications | May 2012 | 40 www.csi-india.org Prof. Dipti Prasad Mukherjee* and Dr. Dharm Singh** CSI Report * RVP, Region II ** Member SIG-e-Agriculture, CSI

* CSI Regional II Meeting at Kolkata A regional meet of the offi ce bearers of diff erent chapters of the Region II was organized at the Indian Statistical Institute, Kolkata on Sunday, March 25, 2012. The representatives from the Patna (Prof. A K Nayak), Durgapur (Prof. Asish Mukhopadhyay), Siliguri (Dr. Ardhendu Mandal) and Kolkata (the current and incoming chairmen Mr. Sushanta Sinha and Dr. Debasish Jana) chapters were present. The meeting was also attended by the CSI Secretary Prof. H R Vishwakarma, Division III Chair Prof. Debesh Das, the regional student coordinator Prof. Phalguni Mukherjee and national nomination committee member Mr. Subimal Kundu. Prof. Dipti Prasad Mukherjee, Regional Vice-President Region II, welcomed the gathering and urged to increase the CSI activity in the Eastern India. Prof. H R Vishwakarma discussed encouraging growth of the CSI membership across India except the eastern region. The problems faced by the smaller chapters like Durgapur and Siliguri were discussed in detail. Possibility of obtaining some seed funds from the CSI headquarter and A-category chapters for smaller chapters was explored at length. A number of senior CSI members present in the get-together expressed their concerns regarding the image of CSI and suggested more quality programs for enhancing the CSI brand value. A set of activity was planned in Patna, Durgapur and Siliguri chapters. The meeting ended with a positive note of leveraging the potential of Region II in expanding the reach of CSI.

** Special Interest Group on e-Agriculture Annual Report: 1 April 2011 to 31 March 2012 Background Special Interest group on e-Agriculture was formed in January 2011. The indirect benefi ts of IT in empowering Indian farmer are signifi cant and remain to be exploited. The Indian farmer urgently requires timely and reliable sources of information inputs for taking decisions. At present, the farmer depends on trickling down of decision inputs from conventional sources which are slow and unreliable. The changing environment faced by Indian farmers makes information not merely useful, but necessary remain competitive. The role of ICT will of great importance for this 60 percent population dependent on agriculture as a part of rural development which isolate from urban sector thereby bridging the digital divide. Objectives • To transform technological intervention to increase agriculture production and productivity by ICT. • To empower the farmers to take quality decision this will improve agriculture and allied activities. • To research and develop strategy of ICT application in agriculture and allied activities. Activities: Events – 2011-2012 Host Institute Conference and Theme Date and Location SIG-WNs, SIG-e Agriculture, DivIV, Udaipur Chapter, A three days International Conference on Emerging Trends 22-24 April, 2011 at the CTAE CSI, IEI, WFEO, CTAE, TINJR and Co-Sponsored by in Networks and Computer Communications (ETNCC2011) Udaipur, India. IEEE Delhi Section was organized SIG-WNs, SIG-e Agriculture, Udaipur chapter and Motivational and expert series of lectures 5th May 2011 MPUAT Speakres: Dr. S. Reisman, President, IEEE Computer Society CTAE, Udaipur (Cyber lecturer), Dr. Dharm Singh, Convenor SIG-WNs CSI, Dr. YC Bahtt, Convenor SIG-e-Agriculture Udaipur Chapter, SIG-WNs, SIG-e-Agriculture, IEI- First CSI Rajasthan State IT Convention and National May 17-19, 2011 ULC Conference with Celebration of World Telecommunica tion SIGs Campus Udaipur and Information Society Day 2011 on “WTISD 2011: Better CTAE and SGI life in rural communities with ICTs” SIG-WNs and e-Agriculture CSI, IEI ULC and TINJR National Seminar on IP Multimedia Communications October 14-15, 2011, Udaipur IEI, SIG-WNs & e-Agriculture CSI, CTAE and TINJR All India Seminar on Information and Communication February 11-12, 2012 Technology for Integrated Rural Development Peer recognition achieved within India/globally This group is new one and presently taking up some projects on research and development side to develop electronic planters for precision farming. More collaborative work is envisage once the activities are strengthen more. Plans 2012-13 1. Technical Session in 26th National Convention of Agricultural Engineers in January 2013. 2. Seminar exclusively on theme of e-Agriculture planned at end of 2012.

Dr. R. Srinivasan, Past President and Fellow of CSI has been appointed as Professor Emeritus in SRM University. Currently he is also serving as Dean Research & PG Studies at RNS Institute of Technology, Bangalore. Dr. Srinivasn is a member of IEEE, Member of IEEE Computer Society, Fellow of IETE (India) and Life Member of ISTE.”

CSI Communications | May 2012 | 41 CSI Journal of COMPUTING ISSN 2277-6702 e-ISSN 2277-7091 www.csijournal.org

Dear CSI Fraternity, CSI has launched the ‘CSI Journal of Computing’, with truly original papers from the vibrant community of academia, industrial researchers, innovators, and entrepreneurs around the world. The fi rst issue was released by the Honorable Chief Minister of Maharashtra, Shri Prithviraj Chavan, on the CSI Foundation Day 2012 at Mumbai. The Journal covers topics related to Computer Science, Information Technology, several boundary areas among these and other fi elds. It is managed by an International Editorial Board. Initially each volume will have four issues. Contents of Vol. 1, No. 1, March 2012 • Effi cient Face Recognition using Local Active Pixel Pattern (LAPP) for Mobile Environment: Mallikarjuna Rao G, Praveen Kumar, Vijaya Kumari G, and Babu G R • Scalable Lock-Free FIFO Queues using Effi cient Elimination Techniques: V V N Pavan Kumar and K Gopinath • Direct Approach for Machine Translation from Punjabi to Hindi: Gurpreet Singh Josan and Gurpreet Singh Lehal • Markov Modeling in Hindi Speech Recognition System: A Review: R K Aggarwal and M Dave • The Genome Question: Moore vs. Jevons: B Mishra • Hash Based Key Indexing: A New Approach to Rainbow Table Generation: Deepika Dutta Mishra, C S R C Murthy, A K Bhattacharjee, and R S Mundada • Bioinformatics for Next Generation Sequencing: Srinivas Aluru

CSI member Non CSI member (`) Individual ` 400/Volume or US$20/- ` 800/Volume or US$25/- Library ` 600/Volume or US$/50/- ` 1000/Volume or US$75/- For bulk discounts and other related information you may contact Mr. SM Fahimuddin Pasha, ([email protected]) Coordinator.

I invite you all to reserve your copy as soon as possible through www.csijournal.org/subscription. Looking forward to your paper contributions to the Journal and subscriptions. Advertisements and Sponsorships To make the Journal and publications from CSI vibrant and off er Open Access for the community with a minimal subscription for the print versions, we solicit sponsorships for the journal. Note that the open access version off ers very aff ordable advertisements. For advertisement rates, please refer to www.csijournal.org (also on the cover pages of the journal). Here are some of the varieties of Sponsorship possibilities for Software Houses, Universities, and Government organizations.

Sponsorships Rate and Numbers/year Benefi ts Platinum ` 100,000/- a. Online advertisement of 1 full page - whole year Numbers: two b. Half page - printed version

1 Gold ` 75,000/- a. Online advertisement of /2 page - whole year 1 Numbers: Four b. /4 page - Printed version 1 Silver ` 50,000/- a. Online advertisement of /4 page - whole year 1 Numbers: Eight b. One column (/8 page) - printed version Institutional Memberships ` 25,000/- a. The member institutions name will be carried on the web as well as in the printed version 1 b. /4 Page online advertisements - whole year CSI has vibrant distributorship across the country with 66 chapters, 385 student branches, and over 80,000 memberships across the country. Looking forward to generous sponsorships and by institutional memberships from the community to keep CSI publications vibrant.

Satish Babu Prof. R K Shyamasundar (TIFR) President, Editor-in-Chief, CSI Journal of Computing Computer Society of India Chairman, CSI Publication

CSI Communications | May 2012 | 42 www.csi-india.org CSI News

From CSI Chapters » Please check detailed news at: http://www.csi-india.org/web/csi/chapternews-May2012

SPEAKER(S) TOPIC AND GIST GHAZIABAD (REGION I) Dr. Pankaj Jalote, Mr. Sunil Asthana, Mr. Amit Goenka, 7 April 2012: 10th National IT Seminar “Recent Trends in Software Mr. Navneet B Gupta Technologies (RTST-2012)” Dr. Jalote discussed the defi nition of engineering, especially software engineering & skills required. He discussed role of science-based researcher and engineering researcher, abilities of researcher, and diff erence between research & research manager. Mr. Sunil Asthana spoke about developments in IT Industry and covered various aspects of Mobile Commerce, Mobility, Mobile Applications, and Cloud Computing. There were two technical sessions: Emerging Trends & SIG Role in Software development and Recent Advances in Software testing, maintenance, and quality assurance. Dr. Pankaj Jalote, delivering the talk during inaugural session of RTST-2012 (L to R: sitti ng) Dr. A K Puri, Sh. Sunil Asthana, Dr. Vineet Kansal, and Dr. Rabins Porwal GWALIOR (REGION III) Jayu S Bhide 1 to 3 March 2012 and 14 March 2012: A program on “HAM Radio” A program on HAM Radio was jointly conducted by I.P.S. College Gwalior & CSI Gwalior Chapter from 1st to 3rd March and later on 12th March by R.J.I.T. Teknanpur. Mr. Jayu S. Bhide spoke and organized a live demonstration of HAM Radio. Attendees learnt how to set up the HAM Station. Students asked questions regarding security and operation of HAM Radio and speaker answered the queries.

Mr. J S Bhide and students, during HAM Radio practi cal CUTTACK (REGION IV) Dr. Lalit Mohan Patnaik, Mr. Sushant Panda, 5-7 March 2012: Conference and Student Convention on “Cloud Computing” Objective of the conference was to provide an overview on Cloud Computing, the evolution, when and why to use the cloud services, some major market players and what they provide, and to familiarize the participants on the software and services available in the Cloud public domain. The fi rst day was an Industry day, the second day was devoted to technical workshops and on the third day selected R&D papers were presented by the conference participants. Photograph showing inaugurati on of the Conference (L to R): Mr. Sanjay Mohapatra, Prof. (Dr.) R Misra, Prof. (Dr.) L M Patnaik, IISC Bangalore, Er. S Rout, Mr. Sushant Panda, and Dr. K C Patra BANGALORE (REGION V) Mr. Srikantan Moorthy, Sr. VP & Group Head, Education 17 March 2012: i3 for i3 Club Launch @ Infosys Campus, Bangalore & Research, Infosys Technologies Mr. Srikantan Moorthy delivered key note address on “Top Employability Parameters”. Participants took up three key topics mentioned by Mr. Moorthy viz, A. Building competency among faculty B. Building Competency among students and C. Improving industry interaction.

Parti cipants in group discussion

CSICSC I CoCCommunicationsmmunications | MayMay 22012012 | 43 SPEAKER(S) TOPIC AND GIST COIMBATORE (REGION VII) Dr. Narasimha Murthy K Bhatta and Mr. Mahesh Kolar 10 March 2012: Industry Interaction Day on “Future of Indian IT Sector: Trends, Opportunities and Challenges” A technical session on ‘Cloud Computing’ was handled by Dr. Narasimha Murthy K Bhatta. The second technical session on ‘Mobile Technologies’ was delivered by Mr. Mahesh Kolar.The panel discussion held on the theme, “Future of Indian IT Sector: Trends, Opportunities and Challenges”. Various trends, opportunities and challenges of Indian IT industry were discussed by panel members.

(L to R) Mr. Ashok Bakthavathsalam, Mr. Mahesh Kolar, Mr. R Shekar, Prof. S Balasubramanian, Mr. Kumar Krishnasami, Dr. Narasimha Murthy K Bhatt a, and Mrs. Maya Sreekumar TIRUCHIRAPPALLI (REGION VII) Prof. S Ravimaran, Mr. Ramachandran, Dr. S Selvakumar 15 March 2012: National Level Technical Symposium on “Emerging Trends in Computing, Informatics and its applications - COMBLAZE 2k12” Mr. Ramachandran highlighted importance of communication skill and hard work. He advised student community to upgrade their skills continuously. Dr. S. Selvakumar briefed on Cyberspace security and Network security and technology updates in this Cyber era. He explained various security requisites and security measures. He explained the techniques with real world scenarios, latest tools and software for security and mentioned several resources and references to learn more on the subject.

Dr. Selvakumar at workshop

From Student Branches » http://www.csi-india.org/web/csi/chapternews-May2012 SPEAKER(S) TOPIC AND GIST ABES ENGINEERING COLLEGE, GHAZIABAD (REGION-I) 24 March 2012: An intra-college technical paper presentation competition (Techsurge-2012) Intra-college technical paper presentation competition was organized in collaboration with Ghaziabad Chapter. Objective was to create awareness among students about emerging technologies and encourage them to take up research on related subjects. Approximately 130 students participated in this activity from diff erent courses. Total 21 papers were presented during Techsurge-2012.

Guests on dias at ABES College, Ghaziabad

CSICSI CoCommunicationsmmuniccattionss | MayMay 22012012 | 444 wwwww.csi-india.orgw.csi-i indiaa.org SPEAKER(S) TOPIC AND GIST DR. ZAKIR HUSAIN INSTITUTE, PATNA (REGION-II) Prof. (Dr.) A K Nayak and Dr. M N Hoda 26 March 2012: One-day Seminar on “Twenty First Century Professionals: Industry Expectations” In his Inaugural Address, Prof. (Dr.) A. K. Nayak advised students to have dedication, devotion & determination to achieve scale of excellence in the profession. Prof. Hoda stressed that quality of computer education is the need of the hour for catering to the industry demand. He told students to make sincere eff ort to develop eff ective ability within them since students passing out are not reaching up to the expectations of organizations.

The dignitaries sitti ng on the dais during the workshop SARDAR VALLABHBHAI PATEL INSTITUTE OF TECHNOLOGY(SVIT), VASAD (REGION-III) Prof. Virendra Ingle and Prof. Rinku Chavada 15-16 March 2012: Workshop on "Android-based Mobile Application Development" The workshop covered topics like introduction to Android, the anatomy of Android applications, UI screen elements and layout, and Android data and storage APIs as well as Location-based Services APIs.

Parti cipants at the workshop R.V. COLLEGE OF ENGINEERING, BANGALORE (REGION-V) Mr. Partha and Dr. S Sathyanarayana 19 March 2012: Motivational Talk on “Software Testing - Career” It was an occasion to facilitate Certifi cate distribution for students, who cleared “Software Testing Certifi cation Examination”. Mr. Partha told that people look at testing with diff erent mindset. We need to think from the perspective of customer. The tester are better coders. Dr. S Sathyanarayana advised the participants to make better use of opportunities.

Parti cipants att ending moti vati onal talk on “Soft ware Testi ng” ANURADHA ENGINEERING COLLEGE, BULDHANA (REGION-VI) Prof. Avinash S Kapse and Dr. S V Agarkar 28 February 2012: National Science Day Celebration “Project Exhibition & Debate Competitions” Prof. Avinash S Kapse talked about importance of projects in globalization of knowledge & about projects needed by society. Dr. S V Agarkar gave guidance to students and answered their queries.Students explained their projects.

Inaugural Session: (L to R) Prof. Avinash Kapse, Dr. S V Agarkar, Shri. Siddheshwarji Wanere, and Students Mr. N B Mapari, Prof. Avinash S Kapse, Dr. S V Agarkar, 3-4 March 2012: Two-days Workshop on “Understanding and using and Prof. K H Walse Android platform” Prof. Avinash S Kapse talked about importance of workshop & made appeal to students to improve their personality. He suggested use of the Android technology in future life. He also spoke about globalization of knowledge. Dr. S V Agarkar spoke about importance of Android & its applications & technology. Prof. K.H.Walse talked about importance of Andriod technology in future.

Inaugural Session: (L to R) Mr. D G Vyawahare, Dr. Bhatt achrayya, Dr. S V Agarkar, Prof. Avinash Kapse, , Prof K H Walse, and Mr. Dhaval Gulhane

CSICSC I CoCCommunicationsmmunications | MayMay 22012012 | 45 SPEAKER(S) TOPIC AND GIST K. K. WAGH INSTITUTE OF ENGINEERING EDUCATION & RESEARCH, NASHIK (REGION-VI) 13-14 March 2012: National Level Technical Symposium "Equinox 2k12" Various events conducted such as - • CODE-COGS: Programming Contest, • SPIDER -WEB: Web Designing Contest • TECHNO HUNT: Project Competition • SCRATCH YOUR BRAIN: Aptitude & Group Discussion • NET CONNECT: Networking Workshop • WORLD WAR III: Robo Wars. Chief Guest Mr. Piyush Somani, Prof. Dr. S S Sane, Faculti es, and Student Member

17 March 2012: International Conference on “Emerging Trends in Computer Science and information Technology-2012 (ETCSIT-2012)” Professionals, academic researchers presented and discussed their conceptual and experimental work. The conference provided a forum for eminent academicians, technologist, scientists and researchers to exchange their ideas on the latest developments and future trends in Computer Science and IT. ETCSIT-2012 also provided a platform for UG and PG students & encouraged them to preset their work based on fi nal year project.

(L to R) : Prof. N M Shahabe, Dr. Uday Wad, Dr. Parvati Rajan, Dr. Bhargave, Prof. Dr. S S Sane, Mr. Shekhar Paranjape, Prof. S M Kamalapur, Prof.M B Jhade MET’S INSTITUTE OF ENGINEERING, NASHIK (REGION-VI) Dr. M U Kharat, Mr. Shirode, and Dr. V P Wani 9 -10 February 2012: Student Convention For the fi rst time in 12 years, the CSI Regional Convention for the Region VI was held. Mr. Shirode enlightened students with his experiences of all-round engineering and his 360 degree principle to look at the world. Dr. V P Wani with his motivating words asked the students to give their 100% eff orts in whatever competition they participate and make the competition tougher. During the Convention, IT Quiz, Paper Presentation, Circuit Trap, website design Contest, and Group Discussion contest were organized.

(L to R): Dr. Shirish S Sane, Dr. V P Wani, Mr. Shirode, Mr. Anil Shukla, Mr. Mangesh Pisolkar, and Prof. Aruna Deogire

20 March 2012: Project on “MLearning Framework for Multiple Platforms” MLearning project won fi rst prize in CSI- Discover Thinking National Project Student Contest and Expo 2012. Arpeet Kale, Saurabh Rawal, Jaspreet Kaur Kohli & Komal Bafna, who are students from Computer Engineering, developed this Mobile Application. These students developed a framework, which will deliver engineering education on mobiles through high quality 2D-3D animations, interactive learning content and many more such features.

Winners: Arpeet Kale and Jaspreet Kaur Kohli with Dr. Trimurthi and other Judges GOVERNMENT ENGINEERING COLLEGE(GEC), BARTON HILL, TRIVANDRUM (REGION-VII) Mr. NabeelKoya A, Dr. K C Chandrasekharan Nair, and 15 February 2012: One-day Technical Festival "Inceptra 2012" Mr. Shibin George Mr. Nabeel Koya deliberated on Cyber Security and Forensics in the current scenario. Dr. K C Chandrasekharan Naira talked on Student Entrepreneurship and opportunities open to them. Mr. Shibin George conducted a general quiz competition. Events included Bug Hunt, a technical competition involving cryptography to debugging; LOL Codes, a coding test on rare and useful programming languages; and Cascade Coding, a challenge on parallel programming. Competitions on Project Presentation and Gaming were also conducted as a part of the festival.

(L to R): Mr. Anand Kumar, Prof. Jayaprakash P, Prof. G Ramachandran, Dr. Sheela S, Prof. Balu John, and Ms. Sreelakshmi G S

CSICSI CoCommunicationsmmuniccattionss | MayMay 22012012 | 466 wwwww.csi-india.orgw.csi-i indiaa.org SPEAKER(S) TOPIC AND GIST JYOTHI ENGINEERING COLLEGE(JEC), THRISSUR, KERALA. (REGION-VII) Dr. Gylson and Mr. Chaitany Khanpur 24-25 February 2012: Two-days Workshop on "Cloud Computing" Principal, Dr. Gylson Thomas inaugurated the Two-day National workshop on "Cloud Computing". Mr. Chaitany Khanpur gave a deep and interactive class about Cloud computing from the basics of cloud computing and grid computing. Students also got a hands-on session for implementing private cloud.

During the workshop KALASALINGAM UNIVERSITY, TAMILNADU (REGION-VII) Dr. Maluk Ahamed and Dr. Kalaiselvi 28-29 March 2012: Digital Dreams ’12 – National Level Technical Symposium Dr. Maluk advised students to acquire knowledge about their fi eld by attending Symposiums and Seminars and stressed the importance of maintaining quality standard. In the Symposium, 51 papers were presented. Themes were Distributed Computing, Network Technology, Image Processing and AI techniques. Dr. M. A. Maluk Ahamed delivered lecture on Distributed Computing and Dr. Kalaiselvi spoke on “Medical Imaging”. Other events included Technical Quiz, C-Debugging, Trailer Presentation, Web Designing, Situation Manager and Treasure Hunt.

Dr. M A Maluk Mohammed releases the souvenir of Technical Symposium MAR BASELIOS COLLEGE OF ENGINEERING (MBCET), TRIVANDRUM (REGION-VII) 24 February 2012: Intercollegiate Code Debugging Contest “Neosoft” The competition consisted of two rounds: the prelims and the fi nal round. The prelim was a written round, testing the logical and analytical skills of the participant. The fi nal round was a practical round consisting of three questions, testing the logical, innovative thinking, and team work of the participating teams.

Code Debugging competi ti on in progress NATIONAL ENGINEERING COLLEGE (NEC), KOVILPATTI (REGION-VII) Mr. M K Anand 22 March 2012: Inaugural Function – “National Conference NACCA’12” The Mr. M K Anand inaugurated the conference and addressed the gathering. In his speech, he advised the students not only to look for jobs but also they must concentrate on self-employment with innovative ideas. The inaugural session was followed by the technical sessions in which advanced topics like Grid Computing, Mobile Computing, Soft Computing, and Distributed Computing were presented.

Release of Conference Proceedings by Chief Guest Mr. M K Anand (L to R): Ms. E Siva Sankari, Dr. D Manimegalai, Dr. P Subburaj, Mr. M K Anand, Dr. Kn. K S K Chockalingam, and Mr N BalaSubramanian

CSICSC I CoCCommunicationsmmunications | MayMay 22012012 | 47 Following new student branches were opened as detailed below – REGION I . Model Institute of Engineering and Technology (MIET), Jammu - First CSI student branch in Jammu & Kashmir was inaugurated on 24th March, 2012. On this occasion CSI convention on Disaster Management and e-Governance was organized. Two projects by MIET students showcased on the occasion were - a “Social Network Promoting Social Responsibility” by Sajan Sridhar and “Election Management” by Sumit Gupta. Prof. Ankur Gupta described several IT initiatives at MIET including fi ling of 3 patents; in-house development of 2 IT products; and 3 open source IT projects undertaken pertaining to learning management, campus ERP, and admission management systems. REGION III . NRI Institute of Technology and Management (NRIITM), Gwalior - Dr. S K Gumasta gave an inaugural speech on the occasion of opening a new student branch at NRIITM on 17th February, 2012. A seminar was jointly organized by NRIITM, Gwalior and CSI Gwalior chapter on this occasion. REGION V . REVA Institute and Technology Management (RITM), Bangalore - Inauguration of REVA CSI Student Branch was held on 11th February, 2012. The Chief Guest of the function was T N Seetharamu, who inaugurated the student chapter. On the occasion, Mr. Suman Kumar delivered a talk on “Android – The Mobile Technology”, which was attended by a large number of students, faculty, and staff members of the college. REGION VI . Institute of Management and Entrepreneurship Development (IMED), Pune - On 29th March, 2012 Inaugural ceremony of “IMED-Student Chapter-CSI” was held in the presence of Dr. M S Prasad and Dr. M V Shitole. Chief Guest of the ceremony was Mr. C G Sahasrabuddhe. Mr. Amit Dangle was guest of honor. REGION VII . S. Veerasamy Chettiar College of Engineering and Technology, Tirunelveli - The Inaugural function of student branch was organized on 29th February, 2012. The Chairman Dr. V Murugaiah presided over the function. Mr. Y Kathiresan spoke on the occasion on “Personal Eff ectiveness”.

CSI BRINGS MEMBERS AND OPPORTUNITY TOGETHER Computer Society of India is the recognized association for Information and Communications Technology (ICT) professionals, attracting a large and active membership from all levels of the industry. A member of the Computer Society of India is the public voice of the ICT profession and the guardian of professional ethics and Join standards in the ICT industry. We also work closely with other industry associations, government bodies, and academia to ensure that the benefi ts of IT advancement CSI ultimately percolate down to every single citizen of India. Membership demonstrates IT professionalism and gives a member the status and recognition deserved. Learn more at www.csi-india.org

I am interested in the work of CSI. Please send me information on how to become an individual/institutional* member Name ______Position held______Address______City ______Postal Code ______Telephone: ______Mobile:______Fax:______Email:______*[Delete whichever is not applicable]

Interested in joining CSI? Please send your details in the above format on the following email address. [email protected]

CSI Communications | May 2012 | 48 www.csi-india.org Prof. S V Raghavan CSI Calendar Vice President & Chair, Conference Committee, CSI 2012

Date Event Details & Organizers Contact Information May 2012 Events

22-26 May 2012 Workshop on Confi guring and Administering Microsoft Share Point 2010 Mr. Abraham Koshy CSI Mumbai Chapter [email protected] 24-27 May 2012 Certifi cate Course on PMP (Project Management) 4.0 (36 Hours of PDU's) Mr. Abraham Koshy CSI Mumbai Chapter [email protected] 26-27 May 2012 Two - Day Workshop on "Secure Computing Systems" Dr. T V Gopal CSI Division II [Software] and Military College of Telecommunication Engineering [email protected] [MCTE], Mhow.

June 2012 Events

8-12 June 2012 Hands on workshop on Microsoft Share Point 2010, Application Development Mr. Abraham Koshy CSI Mumbai Chapter [email protected] 13 June 2012 Software Process Information Network (SPIN) Meet on the topic of Advance Agile Mr. Abraham Koshy Methodology (Scrum etc) [email protected] CSI Mumbai Chapter 21-24 June 2012 Certifi cate Course on PMP (Project Management) 4.0 (36 Hours of PDU's) Mr. Abraham Koshy CSI Mumbai Chapter [email protected]

July 2012 Events

26-28 July 2012 International Conference on Advances in Cloud Computing (ACC-2012) Dr. Anirban Basu CSI, Bangalore Chapter and CSI Division I [email protected] Dr. C R Chakravarthy [email protected]

August 2012 Events

31 Aug-1 Sep 3rd International Conference on Transforming Healthcare with IT Dr. T V Gopal 2012 CSI Division II (Software), Hyderabad [email protected] www.transformhealth-it.org

September 2012 Events

5-7 September International Conference on Software Engineering (CONSEG 2012) Dr. T V Gopal 2012 CSI Division II (Software), Indore [email protected] www.conseg2012.org 13-14 September Global Science and Technology Forum Business Intelligent Summit and Awards Dr. T V Gopal 2012 CSI Division II (Software), Singapore [email protected] www.globalstf.org/bi-summit

November 2012 Events

29 Nov-1 Dec Third International Conference on Emerging Applications of Information Technology D P Mukherjee/Debasish Jana/ 2012 (EAIT 2012) Pinakpani Pal/R T Goswami CSI Kolkata Chapter Event at Kolkata, URL: https://sites.google.com/site/csieait2012/ [email protected]

December 2012 Events

1-2 December 47th Annual National Convention of CSI (CSI 2012) Subimal Kundu/D P Mukherjee/ 2012 Organized by CSI Kolkata Chapter, URL: http://csi-2012.org/ Phalguni Mukherjee/J K Mandal [email protected] 14-16 December International Conference on Management of Data (COMAD-2012) Mr. C G Sahasrabudhe 2012 SIGDATA, CSI, Pune Chapter and CSI Division II Shekhar_sahasrabudhe@ persistent.co.in

Please send your event news to [email protected] . Low resolution photos and news without gist will not be published. Please send only 1 photo per event, not more. Kindly note that news received on or before 20th of a month will only be considered for publishing in the CSIC of the following month. Registered with Registrar of News Papers for India - RNI 31668/78 If undelivered return to : Regd. No. MH/MR/N/222/MBI/12-14 Samruddhi Venture Park, Unit No.3, Posting Date: 10&11 every month. Posted at Patrika Channel Mumbai-I 4th fl oor, MIDC, Andheri (E). Mumbai-400 093

47th Annual National Convention of the Computer Society of India Organized by The Kolkata Chapter December 1-2, 2012, Science City, Kolkata In conjunction with 2012 Third International Conference on Emerging Applications of Information Technology (EAIT-2012)

Call for Paper and Participation Convention Theme: Advisory Committee Intelligent Infrastructure R N Lahiri, Chair Event Chair Convention Event: Subimal Kundu Organizing Committee International Conference on Intelligent Infrastructure D P Mukherjee, Chair The Computer Society of India Kolkata Chapter (CSIKC) cordially invites you to S Sinha, Co-Chair participate in the 47th Annual National Convention of CSI. While this event will follow Program Committee the glorious footsteps of previous conventions, it would still be a unique event focussing P Mukherjee, Chair on the theme of Intelligent Infrastructure. J K Mandal, Co-Chair Finance Committee CSI and CSI Kolkata Chapter: Formed in 1965, the CSI has been instrumental in guiding R T Goswami, Chair the Indian IT industry since its formative years. CSIKC is the oldest chapter and the fi rst D Dutta, Co-Chair CSI Annual National Convention was held in Kolkata at the Indian Statistical Institute Convention Committee in 1965. To commemorate the achievement of CSI, CSIKC will host the CSI-2012. The S Daspal event will comprise of Plenary Sessions, Paper Presentations and Panel Discussions. D P Sinha Intelligent Infrastructure: Compelling changes in society and nature require S Roychowdhury unprecedented fusion between the physical and the virtual worlds. Today’s society is a Avik Bose Anirudhha Nag complex system of systems; it is a combination of economic development, public safety, Prashant Verma healthcare, energy and utilities, transportation, education and various other systems. Gurudas Nag The function of intelligent infrastructure is to model as well as manage these complex Gautam Hajra interconnected systems based on a greater understanding of the interconnectivity and Md Aliullah utilisation of the latest developments in ICT. The inter-disciplinary nature of intelligent Chinmay Ghosh infrastructure provides a great deal of opportunity for creative approaches to problem T Chattopadhyay solving. The International Conference on Intelligent Infrastructure in CSI-2012 aims to Subir Lahiri provide a platform for fruitful deliberations on this theme of the hour. Debasish Jana Pinakpani Pal The theme includes (but not limited to) following topics: Convention Website: • Intelligent Infrastructure Applications http://csi-2012.org/ ° Precision Agriculture and Smart Growth Systems Paper Submission: ° Smart Grids and Wide Area Measurement Systems Aug 30, 2012 ° Intelligent Building Automation Systems Paper Acceptance: ° Intelligent Energy and Water Management Systems Sept 30, 2012 ° Intelligent Manufacturing, Healthcare, Transportation Systems Please contact: • Intelligent Infrastructure Technologies CSI Kolkata Chapter ° Smart Structures, Federated Devices, Sensor Signal Processing and Modelling 5 Lala Lajpat Rai Sarani (Elgin Road), Miniature Wireless Sensors and Networks, Nanoscale Sensors 4th Floor, Kolkata 700 020 ° Phone: 2281-4458 ° Security Issues in Smart Infrastructures, Smart GIS Telefax: 2280-2035 ° Computational and Machine Intelligence Tools Email: [email protected] • Intelligent Infrastructure Platforms Web: http://csi-kolkata.org/ ° Sensor Web-enablement, Sensor Data Analytics Conference in Conjunction: ° Management of Big Data and Associated Development Technologies 2012 Third International Conference on ° Next Generation Data Centre Technologies for the Exascale Era Emerging Applications of Information Technology (EAIT-2012) Proceedings: Original unpublished research articles, development notes Nov 29 – Dec 01, 2012, Indian Statistical and position papers aligned with the theme of the convention will be Institute, Kolkata published in the Proceedings of the International Conference on Intelligent EAIT-2012 Website: Infrastructure. The author instructions for paper submission are available at https://www.sites.google.com/site/csieait2012 http://csi-2012.org/. Media Partner for twin mega events Journal Special Issues: Extended versions of the selected papers presented EAIT-2012 and CSI-2012 in the conference will be published in Journals. CSI Journal of Computing (ISSN: 2277-7091) will publish a special issue on Intelligent Infrastructure after FAST TRACK review of selected papers from the conference. CSI Communications | May 2012 | 50