Multilingualism for Cultural Diversity and Universal Access in Cyberspace: an Asian Perspective

Dr. Om Vikas Senior Director & Head Human Centered Computing Division Ministry of Communications & Information Technology 6 C.G.O.Complex, Lodhi Road, New Delhi – 110003 Email : [email protected] Website: http://tdil.mit.gov.in Tel/Fax : +91-11-2436 3076 ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 1

Content

1 Catching up the knowledge wave

2 World divides digitally

3 Developing countries in Asia Pacing Up Slow

4 Digital Divide as They Behold

5 Technology Races to Human Brain

6 Linguistic and Cultural Diversity

7 Languages spoken in Asian Countries

8 Linguistic Scenario in India

9 Major Multilingual Technology Initiatives

10 Language Technology Initiatives in Asia

10.1 ICT in Local Languages

10.2 Digital Libraries

11 Technology Development for Indian Languages

11.1 Implementation Strategy

11.2 Industry Consortium CoILTech

11.3 Whither in Seven Initiatives

11.4 Media Lab Asia

11.5 Cultural Informatics

11.6 E-Readiness in Indian States

12 Beacon to Steps Ahead

13 Recommendations

14 Summing Up

Reference

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 2

Keywords: Multilingualism, local languages computing, IT for masses, digital divide, diversity cultural, localization, languages Technology initiatives, e- governance, e-rural prosperity.

Abstract

The kaleidoscope shows a diversity of colors and patterns. The world is full of bio-diversity, linguistic diversity, cultural diversity and so on. On the trail of the industrial revolution emerges a new knowledge based society. Superior technologies become available to add to speed, productivity and quality. Issues of affordability, availability, and access to them haunt the nations, which trail behind. ICT, as an enabling technology, brings hope. Improved ICTs appear rapidly with lower price, higher performance and smaller in size. This has fueled the process of transforming diversity into polarity – digital divide – that is sprawling and threatening to be a menace to all. “IT for masses” is the new recipe to arrest the digital divide.

Increase in PC penetration and Internet users is often considered a sign of bridging digital divide. But knowledge divide will persist unless the efficacy level of ICT utilization improves. ICT efficacy will depend on ease in handling ICT appliances in local language and communicating seamlessly with other communities. This may necessitate shift in focus from “Bridging Digital Divide” to “Digital Unite and Knowledge for all”, and also from “being Consumers” to “becoming Creators”.

The prevailing Localisation efforts are top-down - influenced by western culture and ethos; global icons suppress local contents. But Localisation must be bottom-up: local-to-global, must handle locale-specific issues, must use reusable components, must look for new markets and cost models. Developing nations should not remain mere recipient of contents localized by others; they must become localizers; and make their content accessible to all. Middleware technology helps in rapidly changing technology environment with a variety of applications. Steps need be taken to localise middleware. Unicode, for instance, emerges with consistent character encoding scheme for living languages of the world. Cross-Lingual Information Retrieval will facilitate access to multilingual information resources. Universal access will depend on affordability of ICT appliances. We detail the multilingual scenario of India, with relevant references to language policies and local language computing initiatives in other Asian countries.

New order of knowledge based society be: Rise, Raise and Race that implies promoting collectivism, collaboration for innovation and striving towards universalization of creativity.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 3

1. Catching up the Knowledge Wave or Trailing Behind ?

With the advent of technologies – Writing system (3000 BC), Printing Press (1450 AD), Steam engine (1780 AD), electricity (1890), and Personal Computer (1970) – there had been rapid transformations in the societies. Knowledge increased. The societies, which participated actively in the process of Knowledge generation, became advanced. Parity in sharing of Knowledge is distancing the societies.

Knowledge defies economic principle of scarcity. Knowledge is not scarce in traditional sense. The more you use it and pass it on, the more it proliferates. It is "infinitely expansible" or "non-rival in consumption". It can be replicated cheaply and consumed over and over again. Knowledge is more difficult to measure than traditional inputs such as steel or labour. Future prosperity of rich economies will depend both on their ability to innovate and on their ability to adjust to change.

20th century brought unprecedented erosion in knowledge of world communities. From an estimated 10,000 world languages in 1900, about 6,700 language surveyed in 2000. By the middle of 21st century, almost all of the world's many ecosystems will be occupied by people who have no indigenous language capable of describing, using, or conserving the diversity that remains. Two percent of the world's languages are becoming extinct every year. There is worldwide, unquantifiable erosion of cultural participation, knowledge and innovation. With the loss of a language, we lose art and ideas, scientific information and technological innovation capacity. World- level literacy is improving. More people can read than ever before, but fewer people create stories. We have moved from being creators to consumers at the time when technology could have amplified our creative capacities.

According to a UNESCO study (1999) of 65 languages for which data was available for both 1980 and 1994, 49 of the languages (75 percent) had experienced real decline in number of works translated from these languages into other languages. The proportion for English arose from 43 percent in 1980 to over 57 percent in 1994. The share held by top four translated languages (English, Spanish, French and German) rose from 65 percent in 1980 to 81 percent in 1994. The UNESCO study also shows that cultural erosion is not confined to the collapse in translated languages. There is also collapse in quality. According to an UNESCO study involving world’s 140 most published authors; 90 out of 140 were English writers in 1994 compared to 64 out of 140 in 1980. There is collapse in authorship, translation and quality in other languages.

Over 25 Million pages are added per year as research in science & technology. Most of these are in English, few in European languages & Japanese, and negligible in the rest. On Internet, more than 2/3 content is in English alone. This is natural consequence of the fact that English has become lingua franca of Science & Technology, and research is conducted/ published in English speaking advanced countries. This has resulted into "Innovation Divide". Innovation traits set roots through . Knowledge can be communicated from one language to ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 4

another, and can grow indigenously. Knowledge acquisition, absorption, communication, and generation are key processes in catching up the knowledge wave.

In the 20th century, we had the potential to use technology to liberate creativity and extend cultural participation. Instead, we used the technologies to curtail participation and to control creativity. [Development Dialogue 1999: 1-2]

In 1960, the world’s poorest countries (20 percent of world population) accounted for 4 percent of global exports; by 1990 their share slipped to barely 1 percent. Predictions that the ‘poor might not always be with us’ have not come true. By 1998, percentage of absolute poor in the world (income below US $1 per day) was at 24 percent and the trend line had turned upward. Optimistic forecasts of gains of technology now seem illusory. Are we winning or losing? Is the world losing more knowledge than it is gaining? [Development Dialogue, 1999]

2. World Divides Digitally

ICT Indicators for Information Access

Country Internet Internet PCs Cellular As % of total Population Population Telephone Name Hosts Per Users per Per Mobile Telephone Total Density Subscribers 10’000 10’000 inhab Subscribers Subscribers (M) (Per km2) Per 100 Inhab Inhab 100 Per 100 2003 2003 2003 Inhabitants 2003 2003 Inhabitants 2003 2003 - 17.98 0.78 1.01 64.8 135.12 938 1.56 Bhutan 13.41 204.27 1.36 1.09 24.1 0.79 16 4.52 Cambodia 0.58 24.75 0.23 3.52 93.2 14.14 78 3.78 1.28 632.48 2.76 21.48 50.7 1256.9 131 42.38 India 0.82 174.86 0.72 2.47 34.8 1056.8 334 7.10 Iran 0.76 723.66 9.05 5.09 18.8 66.33 40 27.06 Israel 643.87 3014.05 24.26 96.07 67.7 6.77 306 141.89 Japan 1015.68 4826.87 38.22 67.90 59.0 127.62 338 115.09 Korea 797.62 6096.99 55.80 70.09 56.6 47.93 487 123.93 Lao P.D.R. 1.65 33.46 0.35 1.98 61.7 5.68 24 3.21 42.90 3440.95 16.69 44.20 70.9 25.17 76 62.36 Mali 0.17 23.52 0.14 2.25 81.2 10.86 9 1.03 - 5.26 0.56 0.12 15.5 53.22 78 0.81 Nepal 0.39 34.48 0.37 0.21 11.9 23.68 167 1.78 Pakistan 1.01 102.77 0.42 1.75 39.7 149.58 186 4.42 Philippines 3.45 440.38 2.77 26.95 86.7 81.10 270 31.07 Singapore 1155.31 5087.65 62.20 85.25 65.4 4.20 6147 130.28 Sri Lanka 0.98 130.45 1.70 7.27 59.7 19.16 292 12.17 1228.55 8830.00 47.14 114.14 65.9 22.60 628 173.22 16.44 1105.19 3.98 39.42 79.0 63.08 123 49.91 United 5577.84 5558.01 65.98 54.58 46.7 290.81 31 116.96 States

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 5

ICT Indicators and PPP (Purchasing Power Parity) are compared here below for underdeveloped, developing and advanced nations.

Tele density Cell phone Density PC penetration PPP

Advanced Nations 50-70 % 30-75 % 30-60 % 100 U

Developing Nations 20-30 % 04-7 % 0.5-2 % 5-10 U

Underdeveloped Nations << 1 U

Sprawling Digital <<<<<<<<< >>>>>> >Divide !!!!!

In comparison to advanced nations, PPP is around 10 percent for developing nations, and less than 1 percent for underdeveloped nations. For rapid penetration of ICT, PPP is key factor in evolving action plan during catch up phase of economic development. Affordable cost may be determined on this basis. For example $400 PC may be low cost PC in advanced nation, but it must cost less than $40 in developing nations. Communication technology will soon be suitable. However, computer technology may pose some problems in input & output, representation & manipulation of information in non-Roman scripts.

The price and the language processing ability will determine ICT efficacy in a local situation.

Linguistic Divide on Internet is obvious with the following statistics:

Latin Alphabet users have 39 % of the global population, and enjoy 84% of access to the Internet.

Hanzi-users (in CJK) have 22% of global population, and enjoy 13% of Internet access

Arabic script users have 9% of global population, and enjoy 1.2 % of the Internet Access

Brahmi-origin scripts users in South- and Indic scripts users occupy 22 % of the world population whereas they have just 0.3 % of Internet access.

More than 65% of the content on Internet is in English. [according to IBM’s Web Fountain analysis, 2003] ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 6

ITU study of all 178 nations in world (Nov 2003) ranked each country with Digital Access Index (DAI) determined by factors such as education, affordability of internet access, and the proportion of Internet users with high-speed connection in addition to the raw availability of bandwidth. Most interesting factor was affordability, defined by what it costs to access the internet as a percentage of a country’s gross national income per capita. Worst in affordability, the congo, was 5000 times worse than the best.

A world divided by a common Internet (Ref : IEEE Spectrum, Feb 2004) In a Study of 178 nations, ITU Calculated Digital Access Index (DAI) based on education, access affordability, Hi-speed connectivity users, bandwidth availability etc. Typical ranking is :

Name of the Digital Name of the Digital Name of the Digital Century Access Century Access Century Access Index (DAI) Index (DAI) Index (DAI) Sweden 1 Finland 8 Japan 15 Denmark 2 Taiwan 9 Australia 19 Iceland 3 Canada 10 Russia 63 South Korea 4 USA 11 China 84 Norway 5 UK 12 India 119 Netherlands 6 Switzerland 13 Nigeria 165 Hong Kong 7 Singapore 14 Niger 178

(Source : IEEE Spectrum, Feb2004)

With intervention of UNESCO and language technology initiatives of developing countries, use of non-English languages on Internet will soon outpace with predominance by 2010.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 7

Ray of Hope : Potential Use of non-English languages on Internet will increase drastically by 2010 as shown below: Users 2010 500 2003 Mn 400 Mn 300 Mn 200 Mn

100 Mn

0 Eng Jap Chinese Frenc Spanish German Indian languages

3. Developing countries in Asia Pacing Up Slow There are 45 Countries in Asia (Ref. ITU, 2005). These include Arab states also. There are 17 countries in East Asia and the pacific : Brunei, Cambodia, China, Fiji, Indonesia, Hong Kong, Lao, Malaysia, Samoa, Singapore, Myanmar, Mangolia, Papua New Guinea, Philippines, Thailand, Vietnam. There are 8 countries in : Bangladesh, Bhutan, India, Iran, Maldives, Nepal, Pakistan, Srilanka.

To illustrate their position vis-à-vis that of developed nation, we may consider Economic indicators and Competitiveness indicators.

Economic Indicators (2002-2003) Country Population GNP Per Capita PPP Rank (Mn) ($Bn) GNP ($) ($) World 6,054 31815 5080 7570 - India 1027 4986 480 2570 81 America 282 10405.3 35060 35060 2 Japan 127 4215.5 33550 26070 14 Germany 82 1719.5 22670 26220 12 France 59 1315.8 25500 25000 20 Britain 60 1486 25250 25870 16 China 1261 1210.7 940 4390 66 Pakistan 145 59 410 1940 89 Bangladesh 136 49 360 1720 91 SriLanka 19 16 840 3390 74 Thailand 63 122 1980 6680 45 Malaysia 25 86 3540 8280 39 Singapore 4 86 20690 23090 19 [Source : Tata Economic Service 2003-04] ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 8

Competitiveness Indicators

Country Competitiveness Work Technology Management Infrastructure Rank ethics Absorption India 49 53 38 37 58 America 1 17 1 1 2 Japan 21 1 3 16 16 Germany 15 8 12 6 5 France 22 19 20 9 6 Britain 9 30 29 19 22 China 41 51 54 56 46 [Source : Tata Economic Services 2001]

PPP (Purchasing power parity) relates to affordability. PPP is about 10% or less for developing countries.

Demographic Trends (Bn) (%) (%) Population Pop above Total Annual pop. Urban Population under 15 65 Population Grow Rate (%) (%)

1999 2015 1999-2015 1999 2015 1999 1999 Developing Countries 4.6 5.7 1.4 38.9 47.6 33.1 5.0 East-Asia & pacific 1.8 2.1 0.8 34.5 44.0 27.3 6.1 South Asia 1.4 1.76 1.5 29.9 38.2 35.5 4.5 Least Developed 0.6 0.9 2.4 25.4 35.1 43.2 3.1 Countries world 5.86 7.05 1.2 46.5 53.2 30.2 6.9 High Human Dev. 1.05 1.12 0.4 78.3 82.1 19.3 13.7 Mid Human Dev. 3.99 4.71 1.0 41.4 49.6 30.3 5.8 Low Human Dev. 0.82 1.22 2.5 30.4 40.6 43.8 3.1

Human Development Index (1999) HDI Life expect Education GDP Index GDP per capita index Index PPP (US$) Developing Countries 0.65 0.66 0.69 0.59 3,530 East-Asia & pacific 0.72 0.74 0.81 0.41 1,170 South Asia 0.56 0.63 0.54 0.61 3,950 Least Developed 0.44 0.45 0.47 0.52 2,280 Countries world 0.72 0.70 0.71 0.71 6,980 High Human Dev. 0.91 0.87 0.96 0.91 23,410 Mid Human Dev. 0.68 0.70 0.75 0.61 3,850 Low Human Dev. 0.44 0.46 0.41 0.41 1,200

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 9

UNDP Human Development Report - 2001 describes calculation of HDI based on 3 Dimension indices : Life expectancy index , education index and GDP Index in 3 respective dimensions : A long & healthy life, Knowledge and a decent standard of living using respectively the indicators : Life expectancy at birth, Adult literacy rate & Gross enrollment ratio, and GDP per capita (PPP US $). Salient observations in the HD Report are as following :

ƒ Throughout the history, Technology has been a powerful tool for human development and poverty reduction. ICT provides powerful new ways for citizens to demand accountability from their governments and in the use of public resources.

ƒ Technology is created in response to market pressures, not the needs of poor people, who have little purchasing power. Technology is also unevenly diffused within countries.

ƒ Developing countries may gain especially high rewards from new technologies, but they also face especially severe challenges in managing the risks. Just as the steam engine and electricity enhanced physical power to make possible the industrial revolution, digital and genetic breakthroughs are enhancing brain power.

ƒ The technology revolution and globalization are creating a network age and that is changing how technology is created and diffused. In the network age, every country needs the capacity to understand and adapt global technologies for local needs.

ƒ All Countries, even the poorest, need to implement policies that encourage innovation, access and development of advanced skills.

ƒ National policies will not be sufficient to compensate for global market failures. New international initiatives and the fair use of global rules are needed to channel new technologies towards the most urgent needs of the world’s poor people. Policy, not charity, will determine whether new technologies become a tool for human development every where. Commitments under TRIPS (Trade- Related Aspects of Intellectual property ) agreement of WTO to promote technology transfer to developing countries are paper promises, often neglected in implementation. They must be brought to light.

Development is about expanding the choices people have to lead lives that they value. In developing countries, there are serious deviations in many aspects of life-health, education, income poverty.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 10

The success or failure of individuals and nations, as well as the property of mankind, depends on whether we can wisely develop our human resources. Technological advance has contributed greatly to the acceleration of human progress in the past several centuries.

Today’s technological transformations are intertwined with another Transformation - Globalisation – and together they are creating the network age.

4. Digital Divide as They Behold

Perception Developed Countries Developing Countries Why discussed? Desire to capture larger Fear of lagging behind in markets economic race Policy Information explosion Localization Results Increasing use of English Erosion of local language and thrust of western and culture. culture. Consumer nature “substitute the old” “Upgrade the Old”

Technology development IPR-Centric Open preferred technology Low cost PC $400 Less than $ 40

Access cost 100 U Less then 10 U Reason: 34260 (USA) 2400 (India) PPP : (15:1) 24260 460 GNP : (75:1) Focus ƒ Digital divide ƒ Digital Unite ƒ Universal access to ƒ Universalisation of Information creativity ƒ Wider control ƒ Share the Knowledge

Low affordability means low ICT penetration and sprawling Digital Divide

How can technology convert erupting “digital divide” into “digital unite”?

5. Technology Races to Human Brain Prof. Raj Reddy of Carnegie Mellon University predicts that after 10 years from now we shall be getting at the same cost the processing power 100 times, the storage 1000 times, and the band-width 10,000 times. ICT will be affordable, easy to use and pervasive.

Ray Kurzweil, an informatics guru, predicts that within 10 years, a 1000-dollar computer will be able to perform more than one trillion calculations a second, that

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 11

well within the first quarter of the 21st century, a similarly priced computer will match the human brain.

The Interspace represents the third wave in the ongoing evolution of the Global Information Infrastructure, driven by rapid advances in computing and Communication Technology.

The technological progress of knowledge exchange - from e-mail in Arpanet (1965- 1985) to Document browsing in the Internet (1985-2000) to Concept navigation in the forthcoming Interspace (2000-2010) - has occurred in three waves, each building on the previous one.

The convergence of computing and networking is more evident in the phenomenal growth of the World Wide Web. Gorden Moore, founder of Intel corporation postulated in 1965 that the microprocessor chip would double in performance (as defined by the number of transistors on a chip) every 18 months, that is 58 percent compounded annual growth rate. Historically, the semiconductor industry has kept pace by continuously shrinking feature size to increase the number of transistors on a chip, and thus increasing the speed of the circuits. Beyond 2006, physical barriers ultimately include atomic properties that will come to fore with aggressive device shrinkage.

6. Linguistic and Cultural Diversity UNESCO takes into due consideration the existence of multilingual communities and encourages linguistic and cultural diversity.

UNESCO has adopted the Recommendation on the promotion and use of Multilingualism and universal access to cyberspace, which focuses on 4 priority areas : (i) development and promotion of multilingual content and systems, (ii) access to networks and services, (iii) development of public domain content; and (iv) reaffirming and promoting the fair balance between the interests of rights-holders and the public interest.

About 6000 world languages survive today.

During MT Summit XII, September 1999, Prof. Hozumi Tanaka presented estimated ranking of mother tongue based populations in the following table [TANAKA, 1999]:

Table 2. Language -wise world population Language 2050 Population in 1996 Population in Billion Billion Chinese 1.384 1.113 Hindi/Urdu 0.556 0.316 English 0.508 0.372 Spanish 0.486 0.304 Arabic 0.482 0.201 Portuguese 0.248 0.165 ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 12

Bengali 0.229 0.125 Russian 0.132 0.155 Japanese 0.108 0.123 German 0.091 0.102 Malay 0.080 0.047 French 0.076 0.070

This ranking suggests that Chinese, Hindi, English, Spanish and Arabic will still remain the top major languages in 2050.

Art : Asian art also has its classical and folk traditions which are still vibrantly alive, and covers the gamut of genres from painting to sculpture to handicrafts. Visual art has been influenced by developments elsewhere in the world. Contemporary art is steadily developing its own language in trying to interpret the ethos in as many media as possible.

Dance : Indian dance has an unbroken tradition of over 2000 years, with themes drawn from mythology, legends and classical literature. It also can be broadly divided into folk/tribal dances which have many regional variations, and the classical dances, which are based on ancient texts and have rigid rules of presentation. (click to view)

Most of the classical dance forms seem to have evolved from a single generic source. The Natyashastra is a Sanskrit compilation of the basic components of Indian music, dance and drama. Some of the major classical dance traditions are Bharata Natyam, Kathak, Odissi, Manipuri, Kuchipudi, Mohiniattam and Kathakali.

Music : There are two broad systems of classical music that dominate the scene today. Hindustani music of northern India and Carnatic music which is more popular in the southern states of Kerala, Tamil Nadu, Andhra Pradesh and Karnataka.

There are many unique regional forms of music which can be classified as: devotional and ritual music, seasonal songs and dance, functional and social items, ballads and narrative forms.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 13

Literature : Indian literature can date its origins to the oral tradition of the Vedas and the great epics of India, which are still an integral part of daily life. Poetry, drama, fiction, non-fiction and all other literary styles have a substantial corpus in each of India's major languages and in quite a few dialects, while the oral tradition also continues through folk songs and dramas. Theatre in India also has ancient historical roots, though classical theatre is performed very rarely nowadays, having been overtaken by a vibrant tradition of folk theatre (including puppet and shadow theatre) and modern professional theatre

Medicine and Engineering : Ayurveda, India's indigenous form of medicine, is still popular throughout the country and indeed the world.

Indian mathematicians are credited with introducing the concept of zero to the world. There is plenty of evidence that ancient India had a very strong interest in astronomy. Archaeologists have found ancient India to have a very high degree of town planning in terms of straight roads, sophisticated sewage systems, granary storage and public baths.

7. Languages Spoken in Asian Countries:-

Afghanistan:- Aimaq, Arabic, Tajiki, Ashkun, Azerbaijani, Balochi, Brahui, Dari, Darwazi, Farsi, Gawar-Bati, Gujari, Hazaragi, Jakati, Kamviri, Karakalpak, Kazakh, Kirghiz, Malakhel, Mogholi, Pashayi, Pashto, Sanglechi-Ishkashimi, Tangshewi, Tatar, Tirahi, Turkmen, Uyghur, Uzbek, Waigali, Wakhi, Warduji, Wotapuri-Katarqalai [Script(s) in Use : Arabic]

Bangladesh:- Achik Kata, Bangla, Chakma, Garo, Khasia, Magh, Manipuri, Munda, Oraon, Oyar, Santali, Kachhari, Kuki, Tipra, Malpahadi, Mikir, Shadri and Hajang [Script(s) in Use : Bangla]

Bhutan:- Bhutanese, chhokey, Sharchopkha, or Tsangla, Khen, Bumthangkha, Nepali, or Lhotsam, Dzongkha, Nepali [Script(s) in Use : Devanagari]

China:- Achang, Adi, Akha, Atsi, Cantonese, Chinese, Chinese: Chihli, Chinese: Foochow, Chinese: Hainan, Chinese: Hakka, Chinese: Hangchow, Chinese: Hankow, Chinese: Hinghua, Chinese: Kiaotung, Chinese: Kienning, Chinese: Kienyang, Chinese: Kinhwa, Chinese: Kuoyu, Chinese: Nanking, Chinese: Ningpo, Chinese: Sankiang, Chinese: Shanghai, Chinese: Shantung, Chinese: Shaowu, Chinese: Chinese: Soochow, Chinese: Swatow, Chinese: Taichow, Chinese: Wenchow, Chinese: Wenli Easy, Chinese: Wenli High, Chung-Chia, Guoyu, Hmong Daw, Hmong Njua, Kadu, Kalmyk, Kazakh, Keh-Deo, Kopu, Korean, Lahu, Lahuli: Bunan, Lahuli: Tnan, Laka, ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 14

Lawa, Lisu: Central, Lisu: Eastern, Lu, Maru, Mandarin, Miao: Ch’uan, Miao: He, Miao:Hwa, Mien, Mongolian: Inner, Na-His, Nanai, Nosu, Nung, Rawang, Riang Lang, Shan: Yunnanese, Tai: Dam, Tai: White, Tai: Ya, Taiwanese, Tibetan, Tuvin, Uighur, Wa [Script(s) in Use : Chinese] Hong Kong:- Cantonese, Chinese [Script(s) in Use : Chinese]

India:- Andamanese, Angika, Assamese, Avadhi, Baltistani, Bengali, Bhojpuri, Bhutia, Brijbhasha, Dardi, Dogri, Garo, Gujarati, Haryanvi, Hindi, Kannad, Kashmiri, Khasi, Konkani, Kuki, Ladakhi, Lepcha, Maithini, Malayalam, Marathi, Marwadi, Mizo, Naga, Nepali, Oriya, Pahadi, Punjabi, Rajasthani, Sanskrit, Santhali, Sindhi, Tamil, Telugu, Tripuri, Urdu [Script(s) in Use : Devanagari, Bangla, Gujarati, Gurumukhi, Kannada, Oriya, Malayalam, Telugu, Tamil, Perso-arabic]

Indonesia:- Acehnese, Bahasa Indonesia (or Indonesian), Balinese, Banjarese, Buginese, Dairi Batak, Javanese, Lampung, Madurese, Makassarese, Malay, Minangkabau, Rejang, Sasakm, Sundanese, and Toba Batak. [Script(s) in Use : Balinese and Latin]

Iran:- Alviri, Nal Khaliji, Gulf Arabic, Nal Mesopotamian Gelet Arabic, Persian, Teimurtash, Vidari [Script(s) in Use : Arabic]

Lao:- Lao [Script(s) in Use : Lao] Malayasia:- Malay, Behasa Melayu [Script(s) in Use : Arabic, Rumi]

Maldives:- Sinhala, Maldivian Dhivehi [Script(s) in Use : Dives]

Myanmar:- Achang, Akha, Anal, Anu, Arakanesem, Blang, Burmese, Chak, Chaungtha, Chin Asho, Chin Bawm, Chin Bualkhaw, Chin Chinbon, Chin Daai, Chin Falam, Chin Haka, Chin Khumi, Chin Khumi Awa, Chin Mara, Chin Mro, Chin Mun, Chin Ngawn, Danau, Gangte, Hmong Njua, Hpon; Hrangkhol, Intha, Jingpho, Kado, Karen, Kayah, Khamti, Khmu, Kiorr, Lahu, Lama, Lamkang, Laopang, Lashi, Lisu, ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 15

Lopi, Lui, Mahei, Maru, Meitei, Mizo, , Mon, Naga, Norra, Nung, Palaung, Palu, Pankhu, Parauk, Purum, Pyen, Ralte, Rawang, Riang, Samtao, Sansu, Shan, Tai Loi, Taman, , Tavoyan, Wa, Welaung, Wewaw, Yangbye, Yinchia, Yos, Zaiwa, Zome, Zyphe [Script(s) in Use : Burmese]

Nepal:- Gurung, Hindi, Nepali, Newari, Limbu, Gorkha [Script(s) in Use : Nepal Script/Devanagari ]

Pakistan:- Punjabi, Sindhi, Siraiki, Pakhtu or Pashto, Balochi, Hindko, Brahui, , Urdu [Script(s) in Use : Arabic]

Singapore:- Cantonese, Chinese: Hainan, Javanese, Madurese, Malay, Malayalam, Sindhi, Sinhala, Taiwanese, Tamil, Thai [Script(s) in Use : Chinese]

Srilanka:- Sinhala and Tamil [Script(s) in Use : Grantha]

Thailand:- Akha, Hmong Daw, Hmong Njua, Karen: Bghai, Khmer, Khmer: Northern, Khmu', Khun, Kuy, Lahu, Lawa, Lu, Mal, Malay: Pattani, Mien, Mon, Salong Sgaw Kayin, Shan, Tai: Dam, Thai, Thai: Northern, Urak Lawoi' [Script(s) in Use : Old Khmer]

Vietnam:- Bahnar, Bru, Cantonese, Chinese, Cham: Eastern, Chrau, Jeh, Koho, Mien, Nung, Pacoh, Rade, Stieng, Tho, Vietnamese [Script(s) in Use : Chinese]

Note: • Bold Indicates for Official Language

8. Linguistic Scenario in India India is a democratic country with over 1 Billion population. There are about 1650 dialects spoken by different communities. Linguistic-based division into states ensures use of the official language of that state in governance and education. There are 22 constitutionally approved languages, which are used in different states for citizen interface. There are 10 Indic scripts in vogue. All of these languages are well developed and rich in content. They follow similar script and language grammars. Alphabetic order is similar. Some languages use common script, especially ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 16

Devanagari. Hindi written in Devanagri script is the official language of the union Government. English is also used for government notifications and communications. Twenty two constitutional Indian Languages are mentioned as follows with their scripts within parentheses: Hindi (Devanagari), Konkani (Devanagari), Marathi (Devanagari), Nepali (Devanagari), Sanskrit (Devanagari), Sindhi (Devanagari/Urdu), Kashmiri (Devanagari/Urdu); Assamese (Assamese), Manipuri (Manipuri), Bangla (Bengali), Oriya (Oriya), Gujarati (Gujarati), Punjabi (Gurumukhi), Telugu (Telugu), Kannada (Kannada), Tamil (Tamil), Malayalam (Malayalam), Urdu (Urdu), Bodo (Assamese), Santhali (Bengali), Maithili (Devanagari) and Dogari (Devanagari). India’s average literacy level is 65.4 percent (Census 2001). Less than 5 percent of people can either read or write English. Over 95 percent population is normally deprived of the benefits of English-based Information Technology.

Interestingly, most of the Indian languages owe their origin to Sanskrit, hence they have in common rich cultural heritage and treasure of knowledge. Indic scripts have originated from Brahmi script. For an example, there are typically 19 prominent dialects/variations of Hindi language being used in different regions, e.g., Marwari, Jaipuri, Brijabhasa, Khari Boli, Avadhi, Chhatisgarhi, Bihari, Maithli, Bhojpuri, Magahi, Garhavali, Kumaunni.

Fig. Indic Scripts Origin

Indian scripts may look different in shapes, but they follow similar alphabetic order. They are Brahmi-based. Script grammar is also similar. Alphabet consists of vowels and consonants. They are ordered on the basis of phonetic utterances. What you write what you speak. Pronunciation of a word is the concatenated string of pronunciation at letter- level. Vowels and consonants have distinct shapes. Pure consonant is a virtual consonant without vowel sound. When vowel follows the (Pure) consonant its modified shape may

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 17 attach on top, on side or on bottom around the consonant. This modified vowel-grapheme is called MATRA or vowel modifier. Consonants can combine themselves.

Language wise distribution for Indian languages (in percentage) Hindi 0.036 1.13 0.965 0.003 Bengali 2.1 Mar athi 3.6 0.07 Urdu 3.9 Gujrati 6.3 Oriya Punjabi 40.2 Assamese 7.9 Telugu Tamil 1.6 Kannada 2.8 Malayalam Other Indo-Aryan 3.3 Languages Other 4.9 Austro-Asiatic Tibeto-Burmese 5.2 8.3 Semito-Hamitic 7.5 Languages not identified Source : http://www.ciil.org/languages/map4.html

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 18

Characteristics of Indian Languages may be summarised as below: • What You Speak Is What You Write (WYSIWYW) • Script grammar describes transformation rules • Relatively word-order-free • Common phonetic based alphabet • Common concept terms (from Sanskrit)

Script Grammar : Roman based Scripts are linear. They have as many glyphs as the letters. Indic scripts are Brahmi-based, non-linear and complex. Number of glyphs is much larger than number of letters in the alphabet. Vowels and consonants may occur independently. Vowel takes a new shape on combining with consonants and it is placed around the consonant or a syllable on the right, left, top or bottom 4 consonant may combine with another consonant to form a conjunct preceding or following the consonant or both may change its shape and combine non-linearly. This gives rise to large number of glyphs.

Eó (ka), ÊEòó (ki), EòÒ (kii), EÖò (ku), EÚò(kuu), ®ú # ´É = ´ÉÇ (rva), Eò # ¶É = IÉ (ksha), (rtsvim)

- Consonant Cluster results into consonants fusion or a new conjunct. CC: क् + व ्+ आ = क्वा k + v + aa = kvaa (fusion) क् + ष ्+ अ = क्ष K + s’h + a = ks’ha (conjunct) प ् + ऋ + थ ्+ व ्+ ई = पथ्वीृ p + r’i + th + v + ii = pr’ithvii - Vowel sign (matra) may split on either side of a consonant in some Indic scripts. e.g. Tamil: ( (Tamil Vowel Sign O), * (Tamil Vowel Sign AU) Bengali: - Bengali Vowel Sign O, . (Bengali Vowel Sign AU)

- CCCV is also seen In Devanagari मत्य र् ःवाःथ्य martya swaasthya CV CCCV CCV CCCV - Consonant ‘r’ ( र ् ) also change its shape differently when it precedes or succeeds consonant ‘r’ precedes पूव र् (puu rva) ‘r’ succeeds चब (cha kra) - Nasal consonant combines only with a character in the same group.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 19

पन्त पम्प ÌWÛó panta pampa panggu - Nasal consonants may transform into anuswar (#Æ) also पंत पंग ु panta pangu

Language Policy

There are 22 constitutionally recognized Indian Languages; Hindi is the official language of the union of India. English continues to be the language in Governance and Judiciary. All the parliament proceedings and government reports to be submitted to parliamentarians must be bi-lingual-in Hindi and English. Department of Official Languages (DOL), Government of India is entrusted with the responsibility of promoting use of Hindi in all Government and public sector offices. As per notification of the DOL, computer must be procured with bi-lingual processing software and keyboard. INSCRIPT is the standard Keyboard Layout. Invariably all software venders ensure INSCRIPT Keyboard support. But this notification is not strictly enforced. Policy of persuasion is adopted. Central Hindi Directorate under Ministry of HRD deals with script standardization, UNESCO interface and writers’ promotion issues. Commission for Scientific & Technical Terminology of Ministry of HRD undertakes the task of developing glossaries/terminologies for various subjects in science and technology. IT terminology and Pan-Indian glossary have been developed.

Four unique features of Indian linguistic culture are 1. Antiquity: many puristic traditions e.g. Sanskrit and Pure Tamil. 2. Ubiquity: Values of language and its preservation and tradition are shared throughout the Indic area, and multilingualism is pervasive. Linguistic diversity is product of culture. 3. Orality: Complicated methods of oral transmission of languages. This ability to memorize things seems to be highly valued in the culture, in many ways.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 20

4. Diversity: India's linguistic diversity is not viewed as impediment to modernization; it can be seen as a resource.

States have agreed upon "three-language formula" - three languages would be taught at the secondary - school level: English, the local language and Hindi. In Hindi areas, another Indian language, Sanskrit or European language. Three language formula is in consonant with traditional linguistic diversity.

India is a multiethnic and multicultural country. There have been tribal groups belonging to different races (viz., Indo-Aryan, Dravidian, Austric, Sino-Tibetan, Tibeto-Burman) practicing different belief systems these have very effectively shaped and enriched their languages to be very unique. But due to smallness of their population, lack of political power and poor socio economic conditions their languages have not been in the domains of modern life like education, administration and mass media. These are in their spoken forms in respective communities.

Central Institute of Indian Languages (www.ciil.org) has conducted studies to document these tribal languages for their richness in folklore, folk medicine and indigenous knowledge systems. 108 tribal languages are listed on the site. Bodo and Santali have been recently, in Dec 2004, added to the list scheduled languages by Government of India. There is yet another dimension to language endangerment in India. A few small tribal group are biologically endangered, and their population may further reduce and become extinct very soon. 12 out of 108 tribal languages are considered endangered languages. These are Andamanese, Bondo, Diday, Jarawa, Kota, Kurukh, Onge, Sentinelese, Shompen, Singpho, Sulung & Toda

State languages are promoted by respective state governments. State level school education programs are invariably in the state language that is the mother tongue of majority of people in the state. A state may have more than one state official language, For example, Uttar Pradesh has Hindi as the first official language and Urdu as the second. Sikkim has Nepali as the First official language and Leptcha and Limbu are second and third languages of the state.

All e-governance projects are intended to be implemented with citizen interface in the state official languages.

9. Major Multilingual Technology Initiatives

• Initiative B@bel Public domain information is a global public good. With this in mind, UNESCO's main goal consists in redefining universal access to information in all languages in cyberspace by encouraging (1) the development of tools (translation mechanisms; terminology; protocols; etc.) that will facilitate multilingual communication in cyberspace (2) the promotion of fair allocation of public resources to public ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 21

information providers; and (3) the promotion of access to multilingual public domain information and knowledge.

The programme "Initiative B@bel" proposes to do this by implementing concrete activities at national and international levels, with the objective to develop multilingualism on the information networks and to encourage full partnership between governments, industry and civil society.

Website: http://www.unesco.org/ webworld/babel

• HLT (Human Language Technologies) programme of European Union aims developing competitive technologies to facilitate seamless trade and commerce, access to educational and health care aids across their 11 European languages. The central aim was to promote the use of Telematics Applications through the use of language technologies so as to facilitate communication in and between different European languages. HLT actions have initially addressed three intertwined areas: multilingual communication, natural interactivity and cross-lingual information management. Website : http://www.hltcentral.org/

• TIDES (Translingual Information Detection, Extraction and Summarization). The mission of the TIDES of USA is to develop the technology to enable use of English to locate, access, and utilize network-accessible text documents in other languages, without requiring any knowledge of the target languages. This will require advances in component technologies of information retrieval, translation, document understanding, information extraction, and summarization.

Website: http://www.darpa.mil /ito/ research/tides index.html

• UNL (Universal Networking Languages). UNL is being developed by UN University, Tokyo. This covers 17 world languages. Hindi is also covered. This aims at development of enconverter and deconverter software for each language into/from UNL using a concept based dictionary and knowledge base. This will be used for on-line translation of documents, and for promoting information exchange without language barrier in the cause of peace among nations.

Website: http://www.unl.ias.unu.edu/, E-mail: [email protected]

• Reference Websites on Language Technology programmes in other countries - China: http://www.flce.org/china.htm Canada: http://www.chass.utoronto.ca/~cla-acl/ USA: http://cslu.cse.ogi.edu/, http://www-csli.stanford.edu/ Japan: http://www.sal.tohoku.ac.jp/, http://www.links.nectec.or.th/orchid/ Russia: http://reenie.utexas.edu/reenie/countries/Russia/russia.html Europe: http://cslu.cse.ogi.edu/ http;//www.cordis.lu/ India : http://tdil.mit.gov.in/ ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 22

10. Language Technology Initiatives in Asia 10.1 ICT in Local languages Countries such as China, Japan, Korea, India, Pakistan, Iran, etc have taken initiatives towards evolving Standards for Encoding Schemes, and keyboards layout, developing fonts, basic information processing tools.

Encoding schemes were influenced by the approach ASCII followed and possibly co- existing with English within 8-bit ASCII aiming at bi-lingual processing.

CJK encoded vowels consonants and C-V syllables with Han character repertoire of 70,207 characters. Arabic encoding had over 210 characters.

Indic Scripts were separately, encoded with less than 120 characters each. ISCII (Indian Script Standard code for Information Interchange) was evolved with one-to- one alphabetic correspondence. It is to note that the Indic Scripts are phonetic based and similar in alphabet.

Unicode unified all these different encoding schemes into single 2-byte code.

Unicode covers south Asian Scripts including Devanagari, Bengali, Gurumukhi, Gujarati, Kannada, Malayalam, Oriya, Tamil, Telugu, Sinhala, Tibetan, Limbu and recently Leptcha. Southeast Asian Scripts covered include Thai, Lao, Tai, Le, Myanmar, Khme and Philippine Scripts (Tagalog, Hanunoo, Buhi & Tagbanwa). East Asian Scripts included in Unicode are: Han ideographs, Hiragana, Katakana, Hangul, Bopomofo and Yi.

Complexities in Script Rendering

Sample Vedic Sanskrit text

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 23

Arabic Script is used for Arabic language, and this has been extended for representing other languages such as Persian, Urdu, Sindhi, Kashmere, Kurdish. Arabic Script is cursive - Naskh and Nastaliq, and written from right to left.

Complexities in Naskh & Nastaliq script rendering

Asian Forum for Standardization of IT (AFSIT) was launched in 1987 at CICC (Centre of International Co-operation for Computerization) with support of AIST of Japan and with the partnership of 9 countries : China, Korea, India, Indonesia, Japan, Malaysia, Philippines, Singapore and Thailand.

Under SEARCC (South East Asia Regional Computer Confederation), SRIG-MLC (Special Regional Interest Group on Multilingual Computing) aims at creating a resource sharing mechanism among MLC experts in the region.

LOP (Language Observatory Project) initiated at Nagaoka Univ. of Technology, Japan aims at crawling the web to produce statistical profile of languages/scripts/character codes used. It will be possible to answer questions such as how many different languages/scripts are in use in cyberspace? How many web pages are in a given language? What is the trend of using Unicode replacing the local encoding scheme on web? Lop may become watchdog to surveillance use of different local languages on web.

Latin Script based languages are mostly represented as character encoded text, whereas non-Latin scripts still face problem to represent in encoded form. One has to download fonts to view these scripts.

A study of Language Observatory Project in Japan reveals that first 100 languages have logarithmic linear relation between speakers' population and ranking of the language. Chinese ranks top. Japanese at 10th rank have speakers' population one tenth of Chinese. Turkmen ranks 100th and speakers population is one hundredth of Chinese. Beyond 100th rank corresponding population is much lower leading to endangered languages of very small community.

Distribution of Speakers population by Script grouping is as follows: Latin Cyrillic Arabic Hanzi Indic Others ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 24

In Millions 2,238 451 462 1,085 844 102 (% of Total) (43.18%) (8.70%) (8.92%) (20.94%) (16.29%) (1.97%)

Govt. of India is voting member of Unicode consortium since 2001. Government of Pakistan has also become Unicode consortium.

Iran Telecom Ros Centre (ITRC) has Initiated plan to support and promote Persian in Digital World.

CICC (Center for International Cooperation for Computerization) project aims at multilingual computing with 9 member countries: China, Korea, India, Indonesia, Japan, Malaysia, Philippines, Singapore and Thailand. Asian Forum for Standardization of Information Technology (AFSIT) was set up in 1987 and it was renamed in 2002 as Asian Forum for Information Technology (AFIT) with focus on Internationalization issues.

International Institute for Software Technology of UN University in Macau (UNU/IIST) is one of the Research & Training Centers (RTC) of United Nations University (UNU). It is fundamentally concerned with the software technology needs of developing countries and is the first International institute devoted to this subject.

Asian countries are gradually shifting towards Open Source Technologies, because it will reduce cost, reduce security risks, give freedom to modify and customize according to needs, give freedom to redistribute and replicate, prevent lock-in to a single vendor, promote open standards thus easing interoperability, and prevent potential international harassment for software privacy. HP, IBM, RedHat, Intel are developing Linux-based solutions in Asian languages. IBM is the biggest investor in Open Source Technologies having committed US $ 2. Bn for Linux-based solutions and out sourcing in Asia.

It is observed that lack of free downloadable fonts on web has hampered growth of internet content in that language. In India, insistence on use of national standard, and proprietary fonts was detrimental in promotion of local language computing. In India, C-DAC and Modular Ltd. both in Pune, had over 95% share of Indic fonts. They were not UNICODE compliant. In late 2004, radical decision was made to go for UNICODE encoding scheme, develop and deploy Open Type Fonts in public domain.

Thiru Dayanidhi Maran, Minister for Communication & IT, Govt. of India launched on 15 April 2005 the Tamil TTFonts (100) and OTFonts(120) along with Open Office Suite, Dictionary, Internet browser, e-mail client, TTS & OCR engines for public use free of charge downloadable from www.ildic.in and also available on CD. He has announced to launch fonts and basic information processing tools for each Indian language in phased manner. Next launch is schedules for Hindi in June' 05. This will greatly boost up Internet content in Indian languages. Further major county-wide initiatives of e-governance, e-learning, e-health and e-rural prosperity will be ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 25

implemented with localization support. This is a major initiative Asian countries, trailing behind in ICT, may follow it.

Language Technology initiatives in some of the Asian countries are given below:

Bangladesh Bangladesh is ranked 93 out of 102 countries for Networked Readiness Index. Telephone density is only about 0.5%. Access to technology is low due to non- affordability, lack of local and social issues on web, very few websites in local language, i.e. Bengali, illiteracy, social constraints on women users. Grameen Phone scheme made a major breakthrough in the field of Rural Telecommunication. Bangladesh Compute Samity (BCS) is the national association to lobby the government for greater support. Adoption of Open Source Technologies by the government can greatly enhance opportunities for local software industry to provide high-end solutions.

Bhutan Bhutan faces a set of unique challenges owing to its history of isolation and underdevelopment, harsh and difficult terrain, scattered habitat and land-lockedness. ICT will help in overcoming the challenges and deterrents to development. Bhutan entered into ICT revolution only in 1999. Ministry of Info. & Communications, established in 2003, is responsible for planning ICT infrastructure, implementing national networking, and promoting R&D on the emerging technologies.

Three significant initiatives have been taken : (i) Drafting of Bhutan ICT policy and strategy (BIPS), (ii) Drafting of Bhutan ICT and Media Act., and (iii) Instituting ICT units in all Government Ministries

BIPS identifies 5 strategic areas; Infrastructure, Human Capacity, Policy, Enterprise, Content & Applications. Under the Asia-pacific Broadband initiative, fibre optic cables have been laid from the border with India and the capital city. East-West fibre along the lateral highway will cover about 600 Km. All power grids will be connected with fibre. By 2007, all 20 districts will be covered. Deterrent factors remain lack of qualified ICT professionals, weak private sector and lack of resources.

China China is emerging as the world economy. There are a number of significant ICT programs supporting Chinese language. Chinese computing made initial strides under the CICC project with Japanese funding. Chinese language websites of ministries, foreign missions and major industries are becoming common. Windows 95/98/ME,200/NT, CE/2003, XP support Chinese language. Handheld computers and PDAs are equipped with Chinese language. Zhong Heng Chinese Computing Technology Research Institute has been set up. Java application tools facilitate to read Chinese text file, create Gif file to display Chinese text. Chinese FontShop creates Chinese characters. ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 26

Chinese (Big-5) Browsing Environment for DOS, Windows, Mac & Unix; Multi- Localization Enhancement of NCSA Mosaic for X2.4; Netscape 2.0/3.0/4.0/ under Unix, support Chinese Language. Freeware created by SuoWei is at Download.com.cn . China Information Security certificate centre gives hacker's news, tools and experiences. Chinasite.com is the most popular website directory and index for Chinese related resources on the internet.

Chinese government is a prime example of a state that has provided considerable impetus to the local software industry by making a significant shift towards adoption of Open Source Technologies - Linux as de fact operating platform. Motorola and TurboLinux are developing a Chinese language version of Linux OS. Intel is developing Linux Standard Base. An IA-64 Linux project is under way between HP, IBM, RedHat to port the Linux Kernel to IA-64 enabling Chinese. Sun Microsystems have put out OSS Star Office as alternative to Microsoft Office. IBM is the biggest investor in Open Source Technologies.

OCR, Text-to-Speech and Speech-to-Text engines have been developed for Chinese. IBM is developing speech-to-Speech Translation System from Chinese to English and vice-versa. China has large population of blind people in the world, Chinese computer for the blind is being improved continuously.

India IT Act 2000 provides legal framework for acceptance of electronic records and digital signatures. It covers e-Governance and e-Commerce transactions.

During FY 2004-05, IT Software & Services export is US$ 17.2 Bn with 34% growth rate; and ITES-BPO Exports grew to US$ 3.6 Bn with 44% growth rate. IT industry's contribution accounts for 4.1% of national GDP. e-Governance is being promoted at massive scale to provide citizen-centric services in local language. Three elements of national e-Governance plan are Data Centers, State Wide Area Networks (SWANs) and Common Services Centers (CSCs). SWANs will be connected to the NICNET through Gateways to enable inter-state connectivity. e- Readiness Assessment study is carried out annually for states and union territories. The e-Governance plan seeks to implement 25 Mission mode projects so as to create a citizen -centric and business - Centric environment for e-Governance. Specific success stories in some states may be listed as below:

• Land Records • Transport Department • Property Registration • Municipalities • Treasuries • Agriculture • Gram Panchayats

10 Such mission mode citizen - centric projects will be implementing in close collaboration with state governments.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 27

8 projects on Income Tax, Passport Visa, Insurance, DCA, National Citizen Database, Central Excise, Pensions, Banking will be implemented directly by Central Government. 7 projects relating to Integrated Services -EDI, E-BIZ, Common Service Centers, India Portal, EG Gateway, E-procurement, E-Courts will be implemented by the ministries in the Central Government. National Informatics Center (NIC) provides Network backbone and e-Governance support to central Government, State Governments, JT Administration, Districts and other Government bodies.

A new .IN internet domain name policy framework aims at adopting a liberal and market friendly approach to register large number of .IN domain names. About 75,000 registrations were made only in the very first week of its opening to public.

It would be possible to have domain names in Indian languages also. There are 22 Constitutionally recognized Indian Languages. Ten scripts are in vogue.

Community Information Centres (CICs) have been set up in hilly, far-flung and rural areas to bring benefits of ICT for socio-economic development by providing broad band connectivity. 467 CICs in NE sates, 135 in J&K 41 Vidya Vahini CICs in schools in Andman & Nicobar islands, and 30 in Lakshdweep. 328 CICs will be set up in Uttaranchal in entrepreneurship mode. CICs are interfaced with SWANs and NICNET.

"Broadband Economy : Vision 2010" recommends the target of 20 Mn subscribers by the year 2010. Internet Subscriber base is 4.2 Mn by Dec 2004.

Malaysia Jawi, as the traditional Malay, was written in Arabic script. Now Rumi is used. Islamic banking and financial systems promote use of Jawi in on-line business transactions through computers. MosqueNet connects mosques in the state. This supports Jawi. JAWINET connects selected schools with computers supporting Jawi. This enables e-mails in Jawi. Multilingual browser, Tango, is able to display multilingual text including Jawi. A bigger project called Digital Jawi, has been recently launched.

Myanmar: Myanmar language using Unicode is supported on Windows and Linux. Myanmar OCR works just fine for numbers, but not for text. Myanmar voice-recognition is too slow. Myanmar spell-checker (beta version) works fine on old version of Windows. Binary sort using Unicode Myanmar text has been successfully developed by Solveware. GeoComp Myanmar also developed a sorting engine using their font.

Nepal Nepal is populated on mountains (7.5%), hills (44.5%), Terai (48%). Telephone density is 1.6% (0.4% in rural areas). Min. of Information & Communication (MoIC) ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 28

is responible for policy for telecommunication sector, and Min. of S&T for policy for IT sector and implementing plans to enhance ICT. Telecom policy came in force in 1999 and IT policy in 2000. In 2004, revised IT policy and E-Transaction Act were announced. Focus would be on IT as priority sector, one window policy, private sector participation, rural internet facilities, e-Education, e-health, Technology transfer, Cyber law.

National ICT policy encourages private sector for socio-economic rural (88%) development. There are 22 ISPs. East-West Nepal optical fibre Indo-Nepal project will cover 841 Km. Adequate number of trained personnel, Cyber law, long term vision, broadband connectivity, sharing resource & knowledge and regional recognition & establishment of certification system are necessary for improving ICT penetration.

Computerisation of Madan Puraskar Pustakalaya (MPP) is major initiative. MPP is the principal archive of books and periodicals in Nepali languages.

Pakistan Center for Research in Urdu Language Processing (CRULP), at National University of Computer & Emerging Sciences (NUCES) was established specifically to conduct research and development for computing in Urdu & Regional languages of Pakistan in Speech Processing, Computational linguistics and Script Processing. Urdu Localization project sponsored by ministry of Information Technology, is Part of E- Government Initiative. The Urdu Localization Project has three components each of which involves development of a technology and an application that uses this technology. These components are Lexicon, Machine Translation, and Urdu Text-to- Speech Synthesizer.

Thailand National IT policy Framework : IT 2000 policy bases on 3 pillars : (i) National Information Infrastructure, (ii) Human Resource and (iii) Good Governance to achieve goals of sustainable Economic power in S/E Asia, Social Equity & Prosperity, and Environment Friendly Society. Pilot projects to materialize the goal of each pillar of IT 2000 include. (i) SchoolNet : a national school in information action program. 5.6 Mbps with Internet and 2Mbps between schools.; (ii) E-Government Initiative : development of Government information Network.; and (iii) IT Law Development : legal infrastructure to support widespread use of IT in the country.

Teledensity : 12.87% ; Mobile density : 26.04; 22 ISPs; Internet users 5.64%; PC penetration 2.43%; over 7000 domains under .th TLD. Broadband connectivity is very low. computing is being promoted for ICT penetration. Pilot Telecenters project aims at enhancing market opportunity and learning opportunity to bridge digital divide.

Min. of ICT launched low cost PC (US $ 250 per a desktop PC, and US$ 500 for laptop). Thai PC captured 75% market Share of total PC market. ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 29

IT 2010, new policy framework, had set the key development objectives to exploit the benefits of ICT to move Thailand to " Knowledge-based society and Economy". Focus is not on " Technology " per se, but rather on good use of ICT for Socio- economy development. The cross-cutting principles of IT 2010 are (i) Building Human Capital, (ii) Promoting innovation, and (iii) Investing in Info-Infrastructure and promoting information industry. Three specific development goals are:

To achieve these goals, 5 flagship. programs are identified ; e-society, E-education, e- Government, e-Commerce, e-Industry.

MNCs are progressively providing local language support.

Windows XP supports 17 major European languages, 8 major Slavic languages, CJK, Vietnamese, Perso-Arabic, Malay, Indonesian, Indian Languages: Gujarati, Hindi, Kannada, Marathi, Punjabi, Tamil, Telugu.

Linux have begun supporting more and more Asian Languages.

Google supports 104 languages including Indian Languages; these are Hindi, Bengali, Marathi, Kannada, Malayalam, Tamil, Telugu, Sindhi, Oriya, Punjabi, Urdu.

Internet browsers – Internet Explorer, Netscape, Mozilla, Firefox - have begun supporting Asian languages in phased manner.

Multilingual Technology products : A Multilingual technology survey divided products into three major groups : Multilingual Workflow Systems, Translation Tools And Engineering Tools. These are visualized as parts of a modern transportation bridge:

• The foundation is made up of engineering tools and computer resources - font libraries, character sets, internationalization and localization routines that help form the basis of translation and localization processes. • The horizontal girders are comprised of translation tools such as dictionaries, translation memory databases, machine translation products or desktop suites of these tools. All of these help to form the road that carries localized software, e-mail, data, documents and other translated materials. • The bridge's suspension structure represents the emerging array of multilingual workflow systems, including content management, project management and workflow monitoring. Together these products are changing the nature of communications by creating a high-speed bridge across the multilingual gulf.

Multilingual Workflow Systems are at the websites :

www.alchemysoftware.ie www.sdlintl.com www.trados.com ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 30

www.globalsight.com www.polardesign.com www.langtech.co.uk www.multilizer.com www.star-transit.com www.obtree.com www.telelinguasoftware.com

Translation tools and suites are at the websites: • www.abbyyusa.com • www.ilsp.gr • www.translate.com • www.apptek.com • www.isc.com.au • www.synthema.it • www.cjkware.com • www.itrblackjack.com • www.systransoft.com • www.multilingo.com • www.jdedwards.com • www.terminotix.com • www.bridgeterm.com • www.lexicool.com • www.tm-systems.com • www.cypresoft.com • www.logomedia.net • www.tranexp.com • www.ectaco.com • www.metatexis.com • www.translationwave.com • www.ectaco.com • www.multicorpora.com • www.twinbridge.com • www.ewgate.com • www.phatware.com • www.ultralingua.com • www.irisusa.com • www.practiline.com • www.tdil.mit.gov.in • www.ibm.com/websphere • www.e-promt.com • www.ildc.in

Engineering Tools and Computer Resources are available at the websites :

• www.agfamonotype.com • www.convera.com • www.margi.com • www.visloc.com • www.em2-solutions.com • www.polderland.nl • www.atia.com/languagestudio • www.everlastingsystems.com • www.schaudin.com • www.babylon.com • www.fontlab.com • www.whippleware.com • www.bantam.co.nz • www.helicon.co.at • www.wizart.com • www.basistech.com • www.idcglobal.com • www.zicorp.com • www.champollion.net • www.langoo.com • www.connexor.com • www.lingoport.com

Most of these are on Windows. Some support multiple platforms : Windows, Macintosh, Unix/Linux, Solaris, IBM AIX.

10.2 Digital Libraries - digitizing multilingual contents

Six major projects were launched during 1994-1998 under DLI (Digital Library Initiative) funded by the NSF, DARPA and NASA in the USA.

European research in Digital Libraries is funded by the European Union as well as national sources.

Since 1995, D-Lib research has become a national grand challenge in several countries in Asia. Most projects can be classified into the following categories:

• Nationwide D-Lib initiative and special purpose digital libraries-for example, the library 2000 Project in Singapore (to link all library resources) and

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 31

Financial Digital Library at the University of Hong Kong (to serve the needs of HK stock market and users) • Digital museum and historical document digitalization-fox example, Digital Museum Project of the National Taiwan University and Digitalization of art collection of the Palace Museum in Taipai by IBM. • Local language and multilingual information retrieval-for example, the Net Compass Project of Tsinghua University in China, Chinese Information Retrieval at the Academia Sinica, Taiwan, and New Zealand's multilingual project.

Local language processing and historical cultural content could be the most immediate Asian contribution to the international DL community. An Asia Digital Library consortium is fostering long-term collaboration and projects in DL-related topics in Asia (www.cyberlib.net/adl).

The New Zealand D-Lib (http://www.nzdl.org) currently offers about 20 collections, varying in size from a few documents upto 10 million documents and several gigabytes of text. The documents written in many different languages, including English, French, German, Arabic, Maori, Portuguese and Swahili.

DLI (Digital Library of India) Initiative was launched in September 2003 by President of India. DLI portal (http://www.dli.ernet.in) is operational. By March 2005, 1,00,000 books (~3.2 million pages) were scanned and cropped in various languages, viz English, Telugu, Tamil, Sanskrit, Kannada, Hindi. There are 4 regional mega centers and 20 scanning centers. The mega centers are responsible for content development of around 14 million pages resulting into a total of 56 million pages and scanning centers would contribute about 15 million pages. Hence 250,000 books are targeted. The mega centers will develop requisite access technologies such as Cross-Lingual Information Access, Multilingual Crawler, OCR with workflow, Multimedia Interface for people with disabilities. Automatic Search Indexing tools, Multilingual and multi-modal authoring tools, Text summarisation with focus on nine languages to begin with, Hindi, Marathi, Punjabi, Bengali, Assamese, Sanskrit, Telugu, Kannada and Malayalam. DLI is being implemented in close collaboration with UDL (Universal Digital Library) project (http://www.ulib.org) at Carnegie Mellon University.

Research challenges include Input (scanning, digitizing, OCR), Metadata creation, Data representation, Navigation and search, Multilingual issues, Output (voice, pictures, virtual reality).

11. Technology Development for Indian Languages

National excellence in the millennium shall be determined by the extent to which the Information Technology can deliver its potential in Local Languages. In a country like India, communication overcoming language barrier is crucial to the growth of society and in preventing the Digital Divide. ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 32

India was aware of the technological changes and the local constraints. Development of Language Technology in India may be categorized in three phases:

• 1975-1990 : A-Technology Phase : Focus was on Adaptation Technologies; abstraction of requisite technological designs and competence building in R&D institutions. • 1990-2005 : B-Technology Phase : Focus was on developing Basic Technologies generic information processing tools, interface technologies and cross-compatibility conversion utilities. TDIL(Technology Development for Indian Languages) programme was initiated. • 2005-2020 : C-Technology Phase : Focus is on developing Cognitive Technologies in the context of convergence of computing, communication and content technologies. Collaborative technology development with public - private partnership is being encouraged to realise.

Government spending during FY 1991- FY 2000 was about US$ 1.5 Million, and during FY 2001 - FY 2004 it was about US$ 5 Million covering all Indian Language.

Magnitude of funding per language is minuscule compared to language technology projects in USA and Europe. For example, funding for US projects on language technology during FY 1991-FY2000 was about US $ 330 Million and during FY2001- 2004, it was about US$ 207 Million. The projects were largely English- centric.

Multi-lingual and non-English countries like Europe, Canada, China, Russia, Korea, Japan have felt urgent need for developing suitable language technology solutions and extending cooperation for collaborative development.

In FY 2000-2001, mission-mode project on Technology development for Indian languages (TDIL) was launched with Vision statement : “Digital unite and knowledge for all” and to achieve the same through the Mission of enabling Communication without language barrier & moving up the knowledge chain.

Mission Objectives • To develop information processing tools to facilitate human machine interaction in Indian languages and to create and access multilingual knowledge resources/content. To promote the use of information processing tools for language studies and research. • To consolidate technologies thus developed for Indian languages and integrate these to develop innovative user products and services.

Major Initiatives • Knowledge Resources ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 33

(Parallel Corpora, Multilingual Libraries/Dictionaries, lexical resources) • Knowledge Tools (Portals, Language Processing Tools, Translation Memory Tools) • Translation Support Systems (Machine Translation, Multilingual Information Access, Cross Language Information Retrieval) • Human Machine Interface System (Optical Character Recognition Systems, Voice Recognition Systems, Text- to-Speech System) • Localization (Local language enabling of OS, middleware, and translocalizing content ) • Language Technology Human Resource Development (Manpower Development in Knowledge Engineering, Computational Linguistics, Localization Technology). • Standardization (ISCII, Unicode, XML Interchange File Format, Corpus Annotation, Indian Script Font Code, (INSFOC), Open Lexicon Interchange, Term Base eXchange (TBX), Searchable Romanisation Scheme, etc.

Long Term Goals • Speech to Speech Translation. • Knowledge Management in Multilingual Environment • Intelligent Cognitive Systems

13 Resource Centers for Language Technology Solutions were setup covering all Indian languages in the premier institutions in various states. This led to competition & innovation in language Technology. COILNet centers were setup in the Hindi speaking states to promote use of IT in Hindi in their socio-economic development. Regional Grouping (RG) of RCs facilitated sharing of results for similar languages.

Language Technology Business Meet was organized to showcase prototype technologies and facilitate dialogue between LT developers in academia and the industry for possible transfer of technology or collaborative development aiming at productization. 43 Technology Handshakes were signed during the LTBM, November 2001.

Language Technology Map Resource Centres & CoIL-Net Centres for Indian Language Technology Solutions

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 34

11.1 Implementation Strategy In order to spur collaborative development of technologies for all Indian languages innovatively and consistently, there had been focus on consensus goals, collaboration, peer–competition, open source software, benchmarking, test & evaluation, industry interface, and dissemination.

Project Planning Matrix (PPM) was evolved through consensus. PPM details: • Why : overall goal • What – p : Project Purpose • What - r : Project Results/outputs • How – a : activities to perform • Which : important assumptions • How - e : indicators (test & evaluation) • Where : means of verification (quantity, quality, time, place)

During the second Zopp workshop held in March 2002. Discussion points included:

• Expectations: networking, sharing, goals, collaboration with industry, institutional issues

• Killer Applications : Parallel Corpora (text & speech), OCR, MAT

• Tools : dictionary, Spell checker, Morph Analyser, Fonts & Conversion Utilities

• Future Directions – Generic applications with open architecture, interoperability, flexibility & scalability (all Resource Centres to develop) ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 35

– Cross Lingual Information Retrieval (top 5 world languages e.g. Chinese, Spanish, Russian, Japanese, English, and Indian languages) – Concept based Networking Language (CNL) – Speech_to_Speech Translation

It was agreed that each Resource Centre will develop Portal, basic information processing tools, Indian language to Hindi e-Dictionary, Indian Language to Hindi MAT, OCR; organize training programs and IT localization clinics. Some Resource Centres volunteered to take on specialised areas and share generic technologies with the rest for rapid development. For instance, OCR (ISI & UOH); CNL (IISc. & IIT-B & AU); Portal; Search Engine (UoH, C-DAC/Trivendrum); Linguistic Data Resource (IISc.-B & IIT-B); Open Source Software (IIT-K & AU).

Technology Innovation Audit of the sponsored projects is essential in order to promote standardization and sharing of technologies. Audit steps may include: • Concept, Design and Implementation audit • Alpha Testing with Peer Developers • Beta Testing with a small number of potential users • Certification of IL Software (IS:14639-1998 standard for software evaluation)

Peer review of the projects and enforcing Beta testing of products or services yield satisfactory results; Culture of collaborative technology development is also strengthened.

Technology management focuses on Consolidation, and Integrating Innovations into Products/ Services

Public Domain/General Public License (GPL) approach is encouraged for faster development and rapid spread. IT localization clinics promote wider dissemination and organize internship training.

Bilateral/International cooperation in Language Technology and Applications will be encouraged.

Academic institutions are good in research and technology development. They are often averse to productising the technology. They prefer publishing papers. They are often reluctant to share their codes. Mechanism for IPRising their ideas and products is also not very conducive. University rules vary.

Researchers in academia want to go for that last 2% of performance, but they need to be reminded that it is better to get sufficient solution out fast and then continue to enhance it. Test and evaluation of their technologies is difficult due to lack of documentation and non-adherence to industry practices.

11.2 Consortium on Innovation & Language Technology(COILTech) ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 36

The MAIT Consortium on Innovation & Language Technology (COILTech) since its inception in September 2001, has been actively co-ordinating various activities with the Industry and the TDIL (Technology Development in Indian Languages) Program. The consortium today has active participation from both Indian and MNC companies. Broad Objectives are : • To promote industry participation in collaborative R&D in language technology and OSS development. • To evolve consensus on industry standards, benchmarks, and certification of Language Technology products. • To collectively interface with government and academia. • To conduct market surveys, organize technology shows, promote technology transfers and expand market collectively.

11.3 Whither in Seven Initiatives ?

Standardisation • INSCRIPT for Indian Script keyboard layout. (1988 - 1991) • Standardization of 8 bit ISCII (Indian Script Standard Code for Information Interchange) was developed in 1988 and later on The revised in 1991. ISCII- 1988 is subset of the Unicode. Dept. of IT (Govt. of India) is a voting member of the Unicode consortium. UNICODE 4.0 incorporates most of the suggestions. Draft Standard of Vedic Vagvarna proposed to UNICODE. UNICODE workshop organised in oct-2004 in Delhi. • INSFOC (Indian Standard for Font Code) and Indian Script to Roman Transliteration (INSROT) are ready. • Dept of Info. Tech. and 4 regional units of CDAC recently have become affiliate members of W3C (World wide web Consortium).

Knowledge Resources • The tagged corpora of texts in machine-readable form have been developed. This is useful as a basic research facility for linguists and computer scientists along with tools for word level tagging, Word Count, Letter Count, Frequency Count, Spell checkers in various Indian Languages. [at CIIL, Mysore] (3 Million words -> 10 Million words) • Hindi Vishwakosh, UN selected countries dictionary, Bharat Bhasha Kosh, Pan-Indian Dictionary, SAARC dictionary, English to Hindi dictionary, Sanskrit to Hindi dictionary, and Bilingual (English, Hindi) • Parallel corpora (1 Million pages) comprising of books and their translations in 11 Indian languages. • Hindi & Marathi Wordnets and concept-to-Hindi lexicon ontologies have been developed. Wordnets for regional languages such as Gujarati, Marathi, Oriya Tamil and Sanskrit are also proposed. This linguistic resource is very essential for building up Machine Translation systems, and speech applications.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 37

• English-to-Regional language dictionaries are being web-enabled. Regional language-to-Hindi e-dictionary are under development. Thesauri are also being made available. • Tutorials for learning Indian languages, and Sanskrit through some Indian languages are available CD-ROM from Resource Centres.

Knowledge Tools • Java based Solutions for displaying Web Documents through Negotiation and Dynamic Rendering. • Search Engine for indexing and searching of Indian Script HTML documents on Linux platform. • Bulletin Board System in major Indian languages. • Shabdhabodha To analyze the semantic and syntactic structure of Sanskrit sentences. Sanskrit word processor to handle special Sanskrit constructs is under development. Sanskrit Authoring System including a Sanskrit word processor. Desika Software package for Understanding plain and accented written Sanskrit texts. • Leap Office 2000 on windows. iLEAP is internet ready IL word processor. ISM (Isfoc Script Manager) is font based interface to windows. GIST SDK for Development of applications on Windows. iPlugin enables development of IL applications such as text processing, e-mail messaging, chat, calendar, Scheduler of events, etc. N-Trans transliterates nouns from and into Indian languages. LIPS decoder sub-titles TV programmes in Indian languages simultaneously. MOVE enables on-line video editing in TV broadcast. Teleprompter is a newsroom IL reading facility in TV broadcast studios. DVD Authoring tool enables authoring of movies on DVD. UTRANS enables reading popular Hindi text for Urdu speaking community and PUTRANS for Punjabi text into Urdu. • Font-based multilingual packages, word processor, & transcription facility, Font based Indian script enabling DTP packages, Database packages, Data entry packages, e-mailing system, IL Packages Indian language support is also becoming available on operating systems-Windows 2000 and Linux • Basic information processing tools such as Text Editor, Morphological Analyser, Spell Checker, and Plugins for MS Word, Access etc. are available for major Indian languages • Office Suites that include Word Processor, Database Management, Spreadsheet and Power Point have been developed for Tamil, Hindi Telugu, Malayalam Mail Server, Chat Server, Simple Translators and Search Engines are available for all the major Indian languages.

Translation Support Systems • Mantra is a Machine-aided Translation System (English to Hindi) for Government notifications. [at C-DAC] • Anusaraka Provides rapid translation as language accessor from other Indian Languages to Hindi. [UoH-IITK]

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 38

• Matra, a machine- aided translation system (English to Hindi) with a Prototype Vaakya system for web based translation service for English news stories to Hindi has been developed. [at NCST, Mumbai] • Angalabharati (E -> H) http://anglahindi.iitk.ac.in, online Machine-aided Translation System (English to Hindi)for public health domain is being developed for offcialese, health and agriculture domains. Anubharati (H -> E) is a machine - aided translation system, a nascent prototype from Hindi to English. [at IIT, Kanpur]. On-line Machine Aided Translation system integrated with TTS is available on http://anglahindi.iitk.ac.in. It has total 5 X (30,000 root words). It follows hybrid approach of rule-based & example based approaches [IIT-K]. • Shakti (http://shakti.iiit.net) at IIIT Hyderabad : Machine Aided Translation system English to Hindi, English to Telugu & Marathi

Human Machine Interface Systems • Continuous Speech Recognition system for Hindi is being developed at IBM. This was tested upon training with pre-recorded speech. • An alpha version of "Hindi Vani" software which is PC based Unlimited Vocabulary Text-to-Speech Conversion Software for Hindi. The quality of speech is being improved upon in terms of pitch, tone, intonation with on-line screen reading capabilities. [CEERI, New Delhi]. Speech Technologies group at IIT Madras is also developing technologies for Indian languages. Text-to- Speech for Hindi "Vaachak", produces acceptable natural speech in Hindi [at Lucknow]. Text-to-Speech for Assamese, Tamil, Malayalam, Bangla, Telugu and Kannada are being developed. Speech corpora are being envisaged for developing automatic speech recognition systems. • Optical Character Recognition software for Hindi, Marathi, Bangla, Oriya, Malayalam, Punjabi, Telugu, and Tamil have been developed with accuracy above 97%. OCRs for Assamese Kannada, Gujarati range in 90%-95%. These are being enhanced. • Line and dot matrix printers were enabled for printing Devanagari [at Lipi Data Systems & Transmetic Systems.[TVSE]. Bilingual computer compatible electronic teleprompters were manufactured [at Abacus, CMC, HTL, AEM, Data byte Equipments]. GIST terminal was developed [at C-DAC] that allows use of Indic scripts in UNIX environment.

Human Resource Development in language Technology : An independent study by Frost & Sullivan and MAIT projected that Language Technology market will grow to US $ 64 Million by the year 2005 with a 79 % annual growth in next three years. The expansion of Language Technology Market has created a considerable demand for specialized human resources having expertise of both linguistics & computing. Presently the human resources in the area of Language Technology and Knowledge Engineering are available only in very few academic institutions. These institutions are unable to meet the demand of required trained manpower. A rough estimate suggests that presently only about 350 trained

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 39

manpower is available in the domain of Language Technology ; while the demand is projected to be 1200 in 3 years time assuming the projected growth rate.

Moreover, growth of ITeS and e-governance in India would require software professionals having expertise in Language Technology for the development of innovative software applications and services in all languages in India. It is evident that, in the perspective of Localization generation of Human Resources in the domains of Computational Linguistics and Knowledge Engineering is necessary.

Localisation of Linux : Increasing use of open source software is proposed. Indix2 is localized version of Linux enabling six languages Hindi, Marathi, Sanskrit, Tamil, Telugu & Kannad, Work is progressing to enable other Indian languages as well.

Evaluation & Benchmarking STQC (Standardisation, Testing and Quality Control) under Ministry of Communications & Information Technology have been entrusted with the task of third party evaluation of language technologies/products. This will enforce discipline on the part of technology developers, facilitate dialogue with industry for technology transfer and commercialisation.

The OCR standards for Indian languages are not available. Therefore the specifications for testing were worked upon mutually by the testing organization and the Development organisation. The testing was carried out from users' point of view and not from development point of view and was purely Black box in nature.

The testing was based on the parameters indicated in the product specifications, such as Accuracy, Speed, Noise Reduction, Skew Angle Correction & detection, File Format Support, Configuration Testing and Installation.

The input documents for scanning were selected to validate the product specifications. Six different books (of different sizes, with different fonts and with different paper quality) were selected. Only two tone (black and white) books of offset /Laser print quality and photocopied papers from these books were used as input documents. For testing "Noise Reduction Feature" photocopied document with salt & pepper noise and with blurs and smudges were used.

Similar test specifications and test data were prepared for testing MT systems. Other language Technology products are also being tested. This facility is available for language technology products from private sector as well.

Dissemination Information on Indian language technologies is disseminated through: • TDIL Quarterly : VishwaBharat@tdil : Language Technology Flash (ISSN 0972-6454). It is being brought out in print and on web since Jan 2001. ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 40

• TDIL Web Site : http://tdil.mit.gov.in contains information for various TDIL activities, achievements and provides access to a variety of content and downloadable in Hindi and other Indian languages. Free Downloads include keyboard driver & fonts, Basic Word Processors, Spell Checkers

Hits to TDIL website are increasing : (2.027 Mn in 2003; 3.08 Mn in 2004; 1.47 Mn during Jan-April 2005).

Impact on MNCs & Indian Industry

As a sequel to progressive achievements of TDIL mission, Government and industry initiatives follow.

Web Sites Supporting Indian Languages • Web Dunia: www.epatra.com supports 12 languages • www.indiainfo.com supports 11 Indian Languages • www.satyamonline.com supports 11 Indian Languages • Mithi.com : www.mailjol.com supports 12 languages • Langoo: www.langoo.com supports 12 languages • C-DAC: www.cdac.in with multilingual support • www.inoman.net • www.cbharati.biz • www.rediffmail.com

Media (to Illustrate) • BBC Hindi : www.bbc.co.uk/hindi • Voice of America : www.voa.gov/hindi • Navbharattimes : www.navbharattimes.indiatimes.com • Manorama (Malayalam) : www.manoramaonline.com

MNCs Support products with Indian Language Supports • Search engine Google Supports Hindi, Bengali, Marathi, Tamil & Telugu • Microsoft supports Indic scripts on Windows 2000 and office XP. • Oracle 8i &9i RDBMS Supports Hindi., Tamil, Gujarati Lotus supports Hindi. • Open office of star office by SUN supports Hindi Google & Yahoo support search in Hindi. • Free and open source software is being localized

11.4 Media Lab Asia

The goal is to bring the benefits of innovation and technology to the common man. Towards this end, a network of research labs has been established on the campuses of the IITs at Mumbai, Delhi, Chennai, Kanpur & Kharagpur and at MIT. Many research projects are underway with promising results. ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 41

There are four key focus areas for research: I. Bits for All: Establishing leadership in low-cost rural communication. II. World Computer: Building low cost computing devices, accessible and affordable on the common man. III. Tomorrow's Tools: Creating language interfaces and low cost sensors. IV. Digital Village: Where the consolidation of the three research topics are embodied in field research across the country, and in delivering value to the masses.

11.5 Cultural Informatics

Cultural Informatics Laboratory (CIL) http://www.ignca.org or http://www.ignca.nic.in was established in 1994 with UNDP assisted multimedia documentation project titled “Strengthening National Facility for Interactive Multimedia Documentation of Cultural Resources”. Under the guidance of the subject experts, the team got trained in Interactive-multimedia-documentation and in-depth analysis of cultural information. This expertise may be used to demonstrate how cultural heritage can be recreated virtually, in holistic and integrated perception of culture.

Amongst the areas where the project has broken new ground are creation of synergies between the disciplines of arts and information technology leading to usage, development and demonstration of new technology and cultural documentation. New design models, development processes and reusable software tools specially targeted at high quality multimedia content creation have been conceived, evolved and applied in some already completed and many ongoing projects.

Kalasampada: Digital Library – Resource for Indian Cultural Heritage (DL- RICH) project sponsored by Ministry of Communication and Information Technology (MCIT) The project aims to use multimedia computer technology to develop a software package that integrates a variety of cultural information and helps the users to interact and explore the subject available in image, audio, text, graphics, animation and video on a computer in a non-linear mode, by a click of mouse.

Kalasampada, a unique project of its kind, will facilitate the students, scholars, artists and Research & Scientific Community to access and view the materials including over couple of lakhs of manuscripts, over a lakhs of slides, thousands of rare books, photographs, audio and video along with highly researched publication of IGNCA, from a single window.

The system aims at being a digital repository of content and information with a user- friendly interface. The knowledge base such created will help the scholars to explore and visualize the information stored in multiple layers. This will provide a new dimension in the study of the Indian art and Culture, in an integrated way, while giving due importance to each medium. ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 42

Kalasampada: (DL-RICH) received GOLDEN ICON: Award for Exemplary Implementation for e-Governance Initiatives under the category : "Best Documented Knowledge and Case Study" from the Department of Administrative Reforms and Public Grievances, Government of India.

CoIL-Net-Content Development in Indian Language Network sponsored by MCIT aims at enhancing the access to culture resources using digital technology.

The digital library shall offer carefully selected, thoughtfully compiled and contextually integrated multimedia content on cultural heritage, folk literature and life style if Hindi Speaking region specially the state of Uttar Pradesh, Madhya Pradesh, Bihar, Rajasthan, Chattisgarh, Uttarakhand, Jharkhand.

CD-ROMs are available from IGNCA [Available on Devadasi, Murtesvara, Rock Art; in the offing : Caves of Ajanta, Gita Govinda, Two Pilgrims Devnarayana and in pipeline : Agnicayana, Brahdeshwara, Man & Mask, Visvarupa.]

Form BHU (http://www.abhyuday.org/) CD-ROMs are available in Hindi on Pracheen Bharat mein bhautiki, Pragya BHU magazine edition in Hindi, Kashi ki vibhutiyan, Paryavaran chetana, Upchar, Nitishtakam, Mandi Bhav, Madan Mohan Malviya Jivan Parichay, Nitishtakam Sholokas in Sanskrit and their meaning in Hindi.

From Bansthali Vidyapith, Rajasthan - CD-ROMs are available on IT in Hindi English/Hindi Dictionary of IT Terms & Courseware. Similarly e-content is available in major Indian languages on CD ROMs and web.

11.6 E-Readiness in Indian States India’s E-Readiness Assessment Report 2004 classifies 35 states and union territories in India as Leaders (5), Aspiring Leaders (6), Expectants (3), Average Achievers (6), Below Average Achievers (6), Least Achievers (9).

Nobel Laureate Prof. Amartya Sen’s “Capacity Enhancement” model is worth consideration during formulation and evaluation of various projects. In this approach, Indirect importance lies in Enhanced productivity, increase economic growth and demography influence. Direct importance lies in human freedom, well being and quality of life. It treats human being as ends/goals and not just a means to higher income.

Human-centric development necessitates ICT, that is pervasive, cross-cutting, disinter mediation and enabling technology, must promote people’s participation and facilitate seamless communication without language barrier.

A project should also meet the criteria of sustainability/scalability/Profitability.

We may cite the following projects which are considered successful and may be replicated supporting local languages. ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 43

(1) E-Chaupal : (http://www.itcportal.com/agri_exports/e-choupal_new.htm) unique blend of ICT and second generation reforms; an initiative by private company ITC. The vision for the project is to make 20,000 choupals in 15 states covering 100,000 villages and servicing 25 million farmers (by 2010). The projected transactions by the year 2010 are valued at US$ 2.5 billion.

The E - Choupals, information centers linked to the Internet, seamlessly connect farmers with global markets. It helped link the largest labor force with the mandis, the international markets as well as the final consumer at much reduced transaction costs. ICT facilitates disintermediation through the creation of an alternative development paradigm that skips the formation of co-operatives and self-help groups and replaces them with the network society. It exemplifies the fact ICT could be and is an enabler in development goals. The e-Choupal projects thus brings out the concept of profitable rural.

(2) E-Seva : (http://www.esevaonline.com/ ) Public Delivery System, citizen service. The Goal of e-Seva is to establish a SMART (Simple Moral, Accountable, Responsive and Transparent) government. Therefore the e-Seva centers are located within reasonable proximity of all citizens and act as a one-stop-shop which provides to the citizens services and information of departments and agencies of State and Central Governments and local bodies in an efficient, reliable, transparent and integrated manner with a view to ultimately eliminate face-to-face interaction between the government and the citizen which has many drawbacks.

eSeva hopes to leverage on the Internet technology to eliminate barriers to enterprise information management and provide citizens with richer self-service over the Web, 24 hours a day, 365 days a year in Andhra Pradesh .

(3) Akshaya :(http://www.akshaya.net/ )Access, skill set and relevant contents in local language. One of the primary differences between Akshaya and other projects is its scale of operations. IT covers the 33 million population in Kerela, and aims at a making 6.5 million e-literate by the year 2005.

The Akshaya project leverages the comparative advantages of the state of Kerala – its high rate of literacy and progressive social framework along with an already existing advanced telecom infrastructure. it is hoped to create a network society of computer literates in order to leverage the social power of the state in a more meaningful way.

(4) Bhoomi : (http://www.revdept-01.kar.nic.in) Effective Governance for the marginalized sections. Bhoomi is an e-governance project for the computerized delivery of 20 million rural land records to 6.7 million farmers through 177 Government owned kiosks in the Indian state of Karnataka which has eliminated red tape and corruption in the issue if land title records, and is fast becoming the ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 44

backbone for credible IT-enabled Government services fir the rural population, thus, bringing relief for the marginalized sections.

(5) GyanDoot : (http://www.gyandoot.nic.in/) Community participation and delivery of services. It was developed as a Community Network with backup LAN. Main areas which are covered under the scheme are E-governance, E-commerce, E- education & E-health services. Community participation is encouraged in planning, funding, implementation, and management of the network in which 21 communities owned and 19 privately owned kiosks are run in rural areas using OFC and WiLL technology. It is empowering inhabitants in around 23 districts in India through ICT by virtue of adoption of the network by the down trodden segments of the society. It won Stockholm Challenge Award in 2000. It was also adjudged as Best Practice by UNDP, WEF, ADB, World Bank, GOI, TRAI, PCI, Time Magazine, etc.

(6) Gyanaudyog : (http://www.itinhindi.org/WebSite/Htmls/) empowering Women for ITeS enterprises Pilot project at Banasthali Vidyapith in Rajasthan envisages promoting entrepreneurial skills among educated unemployed women of rural India for establishing IT enabled service SOHO. Two weeklong workshops enable the participant to design the project with technical, financial, personnel requirements and prospect market analysis, the center provides techno-business supports : Technology Mentoring, Financial support Information and Market Information. This model is being adopted by the state government of Rajasthan.

(7) AGMARKET : Agricultural Marketing Information system AGMARKET project (http://www.agmarket.nic.in) has networked 735 agricultural produces wholesale markets (APWMs), 75 State Agricultural Marketing Boards/Directorates and DMI regional offices. 2000 markets will be added within next 2 years. This is an effort to bring rural prosperity, rural empowerment and digital inclusion to foster rural enterprise.

12. Beacon to Steps Ahead

Phase –II of TDIL Mission may focus on

Core Technologies :

Language processing tools for Indian languages comprising of Fonts, KB drivers, Text editor, Spell Checker, Tri-lingual dictionaries, Messaging system, Code/ File conversion utilities, Machine translation system and OCR.

Core Technologies will be assured for all Indian languages in phased manner.

Futuristic Technologies • Cross-lingual Information Retrieval ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 45

• Speech to Speech Translation • Knowledge Management in Multilingual Environment

Thrust Application Areas : • E-governance • E-learning • E-rural prosperity

High - impact application may include : • Voice enabled multilingual Rural Kiosks • Voice-enabled Multilingual Telephone-based Information Access System for Railways & Tourism • WebBharati for Trans-lingual Web Access. • Multi-modal Digital Library Creations & Retrieval System.

13. Recommendations

The UNESCO may take following steps to fill the knowledge gaps for un-reached nations.

1. Language Policy & Monitoring Linguistic Diversity : People' participation is essential in evolving language policy ensuring holistic development of individuals and communities. Lead people from various disciplines - art, literature, health, industry, education policy - may discuss and evolve guidelines. Requisite technologies should be ensured amplify productivity and creativity. Language Observatory Project (LOP) may carry out statistical analysis of multilingual content flow on web. This data will be useful in monitoring Linguistic diversity and the growth of language.

2. Standardization: UNESCO may facilitate block membership for developing & less developed nations in prominent standardization bodies such as Unicode Consortium and World Wide Web Consortium (W3C). They should have discussion meetings mostly in these countries also. Standardisation of encoding scheme, font coding, glyph representation, e-content metadata, XLIFF (XML Localisation Interchange file format), Term Base eXchange (TBX), Open Lexicon Interchange.

Accessible Web Design contributes to better design for other users. • Web sites and applications be designed such that people with disabilities and constrained vocabulary can perceive, understand, navigate, and interact with; • Web browsers, authoring tools, and media players be designed such that support production of accessible web content and Web sites, and can be used effectively by people with cultural diversity. • Multi-modality increases usability of web sites in different situations: ______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 46

– Low bandwidth (images are slow to download); – Noisy environments (difficult to hear the audio); – Screen-glare (difficult to see the screen); – Driving (eyes and hands are "busy"). • Redundant text/audio/video can support : – Different learning styles; low literacy levels; second-language access. • Captioning of audio files supports : – Better machine indexing of content; faster searching of content.

3. Affordable ICT for All : UNESCO may prepare suggestive guidelines for technology pricing with due consideration of ppp. ICT be affordable, easy-to-use in local language, and income enabling. Basic Information Processing kit (BIPK) for all. BIPK-1 : Unicode-Compliant Open Type Fonts (10), Text Editor, Morph Analyser, Internet Browser, Dictionary(30000w), Spell Checker, Code Conversion Utilities, Open office suite of word processor, Spreadsheet, presentation & drawing tools; E-mailing system; and

BIPK-2 : OCR, TTS, Text Summarizer, Simple MT systems, Grammar Checker, and Hi-Impact Application Software.

These BIPKits may be made available for each language with the cooperation of countries having competence in multi-lingual computing.

UNESCO Free Software Directory may include multilingual software and linguistic resource from other countries also.

4. Transcreation Initiative to Raise Capacity : Transcreation means content creation based on reference content in another language with suitable adaptation to suite to local linguistic & cultural conditions. It is to mention that over 25 Million pages are added every year in modern S&T. This is creating ever-sprawling knowledge gap, that hampers progress. Collaborative initiative for Transcreation of modern (relevant) knowledge in local Languages need to be taken up to bridge the knowledge gap and foster scientific temper.

5. Localization Initiative: UNESCO may promote countries to become localizer, and make the content available accessible to all. Localization of rapidly growing knowledge in local languages may be a mission-mode program to step in the emerging knowledge based society. There is growing market for localization. Capacity enhancement in this field will generate wealth and motivate youths for entrepreneurial innovations. UNESCO may promote setting up Localization Resource Centers covering clusters of languages of deprived nations. The center will ensure building up Basic Information Processing kits, organize training programs in localization and

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 47

content management technologies. The center may also offer PG Diploma and encourage research in the field of Localization Technology. The center may also become affiliate member of standardization bodies – Unicode & W3C – and thus encourage participation of less developed nations. Linguistic Data Resource for less known or endangered languages may be developed in standard formats. Websites of national government and those of MNCs and foreign missions should be made in local languages of the country.

6. Digital Library Initiative : Knowledge for All UNESCO may promote Digital libraries in developing and less developed nations and ensuring free access to major Digital Libraries in USA, Europe, India, China, Japan, New Zealand, etc. Digital library software be in public domain. Mobile/On-demand Digital Library may enable people to access relevant knowledge. Technology Development for Scanning, Cropping, OCR with workflow, Document summarization, Cross-Lingual Information Retrieval Text- to-natural Speech, Multilingual DL Interface for handicapped persons may be initiated.

14. Summing UP Localization is inevitable in bridging digital divide. Localization is technological fusion of language and culture. We may notice developments on the language front as Cross Lingual Information retrieval; on the culture front as Concept based communication across different cultures; on the technology front as convergence of technologies. LCT fusion cultivates into new technology development of Universal Knowledge Access. Innovation in Indian Language Technology began in academia and even today it's being sustained there. Innovation leaders like IBM's Mark Dean observe that researchers always want to go for that last 2% of performance, but they need to be reminded that it is better to get a sufficient solution out fast and then continue to enhance it. In India TDIL mission focuses on Applications. oriented research & technology development, evaluation and benchmarking, integrating language technologies into innovative products & services, open source technologies (architecture, standard, systems), test beds, IT localisation clinics, collaborative development with 'raise the neighbor' approach, and international cooperation. Awareness about language technology is growing in Asian countries.

Challenge ahead is to transit to emerging knowledge-based society realizing the New Order of Knowledge- based Society : Rise, Raise and Race that implies - Promote collectivism rather than individualism - Think globally and act locally - Collaborate for innovation - Strive towards universalisation of creativity

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 48

REFERENCES 1. Development Dialogue, 1999:1-2, Dag Hammarskjold Centre 2. World Culture Diversity, UNESCO, 1995 3. World Culture Report - Culture, Creativity and Markets, UNESCO, 1998 4. Human Development Report 2001: making new technologies work for human development, UNDP, 2001 5. Localisation Focus, September 2002 (LRC, Ireland) 6. VishwaBharat @tdil, : Quarterly Journal of Indian Language Technologies. Min. of Communications & IT, India 7. TDIL Website : http://tdil.mit.gov.in 8. Indian Script Standard Code for Information Interchange (ISCII), IS 13194:1991 9. TDIL Data Center : http://www.ildc.in 10. IGNCA Cultural Informatics Lab : http://www.ignca.org, http://www.ignca.nic.in 11. Central Institute of Indian Languages, Mysore : http://www.ciil.org : 12. India : E-Readiness Assessment Report 2004 for States / Union Territories, Department of Information Technology, & Ministry of Communications & Information Technology, Government of India 13. IEEE spectrum Feb’2004, PP50-51 14. “Digital Access Index”, International Telecommunication union, 19 November2003 http://www.itu.int/ITU-D//ict/dai/index.html 15. Multilingual computing & Technology April /May 2004 http://www.multilingual.com 16. Erik Brynjolfsson, "The Productivity Paradox of Information Technology: Review and Assessment", Communications of the ACM, Dec 1993 17. TANAKA, Hozumi, “What should be do next for MT system development?”, MT Summit VII, 13-17 September 1999, at Kent Ridge Digital Labs, Singapore. 18. Roach, SS, "Service under Siege - the restructuring imperative", Harvard Business Review, Sept - Oct 1991. 19. Claudio Menezes, “ Towards the promotion of Multilingualism in and beyond Cyberspace”, 2nd Language Observatory workshop, tokyo, 22-23 Feb2005. 20. MZA Rozan, Y Mikami, AZA Bakar & Om vikas, “ IT for Multilingual Education : Languages Observatory as a Monitoring Hub”, SEARC, , 2005 (communicated) 21. M Salehie, et.al., “Localisation and Internationalisation in Iran”, Localisation Focus, March 2004 22. Hema Sharma, “Website Analysis : Usage of Indian Languages on the WWW vis-a-vis Other Global Languages”, ViswaBharat@tdil, April 2003 23. Omvikas, “Annals of Indian Language Computing”, International Conf. on Universal Knowledge & Language-2002, Nov 25-29, 2002 at Goa. 24. Omvikas, "Digital Libraries in Knowledge based societies : prospects and issues", Re-engineering Library services, IASLIC Golden Jubilee, Jadavpur University, December 2004.

______Om Vikas, Multilingualism for Cultural Diversity and Universal Access in Cyberspace : an Asian Perspective, UNESCO, 6-7 May 2005 49