A Framework for Intensive R Developing Cross-Br E Arabic Web Applicati Rowser Data Ions
Total Page:16
File Type:pdf, Size:1020Kb
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/261111986 A framework for developing cross-browser data intensive Arabic Web applications Conference Paper · October 2012 DOI: 10.1109/ICCTA.2012.6523547 CITATION READS 1 17 3 authors: Mahmoud Youssef Nourhan Hamdi George Washington University Arab Academy for Science, Technology & Maritime Transport 12 PUBLICATIONS 72 CITATIONS 3 PUBLICATIONS 4 CITATIONS SEE PROFILE SEE PROFILE Salma Rayan University of Strathclyde 2 PUBLICATIONS 2 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Enhancing location privacy View project All content following this page was uploaded by Salma Rayan on 20 June 2020. The user has requested enhancement of the downloaded file. A Framework for Developing Cross-Browser Data Intensive Arabic Web Applications Mahmoud Youssef, Nourhan Hamdi, and Salma Rayan Business Information Systems Department Arab Academy for Science, Technology, and Maritime Transport Alexandria, Egypt [email protected], [email protected], and [email protected] Abstract— the frequent encounter of incorrectly functioning The impact of properly functioning Arabic data-intensive Arabic Websites, especially those of large businesses and Websites can be seen on different fronts. From a societal governments calls for a clear and applied framework to help perspective, it enables the spread of e-commerce and e- practitioners develop properly functioning Websites. The issue of government applications in the Arab world with their Website internationalization has been addressed in a plethora of associated benefits. Moreover, it enhances the trust in the standards, guideline, good practices, and tutorials. However, these organizations that own these Websites [19]. And from a standards with their formal language may not be directly comprehensible to the practitioner. In addition, guidelines and technical perspective, it provides the ability to integrate tutorials are usually designed to address one issue requiring the correctly within Web 2.0 mashups, the ability to handle future practitioner to integrate knowledge from different sources. application needs such as linked data, and the ability to Moreover, these standards, guidelines and tutorials are limited to integrate with other components within distributed the Web technologies themselves excluding other parts of Web architectures such as the Service-Oriented Architecture (SOA). application architecture. In this paper, we propose a comprehensive step-by-step framework addressed to the The needs to provide information in languages other than Latin practitioner. The framework integrates knowledge from different and to provide more than one language at the same time are standards and technologies and calls attention to issues that could addressed in computer applications and specifically in Web be overlooked in designing and implementing data-intensive applications under the titles globalization, internationalization Arabic Web applications. (i18n), and localization (l10n) which we explain herein forth. Keywords-component; Internationalization; Arabic; Data- Globalization refers to the ability of an e-commerce intensive; Web applications Website to address global community taking into consideration the diversified needs of this community. I. INTRODUCTION While the discussion of Web applications Internationalization refers to the design of a Website so that internationalization and localization might sound as an it can be adapted to different countries taking into antiquated topic, the frequent encounter of incorrectly consideration their various languages, scripts, and cultures functioning Arabic Websites, especially those of large [21]. businesses and governments (See Figure 1), calls for a clear Localization refers to the adaption of a Website to a and applied framework to help practitioners develop properly specific locale such as Arabic/Egypt. This may impact, among functioning Arabic Websites. other factors, numeric date and time format, currency format, collate and sort order, and calendar system. A basic requirement of a Website is to perform properly across different Web browsers. While browsers are expected to render content in a consistent way, experience has shown that the behavior of Websites may be different with different browsers. Current statistics show that Google Chrome, Mozilla Firefox (FF), Microsoft Internet Explorer (IE), and Apple Safari are the dominant browsers in the market [22]. In this research we examine the proposed framework against these browsers. A data-intensive Website is characterized by its dynamicity, which typically involves architecture of several layers that interact together. Providing Arabic textual information Figure 1. Improperly functioning Website of a major bank correctly on the user interface requires proper interchange of data among these layers as well as proper representation of the Interestingly, numbers in Arabic, whether using Indic or Arabic data at each layer. numerals, are displayed left-to-right adding to the complexity. To address the needs for internationalization and 1) Arabic Character Sets History localization, the research community, through its In order for computers to process and interchange textual standardization organizations, has developed a plethora of data correctly, characters encoding must be standardized. For standards (e.g.,[7], [8], [9]). However, these standards with English language, the American Standard Code for Information their formal language may not be directly comprehensible to Interchange (ASCII) character set has been the standard the practitioner. To make such knowledge accessible, many representation since the early time of computers. ASCII used 7 guidelines and tutorials (e.g., [10], [16], [20]) were developed, bits per character, which allowed 128 characters only. These yet they are usually designed to address one issue requiring the characters included the upper and lower case English alphabet, practitioner to internalize knowledge from different sources. the digits from 0 to 9, and punctuation symbols. As such, Moreover, these standards, guidelines and tutorials are ASCII allowed representation of English text only. frequently limited to the Web technologies themselves excluding other parts of the architecture. With the existence of large amounts of information in ASCII and the need to represent other languages, 8-bit In this paper, we propose a comprehensive step-by-step character sets that extend ASCII were developed and framework addressed to the practitioner that integrates the standardized. They were referred to as Extended ASCII. As 8- knowledge from different standards and technologies and calls bit representations allowed 256 characters (code points) and as attention to issues that could be overlooked in designing and the first 128 of them were always occupied by ASCII, the other implementing data-intensive Arabic Web applications. 128 code points were used to represent one or more languages, Throughout the discussion, we strive to provide solutions that or character graphics, and they were referred to as the extended are applicable to different development environments; set. As an example, ISO-8859-1 character set extends ASCII however, we conducted our experiments, mostly, using open with other Western European languages. source applications and tools. Operating systems utilized Extended ASCII character sets We limit our work to addressing issues related enforcing to provide multilingual support and referred to them as “code proper Arabic text representation, transport, and processing. As pages”. Each user was able to add other locales by selecting such, we exclude discussion on other issues such as calendar their code pages. However, since only one code page can be systems, time zones, date and time, and currency formats. active at time in an application, it was impossible to have The rest of the paper is organized as follows. Section 2 languages from different code pages concurrently. The problem provides background, Section 3 presents related work, Section was not solved until the development of UNICODE [18] 4 introduces the proposed solution, and Section 5 concludes the presented in the next Section. paper. Arabic character set encoding went through several developments. In 1981, the CUDAR-U encoding appeared as II. PRELIMINARIES the first Arabic character set, which used 7 bits per character. Most of the Web standards emphasize the use of the In 1982, the Arab Standards and Metrology Organization globally accepted Unicode character encoding and maintaining (ASMO) produced its first character set standard, AMSO-449, that encoding throughout the different layers of the which used 7 bits per character as well. ASMO-449 then architecture. In this section, we describe the characteristics of became the basis for all subsequent standard sets, playing a role Arabic, the historical development of Arabic character sets and similar to ASCII for Latin characters; however, similar to the current status of the top Arabic Websites. ASCII, it allowed the representation of one language; Arabic in that case. In 1986, ASMO-708 standard appeared. It uses 8 bits A. Arabic Language Representation per character. Later, it became the international standard ISO- When displaying characters, it is important to distinguish 8859-6 and became the code page for the Arabic Macintosh between the characters themselves and their visual