Chamilo LMS - Feature #272 UTF-8 Native Support 03/12/2009 00:06 - Carlos Vargas
Total Page:16
File Type:pdf, Size:1020Kb
Chamilo LMS - Feature #272 UTF-8 native support 03/12/2009 00:06 - Carlos Vargas Status: Feature implemented Start date: 03/12/2009 Priority: Normal Due date: Assignee: Ivan Tcholakov % Done: 100% Category: Estimated time: 0.00 hour Target version: 1.8.7 beta Spent time: 0.00 hour Complexity: Normal SCRUM pts - 100 complexity: Description An e-mail from cristi hypermedia.eu --- Hello, We’re sorry for the delay, but we’ve been pretty busy here at Hypermedia :) I could send you the whole version that I’ve modified from Dokeos 1.6.3 , but there are many modifications – customizations in fact that were needed at that moment. (if you want the whole version I modified, just say so and it will be no problem for me to send it. I just think that it will be way to complicated to follow all the modifications, because I cannot remember all the things I’ve changed :D). I mean no offense if there are too many explanations, but I wrote the document this way, so that maybe we can use it to show a “non-technical” person how to migrate. Here is the roadmap I followed (actually I wrote it afterwards) for UTF8 conversion of Dokeos: I’ve used here the MySql command line because it’s faster. If you want to use phpmyadmin, I think you should be careful what ‘MySQL charset’ and what ‘MySQL connection collation’ are you using. I will try this sometime, but for the smoothest results my guess is that: - before making the dump you have to use MySQL charset latin1 and MySQL connection collation latin1_swedish (the default for the Mysql tables) - before loading the dump (converted to UTF8) you have to change MySQL charset to utf8 and MySQL connection collation utf8_general_ci It might work in any case, but I think this is the easiest way. 1. Database: Currently the whole Dokeos database uses latin1_swedish charset in MySql. There are a few ways of converting the whole database. Many say to convert all your ‘text’ fields (varchar included) to binary and then run queries. But I have this way of doing it, which is much more “nice” and VERY MUCH FASTER. - Create dump with latin1 charset: mysqldump --default-character-set=latin1 u root -p dokeos_main > dokeos_latin1.sql Remove (or replace) all the charset-related directives. For example – you could replace ‘DEFAULT CHARSET=latin1’ with ‘DEFAULT CHARSET=utf8’ in all the document. Or, if you create your new database with default collation utf_general_ci , you could just remove all the ‘DEFAULT CHARSET=latin1’ from the file, and then when you import it, MySql will use the database default charset for all the tables. - Convert the dump file from latin1 (iso-8859-1) to utf-8 You could use several ways of doing this, but the nicest I have so far is by using the iconv library (on Linux is installed by default I guess, for Windows I downloaded it from http://gnuwin32.sourceforge.net/packages/libiconv.htm) iconv f iso-8859-7 -t utf-8 dokeos_latin1.sql > dokeos_utf8.sql Create a new database with collation utf8_general_ci – database dokeos_utf8.sql - import the new file with utf-8 charset in the new database mysql --default-character-set=utf8 u root -p dokeos_utf < dokeos_utf8.sql Check with phpmyadmin using utf connection and collation to see if texts from all languages are ok (greek is just an example because it uses non latin characters and that’s what we’ve been using here in Cyprus). 02/10/2021 1/47 2. Language files convert from iso-8859-1 to utf-8: I used again the iconv library. For the greek files I used : iconv.exe -f iso-8859-7 -t utf-8 language_file.inc.php > utf/language_file.inc.php all the language files in claroline/lang - problems in chat.inc.php - remove special characters or convert to html entities (I have to clear what was this about sometime) – it seems there are some greek special characters that are not “fully” supported for conversion. Additional language conversions: - ctools/wiki/greek.inc.php - claroline/ctools/glossary/lang/greek/glossarie.inc.php - egnosis/home/home_*_greek.html - files with texts from home page - ctools/poll/calendar/popcalendar.js - greek months names I did these conversion because we used those plugins and tolls in the platform. 3. Modify code-files in claroline directory: Notice that line numbers might not be the same – I told you I made many modifications on the platform. There is a catch here. Search for all mysql_connect() calls and add SET NAMES and SET CHARACTER SET - inc\\claro_init_global_inc.php - added lines 64,65 (after mysql_select_db()) mysql_query("SET NAMES 'utf8'", $dokeos_database_connection); mysql_query("SET CHARACTER SET 'utf8'", $dokeos_database_connection); - ctools\\glossary\\glossary.php line 291 - added function utf8_substr() and used it everywhere in the document instead of substr() - plugin/CoolSearch/ - implemented separate language files for English and Greek using UTF8 - announcements/announcements.php - line 619: UTF-8 encoding for email (instead of iso-8859-1) - admin/user_import.php - UTF-8 charset for the imported file (?!) - admin/user_export.php - UTF-8 charset for the exported file - auth/inscription_second.php - utf-8 charset for the email sent to the new user - calendar/calendar.php - charset of the calendar page - inc/claro_init_header.inc.php - set default charset to UTF-8 - inc/phpmailer/class.phpmailer.php - line 36: set utf-8 encoding for mails - inc/phpmailer/phpdoc/phpmailer.html - description of phpmailer class - line 526: default encoding of mails is utf-8 - inc/phpmailer/test/phpmailer_test.php - lines 63,198: set mail's encoding to utf-8 - lang/english/trad4all.inc.php - line 18: $charset='utf-8'; - lang/(english|greek)/tracking.inc.php - add $status_lang - array with possible status of a lesson - learnpath/learnpath_functions.inc.php - line 2301: set charset of the manifest file to utf-8 - online/header_frame.inc.php - line 67: set page encoding to utf-8 - online/online.php - line 46: set page encoding to utf-8 - scorm/closesco.php - line 49: set page encoding to utf-8 - scorm/contents.php - line 68: set page encoding to utf-8 - scorm/opensco.php - line 51: set page encoding to utf-8 - tracking/courseLog.php - line 60: set page encoding to utf-8 - line 108: if $status_lang not imported from language file, put it in english language - tracking/userLog.php - lines 81,85: set default translations encoding to utf-8 02/10/2021 2/47 - line 67: if $status_lang not imported from language file, put it in english language - work/student.html - line 4: set page encoding to utf-8 Also, there are some characters that are not displayed correctly in utf8 - CHANGE special characters that are displayed garbled in UTF8 to HTML entities: - « and » instead of « and » - calendar/agenda.inc.php - calendar/myagenda.php - CHANGE CREATE DATABASE syntax - DEFAULT CHARACTER SET utf8 for all tables created from within php scripts: - claroline/calendar/myagenda.php - line 113 - claroline/inc/lib/add_course.lib.inc.php - line 199 - claroline/install/install_db.inc.php - line 127, 143, 161, 179 - leave CREATE TABLE syntax as it is for now - CHANGE CREATE TABLE syntax DEFAULT CHARSET = utf8: - claroline/inc/lib/add_course.lib.inc.php ctools/poll/calendar/popcalendar.js - months names - phpbb/reply.php - line 526: replaced htmlentities() function with htmlspecialchars() Here it is now a list of the PROBLEMATIC PHP functions when working with UTF8 strings: - String functions - they all have equivalents in mbstring extension (prefixed with 'mb_') : - strcut() - strlen() - strpos() - strrpos() - strtolower() - strtoupper() - substr_count() - substr() I used built-up utf8_substr() function for now The best solution to fix these problem is to test if the mbstring extension for php is installed and work with equivalent string functions if yes. If it is not installed, we could use the iconv() functions from PHP to convert strings on the fly, but of course this will slow down things. However there might be many cases when you cannot use mbstring evan if you want to (hosting companies). In Dokeos I used : function utf8_substr($str,$from,$len){ return preg_replace('#^(?:[\\x00-\\x7F]|[\\xC0-\\xFF][\\x80-\\xBF]+){0,'.$from.'}'. '((?:[\\x00-\\x7F]|[\\xC0-\\xFF][\\x80-\\xBF]+){0,'.$len.'}).*#s', '$1',$str); } And that pretty much solved everything – because the rest of the functions are not used in a “harmful” way for utf8 strings. value of mbstring.func_overload - original function overloaded function 1 - mail() - mb_send_mail() 2 - strlen() - mb_strlen() 2 - strpos() - mb_strpos() 2 - strrpos() - mb_strrpos() 2 - substr() - mb_substr() 2 - strtolower() - mb_strtolower() 2 - strtoupper() - mb_strtoupper() 2 - substr_count() - mb_substr_count() 4 - ereg() mb_ereg() 4 - eregi() - mb_eregi() 4 - ereg_replace() - mb_ereg_replace() 4 - eregi_replace() - mb_eregi_replace() 4 - split() - mb_split() I think this is all. If I remember something else I will send other emails. Also, if you have any questions, or if you want me to help in this direction, I could install a CVS system and grab the latest files from the repository and convert all the language files, database creation files etc. etc. Just let me know what are your plans and we can decide together how we can help each other :) Awaiting to hear fAn e-mail from cristi hypermedia.eu 02/10/2021 3/47 --- Hello, We’re sorry for the delay, but we’ve been pretty busy here at Hypermedia :) I could send you the whole version that I’ve modified from Dokeos 1.6.3 , but there are many modifications – customizations in fact that were needed at that moment. (if you want the whole version I modified, just say so and it will be no problem for me to send it. I just think that it will be way to complicated to follow all the modifications, because I cannot remember all the things I’ve changed :D).