<<

Office europeen des brevets (fi) Publication number : 0 643 356 A2

@ EUROPEAN PATENT APPLICATION

@ Application number: 94306555.7 @ Int. CI.6: G06F 17/21

(22) Date of filing : 07.09.94

(30) Priority : 09.09.93 CA 2105847 @ Inventor : Storisteanu, Adrian 140 Wells Street (43) Date of publication of application : Toronto, Ontario (CA) 15.03.95 Bulletin 95/11 Inventor: Wang, Zemin 15 Maresfield Drive @) Designated Contracting States : Scarborough, Ontario (CA) DE FR GB (74) Representative : Lloyd, Richard Graham (R) Applicant : International Business Machines IBM (UK) Ltd, Corporation UK Intellectual Property Department, Old Orchard Road Hursley Park Armonk, N.Y. 10504 (US) Winchester, Hampshire S021 2JN (GB)

(54) Method of editing text in column sensitive environments.

A method and editor for line of text (57) editing a 100 comprising mixed single- character set (SBCS) and double-byte character set (DBCS) Determine The text while maintaining the columnar integrity of Present Class Of the text after the edit point. The method and The System editor provide an extended cursor to indicate to the user the text which will be affected by the editing operation. The method and editor are operable on computer systems which require Shift Out (SO) - Shift In (SI) control characters, Determine The on systems which support emulated SO - SI Present State Of control characters and on systems which do The Text not.

Determine The Appropriate Action

112 Display Extended Cursor

116 CM Insert < Replacement Text CO String 10 CO CO 120 CO Move Edit Cursor To Next Active FIG.1 Location LU Jouve, 18, rue Saint-Denis, 75001 PARIS 1 EP 0 643 356 A2 2

FIELD OF THE INVENTION terand not a SBCS character. The continues to interpret the data as two-byte DBCS characters un- The present invention relates to text editors which til a SI is encountered. support double-byte character sets (DBCS). More For example to place the Chinese pictograph specifically, the present invention relates to text edi- 5 ■ 3C in a text string, the user would Shift Out (SO) of tors for mixed DBCS and single-byte character set SBCS mode by pressing a Shift Out key (or prede- (SBCS) text in systems with SO - SI control charac- fined Shift Out key sequence such as ALT-ESC) on ters or with emulated SO - SI control characters or the terminal, compose the desired pictograph charac- without SO - SI control characters, and to a method, ter by pressing a predefined key sequence or by en- for use therein, of accomplishing text replacement 10 tering a predefined number code such as 8248 (com- while maintaining the columnar integrity of the text. prising two hexadecimal digits, 82 and 48, represent- ing the two identifying the DBCS character) and DESCRIPTION OF THE PRIOR ART then re-enter SBCS mode by pressing a Shift In key or key sequence. Depending upon the particular dis- Characters in computer systems are typically 15 stored as a number code which is interpreted by the play and computer system in use, the ^ computer system in accordance with a code table. may be displayed on screen or representative code Common examples of such codes are the ASCII code, numbers may be displayed in its place. Further, a rep- which is used in most Roman alphabet (AtoZ) based resentative symbol may be displayed indicating the personal computers, and the EBCDIC code used in 20 location of the SO and SI control characters. many mainframe computers. Typically, each charac- More recently, personal computer operating sys- ter defined in the ASCII or EBCDIC code requires a tems such as the IBM1 OS/22 operating system have single byte of storage space and thus up to 256 (28) provided support which allows DBCS char- unique characters (including alphanumerics, symbols acters to be stored without requiring a SO - SI se- and control or other reserved characters) may be de- 25 quence. In such systems, the first byte of the DBCS fined. character is a byte which is not used within the SBCS When other, non-Roman alphabet character sets character set and thus identifies the following byte as are desired, for example Japanese Kanji or Chinese a member of a DBCS character set. Several different pictograph characters, more than 256 unique charac- first bytes (each not used in the SBCS character set) ters may be required. To enable the necessary num- 30 may be used to refer to several different 'pages' of ber of characters to be defined, double-byte character 256 unique DBCS characters. The actual first byte sets (DBCS) have been employed wherein each char- values selected vary between languages and system. acter is defined by two bytes of storage space. This For example, in one IBM system the values 129 allows over 65,000 (216) unique characters to be de- through 252 (which are not used for Roman alphabet fined. 35 characters) are the first bytes of DBCS character sets In countries that use ideographic characters like for Chinese pictographs. China and Japan, it is important that DBCS and SBCS Depending upon the particular system in use, text can be intermixed in a single document. For ex- DBCS characters may be displayed as two, two-digit ample, a database program may be originally written hex numbers (e.g. 8248) or as the actual pictogram in English and a non-English version may be sold 40 ■'- (i.e. ^(.). Additionally, depending upon whether SO - wherein Roman alphabet and other user in- prompts SI characters are required, not required, or emulated, terface information is replaced with non-Roman al- the DBCS may be preceded and followed by SO - SI phabet prompts (i.e. - Japanese Kanji characters), al- lowing mixed DBCS and SBCS text to be stored in the symbols as appropriate (i.e. * % T ). database program. ■*5 Despite the above-mentioned capabilities of Previously, when producing such a program or computer systems, a problem exists with text contain- text file with mixed SBCS and DBCS characters on an ing a mixture of SBCS and DBCS characters in that, EBCDIC terminal, a user would switch the terminal for many applications, the columnar positioning of the from SBCS mode to DBCS mode by employing a text is critical. For example, when programming in Shift Out (SO) - Shift In (SI) sequence which places so programming languages like RPG, RPGII or Fortran, a DBCS identifier before (SO) and after (SI) the the position of command and data text on a line (i.e. DBCS characters. When the computer encounters a - its columnar positioning) is crucial to the correct in- SO identifier (control character), it knows that the terpretation of that text within the program. When the next byte is the first byte of a two-byte DBCS charac- text is a mixture of SBCS characters and DBCS char-

Registered trade mark 2Registered trade mark 2 3 EP 0 643 356 A2 4 acters it is difficult and onerous to replace (edit) the which maintains columnar integrity of the text. It is a text while maintaining its columnar integrity. further object to provide a novel text editor for editing Some examples of these difficulties are: replac- m ixed character set text wh ich mai ntai ns col urn nar i n- tegrity of the text. ing t ", which occupies four bytes (one for the 5 According to one aspect of the present invention, Shift Out control character, two for the DBCS charac- there is provided a method of editing at least one ter code and one for the Shift In control character) character of a line of text containing mixed DBCS and with a SBCS character which requires a single byte; SBCS characters while maintaining the columnar in- replacing "A", which requires one byte with "jSC" tegrity of the text, comprising the steps of: which requires two bytes of storage space (or as 10 (i) positioning an edit cursor on a character to be many as four bytes if SO and SI control characters are replaced; required). In each example, the columnar integrity of (ii) determining a present Class of the editor; the text on the line, after the site of the replacement, (iii) determining a present State of the text to be will be affected. edited at the edit cursor location; Text editors capable of editing mixed SBCS and 15 (iv) selecting a predefined Action associated with DBCS text do not presently maintain the columnar in- the present State; tegrity (i.e. - columnar positioning) of text on a line af- (v) displaying an extended cursor indicating the ter an editing operation has been performed. Thus, text to be affected by the replacement according the user must manually adjust the text to maintain col- to the predefined Action; umnar integrity by manually inserting spaces or delet- 20 (vi) composing a replacement text string compris- ing characters and a failure to do so, or doing so in- ing a replacement character and, if defined by the correctly, results in those subsequent characters be- predefined Action, at least one of a character in- ing moved out of their desired columnar positions dicating the start of DBCS characters, a charac- leading to the program misinterpreting the command ter indicating the end of DBCS characters and a and/or data text on that line. Further, in systems 25 blank character; wherein a fixed line length is set, an edit operation (vii) replacing the affected text with the replace- may be aborted by the system as it would result in the ment text string; and maximum line length being exceeded, even in situa- (vii) repositioning the edit cursor immediately af- tions where the maximum line length is exceeded as ter the replacement character. an intermediate editing step and the final line length 30 According to another aspect of the present inven- would be acceptable. It can be most confusing and tion, there is provided a text editor for editing a line of frustrating to a user to attempt an edit operation SBCS and DBCS mixed text and maintaining the col- which is refused by the system, for no apparent rea- umnar integrity of the text during the edit operation, son, especially when the same edit was performed on comprising: a preceding (albeit shorter) line without difficulty. 35 means for positioning an edit cursor on a char- Also, when a replace edit operation is performed, acter to be replaced; it is often difficult for the user to determine the extent means for determining the Class of the editor; to which the edit operation will affect subsequent text. means for determining the State of the charac- For example, replacing the "A" in "CABLE" with a ter to be replaced, and the Action associated with the 40 determined State; DBCS character such as " $1 ", will result in the "B" means to provide a visual indication of the ex- tent of the line of text to be affected by the replace- after the "A" also overwritten to "C$C LE". being give ment according the associated Action; When SO - SI characters are present in the text to be means to compose a replacement text string edited, replacement operations become still more 45 for the text to be affected comprising at least a re- and difficult to implement confusing properly. placement character and, if defined by the associated It is desirable therefore, to provide a method of Action, at least one of a character indicating the start mixed SBCS and DBCS text which maintains editing of DBCS characters, a character indicating the end of columnar of the edited text. It is also desired integrity DBCS characters and a blank character; to indication to the of the extent to provide an user so means to replace said text to be affected with which a character replacement will affect subsequent said replacement text string; and text line and to avoid the of on a problems exceeding means to reposition the edit cursor to the char- maximum line lengths. acter immediately after the replacement character. SUMMARY OF THE INVENTION 55 BRIEF DESCRIPTION OF THE DRAWINGS It is an of the invention to provide object present A preferred embodiment of the present invention a novel method of editing mixed character set text will now be described, by way of example only, with 3 5 EP 0 643 356 A2 6 reference to the accompanying drawings, in which: made to operating system functions to determine the Figure 1 is a flowchart showing the operational present Class of the system. Alternatively, if the edi- steps of the present invention; tor is intended to be used on a single type of system Figure 2 shows the four possible Classes of a or terminal, the determination of whether the system computer system in accordance with the present 5 has SO - SI requirements would be inherent and op- invention; erating system calls would only need to be performed Figure 3 shows a table of defined Classes, States to determine the current keyboard mode. and Actions of the present invention, along with Once the Class of the system has been deter- examples of each Action. mined, the next required step 104 is to determine the 10 State of the system. Each of the above-listed Classes DETAILED DESCRIPTION OF THE PREFERRED has two or more possible States defined depending EMBODIMENTS upon the type of text the edit cursor is located on and, in some cases, the type of text one or more characters Figure 1 shows a flow chart of the steps per- after the edit cursor. The mnemonics for the States formed in a preferred embodiment of a text editor em- 15 are defined according to the format "AABBB", where- bodying the present invention. As the editor allows for in: AA represents the current Class of the system and the editing (replacement) of SBCS and DBCS mixed BBB indicates, depending upon the State, the type of text on various computer systems, the first step 100 one or more characters at or adjacent the present edit is for the editor to determine the Class of the system cursor location. In the description below: D indicates it is operating on. 20 a DBCS character; S indicates a SBCS character; I in- The four possible Classes are shown in Figure 2 dicates the Shift In character; O indicates the Shift wherein: Out character; and X indicates any type of character NS - indicates a system which does not main- (don't care). tain SO - SI symbols, the system's keyboard being in There arefoursets of elements used in determin- SBCS mode; 25 ing the State defined for a particular Class: ND - indicates a system which does not main- Sd is the set of all the first bytes of DBCS char- tain SO - SI symbols, the system's keyboard being in acters defined (in code page based systems, this DBCS mode; would be the set of code page identifiers and in other SS - indicates a system with SO - SI require- systems this would be determined by the presence of ments, or which emulates SO - SI characters, the sys- 30 the character in a sequence delimited by SO - SI char- tem's keyboard being in SBCS mode; and acters); SD - indicates a system with SO - SI require- Ss is the set of all the valid SBCS characters, ments, or which emulates SO - SI characters, the sys- excepting the SO and SI characters; tem's keyboard being in DBCS mode. S| is the SI character; and An example of a system which does not maintain 35 S0 is the SO character. SO - SI symbols would be an ASCII - based system To determine the particular State within a Class, such as a PC running the IBM OS/2 Presentation a function is employed which returns the character at, Manager3 wherein the mixed text was not intended for or after, the edit cursor position and the identity of the subsequent use on systems requiring SO - SI control set the character belongs to. For convenience, this characters for handling DBCS text. An example of a 40 function is referred to herein as the line(i) function, system with emulated SO - SI requirements is the wherein line(0) returns the character (and its set) at above-mentioned OS/2 Presentation Manager sys- the present location of the edit cursor, line(1) returns tem wherein the mixed text is intended forsubsequent the character (and its set) immediately after the edit use on systems that require SO - SI characters. An ex- cursor, line (2) returns the second character (and its ample of a system with SO - SI requirements would 45 set) after the edit cursor, etc. The method of determin- be an EBCDIC-based system, such as a terminal for ing a particular State is described below. use with an IBM AS/4004 system. In all cases, the key- Class NS has two States defined for it, namely board would be switched between SBCS or DBCS NSD and NSS. State NSD is defined when line(0) is mode in a suitable manner. As shown in the Figure, a an element of set Sd (i.e. - the edit cursor is located change in keyboard mode or a change in whether the so on the first byte of a DBCS character). State NSS is system is in a SOSI maintenance mode results in a defined when line(0) is not an element of set Sd (i.e. change of the system's Class. - the edit cursor is not on the first byte of a DBCS char- If the editor is intended to be used on different acter). systems and/or terminals, appropriate calls may be Class ND has three States defined for it, namely

3Registered trade mark 4Registered trade mark 4 7 EP 0 643 356 A2 8

NDD, NDSD and NDSS. State NDD is defined when moved to the right after text replacement has been ac- line(O) is an element of set Sd; State NDSD is defined complished; and T indicates the replacement text when line (0) in not an element of set Sd and line(1) is which may be any combination of one or more DBCS an element of set Sd; and State NDSS is defined when and SBCS characters, the SI control character, the neither line(0) nor line(1) are elements of set Sd. 5 SO control character and blanks (indicated by a 'B'). The States defined for Classes SS and SD are The table of Figure 3 lists the Action defined for determined in a somewhat more complex manner. each Class and State and shows an example text Class SS has seven States defined for it, namely: string before (START TEXT) and after (EDITED SSOI; SSODI; SSOD; SSDI; SSDDI; SSDD; and TEXT) an edit has been performed. The START TEXT SSS. In determining the particular State defined for 10 column shows text strings for which the Class and Class SS, the edit cursor is advanced by the editor un- State have been determined in accordance with the til line(0) is not an element of set Sj. Once line(0) is edit cursor position. The defined Action for the partic- not an element of set Sj, a determination of the de- ular Class and State is performed at steps 112, 116 fined State is performed in accordance with the fol- and 120. Step 112 comprises displaying the extended lowing table: 15 cursor which visually identifies the text characters if line(0) is an element of set S0 which will be affected by the edit operation. In Figure then if line(1) is an element of set Sj, 3, the extended cursor is shown by shading of the af- State SSOI is defined; fected characters (i.e. - ' y ') and the present position else if line(3) is an element of set Sj, of the edit cursor (i.e. - the line(0) position) is the left- State SSODI is defined; 20 most character covered by the extended cursor. It will else State SSOD is defined, be understood by those of skill in the art that the if line(0) is an element of set Sd method of displaying the extended cursor is not par- then if line(2) is an element of set Sj, ticularly limited and any suitable visual indication State SSDI is defined; such as reverse video or a colour change may be em- else if line(4) is an element of set Sj, 25 ployed. State SSDDI is defined; The REP. TEXT column shows the text which is else State SSDD is defined, input at the keyboard and which is combined with any if line(0) is not an element of set S0 and line(0) necessary control characters and/or blanks to form is not an element of set Sd, then State SSS is defined. the replacement text string which, at step 116, is in- Similarly, Class SD has nine States defined for it 30 serted to replace the text indicated by the extended and the edit cursor is advanced by the editor until cursor. Once the replacement text string is inserted, line(0) is not an element of set S0. The determination the new, updated position of the edit cursor is calcu- of the particular defined State is then performed in ac- lated at step 120. The EDITED TEXT column shows cordance with the following table: the result of the edit and the new edit cursor position if line(0) is an element of set Sj 35 is indicated by the underscore character (i.e. - '_'). then if line(1) is an element of set S0, It will be apparent to those of skill in the art that State SDIO is defined; the use of the underscore character in Figure 3 is else if line(2) is an element of set Ss, merely for clarity and that, in actual use, once the new State SDIS is defined; updated edit cursor position is determined, the editor else if line(3) is an element of set Sj, 40 would repeat the above-mentioned steps of determin- State SDISOI is defined; ing the Class, State and defined Action based upon else State SDISOD is defined, the type of text at the new edit cursor position (line(0)) if line(0) is an element of set Sd, State SDD is and a new extended cursor would be displayed. defined; The Actions shown in Figure 3 vary from the triv- else if line(4) is an element of set Sj, State 45 ial, such as the straight replacement of SBCS text SDSXXXI is defined; (State NSS), to the quite complex, such as the re- else if line(4) is an element of set Sd, State placement of a SBCS character with a DBCS charac- SDSXXXD is defined; ter where the SBCS character to be replaced pre- else if line(2) is an element of set S0, State cedes a single DBCS character which is surrounded SDSSO is defined; 50 by SO and SI control characters (State SDSSO). In else State SDS is defined. the first case, which is essentially the standard over- Once the particular State has been determined, write mode of most text editors, the extended cursor step 108 comprises selecting the corresponding Ac- highlights the single character to be affected, the tion in accordance with the table in Figure 3. The mne- character is replaced and the edit cursor is moved to monic for each Action is defined according to the for- 55 the next character. In the second case, the extended mat "12T", wherein: 1 indicates the number of bytes cursor highlights the three characters to be affected, of text to be highlighted by an extended cursor; 2 in- a SO control character is placed at the edit cursor dicates the number of bytes the edit cursor is to be 5 g EP 0 643 356 A2 10 position (line(0)) by the editor, the DBCS replacement The replacement text would then be read from the character from the keyboard is placed at the next two keyboard as it is input and the replacement of the text positions (line(1) and line(2)) and the edit cursor is up- and repositioning of the edit cursor would proceed as dated by moving it three spaces to the right onto the described previously. second DBCS character. 5 The present invention provides a novel and use- Two Actions of particular interest are those de- ful method and a text editor for editing SBCS and fined for States SDSXXXI and SDSXXXD. These par- DBCS mixed text while maintaining the columnar in- ticular States include state reductions which provide tegrity of the text. The present invention will operate a degree of optimization in determining the system's with computer systems requiring SO - SI control char- State. Specifically, if the system is in Class SD and 10 acters, with systems which support emulated SO - SI line(0) is an element of Ss and line(4) is an element of control characters, and with those which do not. The S| or line(4) is an element of Sd then there are two op- present invention also provides a convenient visual timizations which are possible and these optimiza- indication of the text which will be affected by an edit tions are derived as follows: operation through the display of an extended cursor. if, line(4) is an element of Sj then there are 15 three possible configurations of line(1), line(2) and line(3). Specifically, provided valid DBCS and SBCS Claims text strings are considered, these three characters can be SSO, OIO, or OD and each results in the same 1. Amethod of editing at least one character of a line defined action, namely 530DIB; 20 of text containing mixed DBCS and SBCS char- if line(4) is an element of Sd there are only the acters while maintaining the columnar integrity of same three possible configurations of line(1), line(2) the text, comprising the steps of: and line(3) and each results in the same defined ac- (i) positioning an edit cursor on a character to tion, namely 630DIBO. be replaced; Thus, if the Class of the System is SD and line(0) 25 (ii) determining a present Class of the editor; is an element of Ss, the system need not check line(1), (iii) determining a present State of the text to line(2) and line(3) but only needs to check line(4) to be edited at the edit cursor location; determine whether the State is SDSXXXI or (iv) selecting a predefined Action associated SDSXXXD. with the present State; It will be apparent to those of skill in the art that 30 (v) displaying an extended cursor indicating the editor will continuously monitor the present Class the text to be affected by the replacement ac- of the system and will redetermine the appropriate cording to the predefined Action; State and Action as required. For example, once the (vi) composing a replacement text string com- edit has been performed in the SDSSO State exam- prising a replacement character and, if de- ple of Figure 3, the result is 35 fined by the predefined Action, at least one of a character indicating the start of DBCS char- 3C E* 'D acters, a character indicating the end of DBCS characters and a blank character; with the new edit cursor position being on the first (vii) replacing the affected text with the re- 40 placement text string; and byte of the 3>t Since the keyboard of the system is ■ (vii) repositioning the edit cursor immediately still in DBCS mode and the new Class and State after the replacement character. (based upon the new edit cursor position) of the sys- tem would be SDD, the extended cursor would be dis- 2. A text editor for editing a line of SBCS and DBCS played by the system as 45 mixed text and maintaining the columnar integrity of the text during the edit operation, comprising: t> ~$C N »!). means for positioning an edit cursor on a character to be replaced; If however, the user wished to replace the with a means for determining the Class of the SBCS character, the keyboard mode of the system so editor; would be switched to SBCS mode and the editor means for determining the State of the would re-determine the Class, State and Action. Spe- character to be replaced, and the Action associ- cifically, the system would determine that the State ated with the determined State; was now SSDI and the extended cursor would be dis- means to provide a visual indication of the played as 55 extent of the line of text to be affected by the re- placement to the associated Action; 3C according E* »D. means to compose a replacement text string for the text to be affected comprising at 6 11 EP 0 643 356 A2 12 least a replacement character and, if defined by the associated Action, at least one of a character indicating the start of DBCS characters, a char- acter indicating the end of DBCS characters and a blank character; 5 means to replace said text to be affected with said replacement text string; and means to reposition the edit cursor on the character immediately after the replacement character. 10

15

20

25

30

35

40

45

50

7 EP 0 643 356 A2

100

Determine The Present Class Of The System

i 104

Determine The Present State Of The Text

108

Determine The Appropriate Action

Y 112

Display Extended Cursor

J. , 116 Insert Replacement Text String

120 Wove Edit Cursor To Next Active FIG.1 Location EP 0 643 356 A2

Mode = DBCS ©Keyboard ND Keyboard Mode = SBCS

30SI NO SOSI SOSIj MO SOSI

Mode = DBCS ^\ ©Keyboard= SD Keyboard Mode = SBCS \ /

FIG. 2

9 EP 0 643 356 A2

CLASS STATE ACTION START REP. EDITED TEXT TEXT TEXT NS NSS US EOITE X EXITE

NSD 21SB E^tTE x EX_TE

ND NDSS 22D EXXTE 3fc e^TE

NDSD 32DB EX&TE * E* TE

NDD 22D eXtE * E#TE

SS SSS 11S EOITE X EXITE

SSOI 21SB EaYITE X EX_ITE

SSODI 41SBBB E*3t*ITE x EX_ ITE

SSOD 31SBO EA^t^YlTE x EX_a^tITE

SSDI 32ISB EA^t^riTE x Ea£yX_ITE

SSDDI 52ISBBB Ea ^ 3t riTE x Ea^yX ITE

SSDD 42ISBO Ea^^H^yE x Ea^yX_a*yE

SD SDS 430DI EXITED £ Ea£yD

SDSSO 330D EXIA&YD ^ Ea^^yD

SDSXXXI 530DIB bEa^YD Ea^y D

SDSXXXD 630DIBO eX^^JItD * Ea*t a&yD

SDIS 32DI EXa£*DED * EXa£^yD

SDISOI 42DIB Ea&^DATD Ea^^y D

SDISOD 52DIBO YD ^ A£^T AjgvD

SDIO 22D EA^fi^TD * Ea^j^yD

SDD 2 2D Ea^yD * Ea^yD

FIG. 3

10