<<

6RUWLQJ

 6RUWLQJ

,QWURGXFWLRQ   7KH:RUOGRI6RUWLQJ5XOHV  :HVWHUQ(XURSHDQ/DQJXDJHV  $VLDQ,GHRJUDSKV  ,62,(&²,QWHUQDWLRQDO6WULQJ2UGHULQJ  6RUWLQJLQVLGHWKH2UDFOH'DWDEDVH  %LQDU\6RUW   0RQROLQJXDO/LQJXLVWLF6RUW  0XOWLOLQJXDO/LQJXLVWLF6RUW   /LQJXLVWLF6RUW3DUDPHWHUV  8VLQJ/LQJXLVWLF,QGH[   5HTXLUHPHQWVIRU8VLQJ/LQJXLVWLF,QGH[HV   &DVH,QVHQVLWLYH6HDUFK  *(1(5,&%$6(/(77(56RUW  &XVWRPL]DWLRQRI/LQJXLVWLF6RUWV  6XPPDU\        

Sorting Your Linguistic Data Inside the Oracle9i Database Page 2 6RUWLQJ

INTRODUCTION  6RUWLQJFKDUDFWHUVWULQJVFDQEHDQH[WUHPHO\LQWULFDWHRSHUDWLRQWKHFRPSOH[LW\RI ZKLFKPD\QRWEHDSSDUHQWWRPRVWXVHUV'LIIHUHQWODQJXDJHVKDYHWKHLURZQ VRUWLQJUXOHVVRPHODQJXDJHVDUHFROODWHGDFFRUGLQJWRWKHOHWWHUVHTXHQFHLQWKH DOSKDEHWVRPHDFFRUGLQJWRWKHQXPEHURIVWURNHFRXQWVLQWKHOHWWHUDQGWKHUH DUHHYHQODQJXDJHVZKLFKDUHRUGHUHG\WKHSURQXQFLDWLRQRIWKHZRUGV 7UHDWPHQWRIOHWWHUDFFHQWVGLIIHUVDPRQJODQJXDJHVDVZHOO 6RUWLQJLVQRWVWUDLJKWIRUZDUGIRUGD\WRGD\(QJOLVKXVDJHHLWKHU,I\RXORRN ZRUGVLQDQ(QJOLVKGLFWLRQDU\\RXZLOOSUREDEO\ILQGWKDWXSSHUDQGORZHUFDVH FKDUDFWHUVDUHPL[HGWRJHWKHULQWHUPVRIRUGHULQJ/RRNLQJIRUQDPHVLQWKH WHOHSKRQHGLUHFWRU\\RXPD\ILQGWKDWFHUWDLQZRUGVPD\EHWUHDWHGWKHVDPHIRU H[DPSOHWKHSUHIL[0DFDQG0FDUHJURXSHGWRJHWKHU 6RUWLQJFDQEHIXUWKHUFRPSOLFDWHGZKHQ\RXQHHGWRVRUWGDWDIURPPRUHWKDQ RQHODQJXDJHKRZGR\RXKDQGOHWKHFDVHZKHQRQHODQJXDJHGHPDQGVWKDWD JLYHQOHWWHULVVRUWHGDIWHUWKHOHWWHU]ZKLOHWKHVDPHOHWWHULQDQRWKHUODQJXDJH UHTXLUHVLWWREHFROODWHGEHIRUHWKHOHWWHUD" ,QWKHUHVWRIWKLVSDSHUZHZLOOXVHWKHWHUP¶OLQJXLVWLFVRUW·FROODWLRQWRGHVFULEH WKHFXOWXUDOO\H[SHFWHGRUGHULQJRIHOHPHQWVRIDWH[OLVWWKLVOLVWLVFRQVLGHUHG VRUWHGLQDJLYHQODQJXDJH$QDWLYHVSHDNHURIWKHODQJXDJHH[SHFWVWRORFDWHDQ HOHPHQWRIWKHOLVWUHODWLYHWRRWKHUHOHPHQWVEDVHGRQWKLVRUGHULQJ)RUH[DPSOH (QJOLVKVSHDNLQJXVHUVZRXOGH[SHFWWRILQGDZRUGEHJLQQLQJZLWK%WRFRPH DIWHUDOOZRUGVVWDUWLQJZLWK$DQGEHIRUHDOOZRUGVEHJLQQLQJZLWK&LQDQRUGHUHG OLVWRI(QJOLVKZRUGV 3UHVHQWLQJGDWDQRWVRUWHGLQWKHOLQJXLVWLFVHTXHQFHWKDW\RXUXVHUVDUH DFFXVWRPHGWRFDQPDNHVHDUFKLQJIRULQIRUPDWLRQGLIILFXOWDQGWLPHFRQVXPLQJ :LWKWKH2UDFOHLGDWDEDVHZHKDYHH[SDQGHGRXUFRYHUDJHRIELQDU\VRUWV OLQJXLVWLFVRUWVDQGDGGHGWKHQHZPXOWLOLQJXDOOLQJXLVWLFVRUWVWRPHHWWKHGHPDQGV RIFXVWRPHUVZKRQHHGWRVHDUFKDQGVRUWGDWDLQPXOWLSOHODQJXDJHV7KLVSDSHU RXWOLQHVWKHEDVLFFRQFHSWRIOLQJXLVWLFVRUWSURYLGHVH[DPSOHVRIKRZVRUW SURSHUWLHVLQIOXHQFHWKHFROODWLRQRUGHURIGLIIHUHQWODQJXDJHVDQGH[SODLQVKRZ

Sorting Your Linguistic Data Inside the Oracle9i Database Page 3 \RXFDQFXVWRPL]HWKHZD\\RXUGDWDLVVRUWHGLQVLGHDQ2UDFOHGDWDEDVH 

THE WORLD OF SORTING RULES  7KHVHFWLRQDERYHPHQWLRQHGVRPHRIWKHLQWULFDFLHVLQYROYHGZKHQVRUWLQJGDWDLQ DFXOWXUDOO\H[SHFWHGRUGHU7KLVVHFWLRQZLOOJRRYHUWKHPLQPRUHGHWDLOVKRZLQJ WKDWHYHU\ZULWLQJVFULSW H/DWLQ*UHHN&\ULOOLFHWF KDVDQRUGHULQJRILWVRZQ DQGVRPHZULWLQJVFULSWVKDYHPDQ\FRQIOLFWLQJRUGHULQJVIRUWKHGLIIHUHQW ODQJXDJHVWKDWWKH\VXSSRUW

Western European Languages

,Q*HUPDQ‰LVRQHFKDUDFWHUEXWLWLVWUHDWHGDVGRXEOH¶VV·IRURUGHULQJ SXUSRVHV*HUPDQVSHDNHUVDUHDOVRDFFXVWRPHGWRWUHDWLQJlDQG|DVHTXLYDOHQW WRXHDHDQGRHUHVSHFWLYHO\7KHVHWKUHHFKDUDFWHUVDUHVRUWHGLQWKHVDPHRUGHUDV WKHYRZHOSDLUV  6SDLQWUDGLWLRQDOO\WUHDWVFKOODVZHOODVDVOHWWHUVRIWKHLURZQRUGHUHGDIWHUFO DQGQUHVSHFWLYHO\)RUH[DPSOHWKHIROORZLQJ6SDQLVKZRUGVZRXOGEHVRUWHGDV OLVWHG cabalmente, caballa, cantina, caña, clamar, curador, chácara

5HFHQWO\WKLVWUDGLWLRQDO6SDQLVKVRUWLQJSUDFWLFHKDVEHHQUHSODFHGZLWKWKH PRGHUQ6SDQLVKVRUWZKLFKUHPRYHVWKHVSHFLDOVWDWXVRIFKDQGOO  ,QWKH'DQLVKDOSKDEHWWKUHHDGGLWLRQDOOHWWHUVQDPHO\ ¡DQGnVRUWDIWHU],W DOVRWUHDWVWKHOHWWHUFRPELQDWLRQDDDVDVHSDUDWHOHWWHURUGHUHGODVWLQWKHDOSKDEHW DIWHUnDQGDOVRFRQVLGHUVOHWWHUVDQG\DVYDULDQWVRIWKHVDPHOHWWHU'XHWRWKHVH UXOHVWKHIROORZLQJFLWLHV=ULFK$DUILWDQGcUEXVZLOODOODSSHDUDIWHU=\UDUGRZLQ DOLVWRIFLWLHV  ,Q)UHQFKDFFHQWHGYRZHOVDUHILUVWWUHDWHGOLNHXQDFFHQWHGYRZHOVIURPWKHSRLQW RIYLHZRIWKHLUEDVHVRUWLQJRUGHU$IWHUWKHWH[WLVVRUWHGZLWKRXWFRQVLGHULQJ DFFHQWVLWLVIXUWKHUVRUWHGVRWKDWZLWKLQHDFKYRZHOVHWWKHRUGHULVQRDFFHQW DFXWHDFFHQWJUDYHDFFHQWFLUFXPIOH[DFFHQWIROORZHGE\XPODXW7RPDNHWKLV HYHQPRUHFRPSOH[DFFHQWHGYRZHOVDUHHYDOXDWHGIURPULJKWWROHIWZKLOHQRQ DFFHQWHGEDVHFKDUDFWHUVDUHZHLJKWHGIURPOHIWWRULJKW7KXVËGLWFRPHVEHIRUH (GtWZKHQFROODWLQJWKHWZRVWULQJVXVLQJD)UHQFKOLQJXLVWLFVRUW $FFHQWHGOHWWHUVLQWURGXFHRWKHUFRPSOH[LW\DVWKH\FDQKDYHGLIIHUHQWPHDQLQJV DFURVVODQJXDJHVDQGVRPHWLPHVHYHQZLWKLQWKHVDPHODQJXDJH&RPSOLFDWLRQ DULVHVZKHQGHDOLQJZLWKPXOWLQDWLRQDOFRGLQJVFKHPHVVXFKDV8QLFRGHWKDW VXSSRUWVHYHUDOODQJXDJHVVLQFHWKHVHODQJXDJHVPD\FRQWDLQFRQIOLFWLQJDOSKDEHWLF

Sorting Your Linguistic Data Inside the Oracle9i Database Page 4 UXOHVIRUWKHVDPHOHWWHU)RUH[DPSOHWKHOHWWHUl DZLWKDQXPODXW LVVRUWHG EHIRUHELQ*HUPDQEXWLWLVVRUWHGDIWHU]LQ6ZHGLVK7KXVLIWKHVRUWHGWH[W EHORQJVWRPXOWLSOHODQJXDJHVLWXVXDOO\FDQQRWEHVRUWHGFRUUHFWO\IURPWKHSRLQW RIYLHZRIDOORIWKHP

Asian Ideographs  $V\RXFDQVHHFXOWXUDOH[SHFWDWLRQVYDU\DJUHDWGHDOEHWZHHQODQJXDJHVDQGWKH SUHYLRXVH[DPSOHVDUHMXVWIRU:HVWHUQ(XURSHDQODQJXDJHVXVLQJWKHVDPH/DWLQ ZULWLQJVFULSW,IGHDOLQJZLWKWROHWWHUVLQDQDOSKDEHWLVTXLWHDFKDOOHQJH WKHQZRUNLQJZLWK$VLDQLGHRJUDSKVFDQEHDQH[WUHPHO\GDXQWLQJWDVN)RUWKH &KLQHVHODQJXDJHDORQHWKHUHDUHRYHUFKDUDFWHUV7KHVRUWLQJUXOHVDOVR YDU\EHWZHHQ6LPSOLILHG XVHGLQ&KLQDDQG6LQJDSRUH DQG7UDGLWLRQDO XVHGLQ 7DLZDQ+RQJ.RQJ 0DFDX &KLQHVHZULWLQJVFULSWV  +HUHDUHVRPHH[DPSOHVRIWKHPRVWFRPPRQFROODWLRQPHWKRGVXVHGIRUVRUWLQJ &KLQHVHFKDUDFWHUV  6WURNHFRXQW7KLVLVRQHRIWKHPRVWFRPPRQZD\VWRVRUW&KLQHVHFKDUDFWHUV &KLQHVHFKDUDFWHUVDUHFRPSRVHGRIDUDGLFDO EDVHHOHPHQW DQG]HURRUPRUH FRPSRQHQWV%RWKWKHUDGLFDODQGWKHFRPSRQHQWVDUHFRPSRVHGRIVWURNHV7KH QXPEHURIVWURNHVYDULHVEHWZHHQFKDUDFWHUV&KDUDFWHUVZLWKLGHQWLFDOVWURNH FRXQWVZLOOEHIXUWKHUVRUWHGE\WKHUDGLFDO  7KLVH[DPSOHLOOXVWUDWHVKRZWKUHH&KLQHVHFKDUDFWHUVDUHRUGHUHGEDVHGRQWKHLU VWURNHFRXQWV  

  +RZHYHUWKHQXPEHURIVWURNHFRXQWVIRUDJLYHQ&KLQHVHFKDUDFWHUPD\GLIIHU EHWZHHQLWVVLPSOLILHGDQGWUDGLWLRQDOIRUP+HQFHLWLVFRPPRQWRKDYHWZR GLVWLQFWVRUWLQJDOJRULWKPVWRKDQGOHERWKZULWLQJVW\OHV  7KHH[DPSOHEHORZVKRZVKRZWKHWZRIRUPVRIWKH&KLQHVHFKDUDFWHU¶JUDVV·DUH WUHDWHGGLIIHUHQWO\LQWHUPVRIVWURNHFRXQWV  6LPSOLILHG&KLQHVH7UDGLWLRQDO&KLQHVH RI6WURNHV         

Sorting Your Linguistic Data Inside the Oracle9i Database Page 5 5DGLFDO7KLVRUGHULQJLVIRXQGLQPRVW&KLQHVHGLFWLRQDULHV,WLVVLPLODUWRWKH VWURNHFRXQWRUGHULQJEXWKHUHWKHUDGLFDORUGHULQJWDNHVSULRULW\RYHUWKHVWURNH FRXQW$OWKRXJKLWLVDPRUHWUDGLWLRQDOPHWKRGRIVRUWLQJ&KLQHVHFKDUDFWHUV VRPHWLPHVLWFDQEHTXLWHFXPEHUVRPHHYHQIRUQDWLYH&KLQHVHVSHDNHUVVLQFHLW PD\QRWEHLQWXLWLYHWRORFDWHWKHFRUUHFWUDGLFDOIRUDJLYHQ&KLQHVHFKDUDFWHU7KH RUGHURIWKHUDGLFDOLVE\WKHUDGLFDOVWURNHFRXQW)RUFKDUDFWHUVZLWKLGHQWLFDO UDGLFDORUGHUWKHQXPEHURIVWURNHFRXQWVRIWKHFKDUDFWHUZLOOEHXVHG 7KLVH[DPSOHLOOXVWUDWHVKRZWKUHH&KLQHVHFKDUDFWHUVDUHRUGHUHGEDVHGRQWKHLU UDGLFDOVWURNHFRXQWV     +RZHYHUVRPHUDGLFDOVZLOOFKDQJHVKDSHZKHQWKH\DUHELQGZLWKRWKHUFKDUDFWHU FRPSRQHQWVWRIRUPWKHILQDOFKDUDFWHU7KHRUGHURIDUDGLFDOLVDOZD\VEDVHGRQ LWVRULJLQDOIRUP  7KHH[DPSOHEHORZVKRZWKHRUGHURIIRXUFKDUDFWHUVEDVHGRQWKHRULJLQDOVWURNH FRXQWRIWKHLUUDGLFDOV1DPHVLQSDUHQWKHVHVVKRZWKH8QLFRGHFRGHVRIWKHVH FKDUDFWHUV           

7RDGGHYHQPRUHFRPSOH[LW\WRVRUWLQJUDGLFDOVMXVWOLNHLQWKHSUHYLRXVVWURNH FRXQWVRUWVRPH&KLQHVHFKDUDFWHUVPD\KDYHDGLIIHUHQWYDULDWLRQRIWKHUDGLFDO EHWZHHQWKHVLPSOLILHGDQGWUDGLWLRQDOIRUPRI&KLQHVH+HQFHLWLVFRPPRQWR KDYHWZRGLVWLQFWVRUWLQJDOJRULWKPVWRKDQGOHWKHGLIIHUHQFHVEHWZHHQWKHWZR ZULWLQJVW\OHV  7KHH[DPSOHEHORZGHPRQVWUDWHVWKHUDGLFDOGLIIHUHQFHVEHWZHHQ6LPSOLILHG &KLQHVH 6& DQG7UDGLWLRQDO&KLQHVH 7& IRUWKHVDPHVHWRIFKDUDFWHUV

Sorting Your Linguistic Data Inside the Oracle9i Database Page 6        

3URQXQFLDWLRQ 3LQ\LQ ²3LQ

ISO/IEC 14651 – INTERNATIONAL STRING ORDERING

,QWKHPLGVD4XpEHFFHU$ODLQ/D%RQWpZDVVXUSULVHGDQGSX]]OHGZLWKWKH LQFRQVLVWHQWEHKDYLRUZLWKZKLFKWKHGLIIHUHQWFRPPHUFLDOVRUWSURJUDPVGHDOW ZLWK)UHQFKKRPRJUDSKZRUGVHYHQWKRXJKWKHVHZHUHLQWXLWLYHWR)UHQFKQDWLYH VSHDNHUV+HUHDOL]HGWKHQHHGIRUDQLQWHUQDWLRQDOVWDQGDUGWKDWZRXOGDOORZ GHILQLWLRQRIDXQLYHUVDOPHWKRGRORJ\IRUPXOWLVFULSWRUGHULQJ ,QWKHSURMHFWIRUDQLQWHUQDWLRQDOVWULQJRUGHULQJVWDQGDUG,62ZDV FUHDWHG7KLVLQWHUQDWLRQDOVWDQGDUGSURYLGHVDPHWKRGIRURUGHULQJWH[WGDWD ZRUOGZLGHDQGSURYLGHVD&RPPRQ7HPSODWH7DEOHZKRVHWDLORULQJPHHWVWKH UHTXLUHPHQWVRIDJLYHQODQJXDJHDQGFXOWXUHZKLOHUHWDLQLQJXQLYHUVDOSURSHUWLHV IRURWKHUVFULSWV

Sorting Your Linguistic Data Inside the Oracle9i Database Page 7 7KH&RPPRQ7HPSODWH7DEOHUHTXLUHVVRPHWDLORULQJLQGLIIHUHQWORFDO HQYLURQPHQWV+RZHYHUFRQIRUPDQFHWRWKLV,QWHUQDWLRQDO6WDQGDUGUHTXLUHVWKDW DOOGHYLDWLRQVIURPWKH7HPSODWHFDOOHGGHOWDVEHGHFODUHGWRGRFXPHQWUHVXOW GLVFUHSDQFLHV7KLV6WDQGDUGGHVFULEHVDPHWKRGWRRUGHUWH[WGDWDLQGHSHQGHQWO\ RIFRQWH[W

SORTING INSIDE THE ORACLE DATABASE

7H[WLVFRQYHQWLRQDOO\VRUWHGLQVLGHDGDWDEDVHDFFRUGLQJWRWKHELQDU\FRGHVXVHG WRHQFRGHWKHFKDUDFWHUV7\SLFDOO\WKLVGRHVQRWSURGXFHDVRUWRUGHUWKDWLV OLQJXLVWLFDOO\FRUUHFW,QVRPHFDVHVLWFDQEHFRUUHFWLIWKHJLYHQHQFRGLQJVFKHPH VSHFLILHVDOOWKHFKDUDFWHUVLQDVFHQGLQJELQDU\YDOXHDFFRUGLQJWRWKHDSSURSULDWH DOSKDEHWLFFRQYHQWLRQ8QIRUWXQDWHO\PRVWHQFRGLQJVFKHPHVGRQRWIROORZDQ\ VXFKFRQYHQWLRQDQGHYHQLIWKH\GRLWLVQRWSRVVLEOHIRUWKHPWRFRYHUWKHPRUH FRPSOH[VRUWLQJVFHQDULRVGLVFXVVHGSUHYLRXVO\  7RRYHUFRPHWKLVOLPLWDWLRQ2UDFOHSURYLGHVOLQJXLVWLFVRUWLQJ$/LQJXLVWLFVRUW KDQGOHVWKHFRPSOH[VRUWLQJUHTXLUHPHQWVRIWKHGLIIHUHQWODQJXDJHVDQGFXOWXUHV ,WHQDEOHVWH[WLQDQ\FKDUDFWHUHQFRGLQJVFKHPHVWREHVRUWHGDFFRUGLQJWR VSHFLILFOLQJXLVWLFFRQYHQWLRQVLQGHSHQGHQWRIWKHELQDU\YDOXHVRIWKHFKDUDFWHUV  7KUHHW\SHVRIGDWDEDVHVRUWVDUHVXSSRUWHGLQ2UDFOHL  %LQDU\VRUW  0RQROLQJXDOOLQJXLVWLFVRUW  0XOWLOLQJXDOOLQJXLVWLFVRUW 

Binary Sort

7KHPRVWFRPPRQZD\WRVRUWFKDUDFWHUGDWDLVWRRUGHUWKHPE\WKHLUQXPHULF ELQDU\FRGHVDVGHILQHGE\WKHFKDUDFWHUHQFRGLQJVFKHPH7KLVLVDFKLHYHGE\ XVLQJ%LQDU\VRUW,WLVWKHIDVWHVWIRUPRIVRUWLQJLQWKHGDWDEDVHEHFDXVHQR VSHFLDOSURFHVVLQJKDVWREHGRQHRQVRUWHGYDOXHVDQGGDWDEDVHLQGH[HVFUHDWHG GLUHFWO\RQWKHVRUWHGFROXPQV VWDQGDUGLQGH[HV DUHXVXDOO\VPDOOHUWKDQOLQJXLVWLF LQGH[HVGHVFULEHGODWHULQWKLVSDSHUUHTXLULQJOHVVGLVNUHDGVWRVHDUFKWKHP %LQDU\VRUWRIIHUVUHDVRQDEOHUHVXOWVIRUWKH(QJOLVKDOSKDEHWVLQFHERWK$6&,,DQG (%&',&VWDQGDUGVGHILQHWKHOHWWHUVIURP$WR=LQDVFHQGLQJQXPHULFYDOXH RUGHU+RZHYHUWKLVLVQRWSUHIHFWVLQFHXSSHUFDVHOHWWHUVDQGORZHUFDVHOHWWHUVDUH JURXSHGVHSDUDWHO\)RU$6&,,XSSHUFDVHOHWWHUVDSSHDUEHIRUHDQ\ORZHUFDVH OHWWHUVZKHUHDVLQ(%&',&LWLVWKHUHYHUVH

Sorting Your Linguistic Data Inside the Oracle9i Database Page 8 5HVXOWVJHQHUDWHGIURPELQDU\VRUWVFDQYDU\GHSHQGLQJRQWKHRUGHULQJRIWKH FKDUDFWHUVZLWKLQDQ\JLYHQFKDUDFWHUVHWV:KHQFKDUDFWHUVXVHGLQODQJXDJHVRWKHU WKDQ(QJOLVKDUHSUHVHQWELQDU\VRUWVXVXDOO\GRQRWSURGXFHUHDVRQDEOHUHVXOWV )RUH[DPSOHDQDVFHQGLQJ25'(5%<TXHU\UHWXUQVWKHFKDUDFWHUV&(EÊ EHFDXVHHDFKSUHFHGLQJFKDUDFWHUKDVDORZHUQXPHULFFRGHLQWKHFKDUDFWHU HQFRGLQJVFKHPHWKDQWKHRQHDIWHULW$%LQDU\VRUWFDQQRWSUHVHQWOLQJXLVWLFDOO\ PHDQLQJIXOGDWDZKHQWKHODQJXDJHVRUWLQJUXOHVDUHFRPSOH[ 7KHELQDU\VRUWLVQDPHG%,1$5<2UDFOHLDOVRVXSSRUWVWKHILYHTXDVLELQDU\ VRUWV81,&2'(B%,1$5<$6&,,(%&',&%,**%.DQG+.6&67KHVH VRUWVDUHELQDU\LQWKDWWKH\VRUWGDWDDFFRUGLQJWRELQDU\FRGHVRIFKDUDFWHUVLQ $/87)86$6&,,:((%&',&=+7%,*=+6*%.DQG =+7+.6&6FKDUDFWHUVHWVUHVSHFWLYHO\+RZHYHUWKH\DUHQRWHTXLYDOHQWWR WKH%,1$5<VRUWEHFDXVHWKHVRUWHGVWULQJVKDYHWREHFRQYHUWHGWRWKHFRUUHFW FKDUDFWHUVHWEHIRUHFRPSDULVRQ7KLVPHDQVWKDWVWDQGDUG%,1$5<LQGH[HV FDQQRWEHXVHGWRVDWLVI\TXHULHVIRUGDWDRUGHUHGE\WKHVHVRUWVDQGOLQJXLVWLF LQGH[HVKDYHWREHGHILQHGSUHWW\PXFKOLNHIRUWKHPRQROLQJXDOOLQJXLVWLFVRUWV

Monolingual Linguistic Sort  7RSURGXFHD ORFDOL]HG VRUWVHTXHQFHWKDWDGKHUHVWRDVSHFLILFOLQJXLVWLF FRQYHQWLRQDQRWKHUVRUWWHFKQLTXHPXVWEHXVHGWKDWFDQVRUWFKDUDFWHUV LQGHSHQGHQWO\RIWKHLUELQDU\FRGHVLQVLGHWKHFKDUDFWHUHQFRGLQJVFKHPH7KLV WHFKQLTXHLVFDOOHGDOLQJXLVWLFVRUW$OLQJXLVWLFVRUWRSHUDWHVE\UHSODFLQJ FKDUDFWHUVZLWKQXPHULFYDOXHVDOVRNQRZQDV¶VRUWNH\V·WKDWUHIOHFWHDFK FKDUDFWHU·VSURSHUOLQJXLVWLFRUGHUIRUDJLYHQODQJXDJH0RQROLQJXDOOLQJXLVWLFVRUW ZDVILUVWLQWURGXFHGLQ2UDFOH 7KHFRQVWUXFWLRQRIWKHVRUWNH\LVDFKLHYHGE\EUHDNLQJGRZQHDFKOHWWHURIWKH DOSKDEHWLQWRWZRFRPSRQHQWVPDMRUYDOXHDQGPLQRUYDOXH8VXDOO\OHWWHUVZLWK WKHVDPHDSSHDUDQFH RUEDVHOHWWHU KDYHWKHVDPHPDMRUYDOXHPLQRUYDOXHLV XVHGWRGLIIHUHQWLDWHGLDFULWLFDQGFDVHYDULDQWVRIWKHVDPHEDVHOHWWHU 7KHIROORZLQJWDEOHVKRZVVDPSOHOHWWHUVDQGWKHLUPDMRUDQGPLQRUYDOXHV 

/HWWHU 0DMRUYDOXH 0LQRUYDOXH

D  

$  

l  

b  

%  

Sorting Your Linguistic Data Inside the Oracle9i Database Page 9  &KDUDFWHUVWULQJVDUHFRPSDUHGLQWZRVWHSVIRUPRQROLQJXDOOLQJXLVWLFVRUWV)LUVW WKHDOJRULWKPJHQHUDWHVVRUWNH\VIRUWKHFRPSDUHGYDOXHVDQGWKDQLWFRPSDUHVWKH VRUWNH\VE\WHE\E\WHXQWLOWKH\GLIIHURURQHLVVKRUWHU7KHVKRUWHUNH\RUWKH RQHZLWKWKHORZHUGLIIHULQJE\WHLVFRQVLGHUHGVPDOOHU 7KHVRUWNH\VDUHJHQHUDWHGE\FRQFDWHQDWLQJDOOPDMRUVRUWYDOXHVRIDOOFKDUDFWHUV RIWKHVWULQJVIROORZHGE\D]HURYDOXHDQGWKHQE\DOOPLQRUYDOXHV7KLVZD\WKH PDMRUYDOXHVGHWHUPLQHWKHEDVHVRUWRUGHUDQGRQO\IRUVWULQJVZLWKWKHH[DFWO\ VDPHPDMRUYDOXHVWKHPLQRUYDOXHVDUHFRPSDUHGWRIXUWKHURUGHUWKHVWULQJV ,IDFKDUDFWHUOLNHVSDFHRUK\SKHQVKRXOGQRWEHFRQVLGHUHGZKHQFRPSDULQJ VWULQJVWKHQRQO\WKHPLQRUYDOXHLVXVHGIRUWKHVRUWNH\7KHPLQRUYDOXHLVQRW VNLSSHGVRWKDWVWULQJVGLIIHULQJE\LJQRUDEOHFKDUDFWHUVRQO\DUHQRWFRQVLGHUHG HTXDO 7KHUHVXOWRIWKH64/1/66257IXQFWLRQLVWKHVRUWNH\LQWKH5$:GDWDW\ 2UDFOHXVHVQDPHGOLQJXLVWLFVRUWVWRVSHFLI\KRZFKDUDFWHUGDWDVKRXOGEHVRUWHG 7KHQDPHVRIWKHOLQJXLVWLFVRUWVLQPRVWFDVHVDUHGHILQHGXVLQJWKHODQJXDJH QDPHV)RUVRPHODQJXDJHVDGGLWLRQDO H[WHQGHG OLQJXLVWLFVRUWVDUHGHILQHG &RQYHQWLRQDOO\WKHVHOLQJXLVWLFVRUWVDUHQDPHGZLWKDSUHFHGLQJ ;  )RUH[DPSOH2UDFOHVXSSRUWVWZR6SDQLVKPRQROLQJXDOOLQJXLVWLFVRUWVSPANISH IRU0RGHUQ6SDQLVKDQGXSPANISHIRU7UDGLWLRQDO6SDQLVK ([WHQGHGOLQJXLVWLFVRUWVDUHGHVLJQHGWRDFFRPPRGDWHODQJXDJHVSHFLILFVSHFLDO FDVHV  6RUWLQJRIGLJUDSKVDVDVLQJOHFKDUDFWHUHJWKH6SDQLVKOODQGFK  &RQYHUWLQJVLQJOHFKDUDFWHUVLQWRGLJUDSKVIRUVRUWLQJSXUSRVHVHJWKH *HUPDQVKDUSV¶‰·LVWUHDWHGDVVV 7KHIROORZLQJLVDFRPSOHWHOLVWRIPRQROLQJXDOVRUWVVXSSRUWHGLQ2UDFOHL     $5$%,& ),11,6+ 381&78$7,21 $5$%,&B0$7&+ )5(1&+ ;381&78$7,21  $5$%,&B$%-B6257 ;)5(1&+ 520$1,$1 $5$%,&B$%-B0$7&+ *(50$1 5866,$1 %(1*$/, ;*(50$1 6/29$. %8/*$5,$1 *(50$1B',1 ;6/29$. &$1$',$1)5(1&+ ;*(50$1B',1 6/29(1,$1 &$7$/$1 *5((. ;6/29(1,$1 ;&$7$/$1 +(%5(: 63$1,6+ &52$7,$1 +81*$5,$1 ;63$1,6+ ;&52$7,$1 ;+81*$5,$1 6:(',6+ &=(&+ ,&(/$1',& 6:,66 ;&=(&+ ,1'21(6,$1 ;6:,66 &=(&+B381&78$7,21 ,7$/,$1 7+$,B',&7,21$5< ;&=(&+B381&78$7,21 -$3$1(6( 7+$,B7(/(3+21( '$1,6+ /$7,1 785.,6+ ;'$1,6+ /$79,$1 ;785.,6+ '87&+ /,7+8$1,$1 8.5$,1,$1

Sorting Your Linguistic Data Inside the Oracle9i Database Page 10 ;'87&+ 0$/$< 9,(71$0(6( ((&B(852 125:(*,$1 :(67B(8523($1 ((&B(8523$ 32/,6+ ;:(67B(8523($1 (6721,$1    

Multilingual Linguistic Sort  0RQROLQJXDOOLQJXLVWLFVRUWLVXVHIXOZKHQ\RXDUHFRPSDULQJDQGVRUWLQJGDWDLQ RQHODQJXDJHKRZHYHULWFDQQRWVRUWGDWDDFURVVPXOWLSOHODQJXDJHVRUZULWLQJ V\VWHPV2UDFOHLSURYLGHVPXOWLOLQJXDOOLQJXLVWLFVRUWVVRWKDW\RXFDQVRUWGDWDLQ PRUHWKDQRQHODQJXDJHLQDVLQJOHVRUW:LWKWKHGHYHORSPHQWRIWKH,QWHUQHW PRUHDQGPRUHFRPSDQLHVDQGQRZWUDQVIRUPLQJWKHLUEXVLQHVVHVLQWRJOREDO EXVLQHVVHVLWLVEHFRPLQJFRPPRQIRUFXVWRPHUVWRKDYHGLIIHUHQWODQJXDJHGDWD LQDVLQJOHJOREDOGDWDEDVH0XOWLOLQJXDOOLQJXLVWLFVRUWKHOSVFXVWRPHUVZLWK PXOWLOLQJXDOGDWDWRDFFXUDWHO\VHDUFKDQGRUJDQL]HLQIRUPDWLRQLQDQ\ODQJXDJH 2UDFOH·VPXOWLOLQJXDOOLQJXLVWLFVRUWLVEDVHGRQWKH,62,(&DQGWKH 8QLFRGH&ROODWLRQ$OJRULWKPVWDQGDUGV)RUWKHPWRKDQGOHWKHFRPSOH[ PXOWLOLQJXDOVRUWLQJUHTXLUHPHQWVDQGDWWKHVDPHWLPHWRSURYLGHDJUHDWHU IOH[LELOLW\IRUPRGLILFDWLRQPXOWLOLQJXDOOLQJXLVWLFVRUWVDUHHYDOXDWHGLQWKUHHOHYHOV RISUHFLVLRQVDVGHILQHGE\WKHDERYHWZRVWDQGDUGV   3ULPDU\OHYHO²$SULPDU\OHYHOVRUWGLVWLQJXLVKHVEHWZHHQEDVHFKDUDFWHUV VXFKDVWKHGLIIHUHQFHEHWZHHQFKDUDFWHUVDDQGE,IDFKDUDFWHULVDQ LJQRUDEOHFKDUDFWHUWKHQLWLVDVVLJQHGDSULPDU\OHYHORUGHU RUZHLJKW RI ]HURZKLFKPHDQVLWFDQEHLJQRUHGGXULQJVRUWLQJFRPSDULVRQDWWKLV SDUWLFXODUOHYHO$QH[DPSOHRIDQLJQRUDEOHFKDUDFWHULVWKHGDVKFKDUDFWHU ¶·,ILWLVLJQRUHGWKHQWKHZRUG¶PXOWLOLQJXDO··FDQEHWUHDWHGWKHVDPHDV ¶PXOWLOLQJXDO·  6HFRQGDU\OHYHO²WKLVLVXVHGWRGLVWLQJXLVKEHWZHHQWKHGLIIHUHQW GLDFULWLFVIRUDJLYHQEDVHFKDUDFWHU)RUH[DPSOHWKHFKDUDFWHUGLIIHUV IURPWKHFKDUDFWHU2RQO\EHFDXVHLWKDVDGLDFULWLF7KXVgDQG2DUHWKH VDPHRQWKHSULPDU\OHYHOEHFDXVHWKH\KDYHWKHVDPHEDVHFKDUDFWHU2 EXWGLIIHURQWKHVHFRQGDU\OHYHO7KHVHFRQGDU\OHYHOPD\EHVSHFLILHGDV FRQVLGHULQJFKDUDFWHUVVWDUWLQJIURPWKHHQGRIWKHVWULQJWRZDUGLWV EHJLQQLQJ7KLVLVUHTXLUHGIRUWKH)UHQFKVRUWRUGHU  7HUWLDU\OHYHO²$WHUWLDU\OHYHOVRUWGLVWLQJXLVKHVEHWZHHQFDVLQJ XSSHU DQGORZHUFDVH RIFKDUDFWHUVWKDWGRQRWGLIIHURQWKHSULPDU\DQG VHFRQGDU\OHYHOV

Sorting Your Linguistic Data Inside the Oracle9i Database Page 11 7KHIROORZLQJH[DPSOHLOOXVWUDWHVKRZDOLVWLVVRUWHGEDVHGRQDOOWKUHHOHYHOV$W WKHSULPDU\OHYHO¶UHVXPH·LVRUGHUHGEHIRUH¶UHVXPHV·DWWKHVHFRQGDU\OHYHO VWULQJVZLWKRXWGLDFULWLFVFRPHEHIRUHVWULQJVZLWKGLDFULWLFVDQGDWWKHWHUWLDU\ OHYHOORZHUFDVHFKDUDFWHUVDUHVRUWHGEHIRUHXSSHUFDVHRQHV resume Resume résumé Résumé resumes

 7KLVWKUHHOHYHODUFKLWHFWXUHHQDEOHVWKH2UDFOHGDWDEDVHWRKDQGOHODQJXDJHVWKDW KDYHFRPSOH[VRUWLQJUXOHVDQGSURYLGHOLQJXLVWLFVRUWLQJVXSSRUWIRUGDWDEDVHV ZLWKPXOWLOLQJXDOGDWD/LQJXLVWLFVRUWVIRU$VLDQODQJXDJHVVXFKDV&KLQHVH -DSDQHVHDQG.RUHDQDUHQRZDYDLODEOHIRUWKHILUVWWLPH,QIDFWIRU&KLQHVHVRUWV ZHQRZRIIHUWKUHHYDULHWLHVRIVRUWVEDVHGRQWKHQXPEHURIVWURNHV3LQ

Sorting Your Linguistic Data Inside the Oracle9i Database Page 12 2UDFOH·VPXOWLOLQJXDOOLQJXLVWLFVRUWVDUHDOOFUHDWHGEDVHGRQWKHGENERIC_M WHPSODWHWKHLUQDPHVDUHSRVWIL[HGZLWK¶_M·WRLQGLFDWHWKDWWKH\DUHPXOWLOLQJXDO VRUWVEDVHGRQWKH,62VWDQGDUG  7KHIROORZLQJLVDFRPSOHWHOLVWRIPXOWLOLQJXDOVRUWVVXSSRUWHGLQ2UDFOHL    *(1(5,&B0 -$3$1(6(B0 63$1,6+B0  .25($1B0 7&+,1(6(B5$',&$/B0  &$1$',$1B0 6&+,1(6(B6752.(B0 7&+,1(6(B6752.(B0 '$1,6+B0 6&+,1(6(B3,1<,1B0 7+$,B0 )5(1&+B0 6&+,1(6(B5$',&$/B0   )RUH[DPSOH2UDFOHLVXSSRUWVDPRQROLQJXDO)UHQFKVRUWFDOOHG¶XFRENCH·

Linguistic Sort Parameters  1/6SDUDPHWHUVDUHXVHGWRGHWHUPLQHWKHORFDOHVSHFLILFEHKDYLRULQ64/TXHULHV 0RVW1/6SDUDPHWHUVFDQEHFRQILJXUHGDWWKHGDWDEDVHVHVVLRQOHYHO6ZLWFKLQJ FXOWXUDOFRQYHQWLRQVLQDGDWDEDVHVHVVLRQLVYLWDOIRUDSSOLFDWLRQVWKDWVXSSRUW PXOWLSOHODQJXDJHVLWDOORZVXVHUVZLWKGLIIHUHQWORFDOHUHTXLUHPHQWVWRFRQQHFWWR WKHVDPHVLQJOHGDWDEDVH  7KHSDUDPHWHU1/6B6257JRYHUQVWKHOLQJXLVWLFVRUWSURSHUW\RIWKHXVHU·V64/ VHVVLRQ

Parameter type String Syntax NLS_SORT = {BINARY | linguistic sort} Default value Derived from NLS_LANGUAGE Parameter scope Initialization Parameter, Environment Variable and ALTER SESSION

Sorting Your Linguistic Data Inside the Oracle9i Database Page 13 Range of values BINARY or any valid linguistic sort definition name

NLS_SORT is implicitly defined by another 7KHSDUDPHWHU1/6B6257VSHFLILHVWKHFROODWLQJVHTXHQFHIRU25'(5%< parameter called NLS_LANGUAGE. The TXHULHV,IWKHYDOXHLV%,1$5<WKHQWKHFROODWLQJVHTXHQFHLVEDVHGRQWKH value of NLS_SORT may change if the QXPHULFFRGHRIWKHFKDUDFWHUVLQWKHXQGHUO\LQJHQFRGLQJVFKHPHGHSHQGLQJRQ value of NLS_LANGUAGE is altered within WKHGDWDW\SHWKLVZLOOHLWKHUEHLQWKHELQDU\VHTXHQFHRUGHURIWKHGDWDEDVH a given user session. FKDUDFWHUVHWRUWKHQDWLRQDOFKDUDFWHUVHW   ,IWKHYDOXHLVDQDPHGOLQJXLVWLFVRUWVRUWLQJLVEDVHGRQWKHRUGHURIWKHGHILQHG VRUW0RVW EXWQRWDOO ODQJXDJHVVXSSRUWHGE\WKH1/6B/$1*8$*(SDUDPHWHU DOVRVXSSRUWDOLQJXLVWLFVRUWZLWKWKHVDPHQDPH  7KHWDEOHEHORZOLVWVWKHGHIDXOW1/6B6257YDOXHIRUHDFKRIWKH2UDFOH ODQJXDJHV  1/6B/$1*8$*( 1/6B6257 $0(5,&$1 ELQDU\ $5$%,& $5$%,& $66$0(6( ELQDU\ %$1*/$ ELQDU\ %(1*$/, %(1*$/, %5$=,/,$132578*8(6( :(67B(8523($1 %8/*$5,$1 %8/*$5,$1 &$1$',$1)5(1&+ &$1$',$1)5(1&+ &$7$/$1 &$7$/$1 &52$7,$1 &52$7,$1 &=(&+ &=(&+ '$1,6+ '$1,6+ '87&+ '87&+ (*<37,$1 $5$%,& (1*/,6+ ELQDU\ (6721,$1 (6721,$1 ),11,6+ ),11,6+ )5(1&+ )5(1&+ *(50$1 *(50$1 *(50$1',1 *(50$1 *5((. *5((. *8-$5$7, ELQDU\ +(%5(: +(%5(: +,1', ELQDU\ +81*$5,$1 +81*$5,$1 ,&(/$1',& ,&(/$1',& ,1'21(6,$1 ,1'21(6,$1 ,7$/,$1 :(67B(8523($1 -$3$1(6( ELQDU\ .$11$'$ ELQDU\ .25($1 ELQDU\ /$7,1$0(5,&$163$1,6+ 63$1,6+ /$79,$1 /$79,$1 /,7+8$1,$1 /,7+8$1,$1 0$/$< 0$/$< 0$/$<$/$0 ELQDU\ 0$5$7+, ELQDU\

Sorting Your Linguistic Data Inside the Oracle9i Database Page 14 0(;,&$163$1,6+ :(67B(8523($1 125:(*,$1 125:(*,$1 25,<$ ELQDU\ 32/,6+ 32/,6+ 32578*8(6( :(67B(8523($1 381-$%, ELQDU\ 520$1,$1 520$1,$1 5866,$1 5866,$1 6,03/,),('&+,1(6( ELQDU\ 6/29$. 6/29$. 6/29(1,$1 6/29(1,$1 63$1,6+ 63$1,6+ 6:(',6+ 6:(',6+ 7$0,/ ELQDU\ 7(/8*8 ELQDU\ 7+$, 7+$,B',&7,21$5< 75$',7,21$/&+,1(6( ELQDU\ 785.,6+ 785.,6+ 8.5$,1,$1 8.5$,1,$1 9,(71$0(6( 9,(71$0(6(  ,QJHQHUDOWKHELQDU\VRUWUHTXLUHVOHVVV\VWHPRYHUKHDGWKLVLVEHFDXVHVWDQGDUG 2UDFOHLQGH[HVEXLOWDFFRUGLQJWRWKHELQDU\RUGHURIWKHNH\VDUHVPDOOHUWKDQWKH OLQJXLVWLFLQGH[HVGHVFULEHGLQWKHVHFWLRQ´8VLQJ/LQJXLVWLF,QGH[µ 

The NLS_LANG environment variable can 7KHH[DPSOHVEHORZLOOXVWUDWHWKHGLIIHUHQFHVEHWZHHQDELQDU\VRUWDPRQROLQJXDO also influences the NLS_SORT behavior. 6ZHGLVKOLQJXLVWLFVRUWDQGDPXOWLOLQJXDO*(1(5,&B0OLQJXLVWLFVRUW NLS_SORT will be changed to the default value assigned to a given NLS_LANGUAGE Example 1. Binary Sort parameter, as defined by the component of the NLS_LANG environment ALTER SESSION SET NLS_SORT=BINARY; variable. It is recommended for users to set the NLS_SORT parameter explicitly, to SELECT product_name ensure the correct linguistic sequence is FROM product being used for a given session. ORDER BY product_name; 

PRODUCT NAME ------Antenne Lcd aerial Ähre ächzen  Example 2. Monolingual Swedish Sort ALTER SESSION SET NLS_SORT=SWEDISH;

Sorting Your Linguistic Data Inside the Oracle9i Database Page 15 SELECT product_name FROM product ORDER BY product_name;

PRODUCT NAME ------aerial Antenne Lcd ächzen Ähre

Example 3. Multilingual GENERIC_M Sort

SELECT product_name FROM product ORDER BY NLSSORT(product_name, ’NLS_SORT=GENERIC_M’);

PRODUCT NAME ------ächzen aerial Ähre Antenne Lcd

 :KHQXVLQJFRPSDULVRQRSHUDWRUVFKDUDFWHUVDUHFRPSDUHGDFFRUGLQJWRWKHLU ELQDU\FRGHVLQWKHGHVLJQDWHGHQFRGLQJVFKHPH$FKDUDFWHULVJUHDWHUWKDQ DQRWKHULILWKDVDKLJKHUELQDU\FRGH6LQFHWKHELQDU\VHTXHQFHRIFKDUDFWHUVPD\ QRWPDWFKWKHOLQJXLVWLFVHTXHQFHIRUDSDUWLFXODUODQJXDJHVXFKFRPSDULVRQVPD\ QRWEH OLQJXLVWLFDOO\FRUUHFW 7KH64/1/66257IXQFWLRQDOORZVVXFK FRPSDULVRQVWRUHIOHFWOLQJXLVWLFFRQYHQWLRQV Example 4. BINARY comparison

ALTER SESSION SET NLS_SORT=GERMAN;

SELECT product_name FROM product WHERE product_name > 'Antenne' ORDER BY product_name;

PRODUCT NAME ------

Sorting Your Linguistic Data Inside the Oracle9i Database Page 16 ächzen aerial Ähre Lcd

Example 5. GERMAN linguistic sensitive comparison

ALTER SESSION SET NLS_SORT=GERMAN;

SELECT product_name The SQL NLSSORT function has to be FROM product added on both sides of the comparison operator. WHERE NLSSORT(product_name) > NLSSORT('Antenne')  ORDER BY product_name;

PRODUCT NAME ------Lcd

8VLQJ1/66257IXQFWLRQLQ64/VWDWHPHQWVFDQEHFXPEHUVRPH$QHZ1/6 SDUDPHWHU1/6B&203ZDVLQWURGXFHGLQ2UDFOHLWRLQGLFDWHWKDWWKH FRPSDULVRQPXVWEHOLQJXLVWLFDOO\VHQVLWLYHDFFRUGLQJWRWKH1/6B6257VHVVLRQ SDUDPHWHU

Parameter type String Syntax NLS_COMP = {BINARY | ANSI} Default value BINARY Parameter scope Initialization Parameter, Environment Variable and ALTER SESSION Range of values BINARY or ANSI

HJExample 5. GERMAN linguistic sensitive comparisonDERYHFDQEHUHZULWWHQXVLQJ ALTER SESSION SET NLS_SORT=GERMAN; ALTER SESSION SET NLS_COMP=ANSI;

SELECT product_name FROM product WHERE product_name > 'Antenne' ORDER BY product_name;

PRODUCT NAME ------Lcd

Sorting Your Linguistic Data Inside the Oracle9i Database Page 17 1RWH1/6B&203DQG1/6B6257DIIHFWWKHIROORZLQJ64/RSHUDWLRQVRQO\  WHERE, ORDER BY, START WITH, IN/NOT IN, BETWEEN, CASE WHEN DQG HAVING. 7KHRWKHU64/RSHUDWRUVZLOOFRPSDUHLQELQDU\PRGHRQO\7RHQDEOHOLQJXLVWLF VHQVLWLYHFRPSDULVRQVNLSSORT IXQFWLRQVPXVWEHDGGHGWRWKHVHRSHUDWRUV

Using a Linguistic Index

/LQJXLVWLFVRUWLQJLVODQJXDJHVSHFLILFDQGUHTXLUHVPRUHGDWDSURFHVVLQJWKDQ ELQDU\VRUWLQJ%LQDU\VRUWLQJLVIDVWEHFDXVHLWLVLQWKHRUGHURIWKHFKDUDFWHUVHW HQFRGLQJ:KHQGDWDRIPXOWLSOHODQJXDJHVDUHVWRUHGLQWKHGDWDEDVH\RXPD\ ZDQW\RXUDSSOLFDWLRQVWRFROODWHDUHVXOWVHWUHWXUQHGIURPD6(/(&7VWDWHPHQW XVLQJWKH25'(5%<FODXVHZLWKGLIIHUHQWOLQJXLVWLFVHTXHQFHVEDVHGXSRQWKH ODQJXDJHEHLQJXVHGDQGZLWKRXWWKHSHUIRUPDQFHSHQDOW\DVVRFLDWHGZLWK OLQJXLVWLFVRUWLQJ7KLVFDQEHDFFRPSOLVKHGE\XVLQJOLQJXLVWLFLQGH[HV/LQJXLVWLF LQGH[LVDYDULDWLRQRIWKHIXQFWLRQEDVHGLQGH[ 7KHUHDUHWKUHHDSSURDFKHVWRVHWXSOLQJXLVWLFLQGH[HVWRVRUW\RXUODQJXDJHGDWD %XLOGDOLQJXLVWLFLQGH[IRUHDFKODQJXDJHWKDWWKHDSSOLFDWLRQQHHGVWRVXSSRUW 7KLVDSSURDFKRIIHUVVLPSOLFLW\EXWUHTXLUHVPRUHGLVNVSDFH)RUHDFKLQGH[WKH URZVLQWKHODQJXDJHRWKHUWKDQWKHRQHRQZKLFKWKHLQGH[LVEXLOWDUHFROODWHG WRJHWKHUDWWKHHQGRIWKHVHTXHQFH7KHIROORZLQJH[DPSOHEXLOGVOLQJXLVWLF LQGH[HVIRUVRUWLQJ)UHQFKDQG*HUPDQGDWD CREATE INDEX french_index ON product (NLSSORT(product_name, 'NLS_SORT=FRENCH'));

CREATE INDEX german_index ON product (NLSSORT(product_name, 'NLS_SORT=GERMAN'));  7KHLQGH[LVVHOHFWHGE\WKH64/RSWLPL]EDVHGRQWKH1/6B6257VHVVLRQ SDUDPHWHURUWKHDUJXPHQWVRIWKH1/66257IXQFWLRQWKDWLVVSHFLILHGLQWKH 25'(5%<FODXVH)RUH[DPSOHLIWKHVHVVLRQYDULDEOH1/6B6257LVVHWWR )5(1&+IUHQFKBLQGH[ZLOOEHVHOHFWHGDQGZKHQLWLVVHWWR*(50$1WKH JHUPDQBLQGH[ZLOOEHXVHG  %XLOGDVLQJOHOLQJXLVWLFLQGH[IRUDOOODQJXDJHVXVLQJDPXOWLOLQJXDOOLQJXLVWLFVRUW VXFKDV*(1(5,&B0RU)5(1&+B07KLVLQGH[FROODWHVFKDUDFWHUVDFFRUGLQJ WRWKHFKDUDFWHUUXOHVGHILQHGLQWKH,62VWDQGDUG  CREATE INDEX generic_index on product (NLSSORT(product_name, 'NLS_SORT=GENERIC_M');

Sorting Your Linguistic Data Inside the Oracle9i Database Page 18 7KHLQGH[LVDXWRPDWLFDOO\SLFNHGXSLIWKH1/6B6257VHVVLRQSDUDPHWHURUWKH DUJXPHQWRIWKH1/66257IXQFWLRQ\RXVSHFLILHGLQWKH25'(5%<FODXVHLV HTXDOWRWKHVRUWQDPHXVHGLQWKHLQGH[GHILQLWLRQ7KLVDSSURDFKLVXVHIXOLIWKH ODQJXDJHVWKDW\RXQHHGWRVXSSRUWDUHFRYHUHGE\DJLYHQPXOWLOLQJXDOOLQJXLVWLF VRUW  %XLOGDVLQJOHOLQJXLVWLFLQGH[IRUDOOODQJXDJHV7KLVFDQEHDFFRPSOLVKHGE\ LQFOXGLQJDODQJXDJHFROXPQLQ\RXUWDEOH VXFKDV/$1*B&2/LQWKHH[DPSOH 64/EHORZ WREHXVHGDVDSDUDPHWHURIWKH1/66257IXQFWLRQ7KHODQJXDJH

When creating an index, the total length of FROXPQFRQWDLQVWKH1/6B6257YDOXHVIRUWKHGDWDLQWKHFROXPQRQZKLFKWKH the index key cannot exceed a certain LQGH[LVEXLOW7KHIROORZLQJH[DPSOHEXLOGVDVLQJOHOLQJXLVWLFLQGH[IRUPXOWLSOH value. This value depends primarily on the ODQJXDJHV:LWKWKLVLQGH[WKHURZVZLWKWKHVDPHYDOXHIRU1/6B6257DUH DB_BLOCK_SIZE. If an attempt is made to FROODWHGFRUUHFWO\UHODWLYHWRHDFKRWKHU5RZVZLWKGLIIHUHQWYDOXHVFDQQRWEH create an index with a key larger than the PHDQLQJIXOO\FRPSDUHG maximum value, an “ORA-1450 maximum  key length exceeded” error is raised. CREATE INDEX nls_index ON product

(NLSSORT(product_name, 'NLS_SORT=' || LANG_COL)); The maximum allowable length of the index key for 2K block is 758, 4K block is 1578,  8K block is 3218 and 16K block is 6498. 7KHQOVBLQGH[ZLOOEHXVHGRQO\LIWKHTXHU\H[SOLFLWO\VSHFLILHVNLSSORT (product_name, 'NLS_SORT=' || LANG_COL)LQWKHORDER BYFODXVH $VZLWKRWKHUIXQFWLRQEDVHGLQGH[HVEXLOGLQJFRPSRVLWHOLQJXLVWLFLQGH[HVDUH DOVRVXSSRUWHG  )RUH[DPSOH CREATE INDEX german_index ON product (NLSSORT(product_name, 'NLS_SORT=GERMAN’), NLSSORT(company_name, 'NLS_SORT=GERMAN'));

 ,QIDFWWKHUXOHEDVHGRSWLPL]HUFDQXVHWKHQRQIXQFWLRQDOSUHIL[RIDFRPSRVLWH OLQJXLVWLFLQGH[LIWKHUHLVRQHLHLIWKHDERYHLQGH[ZDVPRGLILHGWRLQFOXGHWKH PRGHOQXPEHU  CREATE INDEX german_index ON product (model_number, NLSSORT(product_name, 'NLS_SORT=GERMAN’), NLSSORT(company_name, 'NLS_SORT=GERMAN'));  WKHQDUXOHEDVHGTXHU\FDQWDNHDGYDQWDJHRIWKLVFRPSRVLWHOLQJXLVWLFLQGH[DV ZHOO

Sorting Your Linguistic Data Inside the Oracle9i Database Page 19 Requirements for Using Linguistic Indexes  :KHWKHU\RXGHFLGHWRXVHDVLQJOHOLQJXLVWLFLQGH[RUPXOWLSOHOLQJXLVWLFLQGH[HV VRPHUHTXLUHPHQWVPXVWEHPHWIRUWKHOLQJXLVWLFLQGH[WREHXVHG  6HWQUERY_REWRITE_ENABLEDVHVVLRQSDUDPHWHUWRTRUE 2. 6HWQUERY_REWRITE_INTEGRITY=TRUSTED RUJUHDWHU 3. (QVXUHWKDWWKH COMPATIBLE IODJLVVHWWRRUJUHDWHU

7KHVHWKUHHLQLWLDOL]DWLRQSDUDPHWHUVHWWLQJVDUHUHTXLUHGIRUDOOIXQFWLRQ EDVHGLQGH[HV  6HW NLS_SORT aSSURSULDWHO\

7KHIRXUWKUHTXLUHPHQWLVWKDWWKHNLS_SORTSDUDPHWHUIRUWKHTXHU\ VKRXOGLQGLFDWHWKHOLQJXLVWLFGHILQLWLRQWKDWZDVVSHFLILHGLQWKHLQGH[ &5($7(VWDWHPHQW,WFDQEHVSHFLILHGLPSOLFLWO\ LILWZDVFRQVWDQW RU GLUHFWO\DVWKHVHFRQGDUJXPHQWWRWKH1/66257IXQFWLRQ  6HW2SWLPL]HUPRGHWR FIRST_ROWS

8VHWKHFRVWEDVHGRSWLPL]HUZLWKWKHRSWLPL]HUPRGHVHWWR ),567B52:6EHFDXVHIXQFWLRQEDVHGLQGH[HVDUHQRWUHFRJQL]HGE\WKH UXOHEDVHGRSWLPL]HU  )RUFXVWRPHUVUXQQLQJRQSUH2UDFOHL5GDWDEDVHV\RXZLOODOVRQHHGWRVSHFLI\ DQH[WUDGXPP\WHEREFODXVHWRWULJJHUWKHIXQFWLRQEDVHGLQGH[  WHERE NLSSORT(column_name) IS NOT NULL  ZKHQ\RXZDQWWRXVHORDER BY column_nameZKHUHWKHcolumn_nameLVWKH FROXPQZLWKWKHOLQJXLVWLFLQGH[7KLVLVQHFHVVDU\RQO\ZKHQ\RXXVHDQORDER BY FODXVH7KLVGXPP\WHEREFODXVHLVQRWQHHGHGLQL5LIcolumn_nameKDVDOUHDG\ EHHQGHILQHGDVD12718//FROXPQ  +HUHLVDQH[DPSOHRQKRZWRFUHDWHDOLQJXLVWLFLQGH[FDOOHG1/6B*(1(5,& EDVHGRQWKHPXOWLOLQJXDOOLQJXLVWLFVRUW*(1(5,&B0RQWKHPRODUCTWDEOH  ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE; ALTER SESSION SET QUERY_REWRITE_INTEGRITY=TRUSTED;

CREATE INDEX NLS_GENERIC ON product (NLSSORT(product_name, 'NLS_SORT=GENERIC_M'));

Sorting Your Linguistic Data Inside the Oracle9i Database Page 20 ALTER SESSION SET NLS_SORT=GENERIC_M; ALTER SESSION SET OPTIMIZER_MODE=FIRST_ROWS; Even if all the linguistic index requirements are met, it is possible that the optimizer will choose not to use the linguistic index SELECT * FROM product because there are cheaper plans available. WHERE NLSSORT(product_name) IS NOT NULL – Not needed in 9iR2 ORDER BY product_name; Adding the hint /*+ index(table indexname) */ will cause the cost based optimizer to be WHERE NOT used and will cause the index to be used if 7KH FODXVHLVQRWQHHGHGLQL5LISURGXFWBQDPHKDVEHHQGHILQHGDVD NULL there is any legal query plan that allows it FROXPQ

Case Insensitive Search  7KH64/NLS_UPPERDQGNLS_LOWERIXQFWLRQVSHUIRUPOLQJXLVWLFVHQVLWLYHFDVLQJRI VWULQJVEDVHGRQDJLYHQOLQJXLVWLFVRUWGHILQLWLRQ7KLVDOORZVXVWRSHUIRUPFDVH LQVHQVLWLYHVHDUFKHVUHJDUGOHVVRIWKHODQJXDJHEHLQJXVHG  SELECT product_name FROM product WHERE NLS_UPPER(product_name, ’NLS_SORT = XGERMAN’) = ’GROSSE’;

PRODUCT NAME ------große Große GROSSE  -XVWOLNHIRUOLQJXLVWLFVRUWVDIXQFWLRQEDVHGLQGH[FDQEHEXLOWWRLPSURYHWKH SHUIRUPDQFHRIFDVHLQVHQVLWLYHVHDUFKHV)RUH[DPSOH

CREATE INDEX case_insensitive_index ON product (NLS_UPPER(product_name));  7KLVLQGH[ZLOOEHXWLOL]HGZKHQHYHUDNLS_UPPER ()VWULQJFRPSDULVRQLV SHUIRUPHGRQWKHFROXPQSURGXFWBQDPH

SELECT * FROM PRODUCT WHERE NLS_UPPER(product_name) = ’GROSSE’;

Sorting Your Linguistic Data Inside the Oracle9i Database Page 21

GENERIC_BASELETTER Sort  ,QVWHDGRIXVLQJWKH64/NLS_UPPERDQGNLS_LOWERIXQFWLRQVWRSHUIRUPFDVH LQVHQVLWLYHTXHULHVDQDOWHUQDWLYHDSSURDFKVWDUWLQJIURP2UDFOHL5HOHDVH  LV WRXWLOL]HWKHOLQJXLVWLFVRUWGENERIC_BASELETTER. GENERIC_BASELETTERJURXSVDOO FKDUDFWHUVWRJHWKHUEDVHGRQWKHLUEDVHOHWWHUYDOXHVWKLVLVDFKLHYHGE\LJQRULQJ WKHLUFDVHDQGWKHGLDFULWLFGLIIHUHQFHV  +HUHLVDQH[DPSOHRIDGENERIC_BASELETTERTXHU\  ALTER SESSION SET NLS_COMP=ANSI; GENERIC_BASELETTER search is not a ALTER SESSION SET NLS_SORT=GENERIC_BASELETTER; linguistic sensitive search; it is not based on any specific language. The sort defines the base letters of the underlying SELECT * FROM PRODUCT characters only; hence it simulates the WHERE PRODUCT_NAME = ’database’; behavior of a case and accent insensitive linguistic sort. DATABASE  Database database dätäbase

Customization of Linguistic Sorts  $FRPSUHKHQVLYHFROOHFWLRQRIOLQJXLVWLFVRUWVKDVEHHQSURYLGHGLQ2UDFOHLWR PHHWWKHGHPDQGRIRXUFXVWRPHUVZLWKLQFUHDVLQJGLIIHUHQWODQJXDJHQHHGV +RZHYHUWKHUHZLOODOZD\VEHQHZVRUWLQJUHTXLUHPHQWVGXHWRFKDQJHVLQFXOWXUDO FRQYHUVLRQVRUWKHUHVXOWRIHPHUJLQJQHZLQGXVWU\VWDQGDUGV VXFKDVWKH,62DQG 8QLFRGHVWDQGDUG VRPHWLPHVLWPD\EHQHFHVVDU\WRFXVWRPL]HDOLQJXLVWLFVRUWLQ DZD\VXFKWKDWLWLVFRQVLVWHQWZLWKWKHDSSURDFKDGRSWHGE\RWKHUYHQGRUVLQ RUGHUWREHFRPSDWLEOHZLWKRWKHUVRIWZDUHSURGXFWVDFURVVGLIIHUHQWSODWIRUPV  2UDFOH/RFDOH%XLOGHULVD*8,WRROLQWURGXFHGLQ2UDFOHLIRUFRQILJXULQJORFDOH GDWDGHILQLWLRQV,WSURYLGHVDQHDV\WRXVHJUDSKLFDOLQWHUIDFHWKURXJKZKLFKXVHU FDQHDVLO\YLHZH[LVWLQJ2UDFOHORFDOHGHILQLWLRQVFXVWRPL]HWKHPRUFUHDWHQHZ GHILQLWLRQV)RUOLQJXLVWLFVRUWGHILQLWLRQV/RFDOH%XLOGHUJLYHVDJUDSKLFDO UHSUHVHQWDWLRQRIWKHVRUWRUGHUWKDWLVLQWXLWLYHIRUERWKYLHZLQJDQG FXVWRPL]DWLRQSXUSRVHV2UDFOH/RFDOH%XLOGHUVXSSRUWVWKHUHDUUDQJHPHQWRI FKDUDFWHUVLQGLIIHUHQWVHTXHQFHRUGHUVDQGSURYLGHVFXVWRPL]DWLRQRIFRQWUDFWLQJ DQGH[SDQGLQJFKDUDFWHUVFRQWH[WVHQVLWLYHFKDUDFWHUVDVZHOODVVXSSOHPHQWDU\ FKDUDFWHUV 

Sorting Your Linguistic Data Inside the Oracle9i Database Page 22 Screen shot of the Oracle Locale Builder.             3OHDVHUHIHUWR&KDSWHU&XVWRPL]LQJ/RFDOH'DWDLQWKH2UDFOHL'DWDEDVH *OREDOL]DWLRQ6XSSRUW*XLGHIRUPRUHLQIRUPDWLRQ 

SUMMARY  6RUWLQJ\RXUGDWDLQDOLQJXLVWLFVHQVLWLYHPDQQHULVDQLPSRUWDQWSDUWRIGDWD SURFHVVLQJLQDJOREDOO\GHSOR\HGDSSOLFDWLRQ3UHVHQWLQJGDWDQRWVRUWHGLQWKH OLQJXLVWLFVHTXHQFHWKDW\RXUXVHUVDUHDFFXVWRPHGWRFDQPDNHVHDUFKLQJIRU LQIRUPDWLRQGLIILFXOWDQGWLPHFRQVXPLQJ  :LWKWKH2UDFOHLGDWDEDVH2UDFOHKDVH[SDQGHGWKHFRYHUDJHRIELQDU\VRUWV OLQJXLVWLFVRUWVDQGLQWURGXFHGWKHQHZPXOWLOLQJXDOOLQJXLVWLFVRUWVWRPHHWWKH GHPDQGVRIFXVWRPHUVZKRQHHGWRVHDUFKDQGVRUWGDWDLQPXOWLSOHODQJXDJHV 2UDFOHFRQWLQXHVWRHQKDQFHOLQJXLVWLFVRUWFRYHUDJHDQGSURYLGHVFRQIRUPDQFHWR LQWHUQDWLRQDOVWDQGDUGVE\VXSSRUWLQJWKH,62,QWHUQDWLRQDOVWULQJRUGHULQJ VWDQGDUGDQGWKH8QLFRGHFROODWLRQVWDQGDUG 8&$   2UDFOHLGDWDEDVHRIIHUVDFRPSUHKHQVLYHVHOHFWLRQRIOLQJXLVWLFVRUWVVXSSRUWLQJ RYHUPRQROLQJXDOOLQJXLVWLFVRUWVDQGPXOWLOLQJXDOOLQJXLVWLFVRUWV7KRVH FXVWRPHUVZLWKVSHFLDOUHTXLUHPHQWVWKDWJREH\RQGWKHH[WHQVLYHVHWRIOLQJXLVWLF VRUWVSURYLGHGE\2UDFOHLKDYHWKHIOH[LELOLW\RIFXVWRPL]LQJDQGGHILQLQJWKHLU RZQOLQJXLVWLFVRUWVE\XVLQJWKH2UDFOH/RFDOH%XLOGHUDQHZHDV\WRXVH*8, WRROWKDWDOORZVWKHPWRYLHZH[LVWLQJDQGFUHDWHQHZOLQJXLVWLFVRUWV 

Sorting Your Linguistic Data Inside the Oracle9i Database Page 23 Sorting your data inside the Oracle database August 2002 Author: Simon Law Contributing Authors: Sergiusz Wolicki, Claire Ho, Barry Trute

Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 ..A.

Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 www.oracle.com

Oracle Corporation provides the software that powers the internet.

Oracle is a registered trademark of Oracle Corporation. Various product and service names referenced herein may be trademarks of Oracle Corporation. All other product and service names mentioned may be trademarks of their respective owners.

Copyright © 2000 Oracle Corporation All rights reserved.