6RUWLQJ 6RUWLQJ ,QWURGXFWLRQ 7KH:RUOGRI6RUWLQJ5XOHV :HVWHUQ(XURSHDQ/DQJXDJHV $VLDQ,GHRJUDSKV ,62,(&²,QWHUQDWLRQDO6WULQJ2UGHULQJ 6RUWLQJLQVLGHWKH2UDFOH'DWDEDVH %LQDU\6RUW 0RQROLQJXDO/LQJXLVWLF6RUW 0XOWLOLQJXDO/LQJXLVWLF6RUW /LQJXLVWLF6RUW3DUDPHWHUV 8VLQJD/LQJXLVWLF,QGH[ 5HTXLUHPHQWVIRU8VLQJ/LQJXLVWLF,QGH[HV &DVH,QVHQVLWLYH6HDUFK *(1(5,&B%$6(/(77(56RUW &XVWRPL]DWLRQRI/LQJXLVWLF6RUWV 6XPPDU\ Sorting Your Linguistic Data Inside the Oracle9i Database Page 2 6RUWLQJ INTRODUCTION 6RUWLQJFKDUDFWHUVWULQJVFDQEHDQH[WUHPHO\LQWULFDWHRSHUDWLRQWKHFRPSOH[LW\RI ZKLFKPD\QRWEHDSSDUHQWWRPRVWXVHUV'LIIHUHQWODQJXDJHVKDYHWKHLURZQ VRUWLQJUXOHVVRPHODQJXDJHVDUHFROODWHGDFFRUGLQJWRWKHOHWWHUVHTXHQFHLQWKH DOSKDEHWVRPHDFFRUGLQJWRWKHQXPEHURIVWURNHFRXQWVLQWKHOHWWHUDQGWKHUH DUHHYHQODQJXDJHVZKLFKDUHRUGHUHGE\WKHSURQXQFLDWLRQRIWKHZRUGV 7UHDWPHQWRIOHWWHUDFFHQWVGLIIHUVDPRQJODQJXDJHVDVZHOO 6RUWLQJLVQRWVWUDLJKWIRUZDUGIRUGD\WRGD\(QJOLVKXVDJHHLWKHU,I\RXORRNXS ZRUGVLQDQ(QJOLVKGLFWLRQDU\\RXZLOOSUREDEO\ILQGWKDWXSSHUDQGORZHUFDVH FKDUDFWHUVDUHPL[HGWRJHWKHULQWHUPVRIRUGHULQJ/RRNLQJIRUQDPHVLQWKH WHOHSKRQHGLUHFWRU\\RXPD\ILQGWKDWFHUWDLQZRUGVPD\EHWUHDWHGWKHVDPHIRU H[DPSOHWKHSUHIL[0DFDQG0FDUHJURXSHGWRJHWKHU 6RUWLQJFDQEHIXUWKHUFRPSOLFDWHGZKHQ\RXQHHGWRVRUWGDWDIURPPRUHWKDQ RQHODQJXDJHKRZGR\RXKDQGOHWKHFDVHZKHQRQHODQJXDJHGHPDQGVWKDWD JLYHQOHWWHULVVRUWHGDIWHUWKHOHWWHU]ZKLOHWKHVDPHOHWWHULQDQRWKHUODQJXDJH UHTXLUHVLWWREHFROODWHGEHIRUHWKHOHWWHUD" ,QWKHUHVWRIWKLVSDSHUZHZLOOXVHWKHWHUP¶OLQJXLVWLFVRUW·RUFROODWLRQWRGHVFULEH WKHFXOWXUDOO\H[SHFWHGRUGHULQJRIHOHPHQWVRIDWH[WOLVWLIWKLVOLVWLVFRQVLGHUHG VRUWHGLQDJLYHQODQJXDJH$QDWLYHVSHDNHURIWKHODQJXDJHH[SHFWVWRORFDWHDQ HOHPHQWRIWKHOLVWUHODWLYHWRRWKHUHOHPHQWVEDVHGRQWKLVRUGHULQJ)RUH[DPSOH (QJOLVKVSHDNLQJXVHUVZRXOGH[SHFWWRILQGDZRUGEHJLQQLQJZLWK%WRFRPH DIWHUDOOZRUGVVWDUWLQJZLWK$DQGEHIRUHDOOZRUGVEHJLQQLQJZLWK&LQDQRUGHUHG OLVWRI(QJOLVKZRUGV 3UHVHQWLQJGDWDQRWVRUWHGLQWKHOLQJXLVWLFVHTXHQFHWKDW\RXUXVHUVDUH DFFXVWRPHGWRFDQPDNHVHDUFKLQJIRULQIRUPDWLRQGLIILFXOWDQGWLPHFRQVXPLQJ :LWKWKH2UDFOHLGDWDEDVHZHKDYHH[SDQGHGRXUFRYHUDJHRIELQDU\VRUWV OLQJXLVWLFVRUWVDQGDGGHGWKHQHZPXOWLOLQJXDOOLQJXLVWLFVRUWVWRPHHWWKHGHPDQGV RIFXVWRPHUVZKRQHHGWRVHDUFKDQGVRUWGDWDLQPXOWLSOHODQJXDJHV7KLVSDSHU RXWOLQHVWKHEDVLFFRQFHSWRIOLQJXLVWLFVRUWSURYLGHVH[DPSOHVRIKRZVRUW SURSHUWLHVLQIOXHQFHWKHFROODWLRQRUGHURIGLIIHUHQWODQJXDJHVDQGH[SODLQVKRZ Sorting Your Linguistic Data Inside the Oracle9i Database Page 3 \RXFDQFXVWRPL]HWKHZD\\RXUGDWDLVVRUWHGLQVLGHDQ2UDFOHGDWDEDVH THE WORLD OF SORTING RULES 7KHVHFWLRQDERYHPHQWLRQHGVRPHRIWKHLQWULFDFLHVLQYROYHGZKHQVRUWLQJGDWDLQ DFXOWXUDOO\H[SHFWHGRUGHU7KLVVHFWLRQZLOOJRRYHUWKHPLQPRUHGHWDLOVKRZLQJ WKDWHYHU\ZULWLQJVFULSW HJ/DWLQ*UHHN&\ULOOLFHWF KDVDQRUGHULQJRILWVRZQ DQGVRPHZULWLQJVFULSWVKDYHPDQ\FRQIOLFWLQJRUGHULQJVIRUWKHGLIIHUHQW ODQJXDJHVWKDWWKH\VXSSRUW Western European Languages ,Q*HUPDQLVRQHFKDUDFWHUEXWLWLVWUHDWHGDVGRXEOHV¶VV·IRURUGHULQJ SXUSRVHV*HUPDQVSHDNHUVDUHDOVRDFFXVWRPHGWRWUHDWLQJlDQG|DVHTXLYDOHQW WRXHDHDQGRHUHVSHFWLYHO\7KHVHWKUHHFKDUDFWHUVDUHVRUWHGLQWKHVDPHRUGHUDV WKHYRZHOSDLUV 6SDLQWUDGLWLRQDOO\WUHDWVFKOODVZHOODVxDVOHWWHUVRIWKHLURZQRUGHUHGDIWHUFO DQGQUHVSHFWLYHO\)RUH[DPSOHWKHIROORZLQJ6SDQLVKZRUGVZRXOGEHVRUWHGDV OLVWHG cabalmente, caballa, cantina, caña, clamar, curador, chácara 5HFHQWO\WKLVWUDGLWLRQDO6SDQLVKVRUWLQJSUDFWLFHKDVEHHQUHSODFHGZLWKWKH PRGHUQ6SDQLVKVRUWZKLFKUHPRYHVWKHVSHFLDOVWDWXVRIFKDQGOO ,QWKH'DQLVKDOSKDEHWWKUHHDGGLWLRQDOOHWWHUVQDPHO\ ¡DQGnVRUWDIWHU],W DOVRWUHDWVWKHOHWWHUFRPELQDWLRQDDDVDVHSDUDWHOHWWHURUGHUHGODVWLQWKHDOSKDEHW DIWHUnDQGDOVRFRQVLGHUVOHWWHUVDQG\DVYDULDQWVRIWKHVDPHOHWWHU'XHWRWKHVH UXOHVWKHIROORZLQJFLWLHV=ULFK$DUILWDQGcUEXVZLOODOODSSHDUDIWHU=\UDUGRZLQ DOLVWRIFLWLHV ,Q)UHQFKDFFHQWHGYRZHOVDUHILUVWWUHDWHGOLNHXQDFFHQWHGYRZHOVIURPWKHSRLQW RIYLHZRIWKHLUEDVHVRUWLQJRUGHU$IWHUWKHWH[WLVVRUWHGZLWKRXWFRQVLGHULQJ DFFHQWVLWLVIXUWKHUVRUWHGVRWKDWZLWKLQHDFKYRZHOVHWWKHRUGHULVQRDFFHQW DFXWHDFFHQWJUDYHDFFHQWFLUFXPIOH[DFFHQWIROORZHGE\XPODXW7RPDNHWKLV HYHQPRUHFRPSOH[DFFHQWHGYRZHOVDUHHYDOXDWHGIURPULJKWWROHIWZKLOHQRQ DFFHQWHGEDVHFKDUDFWHUVDUHZHLJKWHGIURPOHIWWRULJKW7KXVËGLWFRPHVEHIRUH (GtWZKHQFROODWLQJWKHWZRVWULQJVXVLQJD)UHQFKOLQJXLVWLFVRUW $FFHQWHGOHWWHUVLQWURGXFHRWKHUFRPSOH[LW\DVWKH\FDQKDYHGLIIHUHQWPHDQLQJV DFURVVODQJXDJHVDQGVRPHWLPHVHYHQZLWKLQWKHVDPHODQJXDJH&RPSOLFDWLRQ DULVHVZKHQGHDOLQJZLWKPXOWLQDWLRQDOFRGLQJVFKHPHVVXFKDV8QLFRGHWKDW VXSSRUWVHYHUDOODQJXDJHVVLQFHWKHVHODQJXDJHVPD\FRQWDLQFRQIOLFWLQJDOSKDEHWLF Sorting Your Linguistic Data Inside the Oracle9i Database Page 4 UXOHVIRUWKHVDPHOHWWHU)RUH[DPSOHWKHOHWWHUl DZLWKDQXPODXW LVVRUWHG EHIRUHELQ*HUPDQEXWLWLVVRUWHGDIWHU]LQ6ZHGLVK7KXVLIWKHVRUWHGWH[W EHORQJVWRPXOWLSOHODQJXDJHVLWXVXDOO\FDQQRWEHVRUWHGFRUUHFWO\IURPWKHSRLQW RIYLHZRIDOORIWKHP Asian Ideographs $V\RXFDQVHHFXOWXUDOH[SHFWDWLRQVYDU\DJUHDWGHDOEHWZHHQODQJXDJHVDQGWKH SUHYLRXVH[DPSOHVDUHMXVWIRU:HVWHUQ(XURSHDQODQJXDJHVXVLQJWKHVDPH/DWLQ ZULWLQJVFULSW,IGHDOLQJZLWKWROHWWHUVLQDQDOSKDEHWLVTXLWHDFKDOOHQJH WKHQZRUNLQJZLWK$VLDQLGHRJUDSKVFDQEHDQH[WUHPHO\GDXQWLQJWDVN)RUWKH &KLQHVHODQJXDJHDORQHWKHUHDUHRYHUFKDUDFWHUV7KHVRUWLQJUXOHVDOVR YDU\EHWZHHQ6LPSOLILHG XVHGLQ&KLQDDQG6LQJDSRUH DQG7UDGLWLRQDO XVHGLQ 7DLZDQ+RQJ.RQJ 0DFDX &KLQHVHZULWLQJVFULSWV +HUHDUHVRPHH[DPSOHVRIWKHPRVWFRPPRQFROODWLRQPHWKRGVXVHGIRUVRUWLQJ &KLQHVHFKDUDFWHUV 6WURNHFRXQW7KLVLVRQHRIWKHPRVWFRPPRQZD\VWRVRUW&KLQHVHFKDUDFWHUV &KLQHVHFKDUDFWHUVDUHFRPSRVHGRIDUDGLFDO EDVHHOHPHQW DQG]HURRUPRUH FRPSRQHQWV%RWKWKHUDGLFDODQGWKHFRPSRQHQWVDUHFRPSRVHGRIVWURNHV7KH QXPEHURIVWURNHVYDULHVEHWZHHQFKDUDFWHUV&KDUDFWHUVZLWKLGHQWLFDOVWURNH FRXQWVZLOOEHIXUWKHUVRUWHGE\WKHUDGLFDO 7KLVH[DPSOHLOOXVWUDWHVKRZWKUHH&KLQHVHFKDUDFWHUVDUHRUGHUHGEDVHGRQWKHLU VWURNHFRXQWV +RZHYHUWKHQXPEHURIVWURNHFRXQWVIRUDJLYHQ&KLQHVHFKDUDFWHUPD\GLIIHU EHWZHHQLWVVLPSOLILHGDQGWUDGLWLRQDOIRUP+HQFHLWLVFRPPRQWRKDYHWZR GLVWLQFWVRUWLQJDOJRULWKPVWRKDQGOHERWKZULWLQJVW\OHV 7KHH[DPSOHEHORZVKRZVKRZWKHWZRIRUPVRIWKH&KLQHVHFKDUDFWHU¶JUDVV·DUH WUHDWHGGLIIHUHQWO\LQWHUPVRIVWURNHFRXQWV 6LPSOLILHG&KLQHVH7UDGLWLRQDO&KLQHVH RI6WURNHV Sorting Your Linguistic Data Inside the Oracle9i Database Page 5 5DGLFDO7KLVRUGHULQJLVIRXQGLQPRVW&KLQHVHGLFWLRQDULHV,WLVVLPLODUWRWKH VWURNHFRXQWRUGHULQJEXWKHUHWKHUDGLFDORUGHULQJWDNHVSULRULW\RYHUWKHVWURNH FRXQW$OWKRXJKLWLVDPRUHWUDGLWLRQDOPHWKRGRIVRUWLQJ&KLQHVHFKDUDFWHUV VRPHWLPHVLWFDQEHTXLWHFXPEHUVRPHHYHQIRUQDWLYH&KLQHVHVSHDNHUVVLQFHLW PD\QRWEHLQWXLWLYHWRORFDWHWKHFRUUHFWUDGLFDOIRUDJLYHQ&KLQHVHFKDUDFWHU7KH RUGHURIWKHUDGLFDOLVE\WKHUDGLFDOVWURNHFRXQW)RUFKDUDFWHUVZLWKLGHQWLFDO UDGLFDORUGHUWKHQXPEHURIVWURNHFRXQWVRIWKHFKDUDFWHUZLOOEHXVHG 7KLVH[DPSOHLOOXVWUDWHVKRZWKUHH&KLQHVHFKDUDFWHUVDUHRUGHUHGEDVHGRQWKHLU UDGLFDOVWURNHFRXQWV +RZHYHUVRPHUDGLFDOVZLOOFKDQJHVKDSHZKHQWKH\DUHELQGZLWKRWKHUFKDUDFWHU FRPSRQHQWVWRIRUPWKHILQDOFKDUDFWHU7KHRUGHURIDUDGLFDOLVDOZD\VEDVHGRQ LWVRULJLQDOIRUP 7KHH[DPSOHEHORZVKRZWKHRUGHURIIRXUFKDUDFWHUVEDVHGRQWKHRULJLQDOVWURNH FRXQWRIWKHLUUDGLFDOV1DPHVLQSDUHQWKHVHVVKRZWKH8QLFRGHFRGHVRIWKHVH FKDUDFWHUV 7RDGGHYHQPRUHFRPSOH[LW\WRVRUWLQJUDGLFDOVMXVWOLNHLQWKHSUHYLRXVVWURNH FRXQWVRUWVRPH&KLQHVHFKDUDFWHUVPD\KDYHDGLIIHUHQWYDULDWLRQRIWKHUDGLFDO EHWZHHQWKHVLPSOLILHGDQGWUDGLWLRQDOIRUPRI&KLQHVH+HQFHLWLVFRPPRQWR KDYHWZRGLVWLQFWVRUWLQJDOJRULWKPVWRKDQGOHWKHGLIIHUHQFHVEHWZHHQWKHWZR ZULWLQJVW\OHV 7KHH[DPSOHEHORZGHPRQVWUDWHVWKHUDGLFDOGLIIHUHQFHVEHWZHHQ6LPSOLILHG &KLQHVH 6& DQG7UDGLWLRQDO&KLQHVH 7& IRUWKHVDPHVHWRIFKDUDFWHUV Sorting Your Linguistic Data Inside the Oracle9i Database Page 6 3URQXQFLDWLRQ 3LQ\LQ ²3LQ ISO/IEC 14651 – INTERNATIONAL STRING ORDERING ,QWKHPLGVD4XpEHFFHU$ODLQ/D%RQWpZDVVXUSULVHGDQGSX]]OHGZLWKWKH LQFRQVLVWHQWEHKDYLRUZLWKZKLFKWKHGLIIHUHQWFRPPHUFLDOVRUWSURJUDPVGHDOW ZLWK)UHQFKKRPRJUDSKZRUGVHYHQWKRXJKWKHVHZHUHLQWXLWLYHWR)UHQFKQDWLYH VSHDNHUV+HUHDOL]HGWKHQHHGIRUDQLQWHUQDWLRQDOVWDQGDUGWKDWZRXOGDOORZ GHILQLWLRQRIDXQLYHUVDOPHWKRGRORJ\IRUPXOWLVFULSWRUGHULQJ ,QWKHSURMHFWIRUDQLQWHUQDWLRQDOVWULQJRUGHULQJVWDQGDUG,62ZDV FUHDWHG7KLVLQWHUQDWLRQDOVWDQGDUGSURYLGHVDPHWKRGIRURUGHULQJWH[WGDWD ZRUOGZLGHDQGSURYLGHVD&RPPRQ7HPSODWH7DEOHZKRVHWDLORULQJPHHWVWKH UHTXLUHPHQWVRIDJLYHQODQJXDJHDQGFXOWXUHZKLOHUHWDLQLQJXQLYHUVDOSURSHUWLHV IRURWKHUVFULSWV Sorting Your Linguistic Data Inside the Oracle9i Database Page 7 7KH&RPPRQ7HPSODWH7DEOHUHTXLUHVVRPHWDLORULQJLQGLIIHUHQWORFDO HQYLURQPHQWV+RZHYHUFRQIRUPDQFHWRWKLV,QWHUQDWLRQDO6WDQGDUGUHTXLUHVWKDW DOOGHYLDWLRQVIURPWKH7HPSODWHFDOOHGGHOWDVEHGHFODUHGWRGRFXPHQWUHVXOW GLVFUHSDQFLHV7KLV6WDQGDUGGHVFULEHVDPHWKRGWRRUGHUWH[WGDWDLQGHSHQGHQWO\ RIFRQWH[W SORTING INSIDE THE ORACLE DATABASE 7H[WLVFRQYHQWLRQDOO\VRUWHGLQVLGHDGDWDEDVHDFFRUGLQJWRWKHELQDU\FRGHVXVHG WRHQFRGHWKHFKDUDFWHUV7\SLFDOO\WKLVGRHVQRWSURGXFHDVRUWRUGHUWKDWLV OLQJXLVWLFDOO\FRUUHFW,QVRPHFDVHVLWFDQEHFRUUHFWLIWKHJLYHQHQFRGLQJVFKHPH VSHFLILHVDOOWKHFKDUDFWHUVLQDVFHQGLQJELQDU\YDOXHDFFRUGLQJWRWKHDSSURSULDWH DOSKDEHWLFFRQYHQWLRQ8QIRUWXQDWHO\PRVWHQFRGLQJVFKHPHVGRQRWIROORZDQ\ VXFKFRQYHQWLRQDQGHYHQLIWKH\GRLWLVQRWSRVVLEOHIRUWKHPWRFRYHUWKHPRUH FRPSOH[VRUWLQJVFHQDULRVGLVFXVVHGSUHYLRXVO\ 7RRYHUFRPHWKLVOLPLWDWLRQ2UDFOHSURYLGHVOLQJXLVWLFVRUWLQJ$/LQJXLVWLFVRUW KDQGOHVWKHFRPSOH[VRUWLQJUHTXLUHPHQWVRIWKHGLIIHUHQWODQJXDJHVDQGFXOWXUHV ,WHQDEOHVWH[WLQDQ\FKDUDFWHUHQFRGLQJVFKHPHVWREHVRUWHGDFFRUGLQJWR VSHFLILFOLQJXLVWLFFRQYHQWLRQVLQGHSHQGHQWRIWKHELQDU\YDOXHVRIWKHFKDUDFWHUV 7KUHHW\SHVRIGDWDEDVHVRUWVDUHVXSSRUWHGLQ2UDFOHL %LQDU\VRUW 0RQROLQJXDOOLQJXLVWLFVRUW 0XOWLOLQJXDOOLQJXLVWLFVRUW Binary Sort 7KHPRVWFRPPRQZD\WRVRUWFKDUDFWHUGDWDLVWRRUGHUWKHPE\WKHLUQXPHULF ELQDU\FRGHVDVGHILQHGE\WKHFKDUDFWHUHQFRGLQJVFKHPH7KLVLVDFKLHYHGE\ XVLQJ%LQDU\VRUW,WLVWKHIDVWHVWIRUPRIVRUWLQJLQWKHGDWDEDVHEHFDXVHQR VSHFLDOSURFHVVLQJKDVWREHGRQHRQVRUWHGYDOXHVDQGGDWDEDVHLQGH[HVFUHDWHG GLUHFWO\RQWKHVRUWHGFROXPQV VWDQGDUGLQGH[HV DUHXVXDOO\VPDOOHUWKDQOLQJXLVWLF LQGH[HVGHVFULEHGODWHULQWKLVSDSHUUHTXLULQJOHVVGLVNUHDGVWRVHDUFKWKHP %LQDU\VRUWRIIHUVUHDVRQDEOHUHVXOWVIRUWKH(QJOLVKDOSKDEHWVLQFHERWK$6&,,DQG (%&',&VWDQGDUGVGHILQHWKHOHWWHUVIURP$WR=LQDVFHQGLQJQXPHULFYDOXH RUGHU+RZHYHUWKLVLVQRWSUHIHFWVLQFHXSSHUFDVHOHWWHUVDQGORZHUFDVHOHWWHUVDUH JURXSHGVHSDUDWHO\)RU$6&,,XSSHUFDVHOHWWHUVDSSHDUEHIRUHDQ\ORZHUFDVH OHWWHUVZKHUHDVLQ(%&',&LWLVWKHUHYHUVH Sorting Your Linguistic Data Inside the Oracle9i Database Page 8 5HVXOWVJHQHUDWHGIURPELQDU\VRUWVFDQYDU\GHSHQGLQJRQWKHRUGHULQJRIWKH FKDUDFWHUVZLWKLQDQ\JLYHQFKDUDFWHUVHWV:KHQFKDUDFWHUVXVHGLQODQJXDJHVRWKHU WKDQ(QJOLVKDUHSUHVHQWELQDU\VRUWVXVXDOO\GRQRWSURGXFHUHDVRQDEOHUHVXOWV )RUH[DPSOHDQDVFHQGLQJ25'(5%<TXHU\UHWXUQVWKHFKDUDFWHUV&(EÊk EHFDXVHHDFKSUHFHGLQJFKDUDFWHUKDVDORZHUQXPHULFFRGHLQWKHFKDUDFWHU HQFRGLQJVFKHPHWKDQWKHRQHDIWHULW$%LQDU\VRUWFDQQRWSUHVHQWOLQJXLVWLFDOO\ PHDQLQJIXOGDWDZKHQWKHODQJXDJHVRUWLQJUXOHVDUHFRPSOH[ 7KHELQDU\VRUWLVQDPHG%,1$5<2UDFOHLDOVRVXSSRUWVWKHILYHTXDVLELQDU\ VRUWV81,&2'(B%,1$5<$6&,,(%&',&%,**%.DQG+.6&67KHVH VRUWVDUHELQDU\LQWKDWWKH\VRUWGDWDDFFRUGLQJWRELQDU\FRGHVRIFKDUDFWHUVLQ $/87)86$6&,,:((%&',&=+7%,*=+6*%.DQG =+7+.6&6FKDUDFWHUVHWVUHVSHFWLYHO\+RZHYHUWKH\DUHQRWHTXLYDOHQWWR WKH%,1$5<VRUWEHFDXVHWKHVRUWHGVWULQJVKDYHWREHFRQYHUWHGWRWKHFRUUHFW FKDUDFWHUVHWEHIRUHFRPSDULVRQ7KLVPHDQVWKDWVWDQGDUG%,1$5<LQGH[HV FDQQRWEHXVHGWRVDWLVI\TXHULHVIRUGDWDRUGHUHGE\WKHVHVRUWVDQGOLQJXLVWLF LQGH[HVKDYHWREHGHILQHGSUHWW\PXFKOLNHIRUWKHPRQROLQJXDOOLQJXLVWLFVRUWV Monolingual Linguistic Sort 7RSURGXFHD ORFDOL]HG VRUWVHTXHQFHWKDWDGKHUHVWRDVSHFLILFOLQJXLVWLF FRQYHQWLRQDQRWKHUVRUWWHFKQLTXHPXVWEHXVHGWKDWFDQVRUWFKDUDFWHUV LQGHSHQGHQWO\RIWKHLUELQDU\FRGHVLQVLGHWKHFKDUDFWHUHQFRGLQJVFKHPH7KLV WHFKQLTXHLVFDOOHGDOLQJXLVWLFVRUW$OLQJXLVWLFVRUWRSHUDWHVE\UHSODFLQJ FKDUDFWHUVZLWKQXPHULFYDOXHVDOVRNQRZQDV¶VRUWNH\V·WKDWUHIOHFWHDFK FKDUDFWHU·VSURSHUOLQJXLVWLFRUGHUIRUDJLYHQODQJXDJH0RQROLQJXDOOLQJXLVWLFVRUW ZDVILUVWLQWURGXFHGLQ2UDFOH 7KHFRQVWUXFWLRQRIWKHVRUWNH\LVDFKLHYHGE\EUHDNLQJGRZQHDFKOHWWHURIWKH DOSKDEHWLQWRWZRFRPSRQHQWVPDMRUYDOXHDQGPLQRUYDOXH8VXDOO\OHWWHUVZLWK WKHVDPHDSSHDUDQFH RUEDVHOHWWHU KDYHWKHVDPHPDMRUYDOXHPLQRUYDOXHLV XVHGWRGLIIHUHQWLDWHGLDFULWLFDQGFDVHYDULDQWVRIWKHVDPHEDVHOHWWHU 7KHIROORZLQJWDEOHVKRZVVDPSOHOHWWHUVDQGWKHLUPDMRUDQGPLQRUYDOXHV /HWWHU 0DMRUYDOXH 0LQRUYDOXH D $ l b % Sorting Your Linguistic Data Inside the Oracle9i Database Page 9 &KDUDFWHUVWULQJVDUHFRPSDUHGLQWZRVWHSVIRUPRQROLQJXDOOLQJXLVWLFVRUWV)LUVW WKHDOJRULWKPJHQHUDWHVVRUWNH\VIRUWKHFRPSDUHGYDOXHVDQGWKDQLWFRPSDUHVWKH VRUWNH\VE\WHE\E\WHXQWLOWKH\GLIIHURURQHLVVKRUWHU7KHVKRUWHUNH\RUWKH RQHZLWKWKHORZHUGLIIHULQJE\WHLVFRQVLGHUHGVPDOOHU 7KHVRUWNH\VDUHJHQHUDWHGE\FRQFDWHQDWLQJDOOPDMRUVRUWYDOXHVRIDOOFKDUDFWHUV RIWKHVWULQJVIROORZHGE\D]HURYDOXHDQGWKHQE\DOOPLQRUYDOXHV7KLVZD\WKH PDMRUYDOXHVGHWHUPLQHWKHEDVHVRUWRUGHUDQGRQO\IRUVWULQJVZLWKWKHH[DFWO\ VDPHPDMRUYDOXHVWKHPLQRUYDOXHVDUHFRPSDUHGWRIXUWKHURUGHUWKHVWULQJV ,IDFKDUDFWHUOLNHVSDFHRUK\SKHQVKRXOGQRWEHFRQVLGHUHGZKHQFRPSDULQJ VWULQJVWKHQRQO\WKHPLQRUYDOXHLVXVHGIRUWKHVRUWNH\7KHPLQRUYDOXHLVQRW VNLSSHGVRWKDWVWULQJVGLIIHULQJE\LJQRUDEOHFKDUDFWHUVRQO\DUHQRWFRQVLGHUHG HTXDO 7KHUHVXOWRIWKH64/1/66257IXQFWLRQLVWKHVRUWNH\LQWKH5$:GDWDW\SH 2UDFOHXVHVQDPHGOLQJXLVWLFVRUWVWRVSHFLI\KRZFKDUDFWHUGDWDVKRXOGEHVRUWHG 7KHQDPHVRIWKHOLQJXLVWLFVRUWVLQPRVWFDVHVDUHGHILQHGXVLQJWKHODQJXDJH QDPHV)RUVRPHODQJXDJHVDGGLWLRQDO H[WHQGHG OLQJXLVWLFVRUWVDUHGHILQHG &RQYHQWLRQDOO\WKHVHOLQJXLVWLFVRUWVDUHQDPHGZLWKDSUHFHGLQJ ; )RUH[DPSOH2UDFOHVXSSRUWVWZR6SDQLVKPRQROLQJXDOOLQJXLVWLFVRUWVSPANISH IRU0RGHUQ6SDQLVKDQGXSPANISHIRU7UDGLWLRQDO6SDQLVK ([WHQGHGOLQJXLVWLFVRUWVDUHGHVLJQHGWRDFFRPPRGDWHODQJXDJHVSHFLILFVSHFLDO FDVHV 6RUWLQJRIGLJUDSKVDVDVLQJOHFKDUDFWHUHJWKH6SDQLVKOODQGFK &RQYHUWLQJVLQJOHFKDUDFWHUVLQWRGLJUDSKVIRUVRUWLQJSXUSRVHVHJWKH *HUPDQVKDUSV¶·LVWUHDWHGDVVV 7KHIROORZLQJLVDFRPSOHWHOLVWRIPRQROLQJXDOVRUWVVXSSRUWHGLQ2UDFOHL $5$%,& ),11,6+ 381&78$7,21 $5$%,&B0$7&+ )5(1&+ ;381&78$7,21 $5$%,&B$%-B6257 ;)5(1&+ 520$1,$1 $5$%,&B$%-B0$7&+ *(50$1 5866,$1 %(1*$/, ;*(50$1 6/29$. %8/*$5,$1 *(50$1B',1 ;6/29$. &$1$',$1)5(1&+ ;*(50$1B',1 6/29(1,$1 &$7$/$1 *5((. ;6/29(1,$1 ;&$7$/$1 +(%5(: 63$1,6+ &52$7,$1 +81*$5,$1 ;63$1,6+ ;&52$7,$1 ;+81*$5,$1 6:(',6+ &=(&+ ,&(/$1',& 6:,66 ;&=(&+ ,1'21(6,$1 ;6:,66 &=(&+B381&78$7,21 ,7$/,$1 7+$,B',&7,21$5< ;&=(&+B381&78$7,21 -$3$1(6( 7+$,B7(/(3+21( '$1,6+ /$7,1 785.,6+ ;'$1,6+ /$79,$1 ;785.,6+ '87&+ /,7+8$1,$1 8.5$,1,$1 Sorting Your Linguistic Data Inside the Oracle9i Database Page 10 ;'87&+ 0$/$< 9,(71$0(6( ((&B(852 125:(*,$1 :(67B(8523($1 ((&B(8523$ 32/,6+ ;:(67B(8523($1 (6721,$1 Multilingual Linguistic Sort 0RQROLQJXDOOLQJXLVWLFVRUWLVXVHIXOZKHQ\RXDUHFRPSDULQJDQGVRUWLQJGDWDLQ RQHODQJXDJHKRZHYHULWFDQQRWVRUWGDWDDFURVVPXOWLSOHODQJXDJHVRUZULWLQJ V\VWHPV2UDFOHLSURYLGHVPXOWLOLQJXDOOLQJXLVWLFVRUWVVRWKDW\RXFDQVRUWGDWDLQ PRUHWKDQRQHODQJXDJHLQDVLQJOHVRUW:LWKWKHGHYHORSPHQWRIWKH,QWHUQHW PRUHDQGPRUHFRPSDQLHVDQGQRZWUDQVIRUPLQJWKHLUEXVLQHVVHVLQWRJOREDO EXVLQHVVHVLWLVEHFRPLQJFRPPRQIRUFXVWRPHUVWRKDYHGLIIHUHQWODQJXDJHGDWD LQDVLQJOHJOREDOGDWDEDVH0XOWLOLQJXDOOLQJXLVWLFVRUWKHOSVFXVWRPHUVZLWK PXOWLOLQJXDOGDWDWRDFFXUDWHO\VHDUFKDQGRUJDQL]HLQIRUPDWLRQLQDQ\ODQJXDJH 2UDFOH·VPXOWLOLQJXDOOLQJXLVWLFVRUWLVEDVHGRQWKH,62,(&DQGWKH 8QLFRGH&ROODWLRQ$OJRULWKPVWDQGDUGV)RUWKHPWRKDQGOHWKHFRPSOH[ PXOWLOLQJXDOVRUWLQJUHTXLUHPHQWVDQGDWWKHVDPHWLPHWRSURYLGHDJUHDWHU IOH[LELOLW\IRUPRGLILFDWLRQPXOWLOLQJXDOOLQJXLVWLFVRUWVDUHHYDOXDWHGLQWKUHHOHYHOV RISUHFLVLRQVDVGHILQHGE\WKHDERYHWZRVWDQGDUGV 3ULPDU\OHYHO²$SULPDU\OHYHOVRUWGLVWLQJXLVKHVEHWZHHQEDVHFKDUDFWHUV VXFKDVWKHGLIIHUHQFHEHWZHHQFKDUDFWHUVDDQGE,IDFKDUDFWHULVDQ LJQRUDEOHFKDUDFWHUWKHQLWLVDVVLJQHGDSULPDU\OHYHORUGHU RUZHLJKW RI ]HURZKLFKPHDQVLWFDQEHLJQRUHGGXULQJVRUWLQJFRPSDULVRQDWWKLV SDUWLFXODUOHYHO$QH[DPSOHRIDQLJQRUDEOHFKDUDFWHULVWKHGDVKFKDUDFWHU ¶·,ILWLVLJQRUHGWKHQWKHZRUG¶PXOWLOLQJXDO··FDQEHWUHDWHGWKHVDPHDV ¶PXOWLOLQJXDO· 6HFRQGDU\OHYHO²WKLVLVXVHGWRGLVWLQJXLVKEHWZHHQWKHGLIIHUHQW GLDFULWLFVIRUDJLYHQEDVHFKDUDFWHU)RUH[DPSOHWKHFKDUDFWHUgGLIIHUV IURPWKHFKDUDFWHU2RQO\EHFDXVHLWKDVDGLDFULWLF7KXVgDQG2DUHWKH VDPHRQWKHSULPDU\OHYHOEHFDXVHWKH\KDYHWKHVDPHEDVHFKDUDFWHU2 EXWGLIIHURQWKHVHFRQGDU\OHYHO7KHVHFRQGDU\OHYHOPD\EHVSHFLILHGDV FRQVLGHULQJFKDUDFWHUVVWDUWLQJIURPWKHHQGRIWKHVWULQJWRZDUGLWV EHJLQQLQJ7KLVLVUHTXLUHGIRUWKH)UHQFKVRUWRUGHU 7HUWLDU\OHYHO²$WHUWLDU\OHYHOVRUWGLVWLQJXLVKHVEHWZHHQFDVLQJ XSSHU DQGORZHUFDVH RIFKDUDFWHUVWKDWGRQRWGLIIHURQWKHSULPDU\DQG VHFRQGDU\OHYHOV Sorting Your Linguistic Data Inside the Oracle9i Database Page 11 7KHIROORZLQJH[DPSOHLOOXVWUDWHVKRZDOLVWLVVRUWHGEDVHGRQDOOWKUHHOHYHOV$W WKHSULPDU\OHYHO¶UHVXPH·LVRUGHUHGEHIRUH¶UHVXPHV·DWWKHVHFRQGDU\OHYHO VWULQJVZLWKRXWGLDFULWLFVFRPHEHIRUHVWULQJVZLWKGLDFULWLFVDQGDWWKHWHUWLDU\ OHYHOORZHUFDVHFKDUDFWHUVDUHVRUWHGEHIRUHXSSHUFDVHRQHV resume Resume résumé Résumé resumes 7KLVWKUHHOHYHODUFKLWHFWXUHHQDEOHVWKH2UDFOHGDWDEDVHWRKDQGOHODQJXDJHVWKDW KDYHFRPSOH[VRUWLQJUXOHVDQGSURYLGHOLQJXLVWLFVRUWLQJVXSSRUWIRUGDWDEDVHV ZLWKPXOWLOLQJXDOGDWD/LQJXLVWLFVRUWVIRU$VLDQODQJXDJHVVXFKDV&KLQHVH -DSDQHVHDQG.RUHDQDUHQRZDYDLODEOHIRUWKHILUVWWLPH,QIDFWIRU&KLQHVHVRUWV ZHQRZRIIHUWKUHHYDULHWLHVRIVRUWVEDVHGRQWKHQXPEHURIVWURNHV3LQ Sorting Your Linguistic Data Inside the Oracle9i Database Page 12 2UDFOH·VPXOWLOLQJXDOOLQJXLVWLFVRUWVDUHDOOFUHDWHGEDVHGRQWKHGENERIC_M WHPSODWHWKHLUQDPHVDUHSRVWIL[HGZLWK¶_M·WRLQGLFDWHWKDWWKH\DUHPXOWLOLQJXDO VRUWVEDVHGRQWKH,62VWDQGDUG 7KHIROORZLQJLVDFRPSOHWHOLVWRIPXOWLOLQJXDOVRUWVVXSSRUWHGLQ2UDFOHL *(1(5,&B0 -$3$1(6(B0 63$1,6+B0 .25($1B0 7&+,1(6(B5$',&$/B0 &$1$',$1B0 6&+,1(6(B6752.(B0 7&+,1(6(B6752.(B0 '$1,6+B0 6&+,1(6(B3,1<,1B0 7+$,B0 )5(1&+B0 6&+,1(6(B5$',&$/B0 )RUH[DPSOH2UDFOHLVXSSRUWVDPRQROLQJXDO)UHQFKVRUWFDOOHG¶XFRENCH· Linguistic Sort Parameters 1/6SDUDPHWHUVDUHXVHGWRGHWHUPLQHWKHORFDOHVSHFLILFEHKDYLRULQ64/TXHULHV 0RVW1/6SDUDPHWHUVFDQEHFRQILJXUHGDWWKHGDWDEDVHVHVVLRQOHYHO6ZLWFKLQJ FXOWXUDOFRQYHQWLRQVLQDGDWDEDVHVHVVLRQLVYLWDOIRUDSSOLFDWLRQVWKDWVXSSRUW PXOWLSOHODQJXDJHVLWDOORZVXVHUVZLWKGLIIHUHQWORFDOHUHTXLUHPHQWVWRFRQQHFWWR WKHVDPHVLQJOHGDWDEDVH 7KHSDUDPHWHU1/6B6257JRYHUQVWKHOLQJXLVWLFVRUWSURSHUW\RIWKHXVHU·V64/ VHVVLRQ Parameter type String Syntax NLS_SORT = {BINARY | linguistic sort} Default value Derived from NLS_LANGUAGE Parameter scope Initialization Parameter, Environment Variable and ALTER SESSION Sorting Your Linguistic Data Inside the Oracle9i Database Page 13 Range of values BINARY or any valid linguistic sort definition name NLS_SORT is implicitly defined by another 7KHSDUDPHWHU1/6B6257VSHFLILHVWKHFROODWLQJVHTXHQFHIRU25'(5%< parameter called NLS_LANGUAGE. The TXHULHV,IWKHYDOXHLV%,1$5<WKHQWKHFROODWLQJVHTXHQFHLVEDVHGRQWKH value of NLS_SORT may change if the QXPHULFFRGHRIWKHFKDUDFWHUVLQWKHXQGHUO\LQJHQFRGLQJVFKHPHGHSHQGLQJRQ value of NLS_LANGUAGE is altered within WKHGDWDW\SHWKLVZLOOHLWKHUEHLQWKHELQDU\VHTXHQFHRUGHURIWKHGDWDEDVH a given user session. FKDUDFWHUVHWRUWKHQDWLRQDOFKDUDFWHUVHW ,IWKHYDOXHLVDQDPHGOLQJXLVWLFVRUWVRUWLQJLVEDVHGRQWKHRUGHURIWKHGHILQHG VRUW0RVW EXWQRWDOO ODQJXDJHVVXSSRUWHGE\WKH1/6B/$1*8$*(SDUDPHWHU DOVRVXSSRUWDOLQJXLVWLFVRUWZLWKWKHVDPHQDPH 7KHWDEOHEHORZOLVWVWKHGHIDXOW1/6B6257YDOXHIRUHDFKRIWKH2UDFOH ODQJXDJHV 1/6B/$1*8$*( 1/6B6257 $0(5,&$1 ELQDU\ $5$%,& $5$%,& $66$0(6( ELQDU\ %$1*/$ ELQDU\ %(1*$/, %(1*$/, %5$=,/,$132578*8(6( :(67B(8523($1 %8/*$5,$1 %8/*$5,$1 &$1$',$1)5(1&+ &$1$',$1)5(1&+ &$7$/$1 &$7$/$1 &52$7,$1 &52$7,$1 &=(&+ &=(&+ '$1,6+ '$1,6+ '87&+ '87&+ (*<37,$1 $5$%,& (1*/,6+ ELQDU\ (6721,$1 (6721,$1 ),11,6+ ),11,6+ )5(1&+ )5(1&+ *(50$1 *(50$1 *(50$1',1 *(50$1 *5((. *5((. *8-$5$7, ELQDU\ +(%5(: +(%5(: +,1', ELQDU\ +81*$5,$1 +81*$5,$1 ,&(/$1',& ,&(/$1',& ,1'21(6,$1 ,1'21(6,$1 ,7$/,$1 :(67B(8523($1 -$3$1(6( ELQDU\ .$11$'$ ELQDU\ .25($1 ELQDU\ /$7,1$0(5,&$163$1,6+ 63$1,6+ /$79,$1 /$79,$1 /,7+8$1,$1 /,7+8$1,$1 0$/$< 0$/$< 0$/$<$/$0 ELQDU\ 0$5$7+, ELQDU\ Sorting Your Linguistic Data Inside the Oracle9i Database Page 14 0(;,&$163$1,6+ :(67B(8523($1 125:(*,$1 125:(*,$1 25,<$ ELQDU\ 32/,6+ 32/,6+ 32578*8(6( :(67B(8523($1 381-$%, ELQDU\ 520$1,$1 520$1,$1 5866,$1 5866,$1 6,03/,),('&+,1(6( ELQDU\ 6/29$. 6/29$. 6/29(1,$1 6/29(1,$1 63$1,6+ 63$1,6+ 6:(',6+ 6:(',6+ 7$0,/ ELQDU\ 7(/8*8 ELQDU\ 7+$, 7+$,B',&7,21$5< 75$',7,21$/&+,1(6( ELQDU\ 785.,6+ 785.,6+ 8.5$,1,$1 8.5$,1,$1 9,(71$0(6( 9,(71$0(6( ,QJHQHUDOWKHELQDU\VRUWUHTXLUHVOHVVV\VWHPRYHUKHDGWKLVLVEHFDXVHVWDQGDUG 2UDFOHLQGH[HVEXLOWDFFRUGLQJWRWKHELQDU\RUGHURIWKHNH\VDUHVPDOOHUWKDQWKH OLQJXLVWLFLQGH[HVGHVFULEHGLQWKHVHFWLRQ´8VLQJ/LQJXLVWLF,QGH[µ The NLS_LANG environment variable can 7KHH[DPSOHVEHORZLOOXVWUDWHWKHGLIIHUHQFHVEHWZHHQDELQDU\VRUWDPRQROLQJXDO also influences the NLS_SORT behavior. 6ZHGLVKOLQJXLVWLFVRUWDQGDPXOWLOLQJXDO*(1(5,&B0OLQJXLVWLFVRUW NLS_SORT will be changed to the default value assigned to a given NLS_LANGUAGE Example 1. Binary Sort parameter, as defined by the PRODUCT NAME ------Antenne Lcd aerial Ähre ächzen Example 2. Monolingual Swedish Sort ALTER SESSION SET NLS_SORT=SWEDISH; Sorting Your Linguistic Data Inside the Oracle9i Database Page 15 SELECT product_name FROM product ORDER BY product_name; PRODUCT NAME ------aerial Antenne Lcd ächzen Ähre Example 3. Multilingual GENERIC_M Sort SELECT product_name FROM product ORDER BY NLSSORT(product_name, ’NLS_SORT=GENERIC_M’); PRODUCT NAME ------ächzen aerial Ähre Antenne Lcd :KHQXVLQJFRPSDULVRQRSHUDWRUVFKDUDFWHUVDUHFRPSDUHGDFFRUGLQJWRWKHLU ELQDU\FRGHVLQWKHGHVLJQDWHGHQFRGLQJVFKHPH$FKDUDFWHULVJUHDWHUWKDQ DQRWKHULILWKDVDKLJKHUELQDU\FRGH6LQFHWKHELQDU\VHTXHQFHRIFKDUDFWHUVPD\ QRWPDWFKWKHOLQJXLVWLFVHTXHQFHIRUDSDUWLFXODUODQJXDJHVXFKFRPSDULVRQVPD\ QRWEH OLQJXLVWLFDOO\FRUUHFW 7KH64/1/66257IXQFWLRQDOORZVVXFK FRPSDULVRQVWRUHIOHFWOLQJXLVWLFFRQYHQWLRQV Example 4. BINARY comparison ALTER SESSION SET NLS_SORT=GERMAN; SELECT product_name FROM product WHERE product_name > 'Antenne' ORDER BY product_name; PRODUCT NAME ------ Sorting Your Linguistic Data Inside the Oracle9i Database Page 16 ächzen aerial Ähre Lcd Example 5. GERMAN linguistic sensitive comparison ALTER SESSION SET NLS_SORT=GERMAN; SELECT product_name The SQL NLSSORT function has to be FROM product added on both sides of the comparison operator. WHERE NLSSORT(product_name) > NLSSORT('Antenne') ORDER BY product_name; PRODUCT NAME ------Lcd 8VLQJ1/66257IXQFWLRQLQ64/VWDWHPHQWVFDQEHFXPEHUVRPH$QHZ1/6 SDUDPHWHU1/6B&203ZDVLQWURGXFHGLQ2UDFOHLWRLQGLFDWHWKDWWKH FRPSDULVRQPXVWEHOLQJXLVWLFDOO\VHQVLWLYHDFFRUGLQJWRWKH1/6B6257VHVVLRQ SDUDPHWHU Parameter type String Syntax NLS_COMP = {BINARY | ANSI} Default value BINARY Parameter scope Initialization Parameter, Environment Variable and ALTER SESSION Range of values BINARY or ANSI HJExample 5. GERMAN linguistic sensitive comparisonDERYHFDQEHUHZULWWHQXVLQJ ALTER SESSION SET NLS_SORT=GERMAN; ALTER SESSION SET NLS_COMP=ANSI; SELECT product_name FROM product WHERE product_name > 'Antenne' ORDER BY product_name; PRODUCT NAME ------Lcd Sorting Your Linguistic Data Inside the Oracle9i Database Page 17 1RWH1/6B&203DQG1/6B6257DIIHFWWKHIROORZLQJ64/RSHUDWLRQVRQO\ WHERE, ORDER BY, START WITH, IN/NOT IN, BETWEEN, CASE WHEN DQG HAVING. 7KHRWKHU64/RSHUDWRUVZLOOFRPSDUHLQELQDU\PRGHRQO\7RHQDEOHOLQJXLVWLF VHQVLWLYHFRPSDULVRQVNLSSORT IXQFWLRQVPXVWEHDGGHGWRWKHVHRSHUDWRUV Using a Linguistic Index /LQJXLVWLFVRUWLQJLVODQJXDJHVSHFLILFDQGUHTXLUHVPRUHGDWDSURFHVVLQJWKDQ ELQDU\VRUWLQJ%LQDU\VRUWLQJLVIDVWEHFDXVHLWLVLQWKHRUGHURIWKHFKDUDFWHUVHW HQFRGLQJ:KHQGDWDRIPXOWLSOHODQJXDJHVDUHVWRUHGLQWKHGDWDEDVH\RXPD\ ZDQW\RXUDSSOLFDWLRQVWRFROODWHDUHVXOWVHWUHWXUQHGIURPD6(/(&7VWDWHPHQW XVLQJWKH25'(5%<FODXVHZLWKGLIIHUHQWOLQJXLVWLFVHTXHQFHVEDVHGXSRQWKH ODQJXDJHEHLQJXVHGDQGZLWKRXWWKHSHUIRUPDQFHSHQDOW\DVVRFLDWHGZLWK OLQJXLVWLFVRUWLQJ7KLVFDQEHDFFRPSOLVKHGE\XVLQJOLQJXLVWLFLQGH[HV/LQJXLVWLF LQGH[LVDYDULDWLRQRIWKHIXQFWLRQEDVHGLQGH[ 7KHUHDUHWKUHHDSSURDFKHVWRVHWXSOLQJXLVWLFLQGH[HVWRVRUW\RXUODQJXDJHGDWD %XLOGDOLQJXLVWLFLQGH[IRUHDFKODQJXDJHWKDWWKHDSSOLFDWLRQQHHGVWRVXSSRUW 7KLVDSSURDFKRIIHUVVLPSOLFLW\EXWUHTXLUHVPRUHGLVNVSDFH)RUHDFKLQGH[WKH URZVLQWKHODQJXDJHRWKHUWKDQWKHRQHRQZKLFKWKHLQGH[LVEXLOWDUHFROODWHG WRJHWKHUDWWKHHQGRIWKHVHTXHQFH7KHIROORZLQJH[DPSOHEXLOGVOLQJXLVWLF LQGH[HVIRUVRUWLQJ)UHQFKDQG*HUPDQGDWD CREATE INDEX french_index ON product (NLSSORT(product_name, 'NLS_SORT=FRENCH')); CREATE INDEX german_index ON product (NLSSORT(product_name, 'NLS_SORT=GERMAN')); 7KHLQGH[LVVHOHFWHGE\WKH64/RSWLPL]HUEDVHGRQWKH1/6B6257VHVVLRQ SDUDPHWHURUWKHDUJXPHQWVRIWKH1/66257IXQFWLRQWKDWLVVSHFLILHGLQWKH 25'(5%<FODXVH)RUH[DPSOHLIWKHVHVVLRQYDULDEOH1/6B6257LVVHWWR )5(1&+IUHQFKBLQGH[ZLOOEHVHOHFWHGDQGZKHQLWLVVHWWR*(50$1WKH JHUPDQBLQGH[ZLOOEHXVHG %XLOGDVLQJOHOLQJXLVWLFLQGH[IRUDOOODQJXDJHVXVLQJDPXOWLOLQJXDOOLQJXLVWLFVRUW VXFKDV*(1(5,&B0RU)5(1&+B07KLVLQGH[FROODWHVFKDUDFWHUVDFFRUGLQJ WRWKHFKDUDFWHUUXOHVGHILQHGLQWKH,62VWDQGDUG CREATE INDEX generic_index on product (NLSSORT(product_name, 'NLS_SORT=GENERIC_M'); Sorting Your Linguistic Data Inside the Oracle9i Database Page 18 7KHLQGH[LVDXWRPDWLFDOO\SLFNHGXSLIWKH1/6B6257VHVVLRQSDUDPHWHURUWKH DUJXPHQWRIWKH1/66257IXQFWLRQ\RXVSHFLILHGLQWKH25'(5%<FODXVHLV HTXDOWRWKHVRUWQDPHXVHGLQWKHLQGH[GHILQLWLRQ7KLVDSSURDFKLVXVHIXOLIWKH ODQJXDJHVWKDW\RXQHHGWRVXSSRUWDUHFRYHUHGE\DJLYHQPXOWLOLQJXDOOLQJXLVWLF VRUW %XLOGDVLQJOHOLQJXLVWLFLQGH[IRUDOOODQJXDJHV7KLVFDQEHDFFRPSOLVKHGE\ LQFOXGLQJDODQJXDJHFROXPQLQ\RXUWDEOH VXFKDV/$1*B&2/LQWKHH[DPSOH 64/EHORZ WREHXVHGDVDSDUDPHWHURIWKH1/66257IXQFWLRQ7KHODQJXDJH When creating an index, the total length of FROXPQFRQWDLQVWKH1/6B6257YDOXHVIRUWKHGDWDLQWKHFROXPQRQZKLFKWKH the index key cannot exceed a certain LQGH[LVEXLOW7KHIROORZLQJH[DPSOHEXLOGVDVLQJOHOLQJXLVWLFLQGH[IRUPXOWLSOH value. This value depends primarily on the ODQJXDJHV:LWKWKLVLQGH[WKHURZVZLWKWKHVDPHYDOXHIRU1/6B6257DUH DB_BLOCK_SIZE. If an attempt is made to FROODWHGFRUUHFWO\UHODWLYHWRHDFKRWKHU5RZVZLWKGLIIHUHQWYDOXHVFDQQRWEH create an index with a key larger than the PHDQLQJIXOO\FRPSDUHG maximum value, an “ORA-1450 maximum key length exceeded” error is raised. CREATE INDEX nls_index ON product (NLSSORT(product_name, 'NLS_SORT=' || LANG_COL)); The maximum allowable length of the index key for 2K block is 758, 4K block is 1578, 8K block is 3218 and 16K block is 6498. 7KHQOVBLQGH[ZLOOEHXVHGRQO\LIWKHTXHU\H[SOLFLWO\VSHFLILHVNLSSORT (product_name, 'NLS_SORT=' || LANG_COL)LQWKHORDER BYFODXVH $VZLWKRWKHUIXQFWLRQEDVHGLQGH[HVEXLOGLQJFRPSRVLWHOLQJXLVWLFLQGH[HVDUH DOVRVXSSRUWHG )RUH[DPSOH CREATE INDEX german_index ON product (NLSSORT(product_name, 'NLS_SORT=GERMAN’), NLSSORT(company_name, 'NLS_SORT=GERMAN')); ,QIDFWWKHUXOHEDVHGRSWLPL]HUFDQXVHWKHQRQIXQFWLRQDOSUHIL[RIDFRPSRVLWH OLQJXLVWLFLQGH[LIWKHUHLVRQHLHLIWKHDERYHLQGH[ZDVPRGLILHGWRLQFOXGHWKH PRGHOQXPEHU CREATE INDEX german_index ON product (model_number, NLSSORT(product_name, 'NLS_SORT=GERMAN’), NLSSORT(company_name, 'NLS_SORT=GERMAN')); WKHQDUXOHEDVHGTXHU\FDQWDNHDGYDQWDJHRIWKLVFRPSRVLWHOLQJXLVWLFLQGH[DV ZHOO Sorting Your Linguistic Data Inside the Oracle9i Database Page 19 Requirements for Using Linguistic Indexes :KHWKHU\RXGHFLGHWRXVHDVLQJOHOLQJXLVWLFLQGH[RUPXOWLSOHOLQJXLVWLFLQGH[HV VRPHUHTXLUHPHQWVPXVWEHPHWIRUWKHOLQJXLVWLFLQGH[WREHXVHG 6HWQUERY_REWRITE_ENABLEDVHVVLRQSDUDPHWHUWRTRUE 2. 6HWQUERY_REWRITE_INTEGRITY=TRUSTED RUJUHDWHU 3. (QVXUHWKDWWKH COMPATIBLE IODJLVVHWWRRUJUHDWHU 7KHVHWKUHHLQLWLDOL]DWLRQSDUDPHWHUVHWWLQJVDUHUHTXLUHGIRUDOOIXQFWLRQ EDVHGLQGH[HV 6HW NLS_SORT aSSURSULDWHO\ 7KHIRXUWKUHTXLUHPHQWLVWKDWWKHNLS_SORTSDUDPHWHUIRUWKHTXHU\ VKRXOGLQGLFDWHWKHOLQJXLVWLFGHILQLWLRQWKDWZDVVSHFLILHGLQWKHLQGH[ &5($7(VWDWHPHQW,WFDQEHVSHFLILHGLPSOLFLWO\ LILWZDVFRQVWDQW RU GLUHFWO\DVWKHVHFRQGDUJXPHQWWRWKH1/66257IXQFWLRQ 6HW2SWLPL]HUPRGHWR FIRST_ROWS 8VHWKHFRVWEDVHGRSWLPL]HUZLWKWKHRSWLPL]HUPRGHVHWWR ),567B52:6EHFDXVHIXQFWLRQEDVHGLQGH[HVDUHQRWUHFRJQL]HGE\WKH UXOHEDVHGRSWLPL]HU )RUFXVWRPHUVUXQQLQJRQSUH2UDFOHL5GDWDEDVHV\RXZLOODOVRQHHGWRVSHFLI\ DQH[WUDGXPP\WHEREFODXVHWRWULJJHUWKHIXQFWLRQEDVHGLQGH[ WHERE NLSSORT(column_name) IS NOT NULL ZKHQ\RXZDQWWRXVHORDER BY column_nameZKHUHWKHcolumn_nameLVWKH FROXPQZLWKWKHOLQJXLVWLFLQGH[7KLVLVQHFHVVDU\RQO\ZKHQ\RXXVHDQORDER BY FODXVH7KLVGXPP\WHEREFODXVHLVQRWQHHGHGLQL5LIcolumn_nameKDVDOUHDG\ EHHQGHILQHGDVD12718//FROXPQ +HUHLVDQH[DPSOHRQKRZWRFUHDWHDOLQJXLVWLFLQGH[FDOOHG1/6B*(1(5,& EDVHGRQWKHPXOWLOLQJXDOOLQJXLVWLFVRUW*(1(5,&B0RQWKHPRODUCTWDEOH ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE; ALTER SESSION SET QUERY_REWRITE_INTEGRITY=TRUSTED; CREATE INDEX NLS_GENERIC ON product (NLSSORT(product_name, 'NLS_SORT=GENERIC_M')); Sorting Your Linguistic Data Inside the Oracle9i Database Page 20 ALTER SESSION SET NLS_SORT=GENERIC_M; ALTER SESSION SET OPTIMIZER_MODE=FIRST_ROWS; Even if all the linguistic index requirements are met, it is possible that the optimizer will choose not to use the linguistic index SELECT * FROM product because there are cheaper plans available. WHERE NLSSORT(product_name) IS NOT NULL – Not needed in 9iR2 ORDER BY product_name; Adding the hint /*+ index(table indexname) */ will cause the cost based optimizer to be WHERE NOT used and will cause the index to be used if 7KH FODXVHLVQRWQHHGHGLQL5LISURGXFWBQDPHKDVEHHQGHILQHGDVD NULL there is any legal query plan that allows it FROXPQ Case Insensitive Search 7KH64/NLS_UPPERDQGNLS_LOWERIXQFWLRQVSHUIRUPOLQJXLVWLFVHQVLWLYHFDVLQJRI VWULQJVEDVHGRQDJLYHQOLQJXLVWLFVRUWGHILQLWLRQ7KLVDOORZVXVWRSHUIRUPFDVH LQVHQVLWLYHVHDUFKHVUHJDUGOHVVRIWKHODQJXDJHEHLQJXVHG SELECT product_name FROM product WHERE NLS_UPPER(product_name, ’NLS_SORT = XGERMAN’) = ’GROSSE’; PRODUCT NAME ------große Große GROSSE -XVWOLNHIRUOLQJXLVWLFVRUWVDIXQFWLRQEDVHGLQGH[FDQEHEXLOWWRLPSURYHWKH SHUIRUPDQFHRIFDVHLQVHQVLWLYHVHDUFKHV)RUH[DPSOH CREATE INDEX case_insensitive_index ON product (NLS_UPPER(product_name)); 7KLVLQGH[ZLOOEHXWLOL]HGZKHQHYHUDNLS_UPPER ()VWULQJFRPSDULVRQLV SHUIRUPHGRQWKHFROXPQSURGXFWBQDPH SELECT * FROM PRODUCT WHERE NLS_UPPER(product_name) = ’GROSSE’; Sorting Your Linguistic Data Inside the Oracle9i Database Page 21 GENERIC_BASELETTER Sort ,QVWHDGRIXVLQJWKH64/NLS_UPPERDQGNLS_LOWERIXQFWLRQVWRSHUIRUPFDVH LQVHQVLWLYHTXHULHVDQDOWHUQDWLYHDSSURDFKVWDUWLQJIURP2UDFOHL5HOHDVH LV WRXWLOL]HWKHOLQJXLVWLFVRUWGENERIC_BASELETTER. GENERIC_BASELETTERJURXSVDOO FKDUDFWHUVWRJHWKHUEDVHGRQWKHLUEDVHOHWWHUYDOXHVWKLVLVDFKLHYHGE\LJQRULQJ WKHLUFDVHDQGWKHGLDFULWLFGLIIHUHQFHV +HUHLVDQH[DPSOHRIDGENERIC_BASELETTERTXHU\ ALTER SESSION SET NLS_COMP=ANSI; GENERIC_BASELETTER search is not a ALTER SESSION SET NLS_SORT=GENERIC_BASELETTER; linguistic sensitive search; it is not based on any specific language. The sort defines the base letters of the underlying SELECT * FROM PRODUCT characters only; hence it simulates the WHERE PRODUCT_NAME = ’database’; behavior of a case and accent insensitive linguistic sort. DATABASE Database database dätäbase Customization of Linguistic Sorts $FRPSUHKHQVLYHFROOHFWLRQRIOLQJXLVWLFVRUWVKDVEHHQSURYLGHGLQ2UDFOHLWR PHHWWKHGHPDQGRIRXUFXVWRPHUVZLWKLQFUHDVLQJGLIIHUHQWODQJXDJHQHHGV +RZHYHUWKHUHZLOODOZD\VEHQHZVRUWLQJUHTXLUHPHQWVGXHWRFKDQJHVLQFXOWXUDO FRQYHUVLRQVRUWKHUHVXOWRIHPHUJLQJQHZLQGXVWU\VWDQGDUGV VXFKDVWKH,62DQG 8QLFRGHVWDQGDUG VRPHWLPHVLWPD\EHQHFHVVDU\WRFXVWRPL]HDOLQJXLVWLFVRUWLQ DZD\VXFKWKDWLWLVFRQVLVWHQWZLWKWKHDSSURDFKDGRSWHGE\RWKHUYHQGRUVLQ RUGHUWREHFRPSDWLEOHZLWKRWKHUVRIWZDUHSURGXFWVDFURVVGLIIHUHQWSODWIRUPV 2UDFOH/RFDOH%XLOGHULVD*8,WRROLQWURGXFHGLQ2UDFOHLIRUFRQILJXULQJORFDOH GDWDGHILQLWLRQV,WSURYLGHVDQHDV\WRXVHJUDSKLFDOLQWHUIDFHWKURXJKZKLFKXVHU FDQHDVLO\YLHZH[LVWLQJ2UDFOHORFDOHGHILQLWLRQVFXVWRPL]HWKHPRUFUHDWHQHZ GHILQLWLRQV)RUOLQJXLVWLFVRUWGHILQLWLRQV/RFDOH%XLOGHUJLYHVDJUDSKLFDO UHSUHVHQWDWLRQRIWKHVRUWRUGHUWKDWLVLQWXLWLYHIRUERWKYLHZLQJDQG FXVWRPL]DWLRQSXUSRVHV2UDFOH/RFDOH%XLOGHUVXSSRUWVWKHUHDUUDQJHPHQWRI FKDUDFWHUVLQGLIIHUHQWVHTXHQFHRUGHUVDQGSURYLGHVFXVWRPL]DWLRQRIFRQWUDFWLQJ DQGH[SDQGLQJFKDUDFWHUVFRQWH[WVHQVLWLYHFKDUDFWHUVDVZHOODVVXSSOHPHQWDU\ FKDUDFWHUV Sorting Your Linguistic Data Inside the Oracle9i Database Page 22 Screen shot of the Oracle Locale Builder. 3OHDVHUHIHUWR&KDSWHU&XVWRPL]LQJ/RFDOH'DWDLQWKH2UDFOHL'DWDEDVH *OREDOL]DWLRQ6XSSRUW*XLGHIRUPRUHLQIRUPDWLRQ SUMMARY 6RUWLQJ\RXUGDWDLQDOLQJXLVWLFVHQVLWLYHPDQQHULVDQLPSRUWDQWSDUWRIGDWD SURFHVVLQJLQDJOREDOO\GHSOR\HGDSSOLFDWLRQ3UHVHQWLQJGDWDQRWVRUWHGLQWKH OLQJXLVWLFVHTXHQFHWKDW\RXUXVHUVDUHDFFXVWRPHGWRFDQPDNHVHDUFKLQJIRU LQIRUPDWLRQGLIILFXOWDQGWLPHFRQVXPLQJ :LWKWKH2UDFOHLGDWDEDVH2UDFOHKDVH[SDQGHGWKHFRYHUDJHRIELQDU\VRUWV OLQJXLVWLFVRUWVDQGLQWURGXFHGWKHQHZPXOWLOLQJXDOOLQJXLVWLFVRUWVWRPHHWWKH GHPDQGVRIFXVWRPHUVZKRQHHGWRVHDUFKDQGVRUWGDWDLQPXOWLSOHODQJXDJHV 2UDFOHFRQWLQXHVWRHQKDQFHOLQJXLVWLFVRUWFRYHUDJHDQGSURYLGHVFRQIRUPDQFHWR LQWHUQDWLRQDOVWDQGDUGVE\VXSSRUWLQJWKH,62,QWHUQDWLRQDOVWULQJRUGHULQJ VWDQGDUGDQGWKH8QLFRGHFROODWLRQVWDQGDUG 8&$ 2UDFOHLGDWDEDVHRIIHUVDFRPSUHKHQVLYHVHOHFWLRQRIOLQJXLVWLFVRUWVVXSSRUWLQJ RYHUPRQROLQJXDOOLQJXLVWLFVRUWVDQGPXOWLOLQJXDOOLQJXLVWLFVRUWV7KRVH FXVWRPHUVZLWKVSHFLDOUHTXLUHPHQWVWKDWJREH\RQGWKHH[WHQVLYHVHWRIOLQJXLVWLF VRUWVSURYLGHGE\2UDFOHLKDYHWKHIOH[LELOLW\RIFXVWRPL]LQJDQGGHILQLQJWKHLU RZQOLQJXLVWLFVRUWVE\XVLQJWKH2UDFOH/RFDOH%XLOGHUDQHZHDV\WRXVH*8, WRROWKDWDOORZVWKHPWRYLHZH[LVWLQJDQGFUHDWHQHZOLQJXLVWLFVRUWV Sorting Your Linguistic Data Inside the Oracle9i Database Page 23 Sorting your data inside the Oracle database August 2002 Author: Simon Law Contributing Authors: Sergiusz Wolicki, Claire Ho, Barry Trute Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 www.oracle.com Oracle Corporation provides the software that powers the internet. Oracle is a registered trademark of Oracle Corporation. Various product and service names referenced herein may be trademarks of Oracle Corporation. All other product and service names mentioned may be trademarks of their respective owners. Copyright © 2000 Oracle Corporation All rights reserved.