Bitmap Indexes 3 - Bitmap Join Indexes (Sept 2003)

In recent articles, I have described the functionality of bitmap indexes, and the bitmap star transformation. In this article we move on to the bitmap join index an option just introduced in Oracle 9.2

In the previous article in this short loop access path through the primary series on bitmap indexes, I described keys on the outlying tables. how ordinary bitmap indexes are used in the bitmap star Apparently, according to a whitepaper transformation. In this article we look I found recently on Metalink, Oracle at the effect, and costs, of using the actually owns the copyright to some new bitmap join index in an attempt aspects of this extremely cute strategy. to make the process even more efficient. It's fantastic - what's the problem There are three possible inefficiencies Figure 1 shows us a simple example of in this strategy. (Which, I have to add, the way in which several tables might seem to be very small compared to the be related in a query that Oracle would massive benefit that the technology address through the bitmap star can introduce). transformation. First - if the queries reaching the system are typically about some

Dim 1 attribute of a dimension, but never acquire data about that dimension, then we are adding work by travelling Dim 4 Facts Dim 2 Parent Y through the dimension and perhaps merging many bitmap index sections corresponding to the selected primary Dim 3 keys from that dimension.

Parent X Secondly - if the queries reaching the system are typically aimed at the parent tables (the outer layers of the Figure 1 Simple Snowflake Schema. snowflake) then we are doing a lot of work tracking back and forth through As we saw in the last article, the query the dimension tables. represented by Figure 1 would be addressed by a two-stage process. Finally - bitmap indexes on columns First Oracle would visit all the of higher 'distinct cardinality' can be necessary outlying tables (the relatively large; which leads to a waste dimension tables and their parents) to of buffer space loading them, and a collect a set of primary keys from each waste of memory and CPU if we have of the dimensions, which it could use to merge lots of small bitmap ranges. to access the four bitmap indexes into (See my first paper on bitmap the fact table. indexes to find out why the myth about 'bitmaps for low-cardinality After ANDing the four bitmaps, to columns is misleading') locate the minimum set of relevant fact rows, Oracle would 'turn around' So - what has Oracle got to offer those to revisit the dimension tables and of us whose systems are described by their parents - possibly using a nested the conditions above? The answer is bitmap join indexes. What is a bitmap join index ? create bitmap index pe_work_idx As its name suggests, this is a bitmap on people(wo.name) index, but an index that is described from through a join query. towns wo, people pe where Amazingly, it is an index where the pe.id_town_work = wo.id indexed values come from one table, ; but the bitmaps point to another table. Figure 3 Creating a basic bitmap join index.

Let's clarify this with a simple concrete Imagine that I have also noticed that example. Figure 2 reproduces the SQL queries about where people live are I used to build my first demonstration always based on the name of the of a star transformation. state they live in, and not on the name of the town they live in. So the bitmap create table states( id number(3,0), index on column (id_town_home) is name varchar2(30), even less appropriate, and could be constraint st_pk primary key(id) ); replaced by a bitmap join index that allows a query to jump straight into the create table towns( id number(3,0), people table, bypassing both the name varchar2(30), states the towns tables completely. id_state number(3,0) not null Figure 4 gives the definition for this references states, index: constraint to_pk primary key(id) ); create bitmap index pe_home_st_idx create table people( on people(st.name) id_town_work number(3,0) from not null references towns, states st, id_town_home number(3,0) towns ho, not null people pe references towns, where {other identifying columns} {fact columns} ho.id_state = st.id ); and pe.id_town_home = ho.id ; create bitmap index pe_work_idx on people(id_town_work); Figure 4 Creating a more subtle bitmap join index. create bitmap index pe_home_idx on people(id_town_home); You will probably find that the index Figure 2 Part of a star (snowflake) schema. pe_work_id is the same size as the index it has replaced, but there is a Imagine now, that I have observed that chance that the pe_home_st_idx will all the queries are about people who be significantly smaller than the work in particular towns, and these original pe_home_idx. However, the towns are always referenced by name amount of space saved is extremely and not acquired by searching through dependent on the data distribution and attributes of the town. I may decide the typical number of (in this case) that the bitmap index on column towns per state. (id_town_work) is sub-optimal, and should be replaced by a bitmap join In a test case with 4,000,000 rows, index that allows a query to jump 500 towns, and 17 states, with straight into the people table, maximum scattering of data, the bypassing the towns table completely. pe_home_st_idx index dropped from Figure 3 shows how I would define this 12MB to 9MB so the saving was not index. tremendous. On the other hand, when I rigged the distribution of data to The query very specifically selects emulate the scenario of loading data columns only from the people table. one state at a time, the index sizes Note how the execution plan doesn't were 8MB and 700K respectively. reference the towns table or the states table at all. It factors these out These tests, however, revealed an completely and resolves the rows important issue. Even in the more required by the query purely through dramatic space-saving case, the time the two bitmap join indexes. to create the bitmap join index was much greater than the time taken to The query: create the simple bitmap index. select pe.{some facts} The index creation statements took 12 from minutes 24 seconds in one case, and states st, towns wt, 4 minutes 30 seconds in the other; towns ho, compared to a base time of one people pe minute 10 seconds for the simple where index. st.name = 'Midlands' and ho.id_state = st.id and pe.id_town_home = ho.id Remember, after all, that the index and wt.name = 'London' definition is a join of three tables. The and pe.id_town_work = wt.id execution path Oracle used to create ; the index appears in Figure 5. The execution plan INDEX BUILD PE_HOME_ST_IDX (non unique) table access (by index rowid) of people BITMAP CONSTRUCTION bitmap conversion (to rowids) SORT (order by) bitmap and HASH JOIN bitmap index (single value) of pe_work_idx HASH JOIN bitmap index (single value) of pe_home_st_idx TABLE ACCESS TEST_USER STATES (full) TABLE ACCESS TEST_USER TOWNS (full) TABLE ACCESS PEOPLE (full) Figure 6 Querying through a bitmap join index.

Figure 5 Execution path for create index. There is a little oddity that may cause confusion when you run your favourite As you can see, this example hashes bit of SQL to describe these indexes. the two small tables and pass the Try executing: larger table through them. We are not writing an intermediate hash result to select table_name, column_name from user_ind_columns disc and rehashing it, so most of the where index_name = 'PE_WORK_IDX'; work must be due to the sort that takes place before the bitmap construction. The results come back as: Presumably this could be optimised somewhat by using larger amounts of TABLE_NAME COLUMN_NAME memory, but it is an important point to TOWNS NAME test before you go live on a full-scale system. Oracle is telling you the truth - the index on the people table is an index At the end of the day, though, the most on the towns.name column. But if important question is whether or not you've got code that assumes the these indexes work. So let's execute a table_name in user_ind_columns query for people, searching on home always matches the table_name in state, and work town. (See figure 6 for user_indexes, you will find that your the test query, and its execution plan). reports 'lose' bitmap join indexes. (In passing, the view user_indexes will have the value YES in the column join_index for any bitmap join upd_joinindex, which explain plan indexes). cannot yet cope with but which is known to Oracle as the operation Issues "bitmap join index update". Second The mechanism is not perfect - even the undocumented hint cardinality(), though it may offer significant benefits which is telling the cost based in special cases. optimizer to assume that the table aliased as T26763 will return exactly "Join back" still takes place - even if one row. And finally you will notice that you think it should be unnecessary. this SQL is nearly a copy of our For example, if you changed the query definition of index pe_home_st_idx, in Figure 6 to select the home state, but with the addition of table called and the work town, (the two columns SYS.L$15 - what is this strange table? actually stored in the index, and supplied in the where clause) Oracle A little digging (with the help of would still join back through all the sql_trace) demonstrates the fact that tables to report these values. Of every time you create a bitmap join course, since the main benefit comes index, Oracle needs at least a couple from reducing the cost of getting into of global temporary tables in the the fact (people) table, it is possible SYS schema to support that index. In that this little excess will be pretty fact there will be one global irrelevant in most cases. temporary table for each table referenced in the query that defines More importantly, you will recall my the index. warning in the previous articles about the dangers of mixing bitmap indexes These global temporary tables and data updates. This problem is appear with names like L$nnn, and even more significant in the case of are declared as 'on commit preserve bitmap join indexes. Try inserting a rows'. You don't have to worry about single row into the people table with space being used in system sql_trace set to true, and you will find tablespace, of course, as global some surprising recursive SQL going temporary tables allocate space in on behind the scenes - see figure 7 for the user's temporary tablespace only one of the two strange SQL as needed. Unfortunately if you drop statements that take place as part of the index (as you may decide to this single row insert. whenever applying a batch update), Oracle does not seem to drop all the UPD_JOININDEX global temporary table definitions. TEST_USER.PE_HOME_ST_IDX AS On the surface, this seems to be SELECT merely a bit of a nuisance, and not /*+ CARDINALITY(T26763, 1) */ much of a threat - however you may, T26759.NAME, T26763.L$ROWID like me, wonder what impact this might FROM have on the data dictionary if you are TEST_USER.STATES T26759, TEST_USER.TOWNS T26761, dropping and recreating bitmap join SYS.L$15 T26763 indexes on numerous tables on a WHERE daily basis. T26761.ID_STATE = T26759.ID AND T26763.ID_TOWN_HOME = T26761.ID If you pursue bitmap join indexes further through the use of sql_trace - Figure 7 A recursive update to a bitmap join indexes. and it is a good idea to do so before you put them into production, you will There are three new things in this one also see accesses to tables sys.log$, statement alone. First the command sys.jijoin$, sys.jifrefreshsql$ and sequence sys.log$sequence. These Bitmap join indexes may, in special are objects which are part of the cases, reduce index sizes and CPU infrastructure for maintaining the consumption quite significantly at bitmap indexes. jirefreshsql$, for query time. example, holds all the pieces of SQL text that might be used to update a Bitmap join indexes may take much bitmap join index when you change more time to build than similar simple data in the underlying tables (you need bitmap indexes. Make sure it's worth a different piece of SQL for each table the cost. referenced in the index definition). Be warned every time that Oracle gives The overheads involved when you you a new, subtle, piece of modify data covered by bitmap join functionality, there is usually a price to indexes can be very large - the pay somewhere. It is best to know indexes should almost certainly be something about the price before you dropped/invalidated and adopt the functionality. recreated/rebuilt as part of any update process - but watch out for the Conclusion problem in the previous paragraph. This article only scratches the surface of how you may make use of bitmap There are still some anomalies relating join indexes. It hasn't touched on to Oracle's handling of bitmap join issues with partitioned tables, or on indexes, particularly the clean-up indexes that include columns from process after they have been dropped. multiple tables, or many of the other areas which are worthy of careful References investigation. However it has Oracle 9i Release 2 Datawarehousing highlighted four key points. Guide. Chapter 6.

Jonathan Lewis is a freelance consultant with more than 17 years experience of Oracle. He specialises in physical database design and the strategic use of the Oracle database engine, is author of 'Practical Oracle 8I - Designing Efficient Databases' published by Addison-Wesley, and is one of the best-known speakers on the UK Oracle circuit. Further details of his published papers, presentations and seminars can be found at www.jlcomp.demon.co.uk, which also hosts The Co- operative Oracle Users' FAQ for the Oracle-related Usenet newsgroups.