UpBit:'Scalable'InMemory' Updatable'Bitmap'Indexing Manos&Athanassoulis Harvard&University Talk&at&CS65,&September&26 th,&26
Indexing'for'Analytical'Workloads Column&A 3 A= A=2 A=3 Specialized&indexing Compact&representation&of&query&result 2 3 Query&result&is&readily&available 2 Bitvectors Can&leverage&fast&Boolean&operators 3 2 Bitwise&AND/OR/NOT&faster&than&looping&over&meta&data 2
Bitmap'Indexing'Limitations Index&Size Column&A 3 A= A=2 A=3 SpaceQinefficient&for&large&domains 2 3 Addressed&by&bitvector&encoding/compression 2 core'idea:&run$length*encoding*in&prior&work 3 2 but$ Updating&encoded&bitvectors&is&very inefficient 3
4 3&zeros ending&pattern Update? &zeros ending&pattern flip*bit re$encode encode decode
Goal Bitmap&Indexing&with&efficient&Reads&&&Updates 5
Prior%Work:'Bitmap'Indexing'and'Deletes Update&Conscious&Bitmaps&(UCB),&SSDBM&27 A= A=2 A=3 EB efficient&deletes&by&invalidation existence&bitvector&(eb) 6
Prior%Work:'Bitmap'Indexing'and'Deletes efficient&deletes&by&invalidation existence&bitvector&(eb) reads? bitwise&and&with&eb updates? deleteqthenqappend 7 A= A=2 A=3 EB EB A=2 Update&Conscious&Bitmaps&(UCB),&SSDBM&27
Prior%Work:'Limitations Latency (ms) 45 4 35 3 25 2 5 5 Bitwise AND with EB Decode VB Update and Encode EB Decode EB n=m&tuples,&d=&domain&values,&5%&updates&/&5%&reads U R U R U R U R U R UCB st UCB K UCB K UCB 2K UCB 4K read&cost&increases&with&#updates why? bitwise&and&with&eb&is&the&bottleneck update&eb&is&costly&for&>>&#updates UCB&performance&does¬&scale&with&#updates single&auxiliary&bitvector& 8 repetitive&bitwise&operations
Bitmap'Indexing'for'Reads'&'Updates distribute&update&cost efficient&random&accesses&in&compressed&bitvectors queryqdriven&requse&results&of&bitwise&operations 9
Design'Element':'update'bitvectors A= UB one&per&value&of&the&domain initialized&to&s the¤t&value&is&the&xor every&update&flips&a&bit&on&ub
Design'Element':'update'bitvectors A= UB one&per&value&of&the&domain initialized&to&s the¤t&value&is&the&xor every&update&flips&a&bit&on&ub &distribute&the&update&burden
Updating'UpBit 2 &row&2&to& A= A=2 A=3 UB UB UB
Updating'UpBit 3 &row&2&to&. find&old&value&of&row&2&(a=2) A= A=2 A=3 UB UB UB
Updating'UpBit 4 &row&2&to&. find&old&value&of&row&2&(a=2) A= A=2 A=3 UB UB UB
Updating'UpBit 5 &row&2&to&. find&old&value&of&row&2&(a=2) 2. flip&bit&of&row&2&of&ub&of&a=2 A= A=2 A=3 UB UB UB
Updating'UpBit 6 &row&2&to&. find&old&value&of&row&2&(a=2) 2. flip&bit&of&row&2&of&ub&of&a=2 3. flip&bit&of&row&2&of&ub&of&a= A= A=2 A=3 UB UB UB can&we&speed&up&step&?
Design'Element'2:'fence'pointers efficient&access&of&compressed&bitvectors fence'pointers row&3 row&6 7
Updating'UpBit 8 &row&2&to&. find&old&value&of&row&2&(a=2) A= A=2 A=3 UB UB UB
Updating'UpBit (with'fence'pointers) 9 &row&2&to&. find&old&value&of&row&2&(a=2) using&fence&pointers A= A=2 A=3 UB UB UB row&2
Querying 2
Querying'UpBit 2 &A&=&2 Return&the&XOR&of&A=2&and&UB A= A=2 A=3 UB UB UB
Querying'UpBit &A&=&2 Return&the&XOR&of&A=2&and&UB A=2 UB A=2 can&we&requse&the&result? 22
Design'Element'3:'querydriven'merging A=2 UB A=2 UB maintain&high&compressibility&of&ub query2driven'merging on&query& A=2 23
UpBit supports'very'efficient'updates n=m&tuples,&d=&domain&values k&queries&(varying&%&of&updates) Update Latency (ms) 35 3 25 2 5 5 In-place UCB UpBit Read Latency (ms) 35 3 25 2 5 5 Read-optimized UCB UpBit % update 5% update % update updates:&5q29x&faster&than&ucb 5Q5x&faster&than&inQplace % update 5% update % update only&8%&read&overhead&over&optimal 3x&faster&reads&than&UCB 24
UpBit offers'robust'reads Update Latency (ms) 35 3 25 2 5 5 Read Latency (ms) 45 4 35 3 25 2 5 5 In-place UCB UpBit 5 5 2 25 3 35 4 % update 5% update % update updates:&5q29x&faster&than&ucb 5Q5x&faster&than&inQplace Read Latency (ms) 35 3 25 2 5 # updates (thousands) 5 n=m&tuples,&d=&domain&values 5%/5%&update/read&queries UCB UpBit Ideal In-place UCB UpBit % update 5% update % update only&8%&read&overhead&over&optimal 3x&faster&reads&than&UCB 25
More'details' ' Tuning: how&frequent&to&merge&ub&to&the&index? Tuning:'what&is&the&optimal&granularity&of&fence&pointers? Optimizations: multiqthreaded&reads&and&updates Performance: full&query&analysis&(scientific&data&and&tpch)
Tuning when&to&merge?&(during&reads) inqplace&updates?&(fence&pointers) how&frequently&to&merge? how&frequent&fence&pointers? Update Latency (ms).7.6.5.4.3.2. Update Read 5 2 5 2 5 Merging threshold merge&back&every&q2&updates 4 2 8 6 4 2 Read Latency (ms) Latency (ms) 35 3 25 2 5 5 Read Update Memory length = 6 length = 5 length = 4 length = 3 length = 2 length = length = no FP length = 7 2 - -2-3 -4-5 -6-7 27 fence&pointers&every& 4 Q 5& values Space Overhead (ratio)
Memory'Consumption Space Overhead (MB) 35 3 25 2 5 5 Value Bitmaps Update Bitmaps Fence Pointers Existence Bitmap n&=&m,&d&=&&distinct&domain&values UpBit UpBit-FP UCB In-place Compressed UpBit UpBit-FP UCB In-place Uncompressed update&bitvectors:&small&memory&footprint&when&compressed 28
UpBit vs.'scan Latency (ms) 5 4 3 2 M M n&=&b,&d&=&&distinct&domain&values&(range) n&=&b,&d&vary&for&equality:&,,, M.5 B UpBit (range query) UpBit (equality query) Scan.% % % % we&have&the&classical&crossover&of&index&vs.&scan Selectivity shifts&when&the&query&needs&to&or&multiple&bitvectors&(range) 29
UpBit:'achieving'scalable'updates distribute&the&update&burden& update'bitvectors Thanks! efficient&bitvector&accesses fence'pointers avoid&redundant&bitwise&operations query2driven'merging'of'ub http://daslab.seas.harvard.edu/rum/ 3
Building'Access'Methods every&access&method &is&optimizing&for&the&tradeoff&between Tree Index Bitmap'Index Reads Updates Memory this&balance&forms&a&threeqway&tug&of&war 3