| $PostgreSQL: pgsql/src/backend/access/gist/README,v 1.4 2008/03/20 17:55:14 momjian Exp $ |
| |
| GiST Indexing |
| ============= |
| |
| This directory contains an implementation of GiST indexing for Postgres. |
| |
| GiST stands for Generalized Search Tree. It was introduced in the seminal paper |
| "Generalized Search Trees for Database Systems", 1995, Joseph M. Hellerstein, |
| Jeffrey F. Naughton, Avi Pfeffer: |
| |
| http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps |
| |
| and implemented by J. Hellerstein and P. Aoki in an early version of |
| PostgreSQL (more details are available from The GiST Indexing Project |
| at Berkeley at http://gist.cs.berkeley.edu/). As a "university" |
| project it had a limited number of features and was in rare use. |
| |
| The current implementation of GiST supports: |
| |
| * Variable length keys |
| * Composite keys (multi-key) |
| * provides NULL-safe interface to GiST core |
| * Concurrency |
| * Recovery support via WAL logging |
| |
| The support for concurrency implemented in PostgreSQL was developed based on |
| the paper "Access Methods for Next-Generation Database Systems" by |
| Marcel Kornaker: |
| |
| http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz |
| |
| The original algorithms were modified in several ways: |
| |
| * They should be adapted to PostgreSQL conventions. For example, the SEARCH |
| algorithm was considerably changed, because in PostgreSQL function search |
| should return one tuple (next), not all tuples at once. Also, it should |
| release page locks between calls. |
| * Since we added support for variable length keys, it's not possible to |
| guarantee enough free space for all keys on pages after splitting. User |
| defined function picksplit doesn't have information about size of tuples |
| (each tuple may contain several keys as in multicolumn index while picksplit |
| could work with only one key) and pages. |
| * We modified original INSERT algorithm for performance reason. In particular, |
| it is now a single-pass algorithm. |
| * Since the papers were theoretical, some details were omitted and we |
| have to find out ourself how to solve some specific problems. |
| |
| Because of the above reasons, we have to revised interaction of GiST |
| core and PostgreSQL WAL system. Moreover, we encountered (and solved) |
| a problem of uncompleted insertions when recovering after crash, which |
| was not touched in the paper. |
| |
| Search Algorithm |
| ---------------- |
| |
| Function gettuple finds a tuple which satisfies the search |
| predicate. It store their state and returns next tuple under |
| subsequent calls. Stack contains page, its LSN and LSN of parent page |
| and currentposition is saved between calls. |
| |
| gettuple(search-pred) |
| if ( firsttime ) |
| push(stack, [root, 0, 0]) // page, LSN, parentLSN |
| currentposition=0 |
| end |
| ptr = top of stack |
| while(true) |
| latch( ptr->page, S-mode ) |
| if ( ptr->page->lsn != ptr->lsn ) |
| ptr->lsn = ptr->page->lsn |
| currentposition=0 |
| if ( ptr->parentlsn < ptr->page->nsn ) |
| add to stack rightlink |
| else |
| currentposition++ |
| end |
| |
| while(true) |
| currentposition = find_first_match( currentposition ) |
| if ( currentposition is invalid ) |
| unlatch( ptr->page ) |
| pop stack |
| ptr = top of stack |
| if (ptr is NULL) |
| return NULL |
| break loop |
| else if ( ptr->page is leaf ) |
| unlatch( ptr->page ) |
| return tuple |
| else |
| add to stack child page |
| end |
| currentposition++ |
| end |
| end |
| |
| |
| Insert Algorithm |
| ---------------- |
| |
| INSERT guarantees that the GiST tree remains balanced. User defined key method |
| Penalty is used for choosing a subtree to insert; method PickSplit is used for |
| the node splitting algorithm; method Union is used for propagating changes |
| upward to maintain the tree properties. |
| |
| NOTICE: We modified original INSERT algorithm for performance reason. In |
| particularly, it is now a single-pass algorithm. |
| |
| Function findLeaf is used to identify subtree for insertion. Page, in which |
| insertion is proceeded, is locked as well as its parent page. Functions |
| findParent and findPath are used to find parent pages, which could be changed |
| because of concurrent access. Function pageSplit is reccurrent and could split |
| page by more than 2 pages, which could be necessary if keys have different |
| lengths or more than one key are inserted (in such situation, user defined |
| function pickSplit cannot guarantee free space on page). |
| |
| findLeaf(new-key) |
| push(stack, [root, 0]) //page, LSN |
| while(true) |
| ptr = top of stack |
| latch( ptr->page, S-mode ) |
| ptr->lsn = ptr->page->lsn |
| if ( exists ptr->parent AND ptr->parent->lsn < ptr->page->nsn ) |
| unlatch( ptr->page ) |
| pop stack |
| else if ( ptr->page is not leaf ) |
| push( stack, [get_best_child(ptr->page, new-key), 0] ) |
| unlatch( ptr->page ) |
| else |
| unlatch( ptr->page ) |
| latch( ptr->page, X-mode ) |
| if ( ptr->page is not leaf ) |
| //the only root page can become a non-leaf |
| unlatch( ptr->page ) |
| else if ( ptr->parent->lsn < ptr->page->nsn ) |
| unlatch( ptr->page ) |
| pop stack |
| else |
| return stack |
| end |
| end |
| end |
| |
| findPath( stack item ) |
| push stack, [root, 0, 0] // page, LSN, parent |
| while( stack ) |
| ptr = top of stack |
| latch( ptr->page, S-mode ) |
| if ( ptr->parent->page->lsn < ptr->page->nsn ) |
| push stack, [ ptr->page->rightlink, 0, ptr->parent ] |
| end |
| for( each tuple on page ) |
| if ( tuple->pagepointer == item->page ) |
| return stack |
| else |
| add to stack at the end [tuple->pagepointer,0, ptr] |
| end |
| end |
| unlatch( ptr->page ) |
| pop stack |
| end |
| |
| findParent( stack item ) |
| parent = item->parent |
| latch( parent->page, X-mode ) |
| if ( parent->page->lsn != parent->lsn ) |
| while(true) |
| search parent tuple on parent->page, if found the return |
| rightlink = parent->page->rightlink |
| unlatch( parent->page ) |
| if ( rightlink is incorrect ) |
| break loop |
| end |
| parent->page = rightlink |
| latch( parent->page, X-mode ) |
| end |
| newstack = findPath( item->parent ) |
| replace part of stack to new one |
| return findParent( item ) |
| end |
| |
| pageSplit(page, allkeys) |
| (lkeys, rkeys) = pickSplit( allkeys ) |
| if ( page is root ) |
| lpage = new page |
| else |
| lpage = page |
| rpage = new page |
| if ( no space left on rpage ) |
| newkeys = pageSplit( rpage, rkeys ) |
| else |
| push newkeys, union(rkeys) |
| end |
| if ( no space left on lpage ) |
| push newkeys, pageSplit( lpage, lkeys ) |
| else |
| push newkeys, union(lkeys) |
| end |
| return newkeys |
| |
| |
| placetopage(page, keysarray) |
| if ( no space left on page ) |
| keysarray = pageSplit(page, [ extract_keys(page), keysarray]) |
| last page in chain gets old NSN, |
| original and others - new NSN equals to LSN |
| if ( page is root ) |
| make new root with keysarray |
| end |
| else |
| put keysarray on page |
| if ( length of keysarray > 1 ) |
| keysarray = [ union(keysarray) ] |
| end |
| end |
| |
| insert(new-key) |
| stack = findLeaf(new-key) |
| keysarray = [new-key] |
| ptr = top of stack |
| while(true) |
| findParent( ptr ) //findParent latches parent page |
| keysarray = placetopage(ptr->page, keysarray) |
| unlatch( ptr->page ) |
| pop stack; |
| ptr = top of stack |
| if (length of keysarray == 1) |
| newboundingkey = union(oldboundingkey, keysarray) |
| if (newboundingkey == oldboundingkey) |
| unlatch ptr->page |
| break loop |
| end |
| end |
| end |
| |
| Authors: |
| Teodor Sigaev <teodor@sigaev.ru> |
| Oleg Bartunov <oleg@sai.msu.su> |