src/backend/access/gist/README - hawq - Git at Google

 $PostgreSQL: pgsql/src/backend/access/gist/README,v 1.4 2008/03/20 17:55:14 momjian Exp $

 GiST Indexing
 =============

 This directory contains an implementation of GiST indexing for Postgres.

 GiST stands for Generalized Search Tree. It was introduced in the seminal paper
 "Generalized Search Trees for Database Systems", 1995, Joseph M. Hellerstein,
 Jeffrey F. Naughton, Avi Pfeffer:

     http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps

 and implemented by J. Hellerstein and P. Aoki in an early version of
 PostgreSQL (more details are available from The GiST Indexing Project
 at Berkeley at http://gist.cs.berkeley.edu/). As a "university"
 project it had a limited number of features and was in rare use.

 The current implementation of GiST supports:

   * Variable length keys
   * Composite keys (multi-key)
   * provides NULL-safe interface to GiST core
   * Concurrency
   * Recovery support via WAL logging

 The support for concurrency implemented in PostgreSQL was developed based on
 the paper "Access Methods for Next-Generation Database Systems" by
 Marcel Kornaker:

     http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz

 The original algorithms were modified in several ways:

 * They should be adapted to PostgreSQL conventions. For example, the SEARCH
   algorithm was considerably changed, because in PostgreSQL function search
   should return one tuple (next), not all tuples at once. Also, it should
   release page locks between calls.
 * Since we added support for variable length keys, it's not possible to
   guarantee enough free space for all keys on pages after splitting. User
   defined function picksplit doesn't have information about size of tuples
   (each tuple may contain several keys as in multicolumn index while picksplit
   could work with only one key) and pages.
 * We modified original INSERT algorithm for performance reason. In particular,
   it is now a single-pass algorithm.
 * Since the papers were theoretical, some details were omitted and we
   have to find out ourself how to solve some specific problems.

 Because of the above reasons, we have to revised interaction of GiST
 core and PostgreSQL WAL system. Moreover, we encountered (and solved)
 a problem of uncompleted insertions when recovering after crash, which
 was not touched in the paper.

 Search Algorithm
 ----------------

 Function gettuple finds a tuple which satisfies the search
 predicate. It store their state and returns next tuple under
 subsequent calls. Stack contains page, its LSN and LSN of parent page
 and currentposition is saved between calls.

 gettuple(search-pred)
 	if ( firsttime )
 		push(stack, [root, 0, 0]) // page, LSN, parentLSN
 		currentposition=0
 	end
 	ptr = top of stack
 	while(true)
 		latch( ptr->page, S-mode )
 		if ( ptr->page->lsn != ptr->lsn )
 			ptr->lsn = ptr->page->lsn
 			currentposition=0
 			if ( ptr->parentlsn < ptr->page->nsn )
 				add to stack rightlink
 		else
 			currentposition++
 		end

 		while(true)
 			currentposition = find_first_match( currentposition )
 			if ( currentposition is invalid )
 				unlatch( ptr->page )
 				pop stack
 				ptr = top of stack
 				if (ptr is NULL)
 					return NULL
 				break loop
 			else if ( ptr->page is leaf )
 				unlatch( ptr->page )
 				return tuple
 			else
 				add to stack child page
 			end
 			currentposition++
 		end
 	end


 Insert Algorithm
 ----------------

 INSERT guarantees that the GiST tree remains balanced. User defined key method
 Penalty is used for choosing a subtree to insert; method PickSplit is used for
 the node splitting algorithm; method Union is used for propagating changes
 upward to maintain the tree properties.

 NOTICE: We modified original INSERT algorithm for performance reason. In
 particularly, it is now a single-pass algorithm.

 Function findLeaf is used to identify subtree for insertion. Page, in which
 insertion is proceeded, is locked as well as its parent page. Functions
 findParent and findPath are used to find parent pages, which could be changed
 because of concurrent access. Function pageSplit is reccurrent and could split
 page by more than 2 pages, which could be necessary if keys have different
 lengths or more than one key are inserted (in such situation, user defined
 function pickSplit cannot guarantee free space on page).

 findLeaf(new-key)
 	push(stack, [root, 0]) //page, LSN
 	while(true)
 		ptr = top of stack
 		latch( ptr->page, S-mode )
 		ptr->lsn = ptr->page->lsn
 		if ( exists ptr->parent AND ptr->parent->lsn < ptr->page->nsn )
 			unlatch( ptr->page )
 			pop stack
 		else if ( ptr->page is not leaf )
 			push( stack, [get_best_child(ptr->page, new-key), 0] )
 			unlatch( ptr->page )
 		else
 			unlatch( ptr->page )
 			latch( ptr->page, X-mode )
 			if ( ptr->page is not leaf )
 				//the only root page can become a non-leaf
 				unlatch( ptr->page )
 			else if ( ptr->parent->lsn < ptr->page->nsn )
 				unlatch( ptr->page )
 				pop stack
 			else
 				return stack
 			end
 		end
 	end

 findPath( stack item )
 	push stack, [root, 0, 0] // page, LSN, parent
 	while( stack )
 		ptr = top of stack
 		latch( ptr->page, S-mode )
 		if ( ptr->parent->page->lsn < ptr->page->nsn )
 			push stack, [ ptr->page->rightlink, 0, ptr->parent ]
 		end
 		for( each tuple on page )
 			if ( tuple->pagepointer == item->page )
 				return stack
 			else
 				add to stack at the end [tuple->pagepointer,0, ptr]
 			end
 		end
 		unlatch( ptr->page )
 		pop stack
 	end

 findParent( stack item )
 	parent = item->parent
 	latch( parent->page, X-mode )
 	if ( parent->page->lsn != parent->lsn )
 		while(true)
 			search parent tuple on parent->page, if found the return
 			rightlink = parent->page->rightlink
 			unlatch( parent->page )
 			if ( rightlink is incorrect )
 				break loop
 			end
 			parent->page = rightlink
 			latch( parent->page, X-mode )
 		end
 		newstack = findPath( item->parent )
 		replace part of stack to new one
 		return findParent( item )
 	end

 pageSplit(page, allkeys)
 	(lkeys, rkeys) = pickSplit( allkeys )
 	if ( page is root )
 		lpage = new page
 	else
 		lpage = page
 	rpage = new page
 	if ( no space left on rpage )
 		newkeys = pageSplit( rpage, rkeys )
 	else
 		push newkeys, union(rkeys)
 	end
 	if ( no space left on lpage )
 		push newkeys, pageSplit( lpage, lkeys )
 	else
 		push newkeys, union(lkeys)
 	end
 	return newkeys


 placetopage(page, keysarray)
 	if ( no space left on page )
 		keysarray = pageSplit(page, [ extract_keys(page), keysarray])
 		last page in chain gets old NSN,
 		original and others - new NSN equals to LSN
 		if ( page is root )
 			make new root with keysarray
 		end
 	else
 		put keysarray on page
 		if ( length of keysarray > 1 )
 			keysarray = [ union(keysarray) ]
 		end
 	end

 insert(new-key)
 	stack = findLeaf(new-key)
 	keysarray = [new-key]
 	ptr = top of stack
 	while(true)
 		findParent( ptr ) //findParent latches parent page
 		keysarray = placetopage(ptr->page, keysarray)
 		unlatch( ptr->page )
 		pop stack;
 		ptr = top of stack
 		if (length of keysarray == 1)
 			newboundingkey = union(oldboundingkey, keysarray)
 			if (newboundingkey == oldboundingkey)
 				unlatch ptr->page
 				break loop
 			end
 		end
 	end

 Authors:
 	Teodor Sigaev	<teodor@sigaev.ru>
 	Oleg Bartunov   <oleg@sai.msu.su>
	$PostgreSQL: pgsql/src/backend/access/gist/README,v 1.4 2008/03/20 17:55:14 momjian Exp $

	GiST Indexing
	=============

	This directory contains an implementation of GiST indexing for Postgres.

	GiST stands for Generalized Search Tree. It was introduced in the seminal paper
	"Generalized Search Trees for Database Systems", 1995, Joseph M. Hellerstein,
	Jeffrey F. Naughton, Avi Pfeffer:

	http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps

	and implemented by J. Hellerstein and P. Aoki in an early version of
	PostgreSQL (more details are available from The GiST Indexing Project
	at Berkeley at http://gist.cs.berkeley.edu/). As a "university"
	project it had a limited number of features and was in rare use.

	The current implementation of GiST supports:

	* Variable length keys
	* Composite keys (multi-key)
	* provides NULL-safe interface to GiST core
	* Concurrency
	* Recovery support via WAL logging

	The support for concurrency implemented in PostgreSQL was developed based on
	the paper "Access Methods for Next-Generation Database Systems" by
	Marcel Kornaker:

	http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz

	The original algorithms were modified in several ways:

	* They should be adapted to PostgreSQL conventions. For example, the SEARCH
	algorithm was considerably changed, because in PostgreSQL function search
	should return one tuple (next), not all tuples at once. Also, it should
	release page locks between calls.
	* Since we added support for variable length keys, it's not possible to
	guarantee enough free space for all keys on pages after splitting. User
	defined function picksplit doesn't have information about size of tuples
	(each tuple may contain several keys as in multicolumn index while picksplit
	could work with only one key) and pages.
	* We modified original INSERT algorithm for performance reason. In particular,
	it is now a single-pass algorithm.
	* Since the papers were theoretical, some details were omitted and we
	have to find out ourself how to solve some specific problems.

	Because of the above reasons, we have to revised interaction of GiST
	core and PostgreSQL WAL system. Moreover, we encountered (and solved)
	a problem of uncompleted insertions when recovering after crash, which
	was not touched in the paper.

	Search Algorithm
	----------------

	Function gettuple finds a tuple which satisfies the search
	predicate. It store their state and returns next tuple under
	subsequent calls. Stack contains page, its LSN and LSN of parent page
	and currentposition is saved between calls.

	gettuple(search-pred)
	if ( firsttime )
	push(stack, [root, 0, 0]) // page, LSN, parentLSN
	currentposition=0
	end
	ptr = top of stack
	while(true)
	latch( ptr->page, S-mode )
	if ( ptr->page->lsn != ptr->lsn )
	ptr->lsn = ptr->page->lsn
	currentposition=0
	if ( ptr->parentlsn < ptr->page->nsn )
	add to stack rightlink
	else
	currentposition++
	end

	while(true)
	currentposition = find_first_match( currentposition )
	if ( currentposition is invalid )
	unlatch( ptr->page )
	pop stack
	ptr = top of stack
	if (ptr is NULL)
	return NULL
	break loop
	else if ( ptr->page is leaf )
	unlatch( ptr->page )
	return tuple
	else
	add to stack child page
	end
	currentposition++
	end
	end


	Insert Algorithm
	----------------

	INSERT guarantees that the GiST tree remains balanced. User defined key method
	Penalty is used for choosing a subtree to insert; method PickSplit is used for
	the node splitting algorithm; method Union is used for propagating changes
	upward to maintain the tree properties.

	NOTICE: We modified original INSERT algorithm for performance reason. In
	particularly, it is now a single-pass algorithm.

	Function findLeaf is used to identify subtree for insertion. Page, in which
	insertion is proceeded, is locked as well as its parent page. Functions
	findParent and findPath are used to find parent pages, which could be changed
	because of concurrent access. Function pageSplit is reccurrent and could split
	page by more than 2 pages, which could be necessary if keys have different
	lengths or more than one key are inserted (in such situation, user defined
	function pickSplit cannot guarantee free space on page).

	findLeaf(new-key)
	push(stack, [root, 0]) //page, LSN
	while(true)
	ptr = top of stack
	latch( ptr->page, S-mode )
	ptr->lsn = ptr->page->lsn
	if ( exists ptr->parent AND ptr->parent->lsn < ptr->page->nsn )
	unlatch( ptr->page )
	pop stack
	else if ( ptr->page is not leaf )
	push( stack, [get_best_child(ptr->page, new-key), 0] )
	unlatch( ptr->page )
	else
	unlatch( ptr->page )
	latch( ptr->page, X-mode )
	if ( ptr->page is not leaf )
	//the only root page can become a non-leaf
	unlatch( ptr->page )
	else if ( ptr->parent->lsn < ptr->page->nsn )
	unlatch( ptr->page )
	pop stack
	else
	return stack
	end
	end
	end

	findPath( stack item )
	push stack, [root, 0, 0] // page, LSN, parent
	while( stack )
	ptr = top of stack
	latch( ptr->page, S-mode )
	if ( ptr->parent->page->lsn < ptr->page->nsn )
	push stack, [ ptr->page->rightlink, 0, ptr->parent ]
	end
	for( each tuple on page )
	if ( tuple->pagepointer == item->page )
	return stack
	else
	add to stack at the end [tuple->pagepointer,0, ptr]
	end
	end
	unlatch( ptr->page )
	pop stack
	end

	findParent( stack item )
	parent = item->parent
	latch( parent->page, X-mode )
	if ( parent->page->lsn != parent->lsn )
	while(true)
	search parent tuple on parent->page, if found the return
	rightlink = parent->page->rightlink
	unlatch( parent->page )
	if ( rightlink is incorrect )
	break loop
	end
	parent->page = rightlink
	latch( parent->page, X-mode )
	end
	newstack = findPath( item->parent )
	replace part of stack to new one
	return findParent( item )
	end

	pageSplit(page, allkeys)
	(lkeys, rkeys) = pickSplit( allkeys )
	if ( page is root )
	lpage = new page
	else
	lpage = page
	rpage = new page
	if ( no space left on rpage )
	newkeys = pageSplit( rpage, rkeys )
	else
	push newkeys, union(rkeys)
	end
	if ( no space left on lpage )
	push newkeys, pageSplit( lpage, lkeys )
	else
	push newkeys, union(lkeys)
	end
	return newkeys


	placetopage(page, keysarray)
	if ( no space left on page )
	keysarray = pageSplit(page, [ extract_keys(page), keysarray])
	last page in chain gets old NSN,
	original and others - new NSN equals to LSN
	if ( page is root )
	make new root with keysarray
	end
	else
	put keysarray on page
	if ( length of keysarray > 1 )
	keysarray = [ union(keysarray) ]
	end
	end

	insert(new-key)
	stack = findLeaf(new-key)
	keysarray = [new-key]
	ptr = top of stack
	while(true)
	findParent( ptr ) //findParent latches parent page
	keysarray = placetopage(ptr->page, keysarray)
	unlatch( ptr->page )
	pop stack;
	ptr = top of stack
	if (length of keysarray == 1)
	newboundingkey = union(oldboundingkey, keysarray)
	if (newboundingkey == oldboundingkey)
	unlatch ptr->page
	break loop
	end
	end
	end

	Authors:
	Teodor Sigaev <teodor@sigaev.ru>
	Oleg Bartunov <oleg@sai.msu.su>