src/backend/cdb/endpoint/README - cloudberry - Git at Google

 src/backend/cdb/endpoint/README

 With the size of Greenplum cluster increasing, the performance bottleneck on
 the coordinator node becomes more and more obvious.

 The parallel retrieve cursor feature is designed to reduce the heavy burdens of
 the coordinator node. For a query, it can redirect the results to segments
 instead of gathering them to the coordinator node. When it is executed, user
 can setup retrieve mode connections to segments for retrieving result data in
 parallel.


 Parallel Retrieve Cursor
 ========================

 Like a normal cursor, user can declare a parallel retrieve cursor on
 coordinator, then retrieve the query results on each segment directly.

 Parallel retrieve cursor has similar declaration and syntax as normal cursor
 does. However, some cursor operations are not supported in parallel retrieve
 cursor(e.g. MOVE).

 #NOTE: Orca doesn't support PARALLEL RETRIEVE CURSOR for now. Greenplum would
 fall back to postgres optimizer automatically.

 Endpoint
 ========

 Once a parallel retrieve cursor has been declared on QD, a corresponding
 endpoint will be created on the same segment whose QE contains the query
 result. These endpoints are backed by individual backends on each QE. Then,
 those endpoints can be used as the source, and results can be retrieved from
 them in parallel on each QE.


 Declare Parallel Retrieve Cursor
 ================================

 Before retrieving data from endpoints in parallel, a parallel retrieve cursor
 variable needs to be created alongside with the query statement.

 Syntax:
 DECLARE cursor_name PARALLEL RETRIEVE CURSOR FOR query;

 The cursor_name needs to be unique in current transaction.

 Examples:

 postgres=# BEGIN;
 BEGIN
 postgres=# DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1;
 DECLARE


 List Parallel Retrieve Cursors and their endpoints
 ==================================================

 To retrieve the query results of a parallel retrieve cursor, the related
 endpoint information is needed before start a retrieve session on QEs. The UDF
 gp_get_endpoints() can be used to list parallel retrieve cursors and their
 endpoints information. This UDF could be run on the coordinator only.

 For a superuser, it can list all endpoints information of all users', but for
 non-superuser, it can only list the current user's endpoints information for
 security reason.

 Definition: gp_get_endpoints()

 gp_get_endpoints() Columns:
 |-------------+-----------+------------------------------------------|
 | Column Name | Data Type | Description                              |
 |-------------+-----------+------------------------------------------|
 | dbid        | integer   | The QE's dbid                            |
 |-------------+-----------+------------------------------------------|
 | auth_token  | text      | Retrieve-session authentication token    |
 |-------------+-----------+------------------------------------------|
 | cursorname  | text      | Parallel retrieve cursor name            |
 |-------------+-----------+------------------------------------------|
 | sessionid   | integer   | The session where the cursor was created |
 |-------------+-----------+------------------------------------------|
 | hostname    | text      | The host to retrieve from                |
 |-------------+-----------+------------------------------------------|
 | port        | integer   | The port number to start the retrieve    |
 |             |           | session                                  |
 |-------------+-----------+------------------------------------------|
 | userid      | oid       | The oid of the user                      |
 |-------------+-----------+------------------------------------------|
 | state       | text      | One of the following state for this      |
 |             |           | endpoint:                                |
 |             |           | READY:                                   |
 |             |           | The endpoint is ready to be retrieved    |
 |             |           | ATTACHED:                                |
 |             |           | The endpoint has been attached by a      |
 |             |           | a retrieve connection                    |
 |             |           | RETRIEVING:                              |
 |             |           | The endpoint is being retrieved          |
 |             |           | FINISHED:                                |
 |             |           | The endpoint has been fully retrieved    |
 |             |           | RELEASED:                                |
 |             |           | Due to errors, this endpoint has been    |
 |             |           | released and cannot be connected anymore |
 |-------------+-----------+------------------------------------------|
 | endpointname| text      | The identification of this endpoint      |
 |-------------+-----------+------------------------------------------|

 Examples:

 postgres=# select * from gp_get_endpoints();
  dbid |            auth_token            | cursorname | sessionid | hostname | port | userid | state |    endpointname
 ------+----------------------------------+------------+-----------+----------+------+--------+-------+--------------------
     2 | 75ebe7b49c3e09f35e017fc0181c62cf | c3         |       105 | host67   | 7002 |     10 | READY | c30000006900000005
     3 | 3ee0a64a465dde2813d4940961747c88 | c3         |       105 | host67   | 7003 |     10 | READY | c30000006900000005
     4 | 97a8eeee337798f718319c5234ea1440 | c3         |       105 | host67   | 7004 |     10 | READY | c30000006900000005
 (3 rows)

 There is another similar gp_get_session_endpoints() that shows the endpoint
 information that belong to this session only.

 Start A Retrieve Session
 ========================

 Once a parallel retrieve cursor has been declared, retrieve sessions can be
 started on each endpoint's host by using the endpoint's token as the session
 authentication password.

 gp_retrieve_conn=true needs to be set to start retrieve session.

 Examples:

 $> PGOPTIONS='-c gp_retrieve_conn=true' psql -h host67 -p 7002 -d postgres -U user1
 Password: 75ebe7b49c3e09f35e017fc0181c62cf
 postgres=#

 Here the "75ebe7b49c3e09f35e017fc0181c62cf", "host67" and "7002" are the
 endpoint "c30000006900000005"'s token, hostname and port.

 Besides the user and password should match, the user also needs login
 permission. Otherwise this role is not allowed to login, even with correct
 password.

 Multiple parallel retrieve cursors can be retrieved from the same retrieve
 session as long as their token is the same.


 Retrieve From Endpoint
 ======================

 In each retrieve session, the query result on that segment can be
 retrieved by using statement "RETRIEVE" and its corresponding endpoint
 name.

 Syntax:
 RETRIEVE { ALL | count } FROM ENDPOINT endpoint_name;

 Parameter "ALL" means to retrieve all the results from the endpoint.
 Otherwise, a positive integer value needs to be provided as the "count" to
 specify how many rows to retrieve.  An empty set will be returned if no more
 tuples for the endpoint.

 Examples:

 postgres=# RETRIEVE ALL FROM ENDPOINT c30000006900000005;
  col1 | col2
 ------+------
  0    | 0
  1    | 1
 (2 rows)


 List Endpoints In Utility Session On Endpoint QE
 ================================================

 It is possible to list all sessions' endpoints in the UTILITY connection to
 specific endpoint (coordinator or segment node) by using UDF
 gp_get_segment_endpoints(). Same as the UDF gp_get_endpoints(), a superuser can see the
 endpoint information of all users, but non-superuser can see its endpoints
 information only for security reason.

 Definition: gp_get_segment_endpoints()

 gp_get_segment_endpoints() Columns:
 |--------------+-----------+------------------------------------------|
 | Column Name  | Data Type | Description                              |
 |--------------+-----------+------------------------------------------|
 | auth_token   | text      | Retrieve session authentication token    |
 |--------------+-----------+------------------------------------------|
 | databaseid   | integer   | The database where the cursor is created |
 |--------------+-----------+------------------------------------------|
 | senderpid    | integer   | The process where the query results are  |
 |              |           | sent from                                |
 |--------------+-----------+------------------------------------------|
 | receiverpid  | integer   | The process where the results are being  |
 |              |           | received on                              |
 |--------------+-----------+------------------------------------------|
 | state        | text      | The state of the endpoint                |
 |              |           | See gp_get_endpoints() for more details  |
 |--------------+-----------+------------------------------------------|
 | dbid         | integer   | The QE's dbid                            |
 |--------------+-----------+------------------------------------------|
 | sessionid    | integer   | The session where the cursor created     |
 |--------------+-----------+------------------------------------------|
 | userid       | oid       | The user who created this endpoint       |
 |--------------+-----------+------------------------------------------|
 | endpointname | text      | The identification of this endpoint      |
 |--------------+-----------+------------------------------------------|
 | cursorname   | text      | Parallel retrieve cursor name            |
 |--------------+-----------+------------------------------------------|

 Examples:

 # Connect the segment in utility mode
 $> PGOPTIONS="-c gp_role=utility" psql -h host67 -p 7002 -d postgres

 postgres=# select * from gp_get_segment_endpoints();
             auth_token            | databaseid | senderpid | receiverpid | state | dbid | sessionid | userid |    endpointname    | cursorname
 ----------------------------------+------------+-----------+-------------+-------+------+-----------+--------+--------------------+------------
  75ebe7b49c3e09f35e017fc0181c62cf |      13361 |      3854 |          -1 | READY |    2 |       105 |     10 | c30000006900000005 | c3
 (1 row)

 Wait Parallel Retrieve Cursor To Be Fully Retrieved
 ===================================================

 UDF gp_wait_parallel_retrieve_cursor() is designed to block until all the
 endpoints have been fully retrieved for the given parallel retrieve cursor
 until timeout happens. It will block the coordinator session until all the
 relevant endpoints are fully retrieved unless timeout or error happens. When
 error happens, the current transaction will be aborted, all parallel retrieve
 cursors and their endpoints in current transaction will be freed. If it is
 interrupted by signals, the transaction will be aborted as well. When timeout
 happens, it will report whether all retrieve operations finish or not. Specify
 timeout_sec as 0 to check the status immediately and specify timeout_sec as -1
 to block until retrieve finishes or error happens.

 Definition: gp_wait_parallel_retrieve_cursor(cursor_name, timeout_sec)

 It will only return true if all endpoints fully retrieved. In any other cases,
 an error message will be thrown.

 Examples:

 postgres=# SELECT gp_wait_parallel_retrieve_cursor('c3', -1); <waiting...>
  gp_wait_parallel_retrieve_cursor
 -----------------------------------
  t
 (1 row)


 Error Handling In Retrieve Session
 ==================================

 When error happens in a retrieve session, the endpoint will be removed from QE.
 The other retrieve sessions can still retrieve from other endpoints which
 belong to this session at this time. But if gp_wait_parallel_retrieve_cursor()
 or CLOSE are executed on QD, the other retrieve sessions will be canceled since
 the error is delivered to QD and treated as an error happens in the
 transaction.

 When the endpoint is partially retrieved and then the corresponding retrieve
 session quits, the case will be treated as a retrieve session error as well.
 The endpoint will be removed and an error will be reported on QD when checking
 or waiting on the retrieve state. "CLOSE" will also report error in this case.


 Close Parallel Retrieve Cursor
 ==============================

 Use "CLOSE" statement to close the parallel retrieve cursor and release
 its endpoints. All the related on-going retrieve statements will be
 canceled.

 Syntax:
 CLOSE cursorname;

 Examples:

 postgres=# CLOSE C3
 CLOSE


 Walk-Through Example
 ====================

 Assumes we have a table named "t1" with an integer column "id", and its
 data distributed as following:

 postgres=# select gp_segment_id, * from t1;
  gp_segment_id | id
 ---------------+----
              0 |  2
              0 |  3
              0 |  4
              2 |  5
              1 |  0
              1 |  1
 (6 rows)

 To declare a parallel retrieve cursor on coordinator in order to select all
 tuples from t1:

 -- Begin transaction first
 postgres=# BEGIN;
 BEGIN

 -- Create the parallel retrieve cursor for the intent query
 postgres=# DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1;
 DECLARE

 -- List endpoints to get the needed information to start retrieving
 -- sessions on segments
 postgres=# SELECT * FROM gp_get_endpoints();
  dbid |            auth_token            | cursorname | sessionid | hostname | port | userid | state |    endpointname
 ------+----------------------------------+------------+-----------+----------+------+--------+-------+--------------------
     2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1         |        22 | host67   | 7002 |     10 | READY | c1000000160000000a
     3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1         |        22 | host67   | 7003 |     10 | READY | c1000000160000000a
     4 | a522dc766936be8a88710fa137d52df2 | c1         |        22 | host67   | 7004 |     10 | READY | c1000000160000000a
 (3 rows)

 Now start a retrieve session on the first segment (i.e. host67:7002)

 # gp_retrieve_conn=true is needed as the pg option.
 $> PGOPTIONS="-c gp_retrieve_conn=true" psql -h host67 -p 7002 -d postgres
 Password: c5c116a13e2fdb8b436cdbc8e1bc7365

 -- Retrieve all rows of this endpoint
 postgres=# RETRIEVE ALL FROM ENDPOINT c1000000160000000a;
  id
 ----
   2
   3
   4
 (3 rows)

 Now the state of endpoint c1000000160000000a for dbid 2 (host67:7002) should
 become "FINISHED" since all results on the segment have been retrieved.

 -- List endpoints on coordinator to check
 postgres=# SELECT * FROM gp_get_endpoints();
  dbid |            auth_token            | cursorname | sessionid | hostname | port | userid |  state   |    endpointname
 ------+----------------------------------+------------+-----------+----------+------+--------+----------+--------------------
     2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1         |        22 | host67   | 7002 |     10 | FINISHED | c1000000160000000a
     3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1         |        22 | host67   | 7003 |     10 | READY    | c1000000160000000a
     4 | a522dc766936be8a88710fa137d52df2 | c1         |        22 | host67   | 7004 |     10 | READY    | c1000000160000000a
 (3 rows)

 Start another retrieve session on host67:7003

 $> PGOPTIONS="-c gp_retrieve_conn=true" psql -h host67 -p 7003 -d postgres
 Password: 5f59f7f3338212f20d3bb1527fac9fcd

 -- Retrieve 1 row on this endpoint
 postgres=# RETRIEVE 1 FROM ENDPOINT c1000000160000000a;
  id
 ----
   0
 (1 row)

 The state of endpoint  will firstly becomes "RETRIEVING", then becomes
 "ATTACHED".  RETRIEVING means that a receiver is retrieving data from the
 endpoint. If a receiver finishes retrieving a part of data (but has not
 retrieved all the data), the state becomes ATTACHED, means that this endpoint
 has been attached by a receiver. If the receiver has retrieved all the data
 from the endpoint, the state becomes FINISHED.

 -- List endpoints on coordinator to check
 postgres=# SELECT * FROM gp_get_endpoints();
  dbid |            auth_token            | cursorname | sessionid | hostname | port | userid |  state   |    endpointname
 ------+----------------------------------+------------+-----------+----------+------+--------+----------+--------------------
     2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1         |        22 | host67   | 7002 |     10 | FINISHED | c1000000160000000a
     3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1         |        22 | host67   | 7003 |     10 | ATTACHED | c1000000160000000a
     4 | a522dc766936be8a88710fa137d52df2 | c1         |        22 | host67   | 7004 |     10 | READY    | c1000000160000000a
 (3 rows)

 -- To check if the retrieving finished on the coordinator connection.
 postgres =# select gp_wait_parallel_retrieve_cursor('c1', 0);
  gp_wait_parallel_retrieve_cursor
 -----------------------------------
  f
 (1 row)

 We can just close the cursor now if we decide to not retrieve the
 remaining query results (on the coordinator connection).

 postgres=# CLOSE c1;
 CLOSE CURSOR

 -- All endpoints are gone
 postgres=# SELECT * FROM gp_get_endpoints();
  dbid | auth_token | cursorname | sessionid | hostname | port | userid | state | endpointname
 ------+------------+------------+-----------+----------+------+--------+-------+--------------
 (0 rows)

 -- End the transaction
 coordinator=# END;
 END

 Implementation internals
 ====================
 The basic idea is that for a declared parallel retrieve cursor, we create the
 distributed plan that does not gather the tuples to the coordinator as usual.
 The top node in the plan might be on all segments or on partial segment or on
 one segment, depending on the query itself. The cursor backend (executor)
 process would output the tuples to the separate retrieve connection via the
 shm_mq based dest receiver.

 The communication between the cursor backend process and the retrieve
 connection is the endpoint that is represented as EndpointData. There is an
 array of EndpointData (sharedEndpoints[]) in shared memory for the whole
 cluster use. The cursor backend process would allocate one endpoint and the
 retrieve connection would try to use that endpoint thus authentication of
 retrieve connection is surely required.
	src/backend/cdb/endpoint/README

	With the size of Greenplum cluster increasing, the performance bottleneck on
	the coordinator node becomes more and more obvious.

	The parallel retrieve cursor feature is designed to reduce the heavy burdens of
	the coordinator node. For a query, it can redirect the results to segments
	instead of gathering them to the coordinator node. When it is executed, user
	can setup retrieve mode connections to segments for retrieving result data in
	parallel.


	Parallel Retrieve Cursor
	========================

	Like a normal cursor, user can declare a parallel retrieve cursor on
	coordinator, then retrieve the query results on each segment directly.

	Parallel retrieve cursor has similar declaration and syntax as normal cursor
	does. However, some cursor operations are not supported in parallel retrieve
	cursor(e.g. MOVE).

	#NOTE: Orca doesn't support PARALLEL RETRIEVE CURSOR for now. Greenplum would
	fall back to postgres optimizer automatically.

	Endpoint
	========

	Once a parallel retrieve cursor has been declared on QD, a corresponding
	endpoint will be created on the same segment whose QE contains the query
	result. These endpoints are backed by individual backends on each QE. Then,
	those endpoints can be used as the source, and results can be retrieved from
	them in parallel on each QE.


	Declare Parallel Retrieve Cursor
	================================

	Before retrieving data from endpoints in parallel, a parallel retrieve cursor
	variable needs to be created alongside with the query statement.

	Syntax:
	DECLARE cursor_name PARALLEL RETRIEVE CURSOR FOR query;

	The cursor_name needs to be unique in current transaction.

	Examples:

	postgres=# BEGIN;
	BEGIN
	postgres=# DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1;
	DECLARE


	List Parallel Retrieve Cursors and their endpoints
	==================================================

	To retrieve the query results of a parallel retrieve cursor, the related
	endpoint information is needed before start a retrieve session on QEs. The UDF
	gp_get_endpoints() can be used to list parallel retrieve cursors and their
	endpoints information. This UDF could be run on the coordinator only.

	For a superuser, it can list all endpoints information of all users', but for
	non-superuser, it can only list the current user's endpoints information for
	security reason.

	Definition: gp_get_endpoints()

	gp_get_endpoints() Columns:
	\|-------------+-----------+------------------------------------------\|
	\| Column Name \| Data Type \| Description \|
	\|-------------+-----------+------------------------------------------\|
	\| dbid \| integer \| The QE's dbid \|
	\|-------------+-----------+------------------------------------------\|
	\| auth_token \| text \| Retrieve-session authentication token \|
	\|-------------+-----------+------------------------------------------\|
	\| cursorname \| text \| Parallel retrieve cursor name \|
	\|-------------+-----------+------------------------------------------\|
	\| sessionid \| integer \| The session where the cursor was created \|
	\|-------------+-----------+------------------------------------------\|
	\| hostname \| text \| The host to retrieve from \|
	\|-------------+-----------+------------------------------------------\|
	\| port \| integer \| The port number to start the retrieve \|
	\| \| \| session \|
	\|-------------+-----------+------------------------------------------\|
	\| userid \| oid \| The oid of the user \|
	\|-------------+-----------+------------------------------------------\|
	\| state \| text \| One of the following state for this \|
	\| \| \| endpoint: \|
	\| \| \| READY: \|
	\| \| \| The endpoint is ready to be retrieved \|
	\| \| \| ATTACHED: \|
	\| \| \| The endpoint has been attached by a \|
	\| \| \| a retrieve connection \|
	\| \| \| RETRIEVING: \|
	\| \| \| The endpoint is being retrieved \|
	\| \| \| FINISHED: \|
	\| \| \| The endpoint has been fully retrieved \|
	\| \| \| RELEASED: \|
	\| \| \| Due to errors, this endpoint has been \|
	\| \| \| released and cannot be connected anymore \|
	\|-------------+-----------+------------------------------------------\|
	\| endpointname\| text \| The identification of this endpoint \|
	\|-------------+-----------+------------------------------------------\|

	Examples:

	postgres=# select * from gp_get_endpoints();
	dbid \| auth_token \| cursorname \| sessionid \| hostname \| port \| userid \| state \| endpointname
	------+----------------------------------+------------+-----------+----------+------+--------+-------+--------------------
	2 \| 75ebe7b49c3e09f35e017fc0181c62cf \| c3 \| 105 \| host67 \| 7002 \| 10 \| READY \| c30000006900000005
	3 \| 3ee0a64a465dde2813d4940961747c88 \| c3 \| 105 \| host67 \| 7003 \| 10 \| READY \| c30000006900000005
	4 \| 97a8eeee337798f718319c5234ea1440 \| c3 \| 105 \| host67 \| 7004 \| 10 \| READY \| c30000006900000005
	(3 rows)

	There is another similar gp_get_session_endpoints() that shows the endpoint
	information that belong to this session only.

	Start A Retrieve Session
	========================

	Once a parallel retrieve cursor has been declared, retrieve sessions can be
	started on each endpoint's host by using the endpoint's token as the session
	authentication password.

	gp_retrieve_conn=true needs to be set to start retrieve session.

	Examples:

	$> PGOPTIONS='-c gp_retrieve_conn=true' psql -h host67 -p 7002 -d postgres -U user1
	Password: 75ebe7b49c3e09f35e017fc0181c62cf
	postgres=#

	Here the "75ebe7b49c3e09f35e017fc0181c62cf", "host67" and "7002" are the
	endpoint "c30000006900000005"'s token, hostname and port.

	Besides the user and password should match, the user also needs login
	permission. Otherwise this role is not allowed to login, even with correct
	password.

	Multiple parallel retrieve cursors can be retrieved from the same retrieve
	session as long as their token is the same.



	Retrieve From Endpoint
	======================

	In each retrieve session, the query result on that segment can be
	retrieved by using statement "RETRIEVE" and its corresponding endpoint
	name.

	Syntax:
	RETRIEVE { ALL \| count } FROM ENDPOINT endpoint_name;

	Parameter "ALL" means to retrieve all the results from the endpoint.
	Otherwise, a positive integer value needs to be provided as the "count" to
	specify how many rows to retrieve. An empty set will be returned if no more
	tuples for the endpoint.

	Examples:

	postgres=# RETRIEVE ALL FROM ENDPOINT c30000006900000005;
	col1 \| col2
	------+------
	0 \| 0
	1 \| 1
	(2 rows)


	List Endpoints In Utility Session On Endpoint QE
	================================================

	It is possible to list all sessions' endpoints in the UTILITY connection to
	specific endpoint (coordinator or segment node) by using UDF
	gp_get_segment_endpoints(). Same as the UDF gp_get_endpoints(), a superuser can see the
	endpoint information of all users, but non-superuser can see its endpoints
	information only for security reason.

	Definition: gp_get_segment_endpoints()

	gp_get_segment_endpoints() Columns:
	\|--------------+-----------+------------------------------------------\|
	\| Column Name \| Data Type \| Description \|
	\|--------------+-----------+------------------------------------------\|
	\| auth_token \| text \| Retrieve session authentication token \|
	\|--------------+-----------+------------------------------------------\|
	\| databaseid \| integer \| The database where the cursor is created \|
	\|--------------+-----------+------------------------------------------\|
	\| senderpid \| integer \| The process where the query results are \|
	\| \| \| sent from \|
	\|--------------+-----------+------------------------------------------\|
	\| receiverpid \| integer \| The process where the results are being \|
	\| \| \| received on \|
	\|--------------+-----------+------------------------------------------\|
	\| state \| text \| The state of the endpoint \|
	\| \| \| See gp_get_endpoints() for more details \|
	\|--------------+-----------+------------------------------------------\|
	\| dbid \| integer \| The QE's dbid \|
	\|--------------+-----------+------------------------------------------\|
	\| sessionid \| integer \| The session where the cursor created \|
	\|--------------+-----------+------------------------------------------\|
	\| userid \| oid \| The user who created this endpoint \|
	\|--------------+-----------+------------------------------------------\|
	\| endpointname \| text \| The identification of this endpoint \|
	\|--------------+-----------+------------------------------------------\|
	\| cursorname \| text \| Parallel retrieve cursor name \|
	\|--------------+-----------+------------------------------------------\|

	Examples:

	# Connect the segment in utility mode
	$> PGOPTIONS="-c gp_role=utility" psql -h host67 -p 7002 -d postgres

	postgres=# select * from gp_get_segment_endpoints();
	auth_token \| databaseid \| senderpid \| receiverpid \| state \| dbid \| sessionid \| userid \| endpointname \| cursorname
	----------------------------------+------------+-----------+-------------+-------+------+-----------+--------+--------------------+------------
	75ebe7b49c3e09f35e017fc0181c62cf \| 13361 \| 3854 \| -1 \| READY \| 2 \| 105 \| 10 \| c30000006900000005 \| c3
	(1 row)

	Wait Parallel Retrieve Cursor To Be Fully Retrieved
	===================================================

	UDF gp_wait_parallel_retrieve_cursor() is designed to block until all the
	endpoints have been fully retrieved for the given parallel retrieve cursor
	until timeout happens. It will block the coordinator session until all the
	relevant endpoints are fully retrieved unless timeout or error happens. When
	error happens, the current transaction will be aborted, all parallel retrieve
	cursors and their endpoints in current transaction will be freed. If it is
	interrupted by signals, the transaction will be aborted as well. When timeout
	happens, it will report whether all retrieve operations finish or not. Specify
	timeout_sec as 0 to check the status immediately and specify timeout_sec as -1
	to block until retrieve finishes or error happens.

	Definition: gp_wait_parallel_retrieve_cursor(cursor_name, timeout_sec)

	It will only return true if all endpoints fully retrieved. In any other cases,
	an error message will be thrown.

	Examples:

	postgres=# SELECT gp_wait_parallel_retrieve_cursor('c3', -1); <waiting...>
	gp_wait_parallel_retrieve_cursor
	-----------------------------------
	t
	(1 row)


	Error Handling In Retrieve Session
	==================================

	When error happens in a retrieve session, the endpoint will be removed from QE.
	The other retrieve sessions can still retrieve from other endpoints which
	belong to this session at this time. But if gp_wait_parallel_retrieve_cursor()
	or CLOSE are executed on QD, the other retrieve sessions will be canceled since
	the error is delivered to QD and treated as an error happens in the
	transaction.

	When the endpoint is partially retrieved and then the corresponding retrieve
	session quits, the case will be treated as a retrieve session error as well.
	The endpoint will be removed and an error will be reported on QD when checking
	or waiting on the retrieve state. "CLOSE" will also report error in this case.


	Close Parallel Retrieve Cursor
	==============================

	Use "CLOSE" statement to close the parallel retrieve cursor and release
	its endpoints. All the related on-going retrieve statements will be
	canceled.

	Syntax:
	CLOSE cursorname;

	Examples:

	postgres=# CLOSE C3
	CLOSE


	Walk-Through Example
	====================

	Assumes we have a table named "t1" with an integer column "id", and its
	data distributed as following:

	postgres=# select gp_segment_id, * from t1;
	gp_segment_id \| id
	---------------+----
	0 \| 2
	0 \| 3
	0 \| 4
	2 \| 5
	1 \| 0
	1 \| 1
	(6 rows)

	To declare a parallel retrieve cursor on coordinator in order to select all
	tuples from t1:

	-- Begin transaction first
	postgres=# BEGIN;
	BEGIN

	-- Create the parallel retrieve cursor for the intent query
	postgres=# DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1;
	DECLARE

	-- List endpoints to get the needed information to start retrieving
	-- sessions on segments
	postgres=# SELECT * FROM gp_get_endpoints();
	dbid \| auth_token \| cursorname \| sessionid \| hostname \| port \| userid \| state \| endpointname
	------+----------------------------------+------------+-----------+----------+------+--------+-------+--------------------
	2 \| c5c116a13e2fdb8b436cdbc8e1bc7365 \| c1 \| 22 \| host67 \| 7002 \| 10 \| READY \| c1000000160000000a
	3 \| ecf4af4e0e50f2530afda2af178a5bf8 \| c1 \| 22 \| host67 \| 7003 \| 10 \| READY \| c1000000160000000a
	4 \| a522dc766936be8a88710fa137d52df2 \| c1 \| 22 \| host67 \| 7004 \| 10 \| READY \| c1000000160000000a
	(3 rows)

	Now start a retrieve session on the first segment (i.e. host67:7002)

	# gp_retrieve_conn=true is needed as the pg option.
	$> PGOPTIONS="-c gp_retrieve_conn=true" psql -h host67 -p 7002 -d postgres
	Password: c5c116a13e2fdb8b436cdbc8e1bc7365

	-- Retrieve all rows of this endpoint
	postgres=# RETRIEVE ALL FROM ENDPOINT c1000000160000000a;
	id
	----
	2
	3
	4
	(3 rows)

	Now the state of endpoint c1000000160000000a for dbid 2 (host67:7002) should
	become "FINISHED" since all results on the segment have been retrieved.

	-- List endpoints on coordinator to check
	postgres=# SELECT * FROM gp_get_endpoints();
	dbid \| auth_token \| cursorname \| sessionid \| hostname \| port \| userid \| state \| endpointname
	------+----------------------------------+------------+-----------+----------+------+--------+----------+--------------------
	2 \| c5c116a13e2fdb8b436cdbc8e1bc7365 \| c1 \| 22 \| host67 \| 7002 \| 10 \| FINISHED \| c1000000160000000a
	3 \| ecf4af4e0e50f2530afda2af178a5bf8 \| c1 \| 22 \| host67 \| 7003 \| 10 \| READY \| c1000000160000000a
	4 \| a522dc766936be8a88710fa137d52df2 \| c1 \| 22 \| host67 \| 7004 \| 10 \| READY \| c1000000160000000a
	(3 rows)

	Start another retrieve session on host67:7003

	$> PGOPTIONS="-c gp_retrieve_conn=true" psql -h host67 -p 7003 -d postgres
	Password: 5f59f7f3338212f20d3bb1527fac9fcd

	-- Retrieve 1 row on this endpoint
	postgres=# RETRIEVE 1 FROM ENDPOINT c1000000160000000a;
	id
	----
	0
	(1 row)

	The state of endpoint will firstly becomes "RETRIEVING", then becomes
	"ATTACHED". RETRIEVING means that a receiver is retrieving data from the
	endpoint. If a receiver finishes retrieving a part of data (but has not
	retrieved all the data), the state becomes ATTACHED, means that this endpoint
	has been attached by a receiver. If the receiver has retrieved all the data
	from the endpoint, the state becomes FINISHED.

	-- List endpoints on coordinator to check
	postgres=# SELECT * FROM gp_get_endpoints();
	dbid \| auth_token \| cursorname \| sessionid \| hostname \| port \| userid \| state \| endpointname
	------+----------------------------------+------------+-----------+----------+------+--------+----------+--------------------
	2 \| c5c116a13e2fdb8b436cdbc8e1bc7365 \| c1 \| 22 \| host67 \| 7002 \| 10 \| FINISHED \| c1000000160000000a
	3 \| ecf4af4e0e50f2530afda2af178a5bf8 \| c1 \| 22 \| host67 \| 7003 \| 10 \| ATTACHED \| c1000000160000000a
	4 \| a522dc766936be8a88710fa137d52df2 \| c1 \| 22 \| host67 \| 7004 \| 10 \| READY \| c1000000160000000a
	(3 rows)

	-- To check if the retrieving finished on the coordinator connection.
	postgres =# select gp_wait_parallel_retrieve_cursor('c1', 0);
	gp_wait_parallel_retrieve_cursor
	-----------------------------------
	f
	(1 row)

	We can just close the cursor now if we decide to not retrieve the
	remaining query results (on the coordinator connection).

	postgres=# CLOSE c1;
	CLOSE CURSOR

	-- All endpoints are gone
	postgres=# SELECT * FROM gp_get_endpoints();
	dbid \| auth_token \| cursorname \| sessionid \| hostname \| port \| userid \| state \| endpointname
	------+------------+------------+-----------+----------+------+--------+-------+--------------
	(0 rows)

	-- End the transaction
	coordinator=# END;
	END

	Implementation internals
	====================
	The basic idea is that for a declared parallel retrieve cursor, we create the
	distributed plan that does not gather the tuples to the coordinator as usual.
	The top node in the plan might be on all segments or on partial segment or on
	one segment, depending on the query itself. The cursor backend (executor)
	process would output the tuples to the separate retrieve connection via the
	shm_mq based dest receiver.

	The communication between the cursor backend process and the retrieve
	connection is the endpoint that is represented as EndpointData. There is an
	array of EndpointData (sharedEndpoints[]) in shared memory for the whole
	cluster use. The cursor backend process would allocate one endpoint and the
	retrieve connection would try to use that endpoint thus authentication of
	retrieve connection is surely required.