| src/backend/cdb/endpoint/README |
| |
| With the size of Greenplum cluster increasing, the performance bottleneck on |
| the coordinator node becomes more and more obvious. |
| |
| The parallel retrieve cursor feature is designed to reduce the heavy burdens of |
| the coordinator node. For a query, it can redirect the results to segments |
| instead of gathering them to the coordinator node. When it is executed, user |
| can setup retrieve mode connections to segments for retrieving result data in |
| parallel. |
| |
| |
| Parallel Retrieve Cursor |
| ======================== |
| |
| Like a normal cursor, user can declare a parallel retrieve cursor on |
| coordinator, then retrieve the query results on each segment directly. |
| |
| Parallel retrieve cursor has similar declaration and syntax as normal cursor |
| does. However, some cursor operations are not supported in parallel retrieve |
| cursor(e.g. MOVE). |
| |
| #NOTE: Orca doesn't support PARALLEL RETRIEVE CURSOR for now. Greenplum would |
| fall back to postgres optimizer automatically. |
| |
| Endpoint |
| ======== |
| |
| Once a parallel retrieve cursor has been declared on QD, a corresponding |
| endpoint will be created on the same segment whose QE contains the query |
| result. These endpoints are backed by individual backends on each QE. Then, |
| those endpoints can be used as the source, and results can be retrieved from |
| them in parallel on each QE. |
| |
| |
| Declare Parallel Retrieve Cursor |
| ================================ |
| |
| Before retrieving data from endpoints in parallel, a parallel retrieve cursor |
| variable needs to be created alongside with the query statement. |
| |
| Syntax: |
| DECLARE cursor_name PARALLEL RETRIEVE CURSOR FOR query; |
| |
| The cursor_name needs to be unique in current transaction. |
| |
| Examples: |
| |
| postgres=# BEGIN; |
| BEGIN |
| postgres=# DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1; |
| DECLARE |
| |
| |
| List Parallel Retrieve Cursors and their endpoints |
| ================================================== |
| |
| To retrieve the query results of a parallel retrieve cursor, the related |
| endpoint information is needed before start a retrieve session on QEs. The UDF |
| gp_get_endpoints() can be used to list parallel retrieve cursors and their |
| endpoints information. This UDF could be run on the coordinator only. |
| |
| For a superuser, it can list all endpoints information of all users', but for |
| non-superuser, it can only list the current user's endpoints information for |
| security reason. |
| |
| Definition: gp_get_endpoints() |
| |
| gp_get_endpoints() Columns: |
| |-------------+-----------+------------------------------------------| |
| | Column Name | Data Type | Description | |
| |-------------+-----------+------------------------------------------| |
| | dbid | integer | The QE's dbid | |
| |-------------+-----------+------------------------------------------| |
| | auth_token | text | Retrieve-session authentication token | |
| |-------------+-----------+------------------------------------------| |
| | cursorname | text | Parallel retrieve cursor name | |
| |-------------+-----------+------------------------------------------| |
| | sessionid | integer | The session where the cursor was created | |
| |-------------+-----------+------------------------------------------| |
| | hostname | text | The host to retrieve from | |
| |-------------+-----------+------------------------------------------| |
| | port | integer | The port number to start the retrieve | |
| | | | session | |
| |-------------+-----------+------------------------------------------| |
| | userid | oid | The oid of the user | |
| |-------------+-----------+------------------------------------------| |
| | state | text | One of the following state for this | |
| | | | endpoint: | |
| | | | READY: | |
| | | | The endpoint is ready to be retrieved | |
| | | | ATTACHED: | |
| | | | The endpoint has been attached by a | |
| | | | a retrieve connection | |
| | | | RETRIEVING: | |
| | | | The endpoint is being retrieved | |
| | | | FINISHED: | |
| | | | The endpoint has been fully retrieved | |
| | | | RELEASED: | |
| | | | Due to errors, this endpoint has been | |
| | | | released and cannot be connected anymore | |
| |-------------+-----------+------------------------------------------| |
| | endpointname| text | The identification of this endpoint | |
| |-------------+-----------+------------------------------------------| |
| |
| Examples: |
| |
| postgres=# select * from gp_get_endpoints(); |
| dbid | auth_token | cursorname | sessionid | hostname | port | userid | state | endpointname |
| ------+----------------------------------+------------+-----------+----------+------+--------+-------+-------------------- |
| 2 | 75ebe7b49c3e09f35e017fc0181c62cf | c3 | 105 | host67 | 7002 | 10 | READY | c30000006900000005 |
| 3 | 3ee0a64a465dde2813d4940961747c88 | c3 | 105 | host67 | 7003 | 10 | READY | c30000006900000005 |
| 4 | 97a8eeee337798f718319c5234ea1440 | c3 | 105 | host67 | 7004 | 10 | READY | c30000006900000005 |
| (3 rows) |
| |
| There is another similar gp_get_session_endpoints() that shows the endpoint |
| information that belong to this session only. |
| |
| Start A Retrieve Session |
| ======================== |
| |
| Once a parallel retrieve cursor has been declared, retrieve sessions can be |
| started on each endpoint's host by using the endpoint's token as the session |
| authentication password. |
| |
| gp_retrieve_conn=true needs to be set to start retrieve session. |
| |
| Examples: |
| |
| $> PGOPTIONS='-c gp_retrieve_conn=true' psql -h host67 -p 7002 -d postgres -U user1 |
| Password: 75ebe7b49c3e09f35e017fc0181c62cf |
| postgres=# |
| |
| Here the "75ebe7b49c3e09f35e017fc0181c62cf", "host67" and "7002" are the |
| endpoint "c30000006900000005"'s token, hostname and port. |
| |
| Besides the user and password should match, the user also needs login |
| permission. Otherwise this role is not allowed to login, even with correct |
| password. |
| |
| Multiple parallel retrieve cursors can be retrieved from the same retrieve |
| session as long as their token is the same. |
| |
| |
| |
| Retrieve From Endpoint |
| ====================== |
| |
| In each retrieve session, the query result on that segment can be |
| retrieved by using statement "RETRIEVE" and its corresponding endpoint |
| name. |
| |
| Syntax: |
| RETRIEVE { ALL | count } FROM ENDPOINT endpoint_name; |
| |
| Parameter "ALL" means to retrieve all the results from the endpoint. |
| Otherwise, a positive integer value needs to be provided as the "count" to |
| specify how many rows to retrieve. An empty set will be returned if no more |
| tuples for the endpoint. |
| |
| Examples: |
| |
| postgres=# RETRIEVE ALL FROM ENDPOINT c30000006900000005; |
| col1 | col2 |
| ------+------ |
| 0 | 0 |
| 1 | 1 |
| (2 rows) |
| |
| |
| List Endpoints In Utility Session On Endpoint QE |
| ================================================ |
| |
| It is possible to list all sessions' endpoints in the UTILITY connection to |
| specific endpoint (coordinator or segment node) by using UDF |
| gp_get_segment_endpoints(). Same as the UDF gp_get_endpoints(), a superuser can see the |
| endpoint information of all users, but non-superuser can see its endpoints |
| information only for security reason. |
| |
| Definition: gp_get_segment_endpoints() |
| |
| gp_get_segment_endpoints() Columns: |
| |--------------+-----------+------------------------------------------| |
| | Column Name | Data Type | Description | |
| |--------------+-----------+------------------------------------------| |
| | auth_token | text | Retrieve session authentication token | |
| |--------------+-----------+------------------------------------------| |
| | databaseid | integer | The database where the cursor is created | |
| |--------------+-----------+------------------------------------------| |
| | senderpid | integer | The process where the query results are | |
| | | | sent from | |
| |--------------+-----------+------------------------------------------| |
| | receiverpid | integer | The process where the results are being | |
| | | | received on | |
| |--------------+-----------+------------------------------------------| |
| | state | text | The state of the endpoint | |
| | | | See gp_get_endpoints() for more details | |
| |--------------+-----------+------------------------------------------| |
| | dbid | integer | The QE's dbid | |
| |--------------+-----------+------------------------------------------| |
| | sessionid | integer | The session where the cursor created | |
| |--------------+-----------+------------------------------------------| |
| | userid | oid | The user who created this endpoint | |
| |--------------+-----------+------------------------------------------| |
| | endpointname | text | The identification of this endpoint | |
| |--------------+-----------+------------------------------------------| |
| | cursorname | text | Parallel retrieve cursor name | |
| |--------------+-----------+------------------------------------------| |
| |
| Examples: |
| |
| # Connect the segment in utility mode |
| $> PGOPTIONS="-c gp_role=utility" psql -h host67 -p 7002 -d postgres |
| |
| postgres=# select * from gp_get_segment_endpoints(); |
| auth_token | databaseid | senderpid | receiverpid | state | dbid | sessionid | userid | endpointname | cursorname |
| ----------------------------------+------------+-----------+-------------+-------+------+-----------+--------+--------------------+------------ |
| 75ebe7b49c3e09f35e017fc0181c62cf | 13361 | 3854 | -1 | READY | 2 | 105 | 10 | c30000006900000005 | c3 |
| (1 row) |
| |
| Wait Parallel Retrieve Cursor To Be Fully Retrieved |
| =================================================== |
| |
| UDF gp_wait_parallel_retrieve_cursor() is designed to block until all the |
| endpoints have been fully retrieved for the given parallel retrieve cursor |
| until timeout happens. It will block the coordinator session until all the |
| relevant endpoints are fully retrieved unless timeout or error happens. When |
| error happens, the current transaction will be aborted, all parallel retrieve |
| cursors and their endpoints in current transaction will be freed. If it is |
| interrupted by signals, the transaction will be aborted as well. When timeout |
| happens, it will report whether all retrieve operations finish or not. Specify |
| timeout_sec as 0 to check the status immediately and specify timeout_sec as -1 |
| to block until retrieve finishes or error happens. |
| |
| Definition: gp_wait_parallel_retrieve_cursor(cursor_name, timeout_sec) |
| |
| It will only return true if all endpoints fully retrieved. In any other cases, |
| an error message will be thrown. |
| |
| Examples: |
| |
| postgres=# SELECT gp_wait_parallel_retrieve_cursor('c3', -1); <waiting...> |
| gp_wait_parallel_retrieve_cursor |
| ----------------------------------- |
| t |
| (1 row) |
| |
| |
| Error Handling In Retrieve Session |
| ================================== |
| |
| When error happens in a retrieve session, the endpoint will be removed from QE. |
| The other retrieve sessions can still retrieve from other endpoints which |
| belong to this session at this time. But if gp_wait_parallel_retrieve_cursor() |
| or CLOSE are executed on QD, the other retrieve sessions will be canceled since |
| the error is delivered to QD and treated as an error happens in the |
| transaction. |
| |
| When the endpoint is partially retrieved and then the corresponding retrieve |
| session quits, the case will be treated as a retrieve session error as well. |
| The endpoint will be removed and an error will be reported on QD when checking |
| or waiting on the retrieve state. "CLOSE" will also report error in this case. |
| |
| |
| Close Parallel Retrieve Cursor |
| ============================== |
| |
| Use "CLOSE" statement to close the parallel retrieve cursor and release |
| its endpoints. All the related on-going retrieve statements will be |
| canceled. |
| |
| Syntax: |
| CLOSE cursorname; |
| |
| Examples: |
| |
| postgres=# CLOSE C3 |
| CLOSE |
| |
| |
| Walk-Through Example |
| ==================== |
| |
| Assumes we have a table named "t1" with an integer column "id", and its |
| data distributed as following: |
| |
| postgres=# select gp_segment_id, * from t1; |
| gp_segment_id | id |
| ---------------+---- |
| 0 | 2 |
| 0 | 3 |
| 0 | 4 |
| 2 | 5 |
| 1 | 0 |
| 1 | 1 |
| (6 rows) |
| |
| To declare a parallel retrieve cursor on coordinator in order to select all |
| tuples from t1: |
| |
| -- Begin transaction first |
| postgres=# BEGIN; |
| BEGIN |
| |
| -- Create the parallel retrieve cursor for the intent query |
| postgres=# DECLARE c1 PARALLEL RETRIEVE CURSOR FOR SELECT * FROM t1; |
| DECLARE |
| |
| -- List endpoints to get the needed information to start retrieving |
| -- sessions on segments |
| postgres=# SELECT * FROM gp_get_endpoints(); |
| dbid | auth_token | cursorname | sessionid | hostname | port | userid | state | endpointname |
| ------+----------------------------------+------------+-----------+----------+------+--------+-------+-------------------- |
| 2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1 | 22 | host67 | 7002 | 10 | READY | c1000000160000000a |
| 3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1 | 22 | host67 | 7003 | 10 | READY | c1000000160000000a |
| 4 | a522dc766936be8a88710fa137d52df2 | c1 | 22 | host67 | 7004 | 10 | READY | c1000000160000000a |
| (3 rows) |
| |
| Now start a retrieve session on the first segment (i.e. host67:7002) |
| |
| # gp_retrieve_conn=true is needed as the pg option. |
| $> PGOPTIONS="-c gp_retrieve_conn=true" psql -h host67 -p 7002 -d postgres |
| Password: c5c116a13e2fdb8b436cdbc8e1bc7365 |
| |
| -- Retrieve all rows of this endpoint |
| postgres=# RETRIEVE ALL FROM ENDPOINT c1000000160000000a; |
| id |
| ---- |
| 2 |
| 3 |
| 4 |
| (3 rows) |
| |
| Now the state of endpoint c1000000160000000a for dbid 2 (host67:7002) should |
| become "FINISHED" since all results on the segment have been retrieved. |
| |
| -- List endpoints on coordinator to check |
| postgres=# SELECT * FROM gp_get_endpoints(); |
| dbid | auth_token | cursorname | sessionid | hostname | port | userid | state | endpointname |
| ------+----------------------------------+------------+-----------+----------+------+--------+----------+-------------------- |
| 2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1 | 22 | host67 | 7002 | 10 | FINISHED | c1000000160000000a |
| 3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1 | 22 | host67 | 7003 | 10 | READY | c1000000160000000a |
| 4 | a522dc766936be8a88710fa137d52df2 | c1 | 22 | host67 | 7004 | 10 | READY | c1000000160000000a |
| (3 rows) |
| |
| Start another retrieve session on host67:7003 |
| |
| $> PGOPTIONS="-c gp_retrieve_conn=true" psql -h host67 -p 7003 -d postgres |
| Password: 5f59f7f3338212f20d3bb1527fac9fcd |
| |
| -- Retrieve 1 row on this endpoint |
| postgres=# RETRIEVE 1 FROM ENDPOINT c1000000160000000a; |
| id |
| ---- |
| 0 |
| (1 row) |
| |
| The state of endpoint will firstly becomes "RETRIEVING", then becomes |
| "ATTACHED". RETRIEVING means that a receiver is retrieving data from the |
| endpoint. If a receiver finishes retrieving a part of data (but has not |
| retrieved all the data), the state becomes ATTACHED, means that this endpoint |
| has been attached by a receiver. If the receiver has retrieved all the data |
| from the endpoint, the state becomes FINISHED. |
| |
| -- List endpoints on coordinator to check |
| postgres=# SELECT * FROM gp_get_endpoints(); |
| dbid | auth_token | cursorname | sessionid | hostname | port | userid | state | endpointname |
| ------+----------------------------------+------------+-----------+----------+------+--------+----------+-------------------- |
| 2 | c5c116a13e2fdb8b436cdbc8e1bc7365 | c1 | 22 | host67 | 7002 | 10 | FINISHED | c1000000160000000a |
| 3 | ecf4af4e0e50f2530afda2af178a5bf8 | c1 | 22 | host67 | 7003 | 10 | ATTACHED | c1000000160000000a |
| 4 | a522dc766936be8a88710fa137d52df2 | c1 | 22 | host67 | 7004 | 10 | READY | c1000000160000000a |
| (3 rows) |
| |
| -- To check if the retrieving finished on the coordinator connection. |
| postgres =# select gp_wait_parallel_retrieve_cursor('c1', 0); |
| gp_wait_parallel_retrieve_cursor |
| ----------------------------------- |
| f |
| (1 row) |
| |
| We can just close the cursor now if we decide to not retrieve the |
| remaining query results (on the coordinator connection). |
| |
| postgres=# CLOSE c1; |
| CLOSE CURSOR |
| |
| -- All endpoints are gone |
| postgres=# SELECT * FROM gp_get_endpoints(); |
| dbid | auth_token | cursorname | sessionid | hostname | port | userid | state | endpointname |
| ------+------------+------------+-----------+----------+------+--------+-------+-------------- |
| (0 rows) |
| |
| -- End the transaction |
| coordinator=# END; |
| END |
| |
| Implementation internals |
| ==================== |
| The basic idea is that for a declared parallel retrieve cursor, we create the |
| distributed plan that does not gather the tuples to the coordinator as usual. |
| The top node in the plan might be on all segments or on partial segment or on |
| one segment, depending on the query itself. The cursor backend (executor) |
| process would output the tuples to the separate retrieve connection via the |
| shm_mq based dest receiver. |
| |
| The communication between the cursor backend process and the retrieve |
| connection is the endpoint that is represented as EndpointData. There is an |
| array of EndpointData (sharedEndpoints[]) in shared memory for the whole |
| cluster use. The cursor backend process would allocate one endpoint and the |
| retrieve connection would try to use that endpoint thus authentication of |
| retrieve connection is surely required. |