src/backend/optimizer/README.cbdb.parallel - cloudberry - Git at Google

 src/backend/optimizer/README.cbdb.parallel

 ---

 Apache Cloudberry parallel query is based on Postgres parallel.

 Most mechanisms are the same, refer to Parallel Query and Partial
 Paths in src/backend/optimizer/README.  The main difference is
 Postgres has a Gather/GatherMerge node which launches any number of
 workers to execute a plan as a leader process(PG style), but
 Apache Cloudberry doesn't have that.

 Apache Cloudberry treats all workers equally. They work together to
 execute a plan node with some sync mechanism to keep the right thing,
 ex: create a shared hash table etc.

 That's called CBDB style. CBDB style launches workers as non-parallel
 plan except that expand Gang size by factor if a top path node has
 parallel_workers > 1.

 The reasons we choose CBDB style but not PG style or mix them is complex.

 We encounter lots of problems when mixing them together and we don't
 have enough time to enable both and don't know how much the benefit we
 could have.

 1. PG style Gather/GatherMerge node will launch processes workers that
 lack GP's QE info like: distributed transaction, distributed snapshot,
 GP roles and much more. If we mixed them, there would be normal QE
 processes and worker processes launched by Gather node. They don't
 know each other.

 2. PG style Gather/GatherMerge locus issue.

 The locus of Gather node may be different from its child. Ex: a
 parallel scan on a hashed distributed table will have all data that
 should be hashed on same segments as a whole(Hashed locus), but each
 process has partial data(HashedWorkers locus).

 The Gather node should be Hashed locus in that situation. But things
 become complex when joining with other locus and if there is a Motion
 node below that.

 3. CBDB style could parallelize plan as late as possible until the final
 Gather(to QD or to QE in the middle), But PG style will Gather workers
 in apply_scanjoin_target_to_path. PG style can't generate the final
 scan/join target in parallel workers. This is PG's last opportunity to
 use any partial paths that exist.

 It will empty partial_pathlist, all paths are moved to pathlist that
 it couldn't participate in later parallel join as the outer path, ex:
 parallel_aware hash join with a shared table.  But CBDB style could keep
 partial path in partial_pathlist because we have a Gather Motion on
 the top.


 Parallel locus.

 Making locus compatible in parallel mode is more complex than a
 non-parallel plan.  We have to add Motions even though the
 distributions are the same, but paths have different parallel_workers.
 ex:

 ```
 create table t1(b int) with(parallel_workers=3) distributed by (b);
 create table t2(a int) with(parallel_workers=2) distributed by (a);
 gpadmin=# explain(costs off) select * from t1 right join t2 on t1.b = t2.a;
                             QUERY PLAN
 ------------------------------------------------------------------
  Gather Motion 6:1  (slice1; segments: 6)
    ->  Parallel Hash Left Join
          Hash Cond: (t2.a = t1.b)
          ->  Parallel Seq Scan on t2
          ->  Parallel Hash
                ->  Redistribute Motion 9:6  (slice2; segments: 9)
                      Hash Key: t1.b
                      Hash Module: 3
                      ->  Parallel Seq Scan on t1
 ```

 See function cdb_motion_for_parallel_join() for details.
	src/backend/optimizer/README.cbdb.parallel

	---

	Apache Cloudberry parallel query is based on Postgres parallel.

	Most mechanisms are the same, refer to Parallel Query and Partial
	Paths in src/backend/optimizer/README. The main difference is
	Postgres has a Gather/GatherMerge node which launches any number of
	workers to execute a plan as a leader process(PG style), but
	Apache Cloudberry doesn't have that.

	Apache Cloudberry treats all workers equally. They work together to
	execute a plan node with some sync mechanism to keep the right thing,
	ex: create a shared hash table etc.

	That's called CBDB style. CBDB style launches workers as non-parallel
	plan except that expand Gang size by factor if a top path node has
	parallel_workers > 1.

	The reasons we choose CBDB style but not PG style or mix them is complex.

	We encounter lots of problems when mixing them together and we don't
	have enough time to enable both and don't know how much the benefit we
	could have.

	1. PG style Gather/GatherMerge node will launch processes workers that
	lack GP's QE info like: distributed transaction, distributed snapshot,
	GP roles and much more. If we mixed them, there would be normal QE
	processes and worker processes launched by Gather node. They don't
	know each other.

	2. PG style Gather/GatherMerge locus issue.

	The locus of Gather node may be different from its child. Ex: a
	parallel scan on a hashed distributed table will have all data that
	should be hashed on same segments as a whole(Hashed locus), but each
	process has partial data(HashedWorkers locus).

	The Gather node should be Hashed locus in that situation. But things
	become complex when joining with other locus and if there is a Motion
	node below that.

	3. CBDB style could parallelize plan as late as possible until the final
	Gather(to QD or to QE in the middle), But PG style will Gather workers
	in apply_scanjoin_target_to_path. PG style can't generate the final
	scan/join target in parallel workers. This is PG's last opportunity to
	use any partial paths that exist.

	It will empty partial_pathlist, all paths are moved to pathlist that
	it couldn't participate in later parallel join as the outer path, ex:
	parallel_aware hash join with a shared table. But CBDB style could keep
	partial path in partial_pathlist because we have a Gather Motion on
	the top.


	Parallel locus.

	Making locus compatible in parallel mode is more complex than a
	non-parallel plan. We have to add Motions even though the
	distributions are the same, but paths have different parallel_workers.
	ex:

	```
	create table t1(b int) with(parallel_workers=3) distributed by (b);
	create table t2(a int) with(parallel_workers=2) distributed by (a);
	gpadmin=# explain(costs off) select * from t1 right join t2 on t1.b = t2.a;
	QUERY PLAN
	------------------------------------------------------------------
	Gather Motion 6:1 (slice1; segments: 6)
	-> Parallel Hash Left Join
	Hash Cond: (t2.a = t1.b)
	-> Parallel Seq Scan on t2
	-> Parallel Hash
	-> Redistribute Motion 9:6 (slice2; segments: 9)
	Hash Key: t1.b
	Hash Module: 3
	-> Parallel Seq Scan on t1
	```

	See function cdb_motion_for_parallel_join() for details.