versioned_docs/version-1.2/advanced/compute_node.md - doris-website - Git at Google

 ---
 {
     "title": "Compute Node",
     "language": "en"
 }
 ---

  <! --
  Licensed to the Apache Software Foundation (ASF) under one
  Or more contributor license agreements. See the NOTICE file
  Distributed with this work for additional information
  Regarding copyright ownership. The ASF licenses this file
  To you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  With the License. You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing,
  Software distributed under the License is distributed on an
  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  KIND, either expressed or implied. See the License for the
  Specific language governing permissions and limitations
  Under the License.
  -- >


 # Compute node

 <version since="1.2.1">
 </version>

 ## Scenario

 At present, Doris is a typical Share-Nothing architecture, which achieves very high performance by binding data and computing resources in the same node.
 With the continuous improvement of the performance for the Doris computing engine, more and more users have begun to use Doris to directly query data on data lake.
 This is a Share-Disk scenario that data is often stored on the remote HDFS/S3, and calculated in Doris.
 Doris will get the data through the network, and then completes the computation in memory.
 For these two mixed loads in one cluster, current Doris architecture will appear some disadvantages:
 1. Poor resource isolation, the response requirements of these two loads are different, and the hybrid deployment will have mutual effects.
 2. Poor disk usage, the data lake query only needs the computing resources, while doris binding the storage and computing and we have to expand them together, and cause a low utilization rate for disk.
 3. Poor expansion efficiency, when the cluster is expanded, Doris will start the migration of Tablet data, and this process will take a lot of time. And the data lake query load has obvious peaks and valleys, it need hourly flexibility.

 ## solution
 Implement a BE node role specially used for federated computing named `Compute node`.
 `Compute node` is used to handle remote federated queries such as this query of data lake.
 The original BE node type is called `hybrid node`, and this type of node can not only execute SQL query, but also handle tablet data storage.
 And the `Compute node` only can execute SQL query, it have no data on node.

 With the computing node, the cluster deployment topology will also change:
 - the `hybrid node` is used for the data calculation of the OLAP type table, the node is expanded according to the storage demand
 - the `computing node` is used for the external computing, and this node is expanded according to the query load.

   Since the compute node has no storage, the compute node can be deployed on an HDD disk machine with other workload or on a container.


 ## Usage of ComputeNode

 ### Configure
 Add configuration items to BE's configuration file `be.conf`:
 ```
  be_node_role = computation
 ```

 This defualt value of this is `mix`, and this is original BE node type. After setting to `computation`, the node is a computing node.

 You can see the value of the'NodeRole 'field through the `show backend\G` command. If it is'mix ', it is a mixed node, and if it is'computation', it is a computing node

 ```sql
 *************************** 1. row ***************************
               BackendId: 10010
                 Cluster: default_cluster
                      IP: 10.248.181.219
           HeartbeatPort: 9050
                  BePort: 9060
                HttpPort: 8040
                BrpcPort: 8060
           LastStartTime: 2022-11-30 23:01:40
           LastHeartbeat: 2022-12-05 15:01:18
                   Alive: true
    SystemDecommissioned: false
   ClusterDecommissioned: false
               TabletNum: 753
        DataUsedCapacity: 1.955 GB
           AvailCapacity: 202.987 GB
           TotalCapacity: 491.153 GB
                 UsedPct: 58.67 %
          MaxDiskUsedPct: 58.67 %
      RemoteUsedCapacity: 0.000
                     Tag: {"location" : "default"}
                  ErrMsg:
                 Version: doris-0.0.0-trunk-80baca264
                  Status: {"lastSuccessReportTabletsTime":"2022-12-05 15:00:38","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
 HeartbeatFailureCounter: 0
                NodeRole: computation
 ```

 ### Usage

 Add configuration items in fe.conf

 ```
 prefer_compute_node_for_external_table=true
 min_backend_num_for_external_table=3
 ```

 > For parameter description, please refer to: [FE configuration item](../admin-manual/config/fe-config.md)

 When using the [MultiCatalog](../lakehouse/multi-catalog/multi-catalog.md) function when querying, the query will be dispatched to the computing node first.

 ### some restrictions

 - Compute nodes are controlled by configuration items, so do not configure mixed type nodes, modify the configuration to compute nodes.

 ## TODO

 - Computational spillover: Doris inner table query, when the cluster load is high, the upper layer (outside TableScan) operator can be scheduled to the compute node.
 - Graceful offline:
   - When the compute node goes offline, the new task of the task is automatically scheduled to online nodes
   - the node go offline after all the old tasks on the node are completed
   - when the old task cannot be completed on time, the task can kill by itself
	---
	{
	"title": "Compute Node",
	"language": "en"
	}
	---

	<! --
	Licensed to the Apache Software Foundation (ASF) under one
	Or more contributor license agreements. See the NOTICE file
	Distributed with this work for additional information
	Regarding copyright ownership. The ASF licenses this file
	To you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	With the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	Software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either expressed or implied. See the License for the
	Specific language governing permissions and limitations
	Under the License.
	-- >


	# Compute node

	<version since="1.2.1">
	</version>

	## Scenario

	At present, Doris is a typical Share-Nothing architecture, which achieves very high performance by binding data and computing resources in the same node.
	With the continuous improvement of the performance for the Doris computing engine, more and more users have begun to use Doris to directly query data on data lake.
	This is a Share-Disk scenario that data is often stored on the remote HDFS/S3, and calculated in Doris.
	Doris will get the data through the network, and then completes the computation in memory.
	For these two mixed loads in one cluster, current Doris architecture will appear some disadvantages:
	1. Poor resource isolation, the response requirements of these two loads are different, and the hybrid deployment will have mutual effects.
	2. Poor disk usage, the data lake query only needs the computing resources, while doris binding the storage and computing and we have to expand them together, and cause a low utilization rate for disk.
	3. Poor expansion efficiency, when the cluster is expanded, Doris will start the migration of Tablet data, and this process will take a lot of time. And the data lake query load has obvious peaks and valleys, it need hourly flexibility.

	## solution
	Implement a BE node role specially used for federated computing named `Compute node`.
	`Compute node` is used to handle remote federated queries such as this query of data lake.
	The original BE node type is called `hybrid node`, and this type of node can not only execute SQL query, but also handle tablet data storage.
	And the `Compute node` only can execute SQL query, it have no data on node.

	With the computing node, the cluster deployment topology will also change:
	- the `hybrid node` is used for the data calculation of the OLAP type table, the node is expanded according to the storage demand
	- the `computing node` is used for the external computing, and this node is expanded according to the query load.

	Since the compute node has no storage, the compute node can be deployed on an HDD disk machine with other workload or on a container.


	## Usage of ComputeNode

	### Configure
	Add configuration items to BE's configuration file `be.conf`:
	```
	be_node_role = computation
	```

	This defualt value of this is `mix`, and this is original BE node type. After setting to `computation`, the node is a computing node.

	You can see the value of the'NodeRole 'field through the `show backend\G` command. If it is'mix ', it is a mixed node, and if it is'computation', it is a computing node

	```sql
	************************* 1. row *************************
	BackendId: 10010
	Cluster: default_cluster
	IP: 10.248.181.219
	HeartbeatPort: 9050
	BePort: 9060
	HttpPort: 8040
	BrpcPort: 8060
	LastStartTime: 2022-11-30 23:01:40
	LastHeartbeat: 2022-12-05 15:01:18
	Alive: true
	SystemDecommissioned: false
	ClusterDecommissioned: false
	TabletNum: 753
	DataUsedCapacity: 1.955 GB
	AvailCapacity: 202.987 GB
	TotalCapacity: 491.153 GB
	UsedPct: 58.67 %
	MaxDiskUsedPct: 58.67 %
	RemoteUsedCapacity: 0.000
	Tag: {"location" : "default"}
	ErrMsg:
	Version: doris-0.0.0-trunk-80baca264
	Status: {"lastSuccessReportTabletsTime":"2022-12-05 15:00:38","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
	HeartbeatFailureCounter: 0
	NodeRole: computation
	```

	### Usage

	Add configuration items in fe.conf

	```
	prefer_compute_node_for_external_table=true
	min_backend_num_for_external_table=3
	```

	> For parameter description, please refer to: [FE configuration item](../admin-manual/config/fe-config.md)

	When using the [MultiCatalog](../lakehouse/multi-catalog/multi-catalog.md) function when querying, the query will be dispatched to the computing node first.

	### some restrictions

	- Compute nodes are controlled by configuration items, so do not configure mixed type nodes, modify the configuration to compute nodes.

	## TODO

	- Computational spillover: Doris inner table query, when the cluster load is high, the upper layer (outside TableScan) operator can be scheduled to the compute node.
	- Graceful offline:
	- When the compute node goes offline, the new task of the task is automatically scheduled to online nodes
	- the node go offline after all the old tasks on the node are completed
	- when the old task cannot be completed on time, the task can kill by itself