<! -- Licensed to the Apache Software Foundation (ASF) under one Or more contributor license agreements. See the NOTICE file Distributed with this work for additional information Regarding copyright ownership. The ASF licenses this file To you under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance With the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, Software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied. See the License for the Specific language governing permissions and limitations Under the License. -- >
At present, Doris is a typical Share-Nothing architecture, which achieves very high performance by binding data and computing resources in the same node. With the continuous improvement of the performance for the Doris computing engine, more and more users have begun to use Doris to directly query data on data lake. This is a Share-Disk scenario that data is often stored on the remote HDFS/S3, and calculated in Doris. Doris will get the data through the network, and then completes the computation in memory. For these two mixed loads in one cluster, current Doris architecture will appear some disadvantages:
Implement a BE node role specially used for federated computing named Compute node. Compute node is used to handle remote federated queries such as this query of data lake. The original BE node type is called hybrid node, and this type of node can not only execute SQL query, but also handle tablet data storage. And the Compute node only can execute SQL query, it have no data on node.
With the computing node, the cluster deployment topology will also change:
the hybrid node is used for the data calculation of the OLAP type table, the node is expanded according to the storage demand
the computing node is used for the external computing, and this node is expanded according to the query load.
Since the compute node has no storage, the compute node can be deployed on an HDD disk machine with other workload or on a container.
Add configuration items to BE's configuration file be.conf:
be_node_role = computation
This defualt value of this is mix, and this is original BE node type. After setting to computation, the node is a computing node.
You can see the value of the‘NodeRole ‘field through the show backend\G command. If it is’mix ‘, it is a mixed node, and if it is’computation’, it is a computing node
*************************** 1. row *************************** BackendId: 10010 Cluster: default_cluster IP: 10.248.181.219 HeartbeatPort: 9050 BePort: 9060 HttpPort: 8040 BrpcPort: 8060 LastStartTime: 2022-11-30 23:01:40 LastHeartbeat: 2022-12-05 15:01:18 Alive: true SystemDecommissioned: false ClusterDecommissioned: false TabletNum: 753 DataUsedCapacity: 1.955 GB AvailCapacity: 202.987 GB TotalCapacity: 491.153 GB UsedPct: 58.67 % MaxDiskUsedPct: 58.67 % RemoteUsedCapacity: 0.000 Tag: {"location" : "default"} ErrMsg: Version: doris-0.0.0-trunk-80baca264 Status: {"lastSuccessReportTabletsTime":"2022-12-05 15:00:38","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false} HeartbeatFailureCounter: 0 NodeRole: computation
Add configuration items in fe.conf
prefer_compute_node_for_external_table=true min_backend_num_for_external_table=3
For parameter description, please refer to: FE configuration item
When using the MultiCatalog function when querying, the query will be dispatched to the computing node first.