This topic describes issues that may affect query execution in Druid, how to identify those issues, and strategies to resolve them.
In Druid's query processing, when the Broker sends a query to the data servers, the data servers process the query and push their intermediate results back to the Broker. Because calls from the Broker to the data servers are synchronous, the Jetty server can time out in data servers in certain cases:
When such timeout occurs, the server interrupts the connection between the Broker and data servers which causes the query to fail with a channel disconnection error. For example,
{ "error": { "error": "Unknown exception", "errorMessage": "Query[6eee73a6-a95f-4bdc-821d-981e99e39242] url[https://localhost:8283/druid/v2/] failed with exception msg [Channel disconnected] (through reference chain: org.apache.druid.query.scan.ScanResultValue[\"segmentId\"])", "errorClass": "com.fasterxml.jackson.databind.JsonMappingException", "host": "localhost:8283" } }
Channel disconnection occurs for various reasons. To verify that the error is due to web server timeout, search for the query ID in the Historical logs. The query ID in the example above is 6eee73a6-a95f-4bdc-821d-981e99e39242
. The "host"
field in the error message above indicates the IP address of the Historical in question. In the Historical logs, you will see a raised exception indicating Idle timeout expired
:
2021-09-14T19:52:27,685 ERROR [qtp475526834-85[scan_[test_large_table]_6eee73a6-a95f-4bdc-821d-981e99e39242]] org.apache.druid.server.QueryResource - Unable to send query response. (java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms) 2021-09-14T19:52:27,685 ERROR [qtp475526834-85] org.apache.druid.server.QueryLifecycle - Exception while processing queryId [6eee73a6-a95f-4bdc-821d-981e99e39242] (java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms) 2021-09-14T19:52:27,686 WARN [qtp475526834-85] org.eclipse.jetty.server.HttpChannel - handleException /druid/v2/ java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 300000/300000 ms
To mitigate query failure due to web server timeout:
druid.server.http.maxIdleTime
property in the historical/runtime.properties
file. You must restart the Druid cluster for this change to take effect. See Configuration reference for more information on configuring the server.IN
filters in the query, or an under scaled cluster. Analyze your Druid query metrics to determine the bottleneck.