blob: 9324bf2ffcdeb7e1df8464028f7a22886bfe7ef7 [file] [log] [blame]
~~
~~ Licensed to the Apache Software Foundation (ASF) under one
~~ or more contributor license agreements. See the NOTICE file
~~ distributed with this work for additional information
~~ regarding copyright ownership. The ASF licenses this file
~~ to you under the Apache License, Version 2.0 (the
~~ "License"); you may not use this file except in compliance
~~ with the License. You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing,
~~ software distributed under the License is distributed on an
~~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~~ KIND, either express or implied. See the License for the
~~ specific language governing permissions and limitations
~~ under the License.
~~
Lens server monitoring
This section documents all the metrics available from lens server, admin rest end points and more on query statistics.
* Metrics
Lens server emits following metrics for query service
* Number of queued queries;
* Number of running queries;
* Number of finished queries in server's memory;
* Total number of accepted queries;
* Total number of successful queries;
* Total number of finished queries;
* Total number of failed queries;
* Total number of cancelled queries;
* Number of result formatting error
* Total number of opened sessions from the server start/restart
* Total number of closed sessions
* Number of active sessions
Lens server also emits following metrics for other services
* Number of exceptions
* Number of HTTP client error
* Number of HTTP error
* Number of HTTP server error
* Number of HTTP unknown error
* Number of HTTP request started
* Number of HTTP requests finished
* Number of statistics store errors
* Number of statistics log partition handler errors
* Number of statistics log file scanner errors
* Number of email notification errors
Lens server can be configured to emit metrics for resource methods. By default it's disabled, can be enabled by the
property <lens.server.enable.resource.method.metering>. Metrics for
resource methods are created lazily(as and when required) and consist of the following things:
* Number of hits
* Timer for successful executions.
* Timer for failed executions.
A timer can provide running averages, statistical values like mean/median/quartiles etc, histograms.
Lens server also emits jvm, gc, memory and thread level metrics.
Supported reporting methods for the metrics emitted are the following:
* Console reporting. Can be enabled by:
<lens.server.enable.console.metrics>
* CSV reporting. Can be configured by:
<lens.server.enable.csv.metrics>, <lens.server.metrics.csv.directory.path>
* Ganglia reporting. Can be configured by the parameters :
<lens.server.enable.ganglia.metrics>, <lens.server.metrics.ganglia.host>, <lens.server.metrics.ganglia.port>
* Graphite reporting. Can be configured by the parameters:
<lens.server.enable.graphite.metrics>, <lens.server.metrics.graphite.host>, <lens.server.metrics.graphite.port>
Reporting to the chosen reporting methods will happen periodically. That period can be configured by:
<lens.server.metrics.reporting.period>
** Critical Metrics
When resource method metering is enabled you would see different metrics upto 1000 being emitted and might
be confusing to admins - which one to look at.
Along with jvm, memory, thread count gauges, the following are some critical metrics that admin can monitor
* lens.gauges.org.apache.lens.server.api.query.QueryExecutionService.running-queries.value
* lens.gauges.org.apache.lens.server.api.query.QueryExecutionService.queued-queries.value
* lens.gauges.org.apache.lens.server.api.query.QueryExecutionService.finished-queries.value
For all timers, admin can look at mean or/and p99 values and exception.timer count. For example :
* lens.timers.org.apache.lens.server.metastore.MetastoreResource.getLatestDateOfCube.GET.exception.timer.count
* lens.timers.org.apache.lens.server.metastore.MetastoreResource.getLatestDateOfCube.GET.timer.mean
* lens.timers.org.apache.lens.server.metastore.MetastoreResource.getLatestDateOfCube.GET.timer.p99
* REST end points
Lens server provides admin endpoint at <host>:<port>/admin. It provides
end points for ping, metrics, threads and healthcheck.
* ping : admin/ping will respond with pong, if server is up
* metrics : admin/metrics will respond with all metrics in a text file, written in json
* healthcheck : admin/healthcheck is not implemented yet.
* threads : admin/threads will give a thread dump of the server
* Query Statistics:
Lens Server can be configured to emit query related statistics to a hive table <QueryExecutionStatistics>.The
statistics service can be configured by providing values to <lens.statistics.warehouse.dir> set to a HDFS location
where your query statistics log file will be persisted, <lens.statistics.db> the database which will contain all
statistics related tables and <lens.log.rollover.interval> time interval which service will be monitoring for rollover
in log file.The statistics can be disabled by setting, <lens.server.statistics.store.class> to empty string. The
statistics service works by monitoring for rollups of <query-stats.log> file and adds an appropriate partition based
on the rolled over file. The statistics can be queried using Hive queries.