| ~~ |
| ~~ Licensed to the Apache Software Foundation (ASF) under one |
| ~~ or more contributor license agreements. See the NOTICE file |
| ~~ distributed with this work for additional information |
| ~~ regarding copyright ownership. The ASF licenses this file |
| ~~ to you under the Apache License, Version 2.0 (the |
| ~~ "License"); you may not use this file except in compliance |
| ~~ with the License. You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, |
| ~~ software distributed under the License is distributed on an |
| ~~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~~ KIND, either express or implied. See the License for the |
| ~~ specific language governing permissions and limitations |
| ~~ under the License. |
| ~~ |
| |
| Lens server monitoring |
| |
| This section documents all the metrics available from lens server, admin rest end points and more on query statistics. |
| |
| * Metrics |
| |
| Lens server emits following metrics for query service |
| |
| * Number of queued queries; |
| |
| * Number of running queries; |
| |
| * Number of finished queries in server's memory; |
| |
| * Total number of accepted queries; |
| |
| * Total number of successful queries; |
| |
| * Total number of finished queries; |
| |
| * Total number of failed queries; |
| |
| * Total number of cancelled queries; |
| |
| * Number of result formatting error |
| |
| * Total number of opened sessions from the server start/restart |
| |
| * Total number of closed sessions |
| |
| * Number of active sessions |
| |
| Lens server also emits following metrics for other services |
| |
| * Number of exceptions |
| |
| * Number of HTTP client error |
| |
| * Number of HTTP error |
| |
| * Number of HTTP server error |
| |
| * Number of HTTP unknown error |
| |
| * Number of HTTP request started |
| |
| * Number of HTTP requests finished |
| |
| * Number of statistics store errors |
| |
| * Number of statistics log partition handler errors |
| |
| * Number of statistics log file scanner errors |
| |
| * Number of email notification errors |
| |
| |
| Lens server can be configured to emit metrics for resource methods. By default it's disabled, can be enabled by the |
| property <lens.server.enable.resource.method.metering>. Metrics for |
| resource methods are created lazily(as and when required) and consist of the following things: |
| |
| * Number of hits |
| |
| * Timer for successful executions. |
| |
| * Timer for failed executions. |
| |
| |
| |
| A timer can provide running averages, statistical values like mean/median/quartiles etc, histograms. |
| |
| Lens server also emits jvm, gc, memory and thread level metrics. |
| |
| Supported reporting methods for the metrics emitted are the following: |
| |
| * Console reporting. Can be enabled by: |
| <lens.server.enable.console.metrics> |
| |
| * CSV reporting. Can be configured by: |
| <lens.server.enable.csv.metrics>, <lens.server.metrics.csv.directory.path> |
| |
| * Ganglia reporting. Can be configured by the parameters : |
| <lens.server.enable.ganglia.metrics>, <lens.server.metrics.ganglia.host>, <lens.server.metrics.ganglia.port> |
| |
| * Graphite reporting. Can be configured by the parameters: |
| <lens.server.enable.graphite.metrics>, <lens.server.metrics.graphite.host>, <lens.server.metrics.graphite.port> |
| |
| Reporting to the chosen reporting methods will happen periodically. That period can be configured by: |
| <lens.server.metrics.reporting.period> |
| |
| ** Critical Metrics |
| |
| When resource method metering is enabled you would see different metrics upto 1000 being emitted and might |
| be confusing to admins - which one to look at. |
| |
| Along with jvm, memory, thread count gauges, the following are some critical metrics that admin can monitor |
| |
| * lens.gauges.org.apache.lens.server.api.query.QueryExecutionService.running-queries.value |
| |
| * lens.gauges.org.apache.lens.server.api.query.QueryExecutionService.queued-queries.value |
| |
| * lens.gauges.org.apache.lens.server.api.query.QueryExecutionService.finished-queries.value |
| |
| For all timers, admin can look at mean or/and p99 values and exception.timer count. For example : |
| |
| * lens.timers.org.apache.lens.server.metastore.MetastoreResource.getLatestDateOfCube.GET.exception.timer.count |
| |
| * lens.timers.org.apache.lens.server.metastore.MetastoreResource.getLatestDateOfCube.GET.timer.mean |
| |
| * lens.timers.org.apache.lens.server.metastore.MetastoreResource.getLatestDateOfCube.GET.timer.p99 |
| |
| * REST end points |
| |
| Lens server provides admin endpoint at <host>:<port>/admin. It provides |
| end points for ping, metrics, threads and healthcheck. |
| |
| * ping : admin/ping will respond with pong, if server is up |
| |
| * metrics : admin/metrics will respond with all metrics in a text file, written in json |
| |
| * healthcheck : admin/healthcheck is not implemented yet. |
| |
| * threads : admin/threads will give a thread dump of the server |
| |
| * Query Statistics: |
| |
| Lens Server can be configured to emit query related statistics to a hive table <QueryExecutionStatistics>.The |
| statistics service can be configured by providing values to <lens.statistics.warehouse.dir> set to a HDFS location |
| where your query statistics log file will be persisted, <lens.statistics.db> the database which will contain all |
| statistics related tables and <lens.log.rollover.interval> time interval which service will be monitoring for rollover |
| in log file.The statistics can be disabled by setting, <lens.server.statistics.store.class> to empty string. The |
| statistics service works by monitoring for rollups of <query-stats.log> file and adds an appropriate partition based |
| on the rolled over file. The statistics can be queried using Hive queries. |
| |