| //// |
| /** |
| * |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| //// |
| |
| [[slow_log_responses_from_systable]] |
| ==== Get Slow/Large Response Logs from System table hbase:slowlog |
| |
| The above section provides details about Admin APIs: |
| |
| * get_slowlog_responses |
| * get_largelog_responses |
| * clear_slowlog_responses |
| |
| All of the above APIs access online in-memory ring buffers from |
| individual RegionServers and accumulate logs from ring buffers to display |
| to end user. However, since the logs are stored in memory, after RegionServer is |
| restarted, all the objects held in memory of that RegionServer will be cleaned up |
| and previous logs are lost. What if we want to persist all these logs forever? |
| What if we want to store them in such a manner that operator can get all historical |
| records with some filters? e.g get me all large/slow RPC logs that are triggered by |
| user1 and are related to region: |
| cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. ? |
| |
| If we have a system table that stores such logs in increasing (not so strictly though) |
| order of time, it can definitely help operators debug some historical events |
| (scan, get, put, compaction, flush etc) with detailed inputs. |
| |
| Config which enabled system table to be created and store all log events is |
| `hbase.regionserver.slowlog.systable.enabled`. |
| |
| The default value for this config is `false`. If provided `true` |
| (Note: `hbase.regionserver.slowlog.buffer.enabled` should also be `true`), |
| a cron job running in every RegionServer will persist the slow/large logs into |
| table hbase:slowlog. By default cron job runs every 10 min. Duration can be configured |
| with key: `hbase.slowlog.systable.chore.duration`. By default, RegionServer will |
| store upto 1000(config key: `hbase.regionserver.slowlog.systable.queue.size`) |
| slow/large logs in an internal queue and the chore will retrieve these logs |
| from the queue and perform batch insertion in hbase:slowlog. |
| |
| hbase:slowlog has single ColumnFamily: `info` |
| `info` contains multiple qualifiers which are the same attributes present as |
| part of `get_slowlog_responses` API response. |
| |
| * info:call_details |
| * info:client_address |
| * info:method_name |
| * info:param |
| * info:processing_time |
| * info:queue_time |
| * info:region_name |
| * info:response_size |
| * info:server_class |
| * info:start_time |
| * info:type |
| * info:username |
| |
| And example of 2 rows from hbase:slowlog scan result: |
| [source] |
| ---- |
| |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:call_details, timestamp=2020-05-16T14:58:14.211Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:client_address, timestamp=2020-05-16T14:58:14.211Z, value=172.20.10.2:57347 |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:method_name, timestamp=2020-05-16T14:58:14.211Z, value=Scan |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:param, timestamp=2020-05-16T14:58:14.211Z, value=region { type: REGION_NAME value: "hbase:meta,,1" } scan { column { family: "info" } attribute { name: "_isolationle |
| vel_" value: "\x5C000" } start_row: "cluster_test,33333333,99999999999999" stop_row: "cluster_test,," time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks |
| : true max_result_size: 2097152 reversed: true caching: 10 include_stop_row: true readType: PREAD } number_of_rows: 10 close_scanner: false client_handles_partials: true client_ |
| handles_heartbeats: true track_scan_metrics: false |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:processing_time, timestamp=2020-05-16T14:58:14.211Z, value=18 |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:queue_time, timestamp=2020-05-16T14:58:14.211Z, value=0 |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:region_name, timestamp=2020-05-16T14:58:14.211Z, value=hbase:meta,,1 |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:response_size, timestamp=2020-05-16T14:58:14.211Z, value=1575 |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:server_class, timestamp=2020-05-16T14:58:14.211Z, value=HRegionServer |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:start_time, timestamp=2020-05-16T14:58:14.211Z, value=1589640743732 |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:type, timestamp=2020-05-16T14:58:14.211Z, value=ALL |
| \x024\xC1\x03\xE9\x04\xF5@ column=info:username, timestamp=2020-05-16T14:58:14.211Z, value=user2 |
| \x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest) |
| \x024\xC1\x06X\x81\xF6\xEC column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348 |
| \x024\xC1\x06X\x81\xF6\xEC column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan |
| \x024\xC1\x06X\x81\xF6\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a |
| ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2 |
| 097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met |
| rics: false |
| \x024\xC1\x06X\x81\xF6\xEC column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24 |
| \x024\xC1\x06X\x81\xF6\xEC column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0 |
| \x024\xC1\x06X\x81\xF6\xEC column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. |
| \x024\xC1\x06X\x81\xF6\xEC column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227 |
| \x024\xC1\x06X\x81\xF6\xEC column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer |
| \x024\xC1\x06X\x81\xF6\xEC column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932 |
| \x024\xC1\x06X\x81\xF6\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL |
| \x024\xC1\x06X\x81\xF6\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=user1 |
| ---- |
| |
| Operator can use ColumnValueFilter to filter records based on region_name, username, |
| client_address etc. |
| |
| Time range based queries will also be very useful. |
| Example: |
| [source] |
| ---- |
| scan 'hbase:slowlog', { TIMERANGE => [1589621394000, 1589637999999] } |
| ---- |