| ~~ Licensed to the Apache Software Foundation (ASF) under one or more |
| ~~ contributor license agreements. See the NOTICE file distributed with |
| ~~ this work for additional information regarding copyright ownership. |
| ~~ The ASF licenses this file to You under the Apache License, Version 2.0 |
| ~~ (the "License"); you may not use this file except in compliance with |
| ~~ the License. You may obtain a copy of the License at |
| ~~ |
| ~~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~~ |
| ~~ Unless required by applicable law or agreed to in writing, software |
| ~~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~~ See the License for the specific language governing permissions and |
| ~~ limitations under the License. |
| ~~ |
| |
| Data Model |
| |
| Chukwa Adaptors emit data in <Chunks>. A Chunk is a sequence of bytes, |
| with some metadata. Several of these are set automatically by the Agent or |
| Adaptors. Two of them require user intervention: <cluster name> and |
| <datatype>. Cluster name is specified in <conf/chukwa-agent-conf.xml>, |
| and is global to each Agent process. Datatype describes the expected format |
| of the data collected by an Adaptor instance, and it is specified when that |
| instance is started. |
| |
| The following table lists the Chunk metadata fields: |
| |
| *-------------*-------------------------------------+----------------------------------------------------------: |
| | Field | Meaning | Source | |
| *-------------*-------------------------------------+----------------------------------------------------------: |
| | Source | Hostname where Chunk was generated | Automatic | |
| *-------------*-------------------------------------+----------------------------------------------------------: |
| | Cluster | Cluster host is associated with | Specified by user in agent config | |
| *-------------*-------------------------------------+----------------------------------------------------------: |
| | Datatype | Format of output | Specified by user when Adaptor started | |
| *-------------*-------------------------------------+----------------------------------------------------------: |
| | Sequence ID | Offset of Chunk in stream | Automatic, initial offset specified when Adaptor started | |
| *-------------*-------------------------------------+----------------------------------------------------------: |
| | Name | Name of data source | Automatic, chosen by Adaptor | |
| *-------------*-------------------------------------+----------------------------------------------------------: |
| |
| Conceptually, each Adaptor emits a semi-infinite stream of bytes, numbered |
| starting from zero. The sequence ID specifies how many bytes each Adaptor has |
| sent, including the current chunk. So if an adaptor emits a chunk containing |
| the first 100 bytes from a file, the sequenceID of that Chunk will be 100. |
| And the second hundred bytes will have sequence ID 200. This may seem a |
| little peculiar, but it's actually the same way that TCP sequence numbers work. |
| |
| Adaptors need to take sequence ID as a parameter so that they can resume |
| correctly after a crash, and not send redundant data. When starting adaptors, |
| it's usually save to specify 0 as an ID, but it's sometimes useful to specify |
| something else. For instance, it lets you do things like only tail the second |
| half of a file. |
| |
| HBase Schema |
| |
| * Metrics |
| |
| <chukwa> table stores time series data. |
| |
| ** Row Key |
| |
| *------*------*------------*------------* |
| | | Day | Metric MD5 | Source MD5 | |
| *------*------*------------*------------* |
| | Size | 2 | 6 | 6 | |
| *------*------*------------*------------* |
| |
| Row key is composed of 14 bytes data. First 2 bytes are day of the year. |
| The next 6 bytes are md5 signature of metrics name. The last 6 bytes are |
| md5 signature of data source. This arrangement helps Chukwa to partition |
| data evenly across regions base on time. |
| |
| This arrangement provides a good condensed store for data of the same day |
| for the same source. |
| |
| ** Column Family |
| |
| The column family format for Chukwa table are: |
| |
| *---------------*-----------------------------------------------------------------: |
| | Column Family | Description | |
| *---------------*-----------------------------------------------------------------: |
| | t | Time series data. Column name is timestamp. Value is a string | |
| *---------------*-----------------------------------------------------------------: |
| | a | Annotation, string tags associated with time series data. | |
| *---------------*-----------------------------------------------------------------: |
| |
| * Metadata |
| |
| <chukwa_metadata> table is designed to store point lookup data. For example, small |
| amount of data to describe the metric name mapping for chukwa table. It is also used to |
| store JSON blob of dashboard data. |
| |
| ** Row Key |
| |
| *----------------*------------------------------------------------------------------: |
| | Row Key | Description | |
| *----------------*------------------------------------------------------------------: |
| | [Metrics Group]| Metrics Group Name, this allows to fetch all metrics name from | |
| | | the group can be fetched from loading the row key. | |
| *----------------*------------------------------------------------------------------: |
| | chart_meta | All charts are stored in this row. | |
| *----------------*------------------------------------------------------------------: |
| | dashboard_meta | All dashboard are stored in this row. | |
| *----------------*------------------------------------------------------------------: |
| | widget_meta | All widgets are stored in this row. | |
| *----------------*------------------------------------------------------------------: |
| |
| ** Special Row |
| |
| *----------------*------------------------------------------------------------------: |
| | chart_meta | Cell contains the rendering option and metric series name in | |
| | | a JSON blob | |
| *----------------*------------------------------------------------------------------: |
| | dashboard_meta | Cell describes one dashboard view | |
| *----------------*------------------------------------------------------------------: |
| | widget_meta | Cell describes title and URL of a dashboard widget | |
| *----------------*------------------------------------------------------------------: |
| |
| ** Column Family |
| |
| *---------------*-------------------------------------------------------------------: |
| | Column Family | Description | |
| *---------------*-------------------------------------------------------------------: |
| | k | Key, associated with a fixed structure for describing key types | |
| | | and md5 signature of the key used in chukwa table. | |
| *---------------*-------------------------------------------------------------------: |
| | c | column for storing JSON blob for special rows. This column is | |
| | | used to store dashboard, chart, and widget metadata. | |
| *---------------*-------------------------------------------------------------------: |
| |
| Key Types for k column Family, the current supported key types are: |
| |
| *----------*----------------------------------------------------: |
| | Type | Description | |
| *----------*----------------------------------------------------: |
| | metric | This key is a metric name. | |
| *----------*----------------------------------------------------: |
| | source | This key is a source name. | |
| *----------*----------------------------------------------------: |