Available since Apache Kylin v2.3.0
For better supporting self-monitoring, a set of system Cubes are created under the system project, called “KYLIN_SYSTEM”. Currently, there are five Cubes. Three are for query metrics, “METRICS_QUERY”, “METRICS_QUERY_CUBE”, “METRICS_QUERY_RPC”. And the other two are for job metrics, “METRICS_JOB”, “METRICS_JOB_EXCEPTION”.
Create a configuration file SCSinkTools.json in KYLIN_HOME directory.
For example:
[
[
"org.apache.kylin.tool.metrics.systemcube.util.HiveSinkTool",
{
"storage_type": 2,
"cube_desc_override_properties": [
"java.util.HashMap",
{
"kylin.cube.algorithm": "INMEM",
"kylin.cube.max-building-segments": "1"
}
]
}
]
]
Run the following command in KYLIN_HOME folder to generate related metadata:
./bin/kylin.sh org.apache.kylin.tool.metrics.systemcube.SCCreator \ -inputConfig SCSinkTools.json \ -output <output_folder>
By this command, the related metadata will be generated and its location is under the directory <output_folder>. The details are as follows, system_cube is our <output_folder>:
Running the following command to create source hive tables:
hive -f <output_folder>/create_hive_tables_for_system_cubes.sql
By this command, the related hive table will be created.
Then we need to upload metadata to hbase by the following command:
./bin/metastore.sh restore <output_folder>
Finally, we need to reload metadata in Kylin web UI.
Then, a set of system Cubes will be created under the system project, called “KYLIN_SYSTEM”.
When the system Cube is created, we need to build the Cube regularly.
Create a shell script that builds the system Cube by calling org.apache.kylin.tool.job.CubeBuildingCLI
For example:
{% highlight Groff markup %} #!/bin/bash
dir=$(dirname ${0}) export KYLIN_HOME=${dir}/../
CUBE=$1 INTERVAL=$2 DELAY=$3 CURRENT_TIME_IN_SECOND=date +%s CURRENT_TIME=$((CURRENT_TIME_IN_SECOND * 1000)) END_TIME=$((CURRENT_TIME-DELAY)) END=$((END_TIME - END_TIME%INTERVAL))
ID=“$END” echo “building for ${CUBE}_${ID}” >> ${KYLIN_HOME}/logs/build_trace.log sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube ${CUBE} --endTime ${END} > ${KYLIN_HOME}/logs/system_cube_${CUBE}_${END}.log 2>&1 &
{% endhighlight %}
Then run this shell script regularly
For example, add a cron job as follows:
{% highlight Groff markup %} 0 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000
20 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_CUBE_QA 3600000 1200000
40 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_RPC_QA 3600000 1200000
30 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_QA 3600000 1200000
50 */12 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_EXCEPTION_QA 3600000 12000
{% endhighlight %}
For all of these Cube, admins can query at four time granularities. From higher level to lower, it's as follows:
This Cube is for collecting query metrics at the highest level. The details are as follows:
This Cube is for collecting query metrics at the lowest level. For a query, the related aggregation and filter can be pushed down to each rpc target server. The robustness of rpc target servers is the foundation for better serving queries. The details are as follows:
This Cube is for collecting query metrics at the Cube level. The most important are cuboids related, which will serve for Cube planner. The details are as follows:
In Kylin, there are mainly three types of job:
This Cube is for collecting job metrics. The details are as follows:
This Cube is for collecting job exception metrics. The details are as follows: