it is not only step-by-step guide, but give more background knowledge and explanation.
If you are any developer who do following things, then this guide is for you.
To be specific, this guide will help you to:
Read background part.
to be updated
Folder Name | Comment |
---|---|
build | Scripts for building, packaging, running Kylin |
dev-support | Scripts and guides for contributors to develop/debug/test, for committers to release/publish website |
kystudio | Frontend source code, mainly using Vue.js |
src | Backend source code, mainly using Java & Scala, managed by Maven |
pom.xml | Project definition |
README.md | General guide to Kylin 5 project |
LICENSE | A must-to-have file by ASF |
NOTICE | A must-to-have file by ASF |
Module Name | Brief Description | Tags |
---|---|---|
Core Common | todo | Core |
Core Metadata | todo | Core |
Core Metrics | todo | Core |
Core Job | todo | Core |
Core Storage | todo | Core |
Query Common | todo | Core |
Local Data Cache | Improve query performance by caching parquet files in spark executor's disk/memory | Add-on |
Spark Common | todo | |
Query Engine Spark | todo | |
Hive Source | todo | |
Build Engine | todo | Core |
Distributed Lock Extension | todo | Add-on |
Build Engine Spark | todo | |
Query | Transfer sql text to logical/physical plan and optimize using Apache Calcite. | Core |
Streaming SDK | Not ready. Used to parse Kafka message in custom way. | Add-on, Not-Ready-Module |
Streaming | Not ready. Make Apache Kafka as a data source for Kylin 5. | Not-Ready-Module |
Tool | Different tools for metadata backup, Diagnose etc. | Tool |
Common Service | todo | |
Datasource Service | todo | |
Modeling Service | todo | |
Data Loading Service | todo | |
Query Service | todo | |
Common Server | todo | |
Job Service | todo | |
Streaming Service | todo | Not-Ready-Module |
Data Loading Server | todo | |
Query Server | todo | |
Metadata Server | todo | |
REST Server | Main entry of Kylin process, including Spring config files. | Spring |
Datasource SDK | Not ready. Framework to add data source for Kylin 5. | Add-on, Not-Ready-Module |
JDBC Source | Not ready. Make some RDBMS as a data source fro Kylin 5. | Not-Ready-Module |
Integration Test | Major code for Integration Test | Testing |
Integration Test Spark | Some code for Integration Test | Testing |
Source Assembly | Used to create jars for build engine in spark-submit cmd. | Build |
Integration Test Server | Some code for Integration Test | Testing |
Data loading Booter | For micro-service deployment such as k8s. Process build/refresh index/segment request. | Micro-service |
Query Booter | For micro-service deployment such as k8s. Process query request. | Micro-service |
Common Booter | For micro-service deployment such as k8s. Process crud of metadata request. | Micro-service |
JDBC Driver | Connect Kylin using JDBC, for SQL Client or BI | Tool |
Component | Version | Comment/Link |
---|---|---|
JDK | JDK8 | n/a |
Apache Maven | 3.5+ | n/a |
IntelliJ IDEA | IntelliJ IDEA 2023.2.2 (Community Edition) | n/a |
Docker Desktop (for Mac) | 4.22.1 (118664) | n/a |
NodeJs | latest | https://nodejs.org/en/download/ |
nvm | latest | https://github.com/nvm-sh/nvm |
After install nvm, please use nvm install 12.14.0
to install correct version of Node.js for kystudio
.
I checked software version in my laptop(macbook pro, 15-inch, 2018) using following command. You can try it.
(base) xiaoxiang.yu@XXYU-MBP ~ % uname -a Darwin XXYU-MBP.local 22.6.0 Darwin Kernel Version 22.6.0: Wed Jul 5 22:21:56 PDT 2023; root:xnu-8796.141.3~6/RELEASE_X86_64 x86_64 (base) xiaoxiang.yu@XXYU-MBP ~ % java -version java version "1.8.0_301" Java(TM) SE Runtime Environment (build 1.8.0_301-b09) Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode) (base) xiaoxiang.yu@XXYU-MBP ~ % mvn -version Apache Maven 3.8.2 (ea98e05a04480131370aa0c110b8c54cf726c06f) Maven home: /Users/xiaoxiang.yu/LacusDir/lib/apache-maven-3.8.2 Java version: 1.8.0_301, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk1.8.0_301.jdk/Contents/Home/jre Default locale: en_CN, platform encoding: UTF-8 OS name: "mac os x", version: "10.16", arch: "x86_64", family: "mac" (base) xiaoxiang.yu@XXYU-MBP ~ % nvm -v 0.39.1 (base) xiaoxiang.yu@XXYU-MBP ~ % nvm list -> v12.14.0 v16.16.0 system default -> 12.14.0 (-> v12.14.0) (base) xiaoxiang.yu@XXYU-MBP ~ % node -v v12.14.0 (base) xiaoxiang.yu@XXYU-MBP ~ % docker -v Docker version 24.0.5, build ced0996 (base) xiaoxiang.yu@XXYU-MBP ~ % date Wed Sep 20 11:15:45 CST 2023
Make sure these port is available to use.
kylin@worker-03:$ docker -v Docker version 20.10.17, build 100c701 kylin@worker-03:$ uname -a Linux worker-03 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux kylin@worker-03:~$ free -h total used free shared buff/cache available Mem: 47Gi 2.8Gi 37Gi 2.0Mi 8.0Gi 44Gi Swap: 8.0Gi 0B 8.0Gi
Term | Comment |
---|---|
Build Engine | |
Job Engine | |
Query Engine | |
Metadata |
This guide is verified using this verified tag
in 2023-09-22 by xxyu, requiring macOS and recommended software.
Since future commits may change behavior or have a chance to break something,
you are suggested to go through this guide using commit which I verified.
git clone https://github.com/apache/kylin.git --single-branch --branch ide-run-2023 demo-kylin5-local-run export PROJECT_DIR=~/demo-kylin5-local-run
Attention:
And root path of source code is replaced with $PROJECT_DIR at following doc.
All following commands are executed in $PROJECT_DIR .
mvn clean install -DskipTests
Why you need run mvn install/package
?
maven-shade-plugin
in $PROJECT_DIR/src/assembly
, and it is required in later step($PROJECT_DIR/src/assembly/target/kylin-assembly-5.0.0-SNAPSHOT-job.jar
)This process may take 10-20 minutes.
[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Apache Kylin 5 5.0.0-SNAPSHOT: [INFO] [INFO] Apache Kylin 5 ..................................... SUCCESS [ 19.699 s] [INFO] Kylin - Core Common ................................ SUCCESS [02:10 min] [INFO] Kylin - Core Metadata .............................. SUCCESS [01:45 min] [INFO] Kylin - Core Metrics ............................... SUCCESS [ 27.640 s] [INFO] Kylin - Core Job ................................... SUCCESS [ 27.562 s] [INFO] Kylin - Core Storage ............................... SUCCESS [ 5.791 s] [INFO] Kylin - Query Common ............................... SUCCESS [ 17.910 s] [INFO] Kylin - Local Data Cache ........................... SUCCESS [ 33.482 s] [INFO] Kylin - Spark Common ............................... SUCCESS [01:14 min] [INFO] Kylin - Query Engine Spark ......................... SUCCESS [02:04 min] [INFO] Kylin - Hive Source ................................ SUCCESS [ 11.103 s] [INFO] Kylin - Build Engine ............................... SUCCESS [ 3.453 s] [INFO] Kylin - Distributed Lock Extension ................. SUCCESS [ 6.703 s] [INFO] Kylin - Build Engine Spark ......................... SUCCESS [01:45 min] [INFO] Kylin - Query ...................................... SUCCESS [ 34.497 s] [INFO] Kylin - Streaming SDK .............................. SUCCESS [ 3.539 s] [INFO] Kylin - Streaming .................................. SUCCESS [ 57.014 s] [INFO] Kylin - Tool ....................................... SUCCESS [ 28.052 s] [INFO] Kylin - Common Service ............................. SUCCESS [ 26.044 s] [INFO] Kylin - Datasource Service ......................... SUCCESS [ 26.970 s] [INFO] Kylin - Modeling Service ........................... SUCCESS [ 29.045 s] [INFO] Kylin - Data Loading Service ....................... SUCCESS [ 21.783 s] [INFO] Kylin - Query Service .............................. SUCCESS [ 24.721 s] [INFO] Kylin - Common Server .............................. SUCCESS [ 11.979 s] [INFO] Kylin - Job Service ................................ SUCCESS [ 11.572 s] [INFO] Kylin - Streaming Service .......................... SUCCESS [ 8.576 s] [INFO] Kylin - Data Loading Server ........................ SUCCESS [ 9.981 s] [INFO] Kylin - Query Server ............................... SUCCESS [ 22.769 s] [INFO] Kylin - Metadata Server ............................ SUCCESS [ 12.105 s] [INFO] Kylin - REST Server ................................ SUCCESS [ 16.878 s] [INFO] Kylin - Datasource SDK ............................. SUCCESS [ 6.534 s] [INFO] Kylin - JDBC Source ................................ SUCCESS [ 16.812 s] [INFO] Kylin - Integration Test ........................... SUCCESS [ 26.048 s] [INFO] Kylin - Integration Test Spark ..................... SUCCESS [ 17.564 s] [INFO] Kylin - Source Assembly ............................ SUCCESS [ 51.755 s] [INFO] Kylin - Integration Test Server .................... SUCCESS [ 15.409 s] [INFO] Kylin - Data loading Booter ........................ SUCCESS [ 6.654 s] [INFO] Kylin - Query Booter ............................... SUCCESS [ 5.126 s] [INFO] Kylin - Common Booter .............................. SUCCESS [ 4.365 s] [INFO] Kylin - JDBC Driver ................................ SUCCESS [ 7.785 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 20:02 min [INFO] Finished at: 2023-09-20T11:25:25+08:00 [INFO] ------------------------------------------------------------------------
cd kystudio
npm install
bash build/release/download-spark.sh
Result of ls -al build/spark/jars | wc -l
should be 282. By default, the SPARK_HOME
is pointed to $PROJECT_DIR/build/spark
.
You are free to move spark directory to other place.
HashFunctionTest
).DOCKER_HOST
to correct host/ip.export DOCKER_HOST=ssh://kylin@worker-03
docker compose -f "${PROJECT_DIR}/dev-support/contributor/sandbox/docker-compose.yml" up
docker compose -f "${PROJECT_DIR}/dev-support/contributor/sandbox/docker-compose.yml" ps
docker exec -it mysql bash
mysql -uroot -proot CREATE DATABASE IF NOT EXISTS kylin;
docker exec -it namenode bash
hadoop dfs -mkdir -p '/kylin/spark-history'
docker cp ${PROJECT_DIR}/src/examples/sample_cube/data datanode:/tmp/ssb docker cp ${PROJECT_DIR}/src/examples/sample_cube/create_sample_ssb_tables.sql hiveserver:/tmp/ docker exec datanode bash -c "hdfs dfs -mkdir -p /tmp/sample_cube/data \ && hdfs dfs -put /tmp/ssb/* /tmp/sample_cube/data/" docker exec hiveserver bash -c "hive -e 'CREATE DATABASE IF NOT EXISTS SSB' \ && hive --hivevar hdfs_tmp_dir=/tmp --database SSB -f /tmp/create_sample_ssb_tables.sql"
IDEA official reference is https://www.jetbrains.com/help/idea/run-debug-configuration.html https://www.jetbrains.com/help/idea/run-debug-configuration-java-application.html#more_options.
KYLIN_HOME=$PROJECT_DIR KYLIN_CONF=$PROJECT_DIR/dev-support/contributor/sandbox/conf SPARK_HOME=/Users/xiaoxiang.yu/LacusDir/kyspark HADOOP_CONF_DIR=$PROJECT_DIR/dev-support/contributor/sandbox/conf HADOOP_USER_NAME=root
org.apache.kylin.rest.BootstrapServer
Module kylin-server, and add option INCLUDE_PROVIDED_SCOPE
Module kylin-server, or %MODULE_WORKING_DIR% .
Modify $PROJECT_DIR/src/server/src/resources/log4j2.xml to change log level etc. Add -Dlog4j2.debug
in vm options if you need debug.
$PROJECT_DIR/dev-support/contributor/sandbox/conf/kylin.properties
Class Name | Desc |
---|---|
KylinPrepareEnvListener | |
AppInitializer | |
BootstrapServer | |
KylinPropertySourceConfiguration |
docker compose -f "${PROJECT_DIR}/dev-support/contributor/sandbox/docker-compose.yml" down
rm -rf /Applications/IntelliJ\ IDEA\ CE.app rm -rf ~/Library/Application\ Support/JetBrains/IdeaIC* rm -rf ~/Library/Caches/JetBrains/IdeaIC*
rm -rf ~/.m2/repository/org/apache/kylin
rm -rf $PROJECT_DIR
// todo
This is usually caused by hive .
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions' at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3904) ~[hive-exec-2.3.9.jar:2.3.9]
Maybe spark.sql.hive.metastore.jars is pointed to correct path in $PROJECT_DIR/dev-support/contributor/sandbox/conf/kylin.properties
.
When you submit multiple jobs at the same times, you pc might suffer from bad performance.
ERROR [local_run] [http-nio-7070-exec-9] execution.NExecutableManager : get sample data from hdfs log file [/kylin/kylin_metadata/local_run/job_tmp/4edaf665-430a-8cc0-620e-b9f7ae2458bc-98cbd8b9-fee2-44d4-0383-b8b603348e94/01//execute_output.json.1695364459595.log] failed! org.apache.hadoop.hdfs.CannotObtainBlockLengthException: Cannot obtain block length for LocatedBlock{BP-258157769-172.22.0.6-1695103912779:blk_1073742602_1778; getBlockSize()=26457; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[172.22.0.3:50010,DS-ca2d8134-89bd-49d1-af55-680cea736111,DISK]]} of /kylin/kylin_metadata/local_run/job_tmp/4edaf665-430a-8cc0-620e-b9f7ae2458bc-98cbd8b9-fee2-44d4-0383-b8b603348e94/01/execute_output.json.1695364459595.log at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:470) ~[hadoop-hdfs-client-2.10.1.jar:?]
todo
Caused by: java.io.InvalidClassException: org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat; local class incompatible: stream classdesc serialVersionUID = 8961733539262042287, local class serialVersionUID = -27198871445502271
You are using different version of Spark, unify them(1. SPARK_HOME 2. spark.version
in $PROJECT_DIR/pom.xml).