| commit | a175622355136a742e2f835d25e262645abfdeb2 | [log] [tgz] |
|---|---|---|
| author | ZhouJinsong <zhoujinsong0505@163.com> | Thu Oct 20 17:26:24 2022 +0800 |
| committer | GitHub <noreply@github.com> | Thu Oct 20 17:26:24 2022 +0800 |
| tree | f289bdda1a235d1ec1a06a2768cc7d2d460e8aac | |
| parent | 755c87d8c7f1130cab8b1e7b01396df2b24d303a [diff] |
Release 0.3.2 (#500) * release version 0.3.2
Welcome to arctic, arctic is a streaming lake warehouse system open sourced by NetEase. Arctic adds more real-time capabilities on top of iceberg and hive, and provides stream-batch unified, out-of-the-box metadata services for dataops, allowing Data lakes much more usable and practical.
Arctic is a streaming lakehouse service built on top of apache iceberg table format. Through arctic, users could benefit optimized CDC、streaming update、fresh olap etc. on engines like flink, spark, and trino. Combined with efficient offline processing capabilities of data lakes, arctic can serve more scenarios where streaming and batch are fused. At the same time, the function of self-optimization、concurrent conflict resolution and standard management tools could effectively reduce the burden on users in data lake management and optimization.
Arctic services are presented by deploying AMS, which can be considered as a replacement for HMS (Hive Metastore), or HMS for iceberg. Arctic uses iceberg as the base table format, but instead of hacking the iceberg implementation, it uses iceberg as a lib. Arctic's open overlay architecture can help large-scale offline data lakes quickly upgraded to real-time data lakes, without worrying about compatibility issues with the original data lakes, allowing data lakes to meet more real-time analysis, real-time risk control, Real-time training, feature engineering and other scenarios.
Arctic contains modules as below:
arctic-core contains core abstractions and common implementions for other modulesarctic-flink is the module for integrating with Apache Flink (use arctic-flink-runtime for a shaded version)arctic-spark is the module for integrating with Apache Spark (use arctic-spark-runtime for a shaded version)arctic-trino now provides query integrating with apache trino, built on JDK11arctic-optimizing exposes optimizing container/group api and provides default implemetionarctic-ams is arctic meta service moduleams-api contains ams thrift apiams-dashboard is the dashboard frontend for amsams-server is the backend server for amsArctic is built using Maven with Java 1.8 and Java 11(only for trino module).
toolchains.xml in ${user.home}/.m2/ dir, the content is<?xml version="1.0" encoding="UTF-8"?>
<toolchains>
<toolchain>
<type>jdk</type>
<provides>
<version>11</version>
<vendor>sun</vendor>
</provides>
<configuration>
<jdkHome>${yourJdk11Home}</jdkHome>
</configuration>
</toolchain>
</toolchains>
mvn package -P toolchainmvn -DskipTests package -P toolchainArctic support multiple processing engines as below:
| Processing Engine | Version |
|---|---|
| Flink | 1.12.x, 1.14.x and 1.15.x |
| Spark | 2.3, 3.1 |
| Trino | 380 |
Visit https://arctic.netease.com/ch/docker-quickstart/ to quickly explore what arctic can do.
If you are interested in Lakehouse, Data Lake Format, welcome to join our community, we welcome any organizations, teams and individuals to grow together, and sincerely hope to help users better use Data Lake Format through open source.
Join the Arctic WeChat Group: Add " kllnn999 " as a friend on WeChat and specify “Arctic lover”.