import { InstallationEnvs } from ‘../components/TableData.jsx’;
To successfully install StreamPark, the following environments need to be prepared:
You can download the latest version of StreamPark directly from the official website. The version used in this document is 2.1.5, available at: https://streampark.apache.org/download
After the download is complete, extract the package:
# Extract the StreamPark installation package. tar -zxvf apache-streampark_2.12-2.1.5-incubating-bin.tar.gz
The extracted directory will look like this:
├── bin │ ├── startup.sh // Startup script │ ├── shutdown.sh // Shutdown script │ └── ...... ├── conf │ ├── config.yaml // Project configuration file │ └── logback-spring.xml // Logging configuration file ├── lib │ └── *.jar // Project JAR files ├── logs // Log directory ├── script │ ├── data │ │ ├── mysql-data.sql // MySQL DDL SQL scripts │ │ └── pgsql-data.sql // PostgreSQL DDL SQL scripts │ ├── schema │ │ ├── mysql-schema.sql // MySQL full initialization data │ │ └── pgsql-schema.sql // PostgreSQL full initialization data │ └── upgrade │ ├── mysql │ │ ├── 1.2.3.sql // SQL script for upgrading to version 1.2.3 │ │ ├── 2.0.0.sql // SQL script for upgrading to version 2.0.0 │ │ ├── ...... │ └── pgsql │ └── ...... └── temp // Temporary directory for internal use, do not delete
Navigate to the bin directory under the installation directory and start the program:
# Go to the bin directory under the installation directory cd bin # Start the program ./startup.sh
:::danger Warning After starting the program, you may encounter the following error:streampark.workspace.local: "/tmp/streampark" is an invalid path, please reconfigure in xxx/conf/config.yaml
:::
This is because the local workspace directory for StreamPark does not exist or is not set to a valid path. The solution is simple: you can use the default configuration and create the streampark directory under /tmp:
mkdir -p /tmp/streampark
Alternatively, configure a valid local path in the streampark.workspace.local property in the conf/config.yaml file.
Run startup.sh again, and you should see that the program starts successfully:
Visit StreamPark at: http://127.0.0.1:10000
:::tip Tip Username and password: admin / streampark :::
You should see Login successful!
At this point, you can see that the project is running successfully. The installation above is based on the default H2 local database. Let’s now look at how to integrate external databases (MySQL or PostgreSQL).
If you want to use an external database, you need to modify the conf/config.yaml file in the installation directory. The core modification is as follows (using MySQL as an example):
datasource: dialect: mysql # Change this to mysql, default is h2, supports mysql and pgsql h2-data-dir: ~/streampark/h2-data # If using H2, configure this directory. If using MySQL or PostgreSQL, configure the remaining information (username and password) username: # Username for the data source connection password: # Password for the data source connection url: # JDBC connection URL, e.g., jdbc:mysql://localhost:3306/streampark?useUnicode=true&characterEncoding=UTF-8&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=GMT%2B8
Next, manually connect to the external database and initialize the MySQL table creation script (location: installation_directory/script/schema/mysql-schema.sql) and the data initialization script (location: installation_directory/script/data/mysql-data.sql).
Once the above steps are complete, start the service, and it will automatically connect and use the external database.
If you need to deploy jobs to YARN, you must configure the Hadoop environment variables. If you are using a Hadoop environment installed via CDH, the relevant environment variables can be configured as follows:
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop # Hadoop installation directory export HADOOP_CONF_DIR=/etc/hadoop/conf export HIVE_HOME=$HADOOP_HOME/../hive export HBASE_HOME=$HADOOP_HOME/../hbase export HADOOP_HDFS_HOME=$HADOOP_HOME/../hadoop-hdfs export HADOOP_MAPRED_HOME=$HADOOP_HOME/../hadoop-mapreduce export HADOOP_YARN_HOME=$HADOOP_HOME/../hadoop-yarn
:::tip Tip StreamPark will automatically read the Hadoop configuration from the environment variables, connect to Hadoop, upload resources to HDFS, and deploy jobs to YARN. :::
In addition, you may need to modify the conf/config.yaml file (for example, specifying the hadoop user, Kerberos authentication, etc.). The core modifications are as follows:
streampark: workspace: # Root path for storing resources in HDFS remote: hdfs:///streampark/ proxy: # Hadoop YARN proxy URL, e.g., Knox process address https://streampark.com:8443/proxy/yarn yarn-url: yarn: # Authentication type http-auth: 'simple' # Default is simple or kerberos # Flink on YARN or Spark on YARN, HADOOP_USER_NAME hadoop-user-name: hdfs # If Kerberos authentication is enabled, configure the following: security: kerberos: login: debug: false enable: false keytab: krb5: principal:
Here is the detailed explanation of the config.yaml file configuration, allowing you to easily integrate SSO or LDAP, etc.:
########################################### Logging Configuration ########################################### logging: level: root: info ########################################### Basic Service Configuration ########################################### server: port: 10000 session: ttl: 2h # User login session expiration time. Users will be automatically logged out after this period. undertow: # Undertow service settings buffer-size: 1024 direct-buffers: true threads: io: 16 worker: 256 ########################################### Database Configuration ########################################### datasource: dialect: h2 # Database dialect, supports h2, mysql, pgsql h2-data-dir: ~/streampark/h2-data # If using H2, configure this directory username: password: url: # Database connection URL, e.g., jdbc:mysql://localhost:3306/streampark?...... ########################################### Project Configuration ########################################### streampark: ## Workspace configuration, local and HDFS workspaces for different types of resources workspace: local: /tmp/streampark remote: hdfs:///streampark/ ## Proxy settings proxy: lark-url: # Feishu proxy URL yarn-url: # YARN proxy URL ## YARN-related configuration yarn: http-auth: 'simple' # Authentication method hadoop-user-name: hdfs # Configure the Hadoop username ## Project management settings project: max-build: 16 # Maximum number of projects to run concurrently. ## Development interface settings openapi: white-list: # Configure the white list for open APIs ########################################### Kerberos Authentication Configuration ########################################### security: kerberos: login: debug: false enable: false keytab: krb5: principal: ttl: 2h ########################################
Looking forward to your further exploration~