OpenTelemetry, also known as OTel, is a vendor-neutral open source Observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, and logs. OpenTelemetry defines a set of standards and protocols for observability, widely adopted by the observability community and vendors, gradually becoming the de facto standard in the field of observability.
A major goal of OpenTelemetry is to enable easy instrumentation of your applications and systems, regardless of the programming language, infrastructure, and runtime environments used. The backend (storage) and the frontend (visualization) of telemetry data are intentionally left to other tools. Doris, as a storage backend integrated with OpenTelemetry, provides high-performance, low-cost, unified observability data storage and analysis capabilities. The overall architecture is as follows:
Download the OpenTelemetry Collector Contrib installation package from the OpenTelemetry official Release page, for example:
https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.132.2/otelcol-contrib_0.132.2_linux_amd64.tar.gz
Extract the package to obtain the otelcol-contrib executable file.
The core configuration of the OpenTelemetry Collector Doris Exporter is as follows:
| Configuration | Description |
|---|---|
endpoint | Doris FE HTTP address in the format host:port, e.g., "127.0.0.1:8030" |
mysql_endpoint | Doris FE MySQL address in the format host:port, e.g., "127.0.0.1:9030" |
username | Doris username with write permissions for the corresponding database tables |
password | Password for the Doris user |
database | Target Doris database name |
table.logs | Doris table name for logs data (default: otel_logs) |
table.traces | Doris table name for traces data (default: otel_traces) |
table.metrics | Doris table name for metrics data (default: otel_metrics) |
create_schema | Whether to automatically create Doris database tables (default: true) |
history_days | Retention days for historical data in auto-created tables (default: 0, meaning permanent retention) |
create_history_days | Initial partition days for auto-created tables (default: 0, meaning no partitions created) |
label_prefix | Doris Stream Load Label prefix. The final label is {label_prefix}_{db}_{table}_{yyyymmdd_hhmmss}_{uuid} (default: open_telemetry) |
headers | Headers for Doris Stream Load in YAML map format |
log_progress_interval | Interval (in seconds) for logging throughput (default: 10, set to 0 to disable) |
For more configurations, refer to:
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/dorisexporter
This example demonstrates TEXT log collection using Doris FE logs.
1. Data
FE log files are typically located in the fe/log/fe.log file under the Doris installation directory. These are standard Java program logs, including fields such as timestamps, log levels, thread names, code locations, and log content. In addition to normal logs, there are also exception logs with stack traces. Since stack traces span multiple lines, the log collection and storage process must combine the main log and stack trace into a single log entry.
Example logs:
2024-07-08 21:18:01,432 INFO (Statistics Job Appender|61) [StatisticsJobAppender.runAfterCatalogReady():70] Stats table not available, skip
2024-07-08 21:18:53,710 WARN (STATS_FETCH-0|208) [StmtExecutor.executeInternalQuery():3332] Failed to run internal SQL: OriginStatement{originStmt='SELECT * FROM __internal_schema.column_statistics WHERE part_id is NULL ORDER BY update_time DESC LIMIT 500000', idx=0}
org.apache.doris.common.UserException: errCode = 2, detailMessage = tablet 10031 has no queryable replicas. err: replica 10032's backend 10008 does not exist or not alive
at org.apache.doris.planner.OlapScanNode.addScanRangeLocations(OlapScanNode.java:931) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.planner.OlapScanNode.computeTabletInfo(OlapScanNode.java:1197) ~[doris-fe.jar:1.2-SNAPSHOT]
2. OpenTelemetry Configuration
The log collection configuration file (e.g., opentelemetry_java_log.yml) consists of three main parts, corresponding to the ETL process:
Example configuration (opentelemetry_java_log.yml):
# 1. Receivers – Read raw data # The `filelog` receiver reads log files from the local filesystem. # The `multiline` configuration merges stack traces with the main log by matching timestamps. receivers: filelog: include: - /path/to/fe.log start_at: beginning multiline: line_start_pattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}' # Match timestamp as new log start operators: # Parse logs - type: regex_parser regex: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?P<severity>INFO|WARN|ERROR) (?P<message>.*)' timestamp: parse_from: attributes.time layout: '%Y-%m-%d %H:%M:%S,%f' severity: parse_from: attributes.severity trace: TRACE debug: DEBUG info: INFO warn: WARN error: ERROR fatal: FATAL # 2. Processors – Transform data # The `batch` processor groups logs before sending. processors: batch: send_batch_size: 100000 # Number of logs per batch (recommended batch size: 100MB–1GB) timeout: 10s # 3. Exporters – Output data # The `doris` exporter sends data to Doris using the Stream Load HTTP interface (JSON format by default). exporters: doris: endpoint: http://localhost:8030 # FE HTTP address mysql_endpoint: localhost:9030 # FE MySQL address database: doris_db_name username: doris_username password: doris_password table: logs: otel_logs create_schema: true # Automatically create schema (if false, tables must be created manually) history_days: 10 create_history_days: 10 timezone: Asia/Shanghai timeout: 60s # HTTP Stream Load client timeout log_response: true sending_queue: enabled: true num_consumers: 20 queue_size: 1000 retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s headers: load_to_single_tablet: "true" service: pipelines: logs: receivers: [filelog] processors: [batch] exporters: [doris]
3. Running OpenTelemetry
./otelcol-contrib --config config/opentelemetry_java_log.yml # When `log_response` is true, logs will display Stream Load request/response details: 2025-08-18T00:33:22.543+0800 info dorisexporter@v0.132.0/exporter_logs.go:181 log response: { "TxnId": 52, "Label": "open_telemetry_otel_otel_logs_20250818003321_498bb8ec-040c-4982-9eb4-452b15129782", "Comment": "", "TwoPhaseCommit": "false", "Status": "Success", "Message": "OK", "NumberTotalRows": 50355, "NumberLoadedRows": 50355, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 31130235, "LoadTimeMs": 680, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 3, "ReadDataTimeMs": 106, "WriteDataTimeMs": 653, "ReceiveDataTimeMs": 11, "CommitAndPublishTimeMs": 23 } # Progress logs (every 10s) show throughput: 2025-08-18T00:05:00.017+0800 info dorisexporter@v0.132.0/progress_reporter.go:63 [LOG] total 11 MB 18978 ROWS, total speed 0 MB/s 632 R/s, last 10 seconds speed 1 MB/s 1897 R/s
This example demonstrates JSON log collection using GitHub Events Archive data.
1. Data
GitHub Events Archive contains archived user activity events in JSON format, downloadable from https://www.gharchive.org/. For example, to download data from January 1, 2024, at 15:00:
wget https://data.gharchive.org/2024-01-01-15.json.gz
Below is a sample entry (formatted for readability; actual data is one JSON object per line):
{ "id": "37066529221", "type": "PushEvent", "actor": { "id": 46139131, "login": "Bard89", "display_login": "Bard89", "gravatar_id": "", "url": "https://api.github.com/users/Bard89", "avatar_url": "https://avatars.githubusercontent.com/u/46139131?" }, "repo": { "id": 780125623, "name": "Bard89/talk-to-me", "url": "https://api.github.com/repos/Bard89/talk-to-me" }, "payload": { "repository_id": 780125623, "push_id": 17799451992, "size": 1, "distinct_size": 1, "ref": "refs/heads/add_mvcs", "head": "f03baa2de66f88f5f1754ce3fa30972667f87e81", "before": "85e6544ede4ae3f132fe2f5f1ce0ce35a3169d21" }, "public": true, "created_at": "2024-04-01T23:00:00Z" }
2. OpenTelemetry Configuration
The key difference from the TEXT log configuration is the json_parser operator in the filelog receiver, which parses each line as JSON. Extracted fields are used for subsequent processing.
Example configuration (opentelemetry_json_log.yml):
receivers: filelog: include: - /path/to/2024-01-01-15.json start_at: beginning operators: - type: json_parser timestamp: parse_from: attributes.created_at layout: '%Y-%m-%dT%H:%M:%SZ' processors: batch: send_batch_size: 100000 # Logs per batch (recommended: 100MB–1GB total size) timeout: 10s exporters: doris: endpoint: http://localhost:8030 # FE HTTP address mysql_endpoint: localhost:9030 # FE MySQL address database: doris_db_name username: doris_username password: doris_password table: logs: otel_logs create_schema: true # Auto-create schema (if false, manually create tables) history_days: 10 create_history_days: 10 timezone: Asia/Shanghai timeout: 60s # HTTP Stream Load client timeout log_response: true sending_queue: enabled: true num_consumers: 20 queue_size: 1000 retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s headers: load_to_single_tablet: "true" service: pipelines: logs: receivers: [filelog] processors: [batch] exporters: [doris]
3. Running OpenTelemetry
./otelcol-contrib --config config/opentelemetry_json_log.yml
1. OpenTelemetry Configuration
Create the configuration file otel_trace.yml as follows:
receivers: otlp: # OTLP protocol to receive data from OpenTelemetry Java Agent protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: send_batch_size: 100000 # Number of traces per batch (recommended batch size: 100MB–1GB) timeout: 10s exporters: doris: endpoint: http://localhost:8030 # FE HTTP address database: doris_db_name username: doris_username password: doris_password table: traces: doris_table_name create_schema: true # Auto-create schema (if false, tables must be created manually) mysql_endpoint: localhost:9030 # FE MySQL address history_days: 10 create_history_days: 10 timezone: Asia/Shanghai timeout: 60s # HTTP Stream Load client timeout log_response: true sending_queue: enabled: true num_consumers: 20 queue_size: 1000 retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s headers: load_to_single_tablet: "true" service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [doris]
2. Run OpenTelemetry
./otelcol-contrib --config otel_trace.yaml
3. Application Integration with OpenTelemetry SDK
Here we demonstrate integration using a Spring Boot example application (from the official demo), which returns “Hello World!” for the “/” endpoint.
Download the OpenTelemetry Java Agent.
Before starting the application, set these environment variables (no code modification needed):
export JAVA_TOOL_OPTIONS="${JAVA_TOOL_OPTIONS} -javaagent:/your/path/to/opentelemetry-javaagent.jar" # Path to OpenTelemetry Java Agent export OTEL_JAVAAGENT_LOGGING="none" # Disable OTEL logs to avoid interference with application logs export OTEL_SERVICE_NAME="myproject" export OTEL_TRACES_EXPORTER="otlp" # Use OTLP protocol to send trace data export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317" # OpenTelemetry Collector address java -jar myproject-0.0.1-SNAPSHOT.jar