This article introduces the storage and analysis practices of Trace, one of the core observability data. For an overview of the complete observability solution, please refer to Overview. For resource evaluation, cluster deployment, and optimization, please refer to Log.
Trace data has distinct characteristics in terms of writing and querying patterns. Targeted configurations during table creation can significantly improve performance. Create your table based on the key guidelines below:
Partitioning and Sorting
service_name and a time field of type DATETIME as keys; this provides multiple times acceleration when querying traces for a specific service over a certain period.Bucketing
Compaction
VARIANT Data Type
span_attributes and resource_attributes. This automatically splits JSON data into sub-columns for storage, improving compression rates and reducing storage space while also enhancing filtering and sub-column analysis performance.Indexing
support_phrase option to support phrase queries. If not needed, set it to false to reduce storage usage.Storage
log_s3 object storage and log_policy_3day policy to move data older than 3 days to S3.CREATE DATABASE log_db; USE log_db; -- Not required for compute-storage separation mode CREATE RESOURCE "log_s3" PROPERTIES ( "type" = "s3", "s3.endpoint" = "your_endpoint_url", "s3.region" = "your_region", "s3.bucket" = "your_bucket", "s3.root.path" = "your_path", "s3.access_key" = "your_ak", "s3.secret_key" = "your_sk" ); -- Not required for compute-storage separation mode CREATE STORAGE POLICY log_policy_3day PROPERTIES( "storage_resource" = "log_s3", "cooldown_ttl" = "259200" ); CREATE TABLE trace_table ( service_name VARCHAR(200), timestamp DATETIME(6), service_instance_id VARCHAR(200), trace_id VARCHAR(200), span_id STRING, trace_state STRING, parent_span_id STRING, span_name STRING, span_kind STRING, end_time DATETIME(6), duration BIGINT, span_attributes VARIANT, events ARRAY<STRUCT<timestamp:DATETIME(6), name:STRING, attributes:MAP<STRING, STRING>>>, links ARRAY<STRUCT<trace_id:STRING, span_id:STRING, trace_state:STRING, attributes:MAP<STRING, STRING>>>, status_message STRING, status_code STRING, resource_attributes VARIANT, scope_name STRING, scope_version STRING, INDEX idx_timestamp(timestamp) USING INVERTED, INDEX idx_service_instance_id(service_instance_id) USING INVERTED, INDEX idx_trace_id(trace_id) USING INVERTED, INDEX idx_span_id(span_id) USING INVERTED, INDEX idx_trace_state(trace_state) USING INVERTED, INDEX idx_parent_span_id(parent_span_id) USING INVERTED, INDEX idx_span_name(span_name) USING INVERTED, INDEX idx_span_kind(span_kind) USING INVERTED, INDEX idx_end_time(end_time) USING INVERTED, INDEX idx_duration(duration) USING INVERTED, INDEX idx_span_attributes(span_attributes) USING INVERTED, INDEX idx_status_message(status_message) USING INVERTED, INDEX idx_status_code(status_code) USING INVERTED, INDEX idx_resource_attributes(resource_attributes) USING INVERTED, INDEX idx_scope_name(scope_name) USING INVERTED, INDEX idx_scope_version(scope_version) USING INVERTED ) ENGINE = OLAP DUPLICATE KEY(service_name, timestamp) PARTITION BY RANGE(timestamp) () DISTRIBUTED BY RANDOM BUCKETS 250 PROPERTIES ( "compression" = "zstd", "compaction_policy" = "time_series", "inverted_index_storage_format" = "V2", "dynamic_partition.enable" = "true", "dynamic_partition.create_history_partition" = "true", "dynamic_partition.time_unit" = "DAY", "dynamic_partition.start" = "-30", "dynamic_partition.end" = "1", "dynamic_partition.prefix" = "p", "dynamic_partition.buckets" = "250", "dynamic_partition.replication_num" = "2", -- Not required for compute-storage separation "replication_num" = "2", -- Not required for compute-storage separation "storage_policy" = "log_policy_3day" -- Not required for compute-storage separation );
Doris provides open and general-purpose Stream HTTP APIs that can integrate with Trace collection systems like OpenTelemetry.
Here we use a Spring Boot example application integrated with the OpenTelemetry Java SDK. The example application comes from the official demo, which returns a simple “Hello World!” string for requests to the path “/”.
Download the OpenTelemetry Java Agent. The advantage of using the Java Agent is that no modifications are needed to existing application. For other languages and integration methods, see the OpenTelemetry official website Language APIs & SDKs or Zero-code Instrumentation.
Download and extract OpenTelemetry Collector. You need to download the package starting with “otelcol-contrib”, which includes the Doris Exporter.
Create the otel_demo.yaml configuration file as follows. For more details, refer to the Doris Exporter documentation.
receivers: otlp: # OTLP protocol, receiving data sent by the OpenTelemetry Java Agent protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: send_batch_size: 100000 # Number of records per batch; recommended batch size between 100MB-1GB timeout: 10s exporters: doris: endpoint: http://localhost:8030 # FE HTTP address database: doris_db_name username: doris_username password: doris_password table: traces: doris_table_name create_schema: true # Whether to auto-create schema; manual table creation is needed if set to false mysql_endpoint: localhost:9030 # FE MySQL address history_days: 10 create_history_days: 10 timezone: Asia/Shanghai timeout: 60s # Timeout for HTTP stream load client log_response: true sending_queue: enabled: true num_consumers: 20 queue_size: 1000 retry_on_failure: enabled: true initial_interval: 5s max_interval: 30s headers: load_to_single_tablet: "true"
./otelcol-contrib --config otel_demo.yaml
Before starting the application, simply add a few environment variables without modifying any code.
export JAVA_TOOL_OPTIONS="${JAVA_TOOL_OPTIONS} -javaagent:/your/path/to/opentelemetry-javaagent.jar" # Path to OpenTelemetry Java Agent export OTEL_JAVAAGENT_LOGGING="none" # Disable Otel logs to prevent interference with application logs export OTEL_SERVICE_NAME="myproject" export OTEL_TRACES_EXPORTER="otlp" # Send trace data using OTLP protocol export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317" # Address of the OpenTelemetry Collector java -jar myproject-0.0.1-SNAPSHOT.jar
Running curl localhost:8080 will trigger a call to the hello service. The OpenTelemetry Java Agent will automatically generate Trace data and send it to the OpenTelemetry Collector, which then writes the Trace data to the Doris table (default is otel.otel_traces) via the configured Doris Exporter.
Trace querying typically uses visual query interfaces such as Grafana.
Filter by time range and service name to display Trace summaries, including latency distribution charts and detailed individual Traces.
Click on the link to view the Trace detail.