blob: 4620aed03442644c34ac5f97ee43c98479719342 [file] [view]
## 10.3.0
#### Project
* Bump up BanyanDB dependency version(server and java-client) to 0.9.0.
* Fix CVE-2025-54057, restrict and validate url for widgets.
* Fix `MetricsPersistentWorker`, remove DataCarrier queue from `Hour/Day` dimensions metrics persistent process.
This is important to reduce memory cost and `Hour/Day` dimensions metrics persistent latency.
* [Break Change] BanyanDB: support new Trace model.
#### OAP Server
* Implement self-monitoring for BanyanDB via OAP Server.
* BanyanDB: Support `hot/warm/cold` stages configuration.
* Fix query continues profiling policies error when the policy is already in the cache.
* Support `hot/warm/cold` stages TTL query in the status API and graphQL API.
* PromQL Service: traffic query support `limit` and regex match.
* Fix an edge case of HashCodeSelector(Integer#MIN_VALUE causes ArrayIndexOutOfBoundsException).
* Support Flink monitoring.
* BanyanDB: Support `@ShardingKey` for Measure tags.
* BanyanDB: Support cold stage data query for metrics/traces/logs.
* Increase the idle check interval of the message queue to 200ms to reduce CPU usage under low load conditions.
* Limit max attempts of DNS resolution of Istio ServiceEntry to 3, and do not wait for first resolution result in case the DNS is not resolvable at all.
* Support analysis waypoint metrics in Envoy ALS receiver.
* Add Ztunnel component in the topology.
* [Break Change] Change `componentId` to `componentIds` in the K8SServiceRelation Scope.
* Adapt the mesh metrics if detect the ambient mesh in the eBPF access log receiver.
* Add JSON format support for the `/debugging/config/dump` status API.
* Enhance status APIs to support multiple `accept` header values, e.g. `Accept: application/json; charset=utf-8`.
* Storage: separate `SpanAttachedEventRecord` for SkyWalking trace and Zipkin trace.
* [Break Change]BanyanDB: Setup new Group policy.
* Bump up commons-beanutils to 1.11.0.
* Refactor: simplify the `Accept` http header process.
* [Break Change]Storage: Move `event` from metrics to records.
* Remove string limitation in Jackson deserializer for ElasticSearch client.
* Fix `disable.oal` does not work.
* Enhance the stability of e2e PHP tests and update the PHP agent version.
* Add component ID for the `dameng` JDBC driver.
* BanyanDB: Support custom `TopN pre-aggregation` rules configuration in file `bydb-topn.yml`.
* refactor: implement OTEL handler with SPI for extensibility.
* chore: add `toString` implementation for `StorageID`.
* chore: add a warning log when connecting to ES takes too long.
* Fix the query time range in the metadata API.
* OAP gRPC-Client support `Health Check`.
* [Break Change] `health_check_xx` metrics make response 1 represents healthy, 0 represents unhealthy.
* Bump up grpc to 1.70.0.
* BanyanDB: support new Index rule type `SKIPPING/TREE`, and update the record `log`'s `trace_id` indexType to `SKIPPING`
* BanyanDB: remove `index-only` from tag setting.
* Fix analysis tracing profiling span failure in ES storage.
* Add UI dashboard for Ruby runtime metrics.
* Tracing Query Execution HTTP APIs: make the argument `service layer` optional.
* GraphQL API: metadata, topology, log and trace support query by name.
* [Break Change] MQE function `sort_values` sorts according to the aggregation result and labels rather than the simple time series values.
* Self Observability: add `metrics_aggregation_queue_used_percentage` and `metrics_persistent_collection_cached_size` metrics for the OAP server.
* Optimize metrics aggregate/persistent worker: separate `OAL` and `MAL` workers and consume pools. The dataflow signal drives the new MAL consumer,
the following table shows the pool size, driven mode and queue size for each worker.
| Worker | poolSize | isSignalDrivenMode | queueChannelSize | queueBufferSize |
|-------------------------------|------------------------------------------|--------------------|------------------|-----------------|
| MetricsAggregateOALWorker | Math.ceil(availableProcessors * 2 * 1.5) | false | 2 | 10000 |
| MetricsAggregateMALWorker | availableProcessors * 2 / 8, at least 1 | true | 1 | 1000 |
| MetricsPersistentMinOALWorker | availableProcessors * 2 / 8, at least 1 | false | 1 | 2000 |
| MetricsPersistentMinMALWorker | availableProcessors * 2 / 16, at least 1 | true | 1 | 1000 |
* Bump up netty to 4.2.4.Final.
* Bump up commons-lang to 3.18.0.
* BanyanDB: support group `replicas` and `user/password` for basic authentication.
* BanyanDB: fix Zipkin query missing tag `QUERY`.
* Fix `IllegalArgumentException: Incorrect number of labels`, tags in the `LogReportServiceHTTPHandler` and `LogReportServiceGrpcHandler` inconsistent with `LogHandler`.
* BanyanDB: fix Zipkin query by `annotationQuery`
* HTTP Server: Use the default shared thread pool rather than creating a new event loop thread pool for each server. Remove the `MAX_THREADS` from each server config.
* Optimize all Armeria HTTP Server(s) to share the `CommonPools` for the whole JVM.
In the `CommonPools`, the max threads for `EventLoopGroup` is `processor * 2`, and for `BlockingTaskExecutor` is `200` and can be recycled if over the keepAliveTimeMillis (60000L by default).
Here is a summary of the thread dump without UI query in a simple Kind env deployed by SkyWalking showcase:
| **Thread Type** | **Count** | **Main State** | **Description** |
|---------------------------------|-----------|-----------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| **JVM System Threads** | 12 | RUNNABLE/WAITING | Includes Reference Handler, Finalizer, Signal Dispatcher, Service Thread, C2/C1 CompilerThreads, Sweeper thread, Common-Cleaner, etc. |
| **Netty I/O Worker Threads** | 32 | RUNNABLE | Threads named "armeria-common-worker-epoll-*", handling network I/O operations. |
| **gRPC Worker Threads** | 16 | RUNNABLE | Threads named "grpc-default-worker-*". |
| **HTTP Client Threads** | 4 | RUNNABLE | Threads named "HttpClient-*-SelectorManager". |
| **Data Consumer Threads** | 47 | TIMED_WAITING (sleeping) | Threads named "DataCarrier.*", used for metrics data consumption. |
| **Scheduled Task Threads** | 10 | TIMED_WAITING (parking) | Threads named "pool-*-thread-*". |
| **ForkJoinPool Worker Threads** | 2 | WAITING (parking) | Threads named "ForkJoinPool-*". |
| **BanyanDB Processor Threads** | 2 | TIMED_WAITING (parking) | Threads named "BanyanDB BulkProcessor". |
| **gRPC Executor Threads** | 3 | TIMED_WAITING (parking) | Threads named "grpc-default-executor-*". |
| **JVM GC Threads** | 13 | RUNNABLE | Threads named "GC Thread#*" for garbage collection. |
| **Other JVM Internal Threads** | 3 | RUNNABLE | Includes VM Thread, G1 Main Marker, VM Periodic Task Thread. |
| **Attach Listener** | 1 | RUNNABLE | JVM attach listener thread. |
| **Total** | **158** | - | - |
* BanyanDB: make `BanyanDBMetricsDAO` output `scan all blocks` info log only when the model is not `indexModel`.
* BanyanDB: fix the `BanyanDBMetricsDAO.multiGet` not work properly in `IndexMode`.
* BanyanDB: remove `@StoreIDAsTag`, and automatically create a virtual String tag `id` for the SeriesID in `IndexMode`.
* Remove method `appendMutant` from StorageID.
* Fix otlp log handler response error and otlp span convert error.
* Fix service_relation source layer in mq entry span analyse.
* Fix metrics comparison in promql with bool modifier.
* Add rate limiter for Zipkin trace receiver to limit maximum spans per second.
* Open `health-checker` module by default due to latest UI changes. Change the default check period to 30s.
* Refactor Kubernetes coordinator to be more accurate about node readiness.
* Bump up netty to 4.2.5.Final.
* BanyanDB: fix log query missing order by condition, and fix missing service id condition when query by instance id or endpoint id.
* Fix potential NPE in the `AlarmStatusQueryHandler`.
* Aggregate TopN Slow SQL by service dimension.
* BanyanDB: support add group prefix (namespace) for BanyanDB groups.
* BanyanDB: fix when setting `@BanyanDB.TimestampColumn`, the column should not be indexed.
* OAP Self Observability: make Trace analysis metrics separate by label `protocol`, add Zipkin span dropped metrics.
* BanyanDB: Move data write logic from BanyanDB Java Client to OAP and support observe metrics for write operations.
* Self Observability: add write latency metrics for BanyanDB and ElasticSearch.
* Fix the malfunctioning alarm feature of MAL metrics due to unknown metadata in L2 aggregate worker.
* Make MAL percentile align with OAL percentile calculation.
* Update Grafana dashboards for OAP observability.
* BanyanDB: fix query `getInstance` by instance ID.
* Support the go agent(0.7.0 release) bundled pprof profiling feature.
* Service and TCPService source support analyze TLS mode.
* Library-pprof-parser: feat: add PprofSegmentParser.
* Storage: feat: add languageType column to ProfileThreadSnapshotRecord.
* Feat: add go profile analyzer
* Get Alarm Runtime Status: support query the running status for the whole cluster.
#### UI
* Implement self-monitoring for BanyanDB via UI.
* Enhance the trace `List/Tree/Table` graph to support displaying multiple refs of spans and distinguishing different parents.
* Fix: correct the same labels for metrics.
* Refactor: use the Fetch API to instead of Axios.
* Support cold stage data for metrics, trace and log.
* Add route to status API `/debugging/config/dump` in the UI.
* Implement the Status API on Settings page.
* Bump vite from 6.2.6 to 6.3.6.
* Enhance async profiling by adding shorter and custom duration options.
* Fix select wrong span to analysis in trace profiling.
* Correct the service list for legends in trace graphs.
* Correct endpoint topology data to avoid undefined.
* Fix the snapshot charts unable to display.
* Bump vue-i18n from 9.14.3 to 9.14.5.
* Fix split queries for topology to avoid page crash.
* Self Observability ui-template: Add new panels for monitor `metrics aggregation queue used percentage` and `metrics persistent collection cached size`.
* test: introduce and set up unit tests in the UI.
* test: implement comprehensive unit tests for components.
* refactor: optimize data types for widgets and dashboards.
* fix: optimize appearing the wrong prompt by pop-up for the HTTP environments in copy function.
* refactor the configuration view and implement the optional config for displaying timestamp in Log widget.
* test: implement unit tests for hooks and refactor some types.
* fix: share OAP proxy services for different endpoints and use health checked endpoints group.
* Optimize buttons in time picker component.
* Optimize the router system and implement unit tests for router.
* Bump element-plus from 2.9.4 to 2.11.0.
* Adapt new trace protocol and implement new trace view.
* Implement Trace page.
* Support collapsing and expanding for the event widget.
* UI-template: add BanyanDB and Elasticsearch write latency dashboards for OAP self observability.
#### Documentation
* BanyanDB: Add `Data Lifecycle Stages(Hot/Warm/Cold)` documentation.
* Add `SWIP-9 Support flink monitoring`.
* Fix `Metrics Attributes` menu link.
* Implement the Status API on Settings page.
* Fix: Add the prefix for http url.
* Enhance the async-profiling duration options.
* Enhance the TTL Tab on Setting page.
* Fix the snapshot charts in alarm page.
* Fix `Fluent Bit` dead links.
All issues and pull requests are [here](https://github.com/apache/skywalking/milestone/230?closed=1)