blob: 628842522ce30b5eeb52c97432a462e22e76cabb [file] [view]
# SkyWalking Python Agent
Apache SkyWalking Python agent for distributed tracing, metrics, logging, and profiling.
## Project Structure
```
skywalking/ # Main agent package
__init__.py # Component, Layer, Kind enums
config.py # All config via SW_AGENT_* env vars
agent/ # SkyWalkingAgent singleton, queue management, reporter threads
trace/ # context.py, span.py, carrier.py, segment.py, tags.py
plugins/ # ~37 auto-instrumentation plugins (sw_*.py)
meter/ # Counter, gauge, histogram, PVM metrics
log/ # Structured logging with trace context
profile/ # Thread/greenlet profiling
client/ # gRPC/HTTP/Kafka protocol clients (sync + async)
bootstrap/ # CLI (sw-python), sitecustomize loader, uwsgi hook
utils/ # Filters, comparators, atomic refs
decorators.py # @trace, @runnable for manual instrumentation
sw_python/ # CLI entry point module
tests/
unit/ # Standard pytest unit tests
plugin/ # Docker-based integration tests per plugin
base.py # TestPluginBase with validate() method
conftest.py # docker_compose fixture (testcontainers)
docker-compose.base.yml # Mock collector + agent base services
Dockerfile.plugin # Agent image for plugin testing
data/ # Data layer plugins (redis, mongo, mysql, etc.)
http/ # HTTP client plugins (requests, urllib3, httpx, etc.)
web/ # Web framework plugins (flask, django, fastapi, etc.)
e2e/ # End-to-end tests with SkyWalking infra-e2e
orchestrator.py # get_test_vector() for multi-version testing
tools/
plugin_doc_gen.py # Auto-generate docs/en/setup/Plugins.md
config_doc_gen.py # Auto-generate Configuration.md
```
## Python Version Support
- **Current (master):** Python 3.8 - 3.11 (tested in CI), declared >=3.8 <=3.13
- **In-progress (PR #374):** Dropping 3.8, adding 3.12 + 3.13 to CI matrix
- **Upcoming:** Python 3.14 support needed
## Build & Development
```bash
make env # Setup Poetry environment with all extras
make install # Install with all optional dependencies
make lint # Run flake8 + pylint
make fix # Auto-fix style (unify, flynt)
make test # Full test suite (unit + plugin docker tests)
make doc-gen # Regenerate Plugins.md + Configuration.md
make package # Build distribution wheel
make gen # Generate gRPC protobuf code
```
Uses Poetry for dependency management. Config in `pyproject.toml`.
## Plugin Configuration Options (config.py)
Plugin-specific settings (all via `SW_` env vars):
- `agent_disable_plugins`: comma-separated regex patterns to skip plugins
- `plugin_http_http_params_length_threshold`: max chars for HTTP params (default 1024)
- `plugin_http_ignore_method`: comma-delimited HTTP methods to ignore
- `plugin_sql_parameters_max_length`: max SQL param length (default 0 = disabled)
- `plugin_flask_collect_http_params`, `plugin_django_collect_http_params`, `plugin_fastapi_collect_http_params`, `plugin_sanic_collect_http_params`, `plugin_bottle_collect_http_params`: collect HTTP params per framework
- `plugin_pymongo_trace_parameters` / `plugin_pymongo_parameters_max_length`: MongoDB filter tracing
- `plugin_elasticsearch_trace_dsl`: trace ES DSL
- `plugin_celery_parameters_length`: max Celery param length (default 512)
- `plugin_grpc_ignored_methods`: comma-delimited gRPC methods to ignore
Filter functions: `config.ignore_http_method_check(method)`, `config.ignore_grpc_method_check(method)`
## Context & Carrier API Details
### get_context() Signatures
```python
context.new_entry_span(op: str, carrier: Optional[Carrier] = None, inherit: Optional[Component] = None) -> Span
context.new_exit_span(op: str, peer: str, component: Optional[Component] = None, inherit: Optional[Component] = None) -> Span
context.new_local_span(op: str) -> Span
context.active_span # property: current topmost active span
context.put_correlation(key, value) # max 3 elements, 128 chars each
context.get_correlation(key) -> str
```
### Carrier Format
- `Carrier` class with key `sw8`, encodes 8 parts: sample, trace_id, segment_id, span_id, service, instance, endpoint, client_address (all base64 except span_id)
- `SW8CorrelationCarrier` subclass with key `sw8-correlation` for cross-process correlation context
- Iterate carrier items: `for item in carrier: headers[item.key] = item.val`
- `carrier.is_valid` / `carrier.is_suppressed` for validation
### URL Filter Utility
```python
from skywalking.utils.filter import sw_urlparse, sw_traceback
url_param = sw_urlparse(url) # Returns parsed URL with credentials stripped from netloc
```
## Plugin Development
### Plugin Contract
Every plugin is a module file `skywalking/plugins/sw_<name>.py` with:
```python
# Required module-level attributes
link_vector = ['https://docs.example.com/']
support_matrix = {
'package-name': {
'>=3.7': ['1.0', '2.0'] # Python version -> tested lib versions
}
}
note = """"""
# Required function
def install():
# Monkey-patch the target library
...
```
### Plugin Discovery
- `skywalking/plugins/__init__.py` uses `pkgutil.iter_modules()` to find all `sw_*.py` files
- Checks `agent_disable_plugins` config (regex patterns) to skip disabled plugins
- Calls `pkg_version_check(plugin)` for version validation
- Calls `plugin.install()` to execute monkey-patching
### Span Types & Usage
```python
from skywalking.trace.context import get_context, NoopContext
from skywalking.trace.span import NoopSpan
from skywalking.trace.carrier import Carrier
from skywalking import Layer, Component, config
context = get_context()
# Entry span (incoming request - web frameworks, message consumers)
carrier = Carrier()
for item in carrier:
item.val = request.headers.get(item.key)
span = context.new_entry_span(op='/path', carrier=carrier)
# Exit span (outgoing call - HTTP clients, DB queries, cache ops)
span = context.new_exit_span(op='Redis/GET', peer='host:port', component=Component.Redis)
# Local span (internal operations)
span = context.new_local_span(op='process')
# All spans used as context managers
with span:
span.layer = Layer.Http # Http, Database, Cache, MQ, RPCFramework
span.component = Component.Flask
span.tag(TagHttpMethod('GET'))
# For exit spans: inject carrier into outgoing headers
carrier = span.inject()
for item in carrier:
headers[item.key] = item.val
# Call original function
result = original_func()
span.tag(TagHttpStatusCode(200))
if error:
span.error_occurred = True
span.raised() # Captures traceback
```
### Instrumentation Patterns
**Pattern A - Method replacement (most common):**
```python
def install():
from some_lib import SomeClass
_original = SomeClass.method
def _sw_method(this, *args, **kwargs):
with get_context().new_exit_span(...) as span:
span.layer = Layer.Http
result = _original(this, *args, **kwargs)
return result
SomeClass.method = _sw_method
```
**Pattern B - wrapt.ObjectProxy (for C extensions like psycopg2, mysqlclient):**
```python
import wrapt
class ProxyCursor(wrapt.ObjectProxy):
def execute(self, query, vars=None):
with get_context().new_exit_span(...) as span:
return self.__wrapped__.execute(query, vars)
```
**Pattern C - Async wrappers (aiohttp, httpx, asyncpg):**
```python
def install():
from aiohttp import ClientSession
_request = ClientSession._request
async def _sw_request(this, method, url, **kwargs):
with get_context().new_exit_span(...) as span:
carrier = span.inject()
headers = kwargs.get('headers') or {}
for item in carrier:
headers[item.key] = item.val
kwargs['headers'] = headers
res = await _request(this, method, url, **kwargs)
return res
ClientSession._request = _sw_request
```
**Pattern D - Framework interceptors (gRPC, middleware):**
Use framework-native interceptor/middleware APIs, create spans inside them.
### Tags (`skywalking/trace/tags.py`)
| Category | Tags |
|----------|------|
| HTTP | TagHttpMethod, TagHttpURL, TagHttpStatusCode, TagHttpStatusMsg, TagHttpParams |
| Database | TagDbType, TagDbInstance, TagDbStatement, TagDbSqlParameters |
| Cache | TagCacheType, TagCacheOp, TagCacheCmd, TagCacheKey |
| MQ | TagMqBroker, TagMqTopic, TagMqQueue |
| gRPC | TagGrpcMethod, TagGrpcUrl, TagGrpcStatusCode |
| Celery | TagCeleryParameters |
### Component IDs (`skywalking/__init__.py`)
New plugins need a Component enum entry. IDs 7000+ are Python-specific.
External components (Redis=7, MongoDB=9, etc.) share IDs with other SkyWalking agents.
New Python-specific components increment from the last used ID.
### Noop Pattern
For ignored requests (e.g., filtered HTTP methods):
```python
span = NoopSpan(NoopContext()) if config.ignore_http_method_check(method) \
else get_context().new_entry_span(op=path, carrier=carrier)
```
## Plugin Testing
### Test Structure
Each plugin test lives in `tests/plugin/{data|http|web}/sw_<name>/`:
```
sw_<name>/
__init__.py
test_<name>.py # Pytest test class
expected.data.yml # Expected span snapshot
docker-compose.yml # Service definitions
services/
__init__.py
provider.py # Backend service using the plugin
consumer.py # Frontend service making requests
requirements.txt # Auto-generated from version param
```
### Test Pattern
```python
from skywalking.plugins.sw_<name> import support_matrix
from tests.orchestrator import get_test_vector
from tests.plugin.base import TestPluginBase
@pytest.fixture
def prepare():
return lambda *_: requests.get('http://0.0.0.0:9090/endpoint', timeout=5)
class TestPlugin(TestPluginBase):
@pytest.mark.parametrize('version', get_test_vector(lib_name='<name>', support_matrix=support_matrix))
def test_plugin(self, docker_compose, version):
self.validate()
```
### Docker Compose
- Extends `../../docker-compose.base.yml` for collector + agent base image
- Provider runs on port 9091, consumer on 9090
- Services install the plugin lib via `pip install -r /app/requirements.txt`
- Use `sw-python run python3 /app/services/provider.py` to start with agent
- External services (Redis, Kafka, etc.) added as needed with healthchecks
### Expected Data Format (expected.data.yml)
```yaml
segmentItems:
- serviceName: provider
segmentSize: 1
segments:
- segmentId: not null
spans:
- operationName: /endpoint
spanId: 0
parentSpanId: -1
spanLayer: Http # Http, Database, Cache, MQ, RPCFramework
spanType: Entry # Entry, Exit, Local
componentId: 7001 # Must match Component enum value
peer: not null
tags:
- key: http.method
value: GET
startTime: gt 0
endTime: gt 0
skipAnalysis: false
```
Validation operators: `not null`, `gt 0`, exact string match.
### Running Tests
```bash
# Build plugin test image first
docker build --build-arg BASE_PYTHON_IMAGE=3.11-slim \
-t apache/skywalking-python-agent:latest-plugin --no-cache . \
-f tests/plugin/Dockerfile.plugin
# Run specific plugin test
poetry run pytest -v tests/plugin/web/sw_flask/
# Run all plugin tests
poetry run pytest -v $(bash tests/gather_test_paths.sh)
```
## Checklist for New Plugins
1. Add `skywalking/plugins/sw_<name>.py` with `link_vector`, `support_matrix`, `note`, `install()`
2. Add Component enum entry in `skywalking/__init__.py` (if new Python-specific component)
3. Add component ID in [the main SkyWalking repo](https://github.com/apache/skywalking/blob/master/oap-server/server-starter/src/main/resources/component-libraries.yml)
4. Add library to `pyproject.toml` plugins group: `poetry add <lib> --group plugins`
5. Create test directory: `tests/plugin/{data|http|web}/sw_<name>/`
6. Create test files: `__init__.py`, `test_<name>.py`, `expected.data.yml`, `docker-compose.yml`, `services/`
7. Run `make doc-gen` to regenerate Plugins.md
8. Verify with `make lint`
## All 35 Plugins
Web: sw_flask, sw_django, sw_fastapi, sw_sanic, sw_tornado, sw_bottle, sw_pyramid, sw_falcon
HTTP: sw_requests, sw_urllib3, sw_urllib_request, sw_aiohttp, sw_httpx, sw_http_server
Database: sw_pymysql, sw_mysqlclient, sw_psycopg, sw_psycopg2, sw_pymongo, sw_elasticsearch, sw_happybase, sw_neo4j, sw_asyncpg
Cache: sw_redis, sw_aioredis
MQ: sw_kafka, sw_rabbitmq, sw_celery, sw_pulsar, sw_confluent_kafka, sw_aiormq, sw_amqp
RPC: sw_grpc, sw_websockets
Other: sw_loguru (logging)
## License Header
All source files require Apache 2.0 license header. See any existing file for the exact format.
## CI
GitHub Actions in `.github/workflows/CI.yaml`:
- License check + lint
- Plugin doc generation check
- Plugin + unit tests: matrix of Python versions x test paths
- E2E tests: gRPC/HTTP/Kafka protocols, profiling scenarios
- Docker image builds for each Python version
## GitHub Actions Allow List
Apache enforces an allow list for third-party GitHub Actions. All third-party actions
must be pinned to an approved SHA from:
https://github.com/apache/infrastructure-actions/blob/main/approved_patterns.yml
## Skills
- `/new-plugin` Scaffold a complete new instrumentation plugin (code + tests + docker-compose + expected data)
- `/plugin-test` Build Docker images and run plugin/unit tests locally, mirroring CI