LLM Context Guide for Apache Superset

Apache Superset is a data visualization platform with Flask/Python backend and React/TypeScript frontend.

⚠️ CRITICAL: Ongoing Refactors (What NOT to Do)

These migrations are actively happening - avoid deprecated patterns:

Frontend Modernization

  • NO any types - Use proper TypeScript types
  • NO JavaScript files - Convert to TypeScript (.ts/.tsx)
  • Use @superset-ui/core - Don't import Ant Design directly, prefer Ant Design component wrappers from @superset-ui/core/components
  • Use antd theming tokens - Prefer antd tokens over legacy theming tokens
  • Avoid custom css and styles - Follow antd best practices and avoid styling and custom CSS whenever possible

Testing Strategy Migration

  • Prefer unit tests over integration tests
  • Prefer integration tests over end-to-end tests
  • Use Playwright for E2E tests - Migrating from Cypress
  • Cypress is deprecated - Will be removed once migration is completed
  • Use Jest + React Testing Library for component testing
  • Use test() instead of describe() - Follow avoid nesting when testing principles

Backend Type Safety

  • Add type hints - All new Python code needs proper typing
  • MyPy compliance - Run pre-commit run mypy to validate
  • SQLAlchemy typing - Use proper model annotations

UUID Migration

  • Prefer UUIDs over auto-incrementing IDs - New models should use UUID primary keys
  • External API exposure - Use UUIDs in public APIs instead of internal integer IDs
  • Existing models - Add UUID fields alongside integer IDs for gradual migration

Key Directories

superset/
├── superset/                    # Python backend (Flask, SQLAlchemy)
│   ├── views/api/              # REST API endpoints
│   ├── models/                 # Database models
│   └── connectors/             # Database connections
├── superset-frontend/src/       # React TypeScript frontend
│   ├── components/             # Reusable components
│   ├── explore/                # Chart builder
│   ├── dashboard/              # Dashboard interface
│   └── SqlLab/                 # SQL editor
├── superset-frontend/packages/
│   └── superset-ui-core/       # UI component library (USE THIS)
├── tests/                      # Python/integration tests
├── docs/                       # Documentation (UPDATE FOR CHANGES)
└── UPDATING.md                 # Breaking changes log

Code Standards

TypeScript Frontend

  • Avoid any types - Use proper TypeScript, reuse existing types
  • Functional components with hooks
  • @superset-ui/core for UI components (not direct antd)
  • Jest for testing (NO Enzyme)
  • Redux for global state where it exists, hooks for local

Python Backend

  • Type hints required for all new code
  • MyPy compliant - run pre-commit run mypy
  • SQLAlchemy models with proper typing
  • pytest for testing

Apache License Headers

  • New files require ASF license headers - When creating new code files, include the standard Apache Software Foundation license header
  • LLM instruction files are excluded - Files like LLMS.md, CLAUDE.md, etc. are in .rat-excludes to avoid header token overhead

Code Comments

  • Avoid time-specific language - Don't use words like “now”, “currently”, “today” in code comments as they become outdated
  • Write timeless comments - Comments should remain accurate regardless of when they're read

Documentation Requirements

  • docs/: Update for any user-facing changes
  • UPDATING.md: Add breaking changes here
  • Docstrings: Required for new functions/classes

Architecture Patterns

Security & Features

  • RBAC: Role-based access via Flask-AppBuilder
  • Feature flags: Control feature rollouts
  • Row-level security: SQL-based data access control

Test Utilities

Python Test Helpers

  • SupersetTestCase - Base class in tests/integration_tests/base_tests.py
  • @with_config - Config mocking decorator
  • @with_feature_flags - Feature flag testing
  • login_as(), login_as_admin() - Authentication helpers
  • create_dashboard(), create_slice() - Data setup utilities

TypeScript Test Helpers

  • superset-frontend/spec/helpers/testing-library.tsx - Custom render() with providers
  • createWrapper() - Redux/Router/Theme wrapper
  • selectOption() - Select component helper
  • React Testing Library - NO Enzyme (removed)

Test Database Patterns

  • Mock patterns: Use MagicMock() for config objects, avoid AsyncMock for synchronous code
  • API tests: Update expected columns when adding new model fields

Running Tests

# Frontend
npm run test                           # All tests
npm run test -- filename.test.tsx     # Single file

# E2E Tests (Playwright - NEW)
npm run playwright:test                # All Playwright tests
npm run playwright:ui                  # Interactive UI mode
npm run playwright:headed              # See browser during tests
npx playwright test tests/auth/login.spec.ts  # Single file
npm run playwright:debug tests/auth/login.spec.ts  # Debug specific file

# E2E Tests (Cypress - DEPRECATED)
cd superset-frontend/cypress-base
npm run cypress-run-chrome             # All Cypress tests (headless)
npm run cypress-debug                  # Interactive Cypress UI

# Backend  
pytest                                 # All tests
pytest tests/unit_tests/specific_test.py  # Single file
pytest tests/unit_tests/               # Directory

# If pytest fails with database/setup issues, ask the user to run test environment setup

Environment Validation

Quick Setup Check (run this first):

# Verify Superset is running
curl -f http://localhost:8088/health || echo "❌ Setup required - see https://superset.apache.org/docs/contributing/development#working-with-llms"

If health checks fail: “It appears you aren't set up properly. Please refer to the Working with LLMs section in the development docs for setup instructions.”

Key Project Files:

  • superset-frontend/package.json - Frontend build scripts (npm run dev on port 9000, npm run test, npm run lint)
  • pyproject.toml - Python tooling (ruff, mypy configs)
  • requirements/ folder - Python dependencies (base.txt, development.txt)

SQLAlchemy Query Best Practices

  • Use negation operator: ~Model.field instead of == False to avoid ruff E712 errors
  • Example: ~Model.is_active instead of Model.is_active == False

Pull Request Guidelines

When creating pull requests:

  1. Read the current PR template: Always check .github/PULL_REQUEST_TEMPLATE.md for the latest format
  2. Use the template sections: Include all sections from the template (SUMMARY, BEFORE/AFTER, TESTING INSTRUCTIONS, ADDITIONAL INFORMATION)
  3. Follow PR title conventions: Use Conventional Commits
    • Format: type(scope): description
    • Example: fix(dashboard): load charts correctly
    • Types: fix, feat, docs, style, refactor, perf, test, chore

Important: Always reference the actual template file at .github/PULL_REQUEST_TEMPLATE.md instead of using cached content, as the template may be updated over time.

Pre-commit Validation

Use pre-commit hooks for quality validation:

# Install hooks
pre-commit install

# IMPORTANT: Stage your changes first!
git add .                        # Pre-commit only checks staged files

# Quick validation (faster than --all-files)
pre-commit run                   # Staged files only
pre-commit run mypy              # Python type checking
pre-commit run prettier          # Code formatting
pre-commit run eslint            # Frontend linting

Important pre-commit usage notes:

  • Stage files first: Run git add . before pre-commit run to check only changed files (much faster)
  • Virtual environment: Activate your Python virtual environment before running pre-commit
    # Common virtual environment locations (yours may differ):
    source .venv/bin/activate      # if using .venv
    source venv/bin/activate       # if using venv
    source ~/venvs/superset/bin/activate  # if using a central location
    
    If you get a “command not found” error, ask the user which virtual environment to activate
  • Auto-fixes: Some hooks auto-fix issues (e.g., trailing whitespace). Re-run after fixes are applied

Common File Patterns

API Structure

  • /api.py - REST endpoints with decorators and OpenAPI docstrings
  • /schemas.py - Marshmallow validation schemas for OpenAPI spec
  • /commands/ - Business logic classes with @transaction() decorators
  • /models/ - SQLAlchemy database models
  • OpenAPI docs: Auto-generated at /swagger/v1 from docstrings and schemas

Migration Files

  • Location: superset/migrations/versions/
  • Naming: YYYY-MM-DD_HH-MM_hash_description.py
  • Utilities: Use helpers from superset.migrations.shared.utils for database compatibility
  • Pattern: Import utilities instead of raw SQLAlchemy operations

Platform-Specific Instructions


LLM Note: This codebase is actively modernizing toward full TypeScript and type safety. Always run pre-commit run to validate changes. Follow the ongoing refactors section to avoid deprecated patterns.