)]}'
{
  "log": [
    {
      "commit": "5b7ad645bde69f380bb5387100fde5f72804fded",
      "tree": "62d5ef79ff5b3cd3c0db8fe1bd510987f6c283ef",
      "parents": [
        "ac436976c818af890b675b6497ecd725a3609eda"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed Jun 17 18:21:53 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 17 18:21:53 2026 -0400"
      },
      "message": "feat: accept native Python literals on date/time functions (#1563)\n\nWiden date/time scalar function signatures to accept native Python\n``str``/``int`` literals alongside ``Expr``:\n\n- ``date_bin``: ``stride``, ``source``, ``origin`` accept ``Expr | str``.\n- ``make_date``, ``make_time``: components accept ``Expr | int``.\n- ``to_date``, ``to_time``, ``to_timestamp``, ``to_timestamp_{millis,\n  micros,nanos,seconds}``, ``to_unixtime``: ``*formatters`` accept\n  ``Expr | str``.\n\nAdd ``coerce_to_expr_list`` public helper in ``datafusion.expr`` mirroring\n``coerce_to_expr`` / ``ensure_expr_list`` for variadic call sites.\n``date_bin`` uses ``Expr.string_literal`` directly because its planner\ncoerces ``Utf8`` (not ``Utf8View``) literals to ``Interval``/``Timestamp``.\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "ac436976c818af890b675b6497ecd725a3609eda",
      "tree": "3d3d6b67a6c71b8051b4f6eb3f4de89b440813ea",
      "parents": [
        "2d074713d1505256ed92a07e77c18ee43713a303"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed Jun 17 18:16:13 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 17 18:16:13 2026 -0400"
      },
      "message": "chore: update rust dependencies (#1604)\n\n* cargo update\n\n* fix broken test on main"
    },
    {
      "commit": "2d074713d1505256ed92a07e77c18ee43713a303",
      "tree": "7e54074583f8eed5c34e746a9c7d88a645fe2889",
      "parents": [
        "55fd2c925c95711ad7222a5a48b6691d482d6056"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed Jun 17 13:03:04 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Jun 17 13:03:04 2026 -0400"
      },
      "message": "feat: expose spark-compatible functions (#1564)\n\n* feat: expose Spark-compatible functions (#1482)\n\nAdd `datafusion.functions.spark` module exposing the upstream\n`datafusion-spark` crate\u0027s UDF/UDAF library (~87 functions across string,\nmath, datetime, hash, array, aggregate, bitwise, bitmap, conditional,\ncollection, conversion, json, map, url categories).\n\nFor DataFrame use, import the typed Python wrappers from\n`datafusion.functions.spark`. For SQL use, call\n`SessionContext.enable_spark_functions()` to register the Spark UDFs by\nname (overriding DataFusion built-ins of the same name with their Spark\nsemantics — NULL-propagating `concat`, 1-indexed `substring`, HALF_UP\n`round`, etc.).\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* chore: drop unused borrow_deref_ref allows\n\nSeven `#[allow(clippy::borrow_deref_ref)]` attributes on module\ndeclarations in `crates/core/src/lib.rs` had become stale — the only\nremaining lint hit was a redundant `\u0026*x.as_str()` pattern in\n`parse_file_compression_type`. Rewriting that call to\n`\u0026x.unwrap_or_default()` lets every allow come off, removing noise that\nnew modules were copying without need.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: tighten spark_functions macros via expr_fn\n\nSwitch most spark wrappers from UDF-direct path (which forced\n`spark_udf_fixed!(name, fn_category::name, args...)` repetition) to a\n`spark_expr_fn!` macro that mirrors the existing `expr_fn!` macro in\n`functions.rs`, so calls collapse to `spark_expr_fn!(sha2, arg1\nbit_length);`.\n\nUDF-direct retained for genuinely variadic functions whose upstream\n`expr_fn` wrappers were generated with a single-`Expr` arm by\n`export_functions!` (concat, array, xxhash64, parse_url family, etc.) so\nthat the Python side keeps its `*args` ergonomics.\n\nAggregates collapse the same way via `spark_aggregate!` mirroring\n`aggregate_function!`. Net 173 lines removed.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: clarify spark functions cover DataFrame API too\n\nThe intro wording implied \"SQL functions\" only; the same wrappers are the\nprimary entry point for the DataFrame API as well.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: rewrite spark DataFrame intro for users\n\nReplace API-speak (\"Import the submodule\", \"Returned values are Expr\ninstances that compose\") with a concrete description of where users can\nactually drop these calls.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: defer spark function list to API reference\n\nHand-maintained category list would drift from the actual module as\nupstream `datafusion-spark` adds/removes functions. Replace with a\npointer to the AutoAPI-generated reference, which renders from the\nmodule itself.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* test: replace spark doctest skips with verified examples\n\n38 wrappers carried `# doctest: +SKIP` because outputs weren\u0027t verified at\nauthoring time. Run each with concrete inputs, capture actual outputs, and\ninline the values so the doctests execute and stay correct.\n\nCovers datetime (20), URL (5), bitmap (3), map (3), and remaining hash,\nJSON, math, string, conversion, and format_string cases. Net new doctest\ncoverage: 65 examples now run that were skipped before; total skipped\nacross the suite drops from 53 to 12.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor(spark): rename function params to match pyspark\n\nAlign positional parameter names in `functions.spark` with pyspark.sql.functions:\n- aggregate first positional → `col` (avg, try_sum, collect_list, collect_set)\n- unary `arg` → `col` across math/string/byte/datetime helpers\n- multi-arg renames: array_contains (col, value), array (*cols), shuffle (col),\n  array_repeat (col, count), slice (x, start, length), shiftleft/right/rightunsigned\n  (col, numBits), add_months (start, months), date_add/sub (start, days),\n  date_diff (end, start), date_trunc (format, timestamp), time_trunc (unit, time),\n  trunc (date, format), next_day (date, dayOfWeek), from/to_utc_timestamp\n  (timestamp, tz), sha2 (col, numBits), xxhash64 (*cols), map_from_arrays\n  (col1, col2), width_bucket (v, min, max, numBucket), substring (str, pos, len),\n  concat (*cols), elt (*inputs), is_valid_utf8/make_valid_utf8 (str)\n\nBodies updated to reference the new names; positional callers unaffected.\nThis finishes Category 1 / Category 4 (spark-side BOTH-bucket) renames from\nPYSPARK_ALIGNMENT_PLAN.md PR 1.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat(spark): make pyspark-optional params optional\n\nMatch pyspark\u0027s optional-parameter surface in the spark namespace:\n- make_dt_interval, make_interval: all parts default to zero (int32 0 / lit 0.0)\n- str_to_map: pair_delim defaults to \u0027,\u0027, key_value_delim defaults to \u0027:\u0027\n- round: scale defaults to 0 (HALF_UP rounding to nearest integer)\n- shuffle: accepts `seed` kwarg for pyspark parity; raises NotImplementedError\n  for non-None values until the Rust binding supports it\n- like, ilike: accept `escapeChar` for pyspark parity; same NotImplementedError\n  guard; first positional renamed `string` → `str` to match pyspark\n\nceil/floor `scale\u003d` deferred — the underlying Rust expr_fn is single-arg.\n\nAdded a module-level `_ZERO_I32` literal to avoid rebuilding the pyarrow\nint32 zero scalar on every call.\n\nTests: positional-compat coverage for aggregates (`spark.avg(col)` etc.),\ndefaults-omitted cases for the optional-arg functions, and\nNotImplementedError cases for `shuffle(seed\u003d)` and `like/ilike(escapeChar\u003d)`.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor(spark): reshape varargs to match pyspark signatures\n\nReplace generic ``*args`` with explicit pyspark-style signatures:\n- json_tuple(col, *fields) — first positional is the JSON expr\n- format_string(format, *cols) — `format` is the printf template; a plain\n  ``str`` is auto-promoted to a literal\n- parse_url(url, partToExtract, key\u003dNone) — `key` is optional and only\n  meaningful with ``partToExtract\u003d\u0027QUERY\u0027``\n- try_parse_url(url, partToExtract, key\u003dNone) — same shape\n- url_decode(str), try_url_decode(str), url_encode(str) — single-argument\n  forms (multi-arg calls were always semantically wrong)\n\nTests cover the three-arg parse_url path and the plain-str format_string\nauto-promotion.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(skills): cover the new spark function namespace\n\n`functions.spark` mirrors `pyspark.sql.functions` and now ships on this\nbranch. Update every skill that references the function surface:\n\n- skills/datafusion_python/SKILL.md (user-facing): add an import\n  reference, a Core Abstractions row, and a \"Spark-Compatible Functions\"\n  subsection listing coverage by category, the SQL-vs-DataFrame usage\n  (`enable_spark_functions`), and the divergent-semantics table\n  (concat NULL, round HALF_UP, trunc) so callers know which namespace\n  to pick.\n- .ai/skills/check-upstream/SKILL.md: new area for the `datafusion-spark`\n  crate with the coverage policy (parity with pyspark, extras allowed\n  when positional pyspark calls still work). Hygiene check also now\n  spans `functions/spark.py`\u0027s `__all__`.\n- .ai/skills/audit-skill-md/SKILL.md: add `functions.spark` to the\n  surface table and a `spark-functions` scope so this audit also\n  validates the new subsection and divergent-semantics table.\n- .ai/skills/make-pythonic/SKILL.md: explicit scope note that the\n  spark namespace is a deliberate pyspark mirror — generic native-type\n  coercion does not apply there. Path references updated to the new\n  `functions/__init__.py` module layout.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(skills): drop references to PYSPARK_ALIGNMENT_PLAN.md\n\nThe plan file is a working document, not a committed artifact, so skills\nmust not point at it. Inline the one substantive reference (the\n\"deferred to follow-up PRs\" callout in make-pythonic) and drop the\ncross-cutting pointer from check-upstream.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(skills): make-pythonic also targets functions.spark\n\nPrevious guidance said to skip the spark namespace entirely. That was\nwrong: the spark namespace should also feel pythonic — it just carries\nthe extra constraint that every signature must remain compatible with\npyspark.sql.functions (parameter names, positional order, accepted input\ntypes). Pythonic widenings like `Expr → Expr | int` are on-brand there\nbecause pyspark itself accepts the int form.\n\nRewrite the scope section to spell out the compatibility rules (keep\nparameter names/order; widen input types, never narrow; extra kwargs\ndefault to None) and extend \"How to Identify Candidates\" to include\n`functions/spark.py`.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(skill): point at spark __all__ instead of enumerating\n\nEnumerating spark functions in the user-facing skill duplicates the\n__all__ list in python/datafusion/functions/spark.py and will drift the\nmoment a new function lands or is renamed. Replace the per-function\nlisting with a category summary and a discovery snippet that queries\nthe actual __all__ at runtime, which is the authoritative source.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(spark): use isoformat in spark_cast doctest\n\npyarrow tzinfo repr differs across versions (\u003cUTC\u003e vs\nzoneinfo.ZoneInfo(key\u003d\u0027UTC\u0027)), breaking the doctest on some platforms.\nisoformat is stable across versions.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(spark): fix map_from_entries doctest to use the right function\n\nThe example called map_from_arrays, so it never exercised\nmap_from_entries. Build an array-of-struct input and call the\ndocumented function.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(spark): list Spark-compatible aggregates in aggregations guide\n\nAdd avg, try_sum, collect_list, and collect_set under a dedicated\nSpark-Compatible Functions entry, with a note that the\ndatafusion.functions.spark namespace mirrors Spark semantics and may\ndiffer from the like-named built-ins. Adds a (spark-functions) anchor\nfor the cross-reference.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(spark): align if_ doctest with the single-value accessor style\n\nUse the same lit-based single-row select and [0].as_py() accessor as\nthe other wrappers instead of the lone to_pylist() call.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat(spark): accept native Python literals for literal-friendly args\n\nAudit the functions.spark namespace against pyspark.sql.functions and\nwiden arguments that pyspark types as a non-column literal so callers can\npass bare int/float/str instead of wrapping in lit():\n\n- int args: array_repeat count, slice start/length, shiftleft/shiftright/\n  shiftrightunsigned numBits, sha2 numBits, round scale, substring pos/len,\n  width_bucket numBucket\n- int32-coerced args (binding requires int32): add_months months,\n  date_add/date_sub days, space n, make_dt_interval/make_interval parts\n- float args: modulus/pmod operands; make_*_interval secs\n- str args: next_day dayOfWeek, date_trunc/trunc format, date_part field,\n  from_utc_timestamp/to_utc_timestamp tz, spark_cast type_str,\n  json_tuple *fields\n- Any: array_contains value, if_ if_true/if_false\n\nArguments that pyspark types as ColumnOrName (str means column name, not a\nliteral) are left as Expr to avoid diverging from pyspark semantics:\nilike/like pattern, parse_url partToExtract/key, str_to_map delimiters,\nbit_get pos, time_trunc unit.\n\nAlso rename str_to_map\u0027s delimiter params to pairDelim/keyValueDelim to\nmatch pyspark exactly (they were pair_delim/key_value_delim).\n\nAdd a coercion test matrix and update docstring examples to show the\nnative-literal calling convention.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat(spark): accept column-name str for ColumnOrName args\n\nFor arguments that pyspark types as ColumnOrName, a bare str means a\ncolumn name (not a literal). Widen these to Expr | str and resolve a str\nto a column reference via _to_raw_expr, matching pyspark semantics:\n\n- ilike/like pattern\n- parse_url/try_parse_url partToExtract and key\n- str_to_map pairDelim/keyValueDelim\n- bit_get pos\n- time_trunc unit\n\nDocument the column-name behavior in each docstring and add a test\nconfirming a bare str resolves to a per-row column value.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix: correct grammar in file_compression_type error message\n\n\"must one of\" → \"must be one of\".\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "55fd2c925c95711ad7222a5a48b6691d482d6056",
      "tree": "66354d0d6c3cb39aca20068a4651d2f33ac6f649",
      "parents": [
        "c0ac93b68fff606aed570aed09d72d749ba79bae"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Jun 16 18:13:07 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Jun 16 18:13:07 2026 -0400"
      },
      "message": "docs: convert reStructuredText sources to MyST markdown (#1579)\n\n* docs: convert restructuredText sources to MyST markdown\n\nPhase 2 of the documentation-site refresh. Run `rst2myst convert` over\nevery human-authored .rst file under docs/source/ and remove the\noriginals. The result:\n\n- 33 .rst files become 33 .md files (user guide, contributor guide,\n  index, links).\n- Headings, paragraphs, hyperlinks, code blocks, admonitions, and\n  toctree directives all map cleanly to MyST syntax.\n- Cross-reference anchors round-trip through MyST as `(label)\u003d`\n  blocks. The converter kebab-cased the labels (e.g. `(io-csv)\u003d`),\n  but every `{ref}` target in the corpus still uses the underscore\n  form from the original RST (`{ref}\\`CSV \u003cio_csv\u003e\\``) and so do the\n  Python docstrings that AutoAPI pulls in. Rewrite the anchors back\n  to the underscore form so the existing references resolve.\n- 86 `{eval-rst}` blocks remain — they all wrap `.. ipython::`\n  directives, which have no first-class MyST equivalent. They render\n  identically and don\u0027t block the build.\n\nconf.py changes:\n\n- Enable `colon_fence` and `deflist` MyST extensions (rst-to-myst\n  emits these on a few files, particularly execution-metrics.md).\n- Keep `.rst` in `source_suffix` even though no human-authored RST\n  remains: sphinx-autoapi generates RST under autoapi/ at build time\n  and Sphinx needs the suffix registered to parse it.\n\nAGENTS.md: update the two .rst paths called out under \"Aggregate and\nWindow Function Documentation\" to point at the .md equivalents.\n\nVerified by building locally — `build succeeded`, no warnings, all\ninternal cross-references resolve, the ipython examples on the\nlanding page and basics page still execute.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: fix Apache license header format in converted markdown files\n\nRST-to-MD conversion emitted MyST `%` comment syntax with blank line\nbetween each header line, which renders as visible text. Replace with\ncanonical `\u003c!--- ... --\u003e` HTML comment block matching upstream\napache/datafusion and this repo\u0027s existing markdown files.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: fix broken cross-reference links in distributing-work\n\nThe RST -\u003e MyST conversion left two intra-page links as undefined\nreference-style links, which CommonMark renders as literal bracketed\ntext (no Sphinx warning, so the --fail-on-warning build still passed).\nPoint both at the auto-generated heading anchors instead.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: execute examples via myst-nb; native tables and validated refs\n\nRemoves the last RST-syntax islands from the converted MyST markdown so\nthe docs are markdown-native for both human and LLM authors.\n\nExecutable examples (A): replace IPython.sphinxext.ipython_directive with\nmyst-nb. The 83 `{eval-rst}` + `.. ipython:: python` blocks become native\n`{code-cell} ipython3` blocks, and the 14 pages that carry them gain\njupytext/kernelspec front matter so myst-nb runs them. conf.py routes .md\nthrough myst-nb with nb_execution_mode\u003d\"force\" and\nnb_execution_raise_on_error\u003dTrue, so a failing example now fails the build.\n\nmyst-nb gives each page its own kernel instead of the IPython directive\u0027s\nsingle namespace shared across all documents in build order. That isolation\nsurfaced expressions.md, which only ever worked by inheriting `col`/`lit`\nfrom an earlier-built page — it now imports them itself. It also changes the\nexecution working directory to each page\u0027s own folder, so build.sh symlinks\nthe example data next to every page that reads it by relative name and\nregisters the python3 kernel; CI now calls build.sh so it matches local.\n\nTables (B): the 3 `.. list-table::` directives become GFM markdown tables.\n\nCross-references (C): the two intra-page links in distributing-work.md that\nthe conversion left as undefined markdown references (and that built green\nwhile rendering literal brackets) become `{ref}` roles backed by explicit\n`(label)\u003d` targets, so a future break fails the build instead of shipping\nsilently.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: render DataFrame cell outputs as text, not the HTML widget\n\nmyst-nb prefers a cell\u0027s `_repr_html_` over its text repr. A datafusion\nDataFrame\u0027s HTML repr is a Jupyter-oriented widget — inline styles plus an\ninjected \u003cscript\u003e — that renders at the wrong width in the docs theme.\n\nSet nb_mime_priority_overrides so the html builder prefers text/plain. The\n35 cells that end in a bare DataFrame now show the same readable ASCII\ntable the old IPython directive produced, with no per-cell `.show()` edits\nand no dependence on the package-generated HTML staying theme-compatible.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(aggregations): use .alias() on grouping(), drop obsolete workaround\n\napache/datafusion#21411 is resolved — `.alias()` now works directly on a\n`grouping()` expression. Removed the note describing the limitation and the\nwith_column_renamed workaround in the rollup and grouping_sets examples,\naliasing the grouping columns inline instead. Verified on the current\nbranch: the aliased aggregates execute and produce the named columns.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: use a dark-mode variant of the logo\n\nThe header logo was the same SVG in both color modes; the light-colored\nwordmark was hard to read on the dark theme. Point the theme\u0027s image_dark\nat a new original_dark.svg whose wordmark uses light strokes.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: restore right-hand on-this-page TOC, collapsible\n\nThe theme refresh emptied secondary_sidebar_items, dropping the\non-this-page table of contents that the previous site showed. Bring it\nback on the right, wrapped in a native \u003cdetails\u003e so readers can fold it\naway on the longer guide pages. Adds a custom page-toc-collapsible\nsecondary-sidebar template and styles the \u003csummary\u003e toggle (no JS).\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: let readers hide the right TOC sidebar for full-width content\n\nFollow-up to restoring the on-this-page TOC: \"collapsible\" should hide the\nentire right-hand frame, not just fold the list. Replace the \u003cdetails\u003e\nwrapper with a floating toggle button (toc-toggle.js) that hides the whole\nsecondary sidebar via a body class; the flex article container then\nreclaims the width (its 60em cap is lifted while hidden). The preference is\nremembered across pages in localStorage, and the button is suppressed below\nthe theme\u0027s breakpoint where the sidebar is already collapsed.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix(deps): pin typing-extensions to one version so uv.lock parses in CI\n\nAdding the myst-nb docs stack pulled a newer typing-extensions only on\nPython \u003c 3.11, splitting it into two locked versions. Our own\n`typing-extensions; python_full_version \u003c \u00273.13\u0027` dependency then spanned\nthat split, which uv recorded as a multi-version edge without a `version`\nfield — a form older uv builds (the one in CI\u0027s pinned setup-uv) reject\nwith \"missing source field but has more than one matching package\".\n\nAdd a [tool.uv] constraint-dependencies pin of typing-extensions\u003e\u003d4.15.0\nso it resolves to a single version across all supported Pythons, removing\nthe fork and the under-specified edge. Relocked; uv lock --locked is clean\nand no multi-version package has a marker-only edge.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(deps): drop pickleshare and explicit ipython from docs group\n\nBoth were only needed by the old IPython.sphinxext.ipython_directive,\nwhich myst-nb replaced. pickleshare (IPython %store, abandoned 2018) has\nno remaining consumer. ipython is now pulled transitively by ipykernel\nand myst-nb, so the explicit floor is redundant. Relocked.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Apply reviewer\u0027s suggestion to fix CI error\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "c0ac93b68fff606aed570aed09d72d749ba79bae",
      "tree": "46d37d40b06b7436b4a3cfc0ce41a86b2fba0e73",
      "parents": [
        "fd15b03ecc227957b752429bf5b7c56c0f139f1e"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon Jun 15 16:23:05 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 15 16:23:05 2026 -0400"
      },
      "message": "feat: expose arrow_field, arrow_try_cast, cast_to_type, with_metadata (#1568)\n\n* feat: expose arrow_field, arrow_try_cast, cast_to_type, with_metadata\n\nAdds Python bindings for five scalar functions from\ndatafusion::functions::expr_fn that were not previously surfaced:\n\n- arrow_field: returns a struct describing an expression\u0027s Arrow field\n  (name, data_type, nullable, metadata).\n- arrow_try_cast: like arrow_cast but yields NULL on cast failure.\n- cast_to_type / try_cast_to_type: casts a value to the type of a\n  reference expression. These are exposed as a single Python entry\n  point cast_to_type(value, type_ref, *, try_cast\u003dFalse); the kwarg\n  switches between the strict and try variants.\n- with_metadata: attach Arrow field metadata; the inverse of\n  arrow_metadata. Accepts a dict[str, str] for ergonomics.\n\nUpdates skills/datafusion_python/SKILL.md to list the new functions\nand documents the cast_to_type kwarg behavior.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: collapse try_cast_to_type into cast_to_type kwarg\n\nThe previous commit exposed cast_to_type and try_cast_to_type as two\nseparate pyo3 bindings and unified them in the Python wrapper via a\ntry_cast kwarg. That left try_cast_to_type in datafusion._internal\nwithout a matching public Python name, breaking\ntest_datafusion_missing_exports.\n\nMove the dispatch into the rust binding: cast_to_type now takes a\ntry_cast kwarg and selects between functions::expr_fn::cast_to_type\nand try_cast_to_type internally. Only one pyo3 binding is registered,\nso the wrapper-coverage check passes and the Python entrypoint is\nunchanged.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: accept pyarrow DataType in arrow_try_cast\n\nMirrors arrow_cast: arrow_try_cast now accepts `pa.DataType` in addition\nto `str` and `Expr`. Adds `Expr.try_cast(pa.DataType)` PyO3 binding for\nthe pyarrow-type routing path.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix: guard with_metadata against empty dict and empty keys\n\nEmpty `metadata` dict now returns the input expression unchanged\n(previously bubbled an opaque DataFusion error about minimum arg\ncount). Empty keys raise `ValueError` to match the docstring contract.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: assert full struct shape in arrow_field doctest\n\nPrevious doctest set metadata on the input field but only checked the\nname — the metadata setup was dead. Now the example asserts the full\nreturned struct (name, data_type, nullable, metadata) so the demo\nshows what the function actually produces.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* test: add unit tests for arrow_try_cast, arrow_field, cast_to_type, with_metadata\n\nMirrors the existing test_arrow_cast pattern. Covers:\n- arrow_try_cast: string-syntax, pa.DataType, and null-on-failure paths\n- arrow_field: full returned struct shape (name, data_type, nullable, metadata)\n- cast_to_type: type-from-expr happy path and try_cast\u003dTrue null behavior\n- with_metadata: round-trip through arrow_metadata, empty-dict no-op, and\n  empty-key ValueError\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* test: parameterize arrow cast / try_cast tests\n\nFolds the previous four cast tests (arrow_cast + arrow_try_cast × str\n+ pyarrow target type) into a single parameterized test that runs both\nfunctions across all five target-type variants. Collapses the two\ncast_to_type tests (happy path + try_cast\u003dTrue) into one parameterized\ntest, and parameterizes arrow_try_cast null-on-failure over both\ntarget-type syntaxes. 7 test functions, 19 cases — net less code, same\ncoverage.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: point cast_to_type at arrow_cast for static target types\n\nAdds a one-line cross-reference so users with a known target type\nreach for arrow_cast / arrow_try_cast instead of building a sentinel\nexpression to feed cast_to_type.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: split cast_to_type into cast_to_type and try_cast_to_type\n\nReplace the try_cast bool flag with separate cast_to_type and\ntry_cast_to_type functions, matching upstream DataFusion and the\narrow_cast / arrow_try_cast pair. Also drop the redundant data_type\nparametrization on test_arrow_try_cast_null_on_failure, since the\nstr-vs-pyarrow distinction is already covered by test_arrow_cast_variants.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "fd15b03ecc227957b752429bf5b7c56c0f139f1e",
      "tree": "417451c057c4702b287f4427a16dd3a194ff35ed",
      "parents": [
        "3256138ecc57de2666df704909a371919ff6fbfd"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon Jun 15 14:01:13 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Jun 15 14:01:13 2026 -0400"
      },
      "message": "feat: expose array_compact, array_normalize, cosine_distance, inner_product (#1567)\n\n* feat: expose array_compact, array_normalize, cosine_distance, inner_product\n\nAdds Python bindings for four scalar functions from\ndatafusion::functions_nested::expr_fn that were not previously surfaced:\n\n- array_compact / list_compact: drop NULLs from an array.\n- array_normalize / list_normalize: L2-normalize a numeric array.\n- cosine_distance: 1 - cosine_similarity(a, b).\n- inner_product: dot product of two numeric arrays.\n\nImplementation routes each through the existing array_fn! macro in\ncrates/core/src/functions.rs, mirroring the other functions_nested\nwrappers. Python wrappers in python/datafusion/functions.py follow the\nestablished pattern with doctest examples; list_* aliases use the\none-line + See Also form per project convention.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: clarify array_normalize and cosine_distance docstrings\n\nExpand both docstrings with plain-English definitions, worked examples,\nranges, use cases, and behavior on edge cases (zero vector → NULL,\nlength-mismatched inputs fail). Adds a zero-vector example to\narray_normalize and an orthogonal-vector example to cosine_distance.\nUpdates the list_normalize alias summary to match.\n\nCo-Authored-By: Claude \u003cnoreply@anthropic.com\u003e\n\n* test: add alias-equivalence and length-mismatch tests for array fns\n\nPin the contracts the doctests don\u0027t cover:\nlist_compact/list_normalize must produce the same output as their\narray_* primaries, and cosine_distance/inner_product must reject\nlength-mismatched inputs at execution time.\n\n* feat: expose dot_product alias for inner_product\n\nMatch upstream DataFusion SQL alias surface (inner_product UDF\nregisters `dot_product` in its alias list). Also expand\n`inner_product` docstring with NULL/length-mismatch behavior to\nmatch peer distance fns added in this PR.\n\nCo-Authored-By: Claude Opus 4.7 \u003cnoreply@anthropic.com\u003e\n\n* test: fold dot_product alias check into parametrized test\n\nGeneralize test_array_function_aliases to accept multi-column data so\nthe dot_product/inner_product alias case fits, dropping the standalone\ntest_dot_product_alias_matches_inner_product.\n\nCo-Authored-By: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "3256138ecc57de2666df704909a371919ff6fbfd",
      "tree": "7ffc97b66a3033f2455f77d81a331331cb6d2f12",
      "parents": [
        "cd7506ad645bf7f3c01796cbb16ec5821d56d389"
      ],
      "author": {
        "name": "kosiew",
        "email": "kosiew@gmail.com",
        "time": "Sat Jun 13 23:40:08 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Jun 13 23:40:08 2026 +0800"
      },
      "message": "Deprecate `Expr` temporal part arguments in date extraction and truncation functions (#1587)\n\nAdded a shared helper to emit a DeprecationWarning when an Expr is passed to a literal control argument.\n\nUpdated the implementations of:\n\ndate_part\ndatepart\nextract\ndate_trunc\ndatetrunc\nto warn when part is provided as an Expr.\n\nRefactored alias functions (datepart, extract, and datetrunc) to delegate through internal helper functions so warnings reference the user-facing function name correctly and avoid duplicate warning behavior.\n\nUpdated existing temporal function tests to use the preferred string-literal form for part.\n\nAdded targeted tests verifying:\n\nExpr inputs emit DeprecationWarning for date_part, datepart, and extract.\nExpr inputs emit DeprecationWarning for date_trunc and datetrunc.\nString literal inputs continue to work without emitting deprecation warnings."
    },
    {
      "commit": "cd7506ad645bf7f3c01796cbb16ec5821d56d389",
      "tree": "10445a59b69eba5d2c06297cec0891dfbabe5c0c",
      "parents": [
        "43395a46aa2597b8090248ecd9a10c601ebec0bf"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Jun 12 16:45:40 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 12 16:45:40 2026 +0200"
      },
      "message": "build(deps): batch dependabot dependency updates (#1589)\n\nCombine open Dependabot updates into a single PR.\n\nGitHub Actions:\n- github/codeql-action 4.35.4 -\u003e 4.36.2 (#1585)\n- astral-sh/setup-uv 8.1.0 -\u003e 8.2.0 (#1584)\n\nCargo:\n- chrono 0.4.44 -\u003e 0.4.45 (#1583)\n- log 0.4.30 -\u003e 0.4.32 (#1582)\n- uuid 1.23.1 -\u003e 1.23.2 (#1565)\n\nPython (uv.lock):\n- pyarrow 22.0.0 -\u003e 23.0.1 (#1580)\n- idna 3.10 -\u003e 3.15 (#1552)\n- pytest 8.3.4 -\u003e 9.0.3 (#1542)\n- pyjwt 2.10.1 -\u003e 2.12.0 (#1540)\n- pygments 2.19.1 -\u003e 2.20.0 (#1539)\n- requests 2.32.3 -\u003e 2.33.0 (#1538)\n- urllib3 2.3.0 -\u003e 2.7.0 (#1537)\n- pynacl 1.5.0 -\u003e 1.6.2 (#1536)\n- cryptography 44.0.0 -\u003e 46.0.7 (#1535)\n\nCo-authored-by: Claude Opus 4.8 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "43395a46aa2597b8090248ecd9a10c601ebec0bf",
      "tree": "553f644b0f04690547ee8ad293a095bb4e091e78",
      "parents": [
        "43df9f7b954c00cc40bd257317926eddb6d4092e"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Jun 12 16:17:20 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 12 16:17:20 2026 +0200"
      },
      "message": "Remove patch now that 54 is released (#1588)"
    },
    {
      "commit": "43df9f7b954c00cc40bd257317926eddb6d4092e",
      "tree": "88fa1d4dcd8da96c01a58e6bb4bc505258a48849",
      "parents": [
        "407298f37d9b5ccb4439cbe77ee148bf95e52de9"
      ],
      "author": {
        "name": "kosiew",
        "email": "kosiew@gmail.com",
        "time": "Sun Jun 07 21:22:06 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun Jun 07 15:22:06 2026 +0200"
      },
      "message": "feat(dataframe): update group_by to accept None and normalize to empty list (#1581)\n\n- Updated `group_by` method to accept `None` and normalize it to an empty list.\n- Improved docstring for clarity.\n- Added regression test in `test_dataframe.py` to verify that `None` equals an empty list.\n- Updated documentation to mention that `group_by\u003dNone` is now supported."
    },
    {
      "commit": "407298f37d9b5ccb4439cbe77ee148bf95e52de9",
      "tree": "de8b9b774e25944b8bce9916ef5d975faf54f3f4",
      "parents": [
        "bfa14f4ffa879c83acfab2f1d480d9ed474baf7d"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Sun Jun 07 09:19:37 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun Jun 07 15:19:37 2026 +0200"
      },
      "message": "Improve documentation site layout (#1578)\n\n* docs: refresh theme — pydata-sphinx-theme 0.16, top navbar, dark mode\n\nBump pydata-sphinx-theme 0.8.0 -\u003e 0.16 to enable the modern navbar slot\nAPI and dark/light theme switcher. Configure top navbar with logo,\nnav links, GitHub icon, and theme switcher in conf.py. Drop the custom\ndocs-sidebar.html override and the layout.html block that silenced the\nnavbar — both predate the slot API and conflict with the new theme.\nStrip CSS overrides that fought the old theme (--pst-header-height: 0,\nnavbar-brand sizing) and add a dark-mode variant for the inline code\ncolor and table-stripe shading. Fix the stale github_repo\n(\"arrow-datafusion-python\" -\u003e \"datafusion-python\") so future Edit-on-\nGitHub links resolve. Bump copyright year and project name.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: collapse navbar to section landing pages\n\nPrevious structure dumped every top-level toctree entry from index.rst\ninto the navbar, producing eight items including external URLs (\"Github\nand Issue Tracker\", \"Rust\u0027s API Docs\", ...) that wrapped to two lines\neach. Introduce user-guide/index.rst and contributor-guide/index.rst as\nsection landing pages with nested toctrees, then point index.rst at just\nthose two plus autoapi/index. The navbar now reads \"User Guide\",\n\"Contributor Guide\", \"API Reference\" — three single-line entries. Move\nthe external links into the index.rst body where they\u0027re discoverable\nwithout crowding navigation.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: restore external links lost in navbar restructure\n\nAdd Examples and Rust API as text links in the top navbar via the\npydata-sphinx-theme external_links option. Nest the code-of-conduct\nlink inside the Contributor Guide toctree so it appears alongside the\nother contributor pages. Drop the duplicate \"Further reading\" bullet\nlist from the landing page now that every link has a permanent home.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: render Rust API link as docs.rs icon next to GitHub\n\nMove the Rust API docs entry from external_links to icon_links and use\nthe fa-brands fa-rust gear mark. Now sits next to the GitHub icon in\nnavbar_end with matching visual weight instead of a wider text link.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: render sidebar nav on landing page\n\nThe default pydata-sphinx-theme sidebar-nav-bs starts at the current\ntop-level section, so the root index — which has no parent section —\nends up with an empty sidebar. The theme\u0027s layout also explicitly\nfilters sidebar-nav-bs out of the sidebar list when suppress_sidebar_\ntoctree() returns true (which it does for root pages), so simply\noverriding sidebar-nav-bs.html in templates doesn\u0027t help.\n\nAdd a sidebar-globaltoc.html template that calls Sphinx\u0027s toctree()\nglobal directly to render the full document tree, and wire it through\nhtml_sidebars under a name the theme\u0027s suppress filter doesn\u0027t strip.\nLanding page now shows User Guide / Contributor Guide / API Reference\nin the sidebar with the current section expanded on inner pages.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: render expandable chevrons in sidebar nav\n\nSwitch the sidebar toctree call from toctree() to generate_toctree_html\nwith collapse\u003dFalse, so nested \u003cul\u003es render into the DOM for every\nbranch. The pydata-sphinx-theme JS then wraps them in \u003cdetails\u003e with\nfa-chevron-down toggles, matching the datafusion-comet sidebar where\neach section with children can be expanded inline. show_nav_level\u003d1\nkeeps deeper levels collapsed on first load.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: expand sidebar to show level 2 entries by default\n\nBump show_nav_level 1 -\u003e 2 so the landing-page sidebar opens with\nUser Guide / Contributor Guide / API Reference already expanded to\ntheir immediate children. Deeper levels remain collapsed behind\nchevrons so the sidebar stays scannable.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: add Links sidebar section for external references\n\nRestore the \"Links\" sidebar heading that the previous site had —\nGitHub and Issue Tracker, Rust API Docs, Code of Conduct, Examples.\nImplemented as a second hidden toctree with :caption: Links so the\npydata-sphinx-theme sidebar renders the heading above the four\nexternal URLs. Drop Code of Conduct from the Contributor Guide\ntoctree since it now lives under Links instead.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: consolidate external URLs into a single Links nav item\n\nReplace the second hidden toctree (which expanded each external URL\ninto its own navbar entry) with a dedicated links.rst landing page,\nand add a single \"links\" entry to the main toctree. Top navbar now\nshows User Guide / Contributor Guide / API Reference / Links — four\nitems, no wrapping. Clicking Links opens the page that lists GitHub,\nRust API Docs, Code of Conduct, and Examples.\n\nDrop the external_links Examples entry from conf.py since the same\nURL now lives on the Links page.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: add favicon matching the main datafusion site\n\nDrop in the same favicon.svg the main datafusion.apache.org site\nuses (just the Apache DataFusion mark, no wordmark) and wire it\nthrough html_favicon. Browsers and bookmarks now show the project\nicon instead of the generic Sphinx page glyph.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: address Copilot review feedback on sidebar config\n\nTwo small follow-ups from the Copilot reviewer on #1578:\n\n- Append .html to the html_sidebars entry. Sphinx\u0027s Jinja loader\n  resolves both \"sidebar-globaltoc\" and \"sidebar-globaltoc.html\" to\n  the same template, but the explicit form is closer to the spelling\n  in the Sphinx docs and is harder to misread.\n- Update the inline comment in sidebar-globaltoc.html that still\n  claimed show_nav_level\u003d1 after we bumped it to 2 in conf.py. Now\n  describes the variable wiring instead of hard-coding a number that\n  has to be kept in sync with conf.py.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "bfa14f4ffa879c83acfab2f1d480d9ed474baf7d",
      "tree": "3e1a6711138678f3e5ee8bbe94cbb6681662bb96",
      "parents": [
        "23062f78ad55a7121517c5ab00742503e95c7342"
      ],
      "author": {
        "name": "Daniel Mesejo",
        "email": "mesejoleon@gmail.com",
        "time": "Fri Jun 05 18:09:17 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 12:09:17 2026 -0400"
      },
      "message": "refactor(context): deduplicate register/read option-building logic (#1479)\n\n* refactor(context): deduplicate register/read option-building logic\n\nExtract shared helpers (convert_partition_cols, convert_file_sort_order,\nbuild_parquet/json/avro_options, convert_csv_options), standardize path\ntypes to \u0026str, and remove redundant intermediate variables.\n\n* refactor(context): accept PathBuf for path arguments in register/read methods\n\nChange path parameters from \u0026str to PathBuf in all register/read methods\n(register_listing_table, register_parquet, register_json, register_avro,\nregister_arrow, read_json, read_parquet, read_avro, read_arrow) so callers\ncan pass either a Python str or a pathlib.Path object. For register_csv and\nread_csv, which take \u0026Bound\u003cPyAny\u003e to handle lists, extract path elements as\nPathBuf rather than String for the same reason.\n\nAdd a path_to_str helper that converts PathBuf to \u0026str, returning an explicit\nerror for non-UTF-8 paths rather than silently corrupting them.\n\nAdd build_arrow_options helper to deduplicate register_arrow/read_arrow\noption-building logic, consistent with the existing parquet/json/avro helpers.\n\nCo-Authored-By: Claude Sonnet 4.6 \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Sonnet 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "23062f78ad55a7121517c5ab00742503e95c7342",
      "tree": "3457b0f2adce40f8adbd74d1f6adf5bc899bf863",
      "parents": [
        "4a7761736f57376a9c769efd46a8b34e7f46b8f0"
      ],
      "author": {
        "name": "Nuno Faria",
        "email": "nunofpfaria@gmail.com",
        "time": "Fri Jun 05 17:02:21 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 12:02:21 2026 -0400"
      },
      "message": "fix: Skip `fork` and `forkserver` on `win32` (#1566)\n\n* fix: Skip fork and forkserver on win32\n\n* Fix fmt"
    },
    {
      "commit": "4a7761736f57376a9c769efd46a8b34e7f46b8f0",
      "tree": "378608186bf4d7d717970316835aca43c87768fc",
      "parents": [
        "d021e6afa8e08bee42fb9673ba811352c464bdf7"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Jun 05 12:01:53 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Jun 05 12:01:53 2026 -0400"
      },
      "message": "feat: expose SessionContext.copied_config and parse_capacity_limit (#1570)\n\nAdds two small additions to SessionContext that mirror upstream:\n\n- copied_config(): returns a copy of the active SessionConfig wrapped\n  in the existing SessionConfig Python class. Useful when callers want\n  to seed a new context from another context\u0027s settings, or inspect\n  the current configuration without sharing mutable state.\n\n- parse_capacity_limit(config_name, limit): static helper that parses\n  size strings like \"100M\", \"1.5G\", \"512K\", or \"0\" into a byte count.\n  Useful when configuring a RuntimeEnvBuilder from human-friendly\n  inputs. Wraps SessionContext::parse_capacity_limit; the deprecated\n  parse_memory_limit is intentionally not exposed.\n\nThree other items from the same gap cluster (runtime_env,\ncopied_table_options, the deprecated parse_memory_limit) are not\nincluded here. The first two would require wrapping new Rust types\n(RuntimeEnv, TableOptions) whose surface is much larger than the\naccessors themselves; the third is deprecated upstream. Those are\nfiled as separate follow-up issues.\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "d021e6afa8e08bee42fb9673ba811352c464bdf7",
      "tree": "35dc626766ec9a8179a34206ed6bc35e442b4c92",
      "parents": [
        "af388667f8d8583abdd634fa343252ddb4a0da16"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Thu Jun 04 07:57:47 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Jun 04 07:57:47 2026 -0400"
      },
      "message": "feat: import user-defined physical optimizer rules over FFI (#1557)\n\n* feat: user-defined OptimizerRule and AnalyzerRule from Python\n\nExpose `SessionContext.add_optimizer_rule` and\n`SessionContext.add_analyzer_rule` symmetric with the existing\n`remove_optimizer_rule`. Each accepts a Python subclass of the new\n`datafusion.optimizer.OptimizerRule` / `AnalyzerRule` ABCs.\n\nImplementation:\n\n* New `crates/core/src/optimizer_rules.rs` wraps user Python instances\n  in `PyOptimizerRuleAdapter` / `PyAnalyzerRuleAdapter`, which\n  implement the upstream `OptimizerRule` / `AnalyzerRule` traits.\n* `OptimizerRule.rewrite(plan)` returns `None` for \"no change\" or a\n  new `LogicalPlan`. The adapter maps that to\n  `Transformed::no` / `Transformed::yes` so the upstream optimizer\u0027s\n  fixed-point loop terminates correctly.\n* `AnalyzerRule.analyze(plan)` must always return a `LogicalPlan`;\n  returning `None` surfaces a `DataFusionError::Execution` naming the\n  offending rule.\n* The upstream `\u0026dyn OptimizerConfig` / `\u0026ConfigOptions` arguments are\n  not surfaced to Python in this MVP; rules that need configuration\n  should capture it at construction time (for example by holding a\n  `SessionContext` reference) or be implemented in Rust.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: import FFI physical optimizer rules; drop Python logical rules\n\nReplace the Python-defined OptimizerRule/AnalyzerRule approach with FFI-imported physical optimizer rules.\n\nThe Python logical-rule approach could observe plans but not transform them: there are no Python constructors for LogicalPlan node variants, so a rule could only return None or the input plan unchanged. The audience for custom rules also overlaps strongly with people who can write Rust.\n\nDataFusion exposes no FFI bridge for the logical OptimizerRule/AnalyzerRule traits, but it does export FFI_PhysicalOptimizerRule for the physical PhysicalOptimizerRule trait. This commit imports those instead.\n\nChanges:\n\n* Remove crates/core/src/optimizer_rules.rs, python/datafusion/optimizer.py, python/tests/test_optimizer.py, and the SessionContext.add_optimizer_rule / add_analyzer_rule methods. remove_optimizer_rule is unchanged (pre-existing).\n* New crates/core/src/physical_optimizer.rs reads a __datafusion_physical_optimizer_rule__ capsule and converts it via Arc\u003cdyn PhysicalOptimizerRule\u003e::from(\u0026FFI_PhysicalOptimizerRule).\n* SessionContext gains a physical_optimizer_rules constructor argument. Upstream offers no API to add physical rules to a live context, so they are appended to the builder at construction time only.\n* The datafusion-ffi-example crate gains MyPhysicalOptimizerRule, a counter-backed rule used by _test_physical_optimizer_rule.py to prove the rule fires over FFI during physical planning.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: type physical_optimizer_rules with an Exportable Protocol\n\nReplace the `list[Any]` hint on the SessionContext `physical_optimizer_rules` argument with a `PhysicalOptimizerRuleExportable` Protocol, matching the existing `TableProviderExportable` / `*Exportable` pattern used for other FFI-capsule objects.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: reference PhysicalOptimizerRuleExportable in SessionContext docstring\n\nPoint the `physical_optimizer_rules` argument docs at the new\n`PhysicalOptimizerRuleExportable` Protocol instead of describing the duck type inline.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: move FFI capsule detail to PhysicalOptimizerRuleExportable\n\nThe PyCapsule / FFI_PhysicalOptimizerRule mechanics describe the Protocol, not the SessionContext constructor. Move that detail onto PhysicalOptimizerRuleExportable and leave the constructor argument docs focused on behavior.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: drop redundant comment in SessionContext constructor\n\nRemove the explanatory comment about FFI bridge availability; the same information already lives on PhysicalOptimizerRuleExportable.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: drop module-level doc comment from physical_optimizer\n\nSibling FFI-import modules (udf, udaf, catalog, table) carry no module-level docs, and the rst-style markup did not match Rust conventions. The function doc comment already states intent.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: import physical optimizer rule via from_pycapsule! macro\n\nReplace the hand-written crates/core/src/physical_optimizer.rs with a `from_pycapsule!` invocation in the util crate, matching `physical_codec_from_pycapsule` and the other FFI capsule importers. The macro already handles the hasattr/getattr/cast/validate/pointer_checked sequence and the infallible `Arc::from(\u0026FFI)` conversion, so the dedicated module is no longer needed.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: trim PhysicalOptimizerRuleExportable docstring\n\nDrop the sentence about logical-rule FFI availability; it is background, not type-hint information, and keeps the Protocol docstring in line with the other *Exportable hints.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Minor refactor\n\n* refactor: register physical optimizer rules via live add method\n\nDrop the `physical_optimizer_rules` constructor argument on\n`SessionContext` and replace it with `add_physical_optimizer_rule`,\nmatching the existing `register_*` shape on the same class. The new\nmethod rebuilds the session state via `SessionStateBuilder::new_from_existing`\nso previously registered tables, UDFs, and catalogs are preserved.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* test: drop redundant FFI physical optimizer rule export test\n\nCoverage subsumed by test_ffi_physical_optimizer_rule_runs_during_planning,\nwhich exercises the same capsule export via add_physical_optimizer_rule.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "af388667f8d8583abdd634fa343252ddb4a0da16",
      "tree": "769758bc6940a3e523c79057b4b1fc1f5c67f1ad",
      "parents": [
        "3d4c56c0757ddc0372ced03ffa97a13ea50d1bd8"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri May 29 18:04:24 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 18:04:24 2026 -0400"
      },
      "message": "feat: expose lambda and higher-order array functions (#1561)\n\n* feat: expose lambda and higher-order array functions\n\nAdd a Pythonic API for DataFusion\u0027s higher-order array functions and the\nlambda expressions they consume.\n\n- Rust: lambda_, lambda_var, array_transform, and array_any_match pyfunctions,\n  plus a ResolveLambdaVariables analyzer rule so expression-builder plans\n  (which emit unresolved lambda variables) resolve before optimization.\n- Python: array_transform / array_any_match (with list_transform, any_match,\n  list_any_match aliases) accept either a Python callable or an explicit\n  lambda built with lambda_ / lambda_var. Callables are introspected so their\n  parameter names become the lambda parameters.\n- Tests and docs (expressions guide + agent skill), noting v1 limits: lambda\n  expressions are not serializable, and SQL arrow syntax needs the DuckDB\n  dialect.\n\n* test: fold lambda tests into pytest parameterization\n\nCombine the eight higher-order function result tests into a single\nparametrized test_higher_order_function_results, and the two to_lambda\nrejection tests into test_to_lambda_rejects_invalid_arg. Each case keeps\na readable id via pytest.param.\n\nCo-Authored-By: Claude \u003cnoreply@anthropic.com\u003e\n\n* feat: expose array_filter higher-order function\n\nAdd array_filter, the remaining lambda-based higher-order array function\nin DataFusion (alongside the already-exposed array_transform and\narray_any_match). Includes the list_filter alias matching upstream, tests,\nand documentation in the expressions guide and skill.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: emphasize lambda terminology, trim skill lambda section\n\nLead user-facing array-lambda docs with \"lambda function\" instead of\n\"higher-order function,\" which is less recognizable to users. Drop the\nalias list, serialization caveat, and DuckDB-dialect note from the skill\nto keep it lean; those details already live in the docstrings.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: broaden SQL lambda dialect coverage\n\nOther dialects (ClickHouse, Snowflake, Databricks) also enable lambda\nparsing via sqlparser-rs. Document the full set and recommend the\n``lambda x: x`` keyword form, since DuckDB will drop the ``x -\u003e x``\narrow form in v2.1. Parametrize the SQL test over the four dialects\nusing the keyword syntax.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "3d4c56c0757ddc0372ced03ffa97a13ea50d1bd8",
      "tree": "8f2db8f7e47dbb7331eeb46cbb3273b9e5b54f0c",
      "parents": [
        "0840763851e47a40676dc47b29f5f3a62b0127d7"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri May 29 16:53:19 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 16:53:19 2026 -0400"
      },
      "message": "feat: create free-threaded python wheels (#1553)\n\n* Initial commit for free threaded python support\n\n* ci: use uvx to run maturin in native wheel builds\n\nThe free-threaded matrix entries skip `uv sync` to avoid resolving\nproject dependencies against cp313t/cp314t (many dev deps lack\nfree-threaded wheels), so `uv run --no-project maturin` failed on\nmacOS/Windows with \"Failed to spawn: `maturin`\". Switch to\n`uvx maturin@1.8.1`, which runs maturin in an isolated tool env\nindependent of the project venv and matches the pin used by\nmaturin-action for manylinux builds.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* ci: resolve free-threaded interpreter path explicitly on Windows\n\nmaturin\u0027s `--interpreter python3.14t` fails on Windows because the\nfree-threaded build ships as plain `python.exe` (no `tN` suffix). Look\nup `sys.executable` of the python on PATH (which actions/setup-python\nprepends with the free-threaded install), assert\n`Py_GIL_DISABLED \u003d\u003d 1` so a misconfigured PATH can\u0027t silently build a\nGIL wheel, and normalize backslashes to forward slashes so the path\nsurvives re-expansion in the downstream `run:` line.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* build: enable PyO3 generate-import-lib for Windows free-threaded wheels\n\nWindows free-threaded Python does not expose `abiflags` in sysconfig,\nso PyO3\u0027s default Windows linkage path fails with \"A python 3\ninterpreter on Windows does not define abiflags in its sysconfig ಠ_ಠ\"\nwhen building cp31Xt wheels. Enabling the `generate-import-lib` PyO3\nfeature switches Windows builds to a generated import library\n(provided by the `python3-dll-a` crate) that does not depend on a\nfully populated sysconfig. It is a no-op on macOS and Linux and is\ncompatible with the existing `abi3` feature.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* ci: bump maturin to 1.13.3 for Windows free-threaded support\n\nmaturin 1.8.1 errors out on Windows free-threaded interpreters with\n\"A python 3 interpreter on Windows does not define abiflags in its\nsysconfig\" even when given a valid `python.exe`. Newer maturin\nreleases handle the missing abiflags gracefully for cp31Xt builds.\nBump both the `uvx maturin@` pin used for native macOS/Windows wheels\nand the `maturin-version` passed to PyO3/maturin-action for the\nmanylinux containers.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* ci: standardize wheel build job names as \"\u003cOS\u003e \u003carch\u003e (\u003ctag\u003e)\"\n\nThe mac/Windows matrix shared a single name template that prepended\n\"macOS arm64 \u0026 Windows\" to every entry, which got truncated in the\nGitHub UI sidebar and made it hard to tell macOS and Windows runs\napart. Rename all wheel build jobs to the same pattern so the OS,\narchitecture, and python tag are visible at a glance:\n\n- Linux x86_64 / arm64\n- macOS arm64 / x86_64\n- Windows x86_64\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* taplo fmt\n\n* build: move pygithub to release group to fix free-threaded wheel builds\n\npygithub pulls in cryptography via pyjwt[crypto]. cryptography 44.0.0\nships only abi3 wheels, which free-threaded interpreters cannot use, so\nuv builds it from sdist; its bundled PyO3 0.23.2 caps at Python 3.13 and\nfails on 3.14t. pygithub is only used by the manual release changelog\nscript, so move it out of the dev group into a new release group.\n\u0027uv sync --dev\u0027 (used by CI test jobs) no longer drags in cryptography.\n\n* ci: pin uv venv to setup-python interpreter for free-threaded jobs\n\nPassing a bare version like \u00273.13t\u0027 to \u0027uv venv --python\u0027 let uv fall\nback to a different system interpreter (3.12), creating a venv whose ABI\ndid not match the downloaded cp313t wheel and failing the install. Use\nthe python-path output from setup-python so the venv uses exactly the\ninterpreter that was set up.\n\n* taplo fmt\n\n* ci: set UV_PYTHON so uv sync keeps the free-threaded interpreter\n\nPinning only \u0027uv venv --python\u0027 was not enough: \u0027uv sync\u0027 ignores the\nexisting .venv, runs its own interpreter discovery, and recreated the\nvenv with the system 3.12, again mismatching the cp313t wheel. Set\nUV_PYTHON to the setup-python interpreter for the install and test\nsteps so every uv command (venv, sync, pip, run) uses it.\n\n* ci: run tests from the .venv, not the bare setup-python interpreter\n\nSetting UV_PYTHON on the test step pointed \u0027uv run --no-project pytest\u0027\nat the setup-python interpreter, which has no pytest installed, causing\n\u0027Failed to spawn: pytest\u0027. UV_PYTHON is only needed in the install step\nto build the .venv with the right interpreter; the test step must use\nthat .venv. Drop UV_PYTHON from the test step.\n\nCo-Authored-By: Claude \u003cnoreply@anthropic.com\u003e\n\n* ci: install datafusion wheel into the activated .venv\n\nSetting UV_PYTHON as a step env split the install across two\nenvironments: \u0027uv sync\u0027 populated .venv while \u0027uv pip install\u0027 targeted\nthe bare setup-python interpreter, so the datafusion wheel never landed\nin .venv and \u0027import datafusion\u0027 failed under pytest. Pin the\ninterpreter at \u0027uv venv --python\u0027, activate the venv, and pass --active\nto \u0027uv sync\u0027 so sync and pip install both target the same .venv.\n\nCo-Authored-By: Claude \u003cnoreply@anthropic.com\u003e\n\n* ci: point uv at the venv interpreter by path for free-threaded jobs\n\nActivating the venv and passing --active still let \u0027uv sync\u0027 run its own\ninterpreter discovery, which skips free-threaded builds and re-picked the\nsystem 3.12, recreating .venv and breaking the cp313t/cp314t wheel\ninstall. Pass the venv\u0027s own interpreter (.venv/bin/python) explicitly to\n\u0027uv sync\u0027, \u0027uv pip install\u0027, and \u0027uv run\u0027 so every step stays in the\nfree-threaded environment created by \u0027uv venv\u0027.\n\nCo-Authored-By: Claude \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "0840763851e47a40676dc47b29f5f3a62b0127d7",
      "tree": "c3b74841a5eadb2061f4d45f0f328f1889425a51",
      "parents": [
        "744dd23ff3bfeeafbf532d71c909e0aedb7b2194"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri May 29 13:30:39 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 13:30:39 2026 -0400"
      },
      "message": "feat: accept distinct kwarg on sum and avg (#1556)\n\n* feat: accept distinct kwarg on sum and avg\n\nUpstream exposes `sum_distinct` / `avg_distinct` / `count_distinct` as\nsibling functions that call the same underlying UDAF with\n`distinct: bool \u003d true`. The Rust binding side already routes\n`distinct\u003dSome(true)` through the aggregate builder for `sum`, `avg`,\nand `count` — but only `count` exposed the kwarg on the Python wrapper.\n\nAdd `distinct: bool \u003d False` to `sum()` and `avg()` mirroring the\nexisting `count()` signature, and update SKILL.md so the check-upstream\naudit does not re-flag the three upstream `*_distinct` shortcuts as\ngaps. The plan emitted by `sum(col, distinct\u003dTrue)` matches what\nupstream\u0027s `sum_distinct(col)` builds.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* test: fold sum/avg distinct tests into parameterized aggregation test\n\nMove the standalone test_sum_distinct_kwarg and test_avg_distinct_kwarg\nfrom test_functions.py into the existing test_aggregation::test_aggregation\nparameterization, matching how distinct is already covered for median,\narray_agg, count, and bit_xor.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: clarify distinct kwarg on sum and avg\n\nDrop the unhelpful \"upstream avg_distinct/sum_distinct shortcut\"\nreference in favor of describing the actual behavior.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: note sum/avg distinct argument-order breaking change\n\ndistinct is inserted before filter on sum and avg for consistency with\nthe other aggregate functions, breaking positional filter callers. Add a\nDataFusion 54.0.0 upgrade-guide entry covering the migration.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Update docs/source/user-guide/upgrade-guides.rst\n\nCo-authored-by: Nick \u003c24689722+ntjohnson1@users.noreply.github.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\nCo-authored-by: Nick \u003c24689722+ntjohnson1@users.noreply.github.com\u003e"
    },
    {
      "commit": "744dd23ff3bfeeafbf532d71c909e0aedb7b2194",
      "tree": "f136efd90adc2e879b5c6c7ee0868af5b455bf3e",
      "parents": [
        "7df58e531bb949785cae3cca488a1ae55cb6d478"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri May 29 09:25:16 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 09:25:16 2026 -0400"
      },
      "message": "chore: remove unused PyConfig (#1485)\n\n* Remove unused Config class that could not be connected to a SessionContext\n\nCloses #322.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Minor correction after merge main\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "7df58e531bb949785cae3cca488a1ae55cb6d478",
      "tree": "76528a4da5662f5fbd027a176bdbc24652f60f54",
      "parents": [
        "987228300b1c52215b4bb10a1cd5781c40648fbe"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri May 29 08:09:42 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 08:09:42 2026 -0400"
      },
      "message": "feat: pass calling SessionContext to Python UDTF callbacks (#1555)\n\n* feat: pass calling SessionContext to Python UDTF callbacks\n\nDataFusion 53 added `TableFunctionImpl::call_with_args(TableFunctionArgs)`\nwhere `TableFunctionArgs` carries both the positional expression\narguments and the calling `\u0026dyn Session`. The pure-Python UDTF path\npreviously discarded everything but the exprs.\n\nThread the session through when the user callback\u0027s signature opts in\nby declaring a `session` keyword parameter (or `**kwargs`). At call\ntime we downcast the `\u0026dyn Session` to its canonical `SessionState`\nimpl and build a fresh `SessionContext` over the same Arc-shared state,\nexposed to Python as a `datafusion.SessionContext` wrapper. Existing\ncallbacks whose signatures do not declare `session` continue to be\ncalled with the positional expression arguments only — no behavior\nchange for current users.\n\nNote: a UDTF body cannot drive a fresh `ctx.sql(...).collect()` on the\npassed-in session because the outer SQL execution already holds the\ntokio runtime. Use the session for metadata access (catalogs, UDF\nlookups, config) rather than nested DataFrame collection.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: clarify py_session_from_session downcast is defensive\n\nThe doc comment implied a foreign FFI session was a real input. No\ncurrent path reaches a pure-Python UDTF with a non-SessionState\nsession: the SQL planner and __call__ both hand a SessionState, and a\nForeignSession would only arrive via FFI-export of the UDTF, which\ndatafusion-python does not do. Reword to state the guard is defensive\nand rewrap the error string.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: opt-in UDTF session injection via with_session flag\n\nReplaces signature sniffing with an explicit ``with_session\u003dTrue`` kwarg\non ``TableFunction`` / ``udtf``. Avoids name-based detection footguns\n(positional-only ``session`` params, accidental ``**kwargs`` opt-in,\nshadowing by unrelated params) and makes author intent visible at\nregistration. Also documents the feature in the UDTF user guide.\n\nRust field renamed ``accepts_session`` -\u003e ``inject_session_on_call`` to\nmatch the Python-side opt-in semantics.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix: reject with_session\u003dTrue for FFI UDTFs and qualify mutation docs\n\nRaise TypeError when with_session\u003dTrue is combined with an FFI-exported\ntable function (one exposing __datafusion_table_function__). The Rust\nFFI branch does not consult the flag, so it would silently be dropped;\nguard both TableFunction.__init__ and the udtf() convenience entry.\n\nQualify the doc claim that mutations through the injected session\npropagate to the caller: registry mutations do (shared Arc registries),\nbut config changes do not (SessionConfig is cloned). Mirror the caveat\nin TableFunction.__init__ per the user-guide caveats convention.\n\nCo-Authored-By: Claude Opus 4.7 \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "987228300b1c52215b4bb10a1cd5781c40648fbe",
      "tree": "a7c9ee949f08ca706a0b57e80d9ffc4edc689a74",
      "parents": [
        "0a9ca68ade8bbe06aaf67928bb8edc65670baa24"
      ],
      "author": {
        "name": "kosiew",
        "email": "kosiew@gmail.com",
        "time": "Fri May 29 17:42:10 2026 +0800"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 29 17:42:10 2026 +0800"
      },
      "message": "Export `to_datafusion_err` from the util crate root (#1487)\n\n* Re-export error helpers from crate root\n\nPublicly re-export curated error helpers, including\nto_datafusion_err, from the crate root. Add a\nregression test in crates/util/tests/root_exports.rs\nto ensure correct functionality in an integration-test\ncontext.\n\n* Restrict re-exported items in lib.rs\n\nMake only to_datafusion_err publicly re-exported from the crate\nroot. Keep PyDataFusionError and PyDataFusionResult as private\nimports for internal use, enhancing encapsulation and reducing\nexposure of non-essential components.\n\n* feat: remove unnecessary blank line in lib.rs to improve code formatting\n\n* rm root_exports.rs"
    },
    {
      "commit": "0a9ca68ade8bbe06aaf67928bb8edc65670baa24",
      "tree": "8c9a9acdca0b95cfad42961c19ffe43adc5d3063",
      "parents": [
        "baec559b0a7c85934338d6da80ffbe538004f4d4"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Thu May 28 17:09:33 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 17:09:33 2026 -0400"
      },
      "message": "docs: user guide + runnable examples for distributing expressions (#1547)\n\n* docs: user guide page + runnable examples for distributing expressions\n\nWraps up the Expr-pickle work with the user-facing material:\n\n* docs/source/user-guide/io/distributing_work.rst — new user guide\n  page covering the multiprocessing, Ray, and datafusion-distributed\n  patterns. Includes the Security section that is the canonical home\n  for the cloudpickle / pickle.loads threat model.\n* docs/source/user-guide/io/index.rst — toctree entry.\n* examples/multiprocessing_pickle_expr.py — runnable example: a\n  Pool.map of a closure-capturing UDF across processes, with worker\n  context registration in the initializer.\n* examples/ray_pickle_expr.py — Ray actor analogue.\n* examples/datafusion-ffi-example/python/tests/_test_pickle_strict_ffi.py\n  — exercises the strict-mode refusal end to end against an FFI\n  capsule scalar UDF (kept under the FFI example crate because the\n  test needs that crate\u0027s compiled artifacts).\n* examples/README.md — index entries for the new files.\n\nAlso tightens three docstrings that previously duplicated the\nsecurity warning so they point at the canonical Security section\ninstead:\n\n* PythonLogicalCodec::with_python_udf_inlining (rustdoc): one-line\n  summary plus a relative pointer to distributing_work.rst and the\n  upstream Python pickle module security warning.\n* SessionContext.with_python_udf_inlining: one-sentence summary plus\n  :doc: link to the user guide.\n* datafusion.ipc module docstring: cross-reference to the user guide\n  for the full pattern.\n\nThe crate-level codec.rs module rustdoc also updates \"pure-Python\nscalar UDFs\" to \"scalar / aggregate / window UDFs\" now that all three\nare covered.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: document Python-version and import portability caveats for inline UDFs\n\nReviewer feedback on the Expr-pickle PRs (#1544) asked that the\ncloudpickle portability caveats be discoverable on the user-facing\npage, not only in docstrings. The distributing_work.rst page is the\ndesignated canonical home for the distribution story, so add them here:\n\n* New \u0027Portability requirements for inline Python UDFs\u0027 subsection\n  covering the matching-Python-minor-version requirement and the\n  by-value vs by-reference import-capture rule (imported modules must\n  be importable on the worker).\n* Qualify the \u0027fully portable\u0027 Python-UDF bullet to point at the new\n  requirements.\n* Cross-reference the new subsection from the closure-capture note.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: restore version-byte and cloudpickle-cache rustdoc wording\n\nTwo codec.rs docstrings were reworded in PR4 in ways that dropped\ninformation:\n\n* try_encode_python_scalar_udf: restore the `DFPYUDF` family prefix +\n  version byte description of the payload framing (PR4 had collapsed it\n  to `DFPYUDF1` prefix, dropping the version-byte mention).\n* cloudpickle cached-handle comment: restore \"The encode/decode helpers\n  above\" wording.\n\n* docs: fix reversed tuple order in multiprocessing example docstring\n\nThe \u0027Worker layout\u0027 docstring described tasks as `(expr, label)` but\nthe code builds and unpacks them as `(label, expr)`. Correct the doc\nto match.\n\n* Respond to first batch of reviewer comments\n\n* docs: relocate and restructure distributing-work guide\n\nMove the page from user-guide/io/ to the top level of user-guide/ — distributing work is a runtime/operational concern, not a file-format topic, and the shorter \"Distributing work\" title fits the sidebar cleanly.\n\nRestructure the body to lead with the practical worker-setup pattern instead of the four-slot SessionContext taxonomy. The taxonomy survives at the bottom as a reference subsection; the worker-init example and portability rules now reach the reader before they need it. Also addresses reviewer NIT: wrap the `if __name__ \u003d\u003d \"__main__\":` guidance in a `.. note::` admonition and link to the Python multiprocessing docs.\n\nAdd a header paragraph to each runnable example pointing to the user-guide page so a reader who jumps straight to the example gets the surrounding context.\n\nCo-Authored-By: Claude Opus 4.7 \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "baec559b0a7c85934338d6da80ffbe538004f4d4",
      "tree": "d18a0c7b5b9385d49cd65dd5f8f68040c6042259",
      "parents": [
        "23f9179ad08189637f88218a8ea77a1222262abc"
      ],
      "author": {
        "name": "BharatDeva",
        "email": "bharatdevagir@gmail.com",
        "time": "Thu May 28 14:28:47 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 15:28:47 2026 -0400"
      },
      "message": "fix: type scalar UDF returns as Arrow arrays (#1528)\n\nCo-authored-by: BharatDeva \u003c278575558+BharatDeva@users.noreply.github.com\u003e"
    },
    {
      "commit": "23f9179ad08189637f88218a8ea77a1222262abc",
      "tree": "36f85a95e9618d5cb3f0d1c1e2e9649270dc8c3c",
      "parents": [
        "56b1ceaae2a5023c6b3238999dc55fcee781aa24"
      ],
      "author": {
        "name": "BharatDeva",
        "email": "bharatdevagir@gmail.com",
        "time": "Thu May 28 08:34:43 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu May 28 09:34:43 2026 -0400"
      },
      "message": "docs: document null-handling function arguments (#1527)\n\nCo-authored-by: BharatDeva \u003c278575558+BharatDeva@users.noreply.github.com\u003e"
    },
    {
      "commit": "56b1ceaae2a5023c6b3238999dc55fcee781aa24",
      "tree": "c8a4a4da02eceee853e6e63f96d11aa7fa7887ff",
      "parents": [
        "081325afe3ac97c6d2ef793a352a26a2634d2738"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed May 27 13:22:19 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 27 13:22:19 2026 -0400"
      },
      "message": "Bump DataFusion to 1321d60 (54.0.0) (#1562)\n\nUpdate the pinned DataFusion git rev to\n1321d60cc37ee487d1e7ce7f501357c3236b2542, which is DataFusion 54.0.0.\n\nBump the workspace dependency requirements from 53 to 54 so the\n[patch.crates-io] git overrides actually bind (cargo only applies a\npatch when its version satisfies the dependency requirement), and\nrefresh Cargo.lock accordingly.\n\nAdapt to the 54 API:\n- Remove the DatasetExec::apply_expressions override; apply_expressions\n  is no longer a member of the ExecutionPlan trait.\n- factorial now errors on negative input, so take abs() before\n  applying factorial in the parametrized expr test and update the\n  expected values.\n\nCo-authored-by: Claude \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "081325afe3ac97c6d2ef793a352a26a2634d2738",
      "tree": "060f3a8b66790c1b6d3f53426274d54b2fe9d6e5",
      "parents": [
        "fa021fef203c9194747b9ebf1c1f526867d407b1"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue May 26 15:05:53 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue May 26 15:05:53 2026 -0400"
      },
      "message": "feat: Python UDFs: per-session inlining toggle and strict refusal setting (#1546)\n\n* feat: per-session Python UDF inlining toggle + sender ctx + strict refusal\n\nAdds a per-session toggle that turns inline Python UDF encoding on or\noff, plus the supporting plumbing to make it usable through\npickle.dumps.\n\nCodec layer:\n  * PythonLogicalCodec / PythonPhysicalCodec gain a python_udf_inlining\n    bool (default true) and a with_python_udf_inlining(enabled) builder.\n    Each try_encode_udf{,af,wf} short-circuits to inner when the toggle\n    is off; each try_decode_udf{,af,wf} that recognizes a DFPY* magic\n    on a strict codec returns a clean Execution error instead of\n    invoking cloudpickle.loads. The refusal message names the UDF and\n    the wire family so an operator can see at a glance whether to\n    re-encode the bytes or register the UDF on the receiver.\n\nSession layer:\n  * PySessionContext::with_python_udf_inlining(enabled) returns a new\n    session whose stacked logical + physical codecs both carry the\n    toggle. The Arc\u003cSessionState\u003e is cloned (cheap), only the codec\n    pair is rebuilt, so registrations and config stay attached.\n  * SessionContext.with_python_udf_inlining(*, enabled) is the Python\n    wrapper. enabled is keyword-only because positional booleans at\n    the call site read as opaque.\n\nSender-side context:\n  * datafusion.ipc gains set_sender_ctx / get_sender_ctx /\n    clear_sender_ctx thread-locals. Expr.__reduce__ now consults\n    get_sender_ctx() to pick the codec for outbound pickles, which is\n    the only path through which a strict session affects pickle.dumps\n    (the protocol calls __reduce__ with no arguments). Without a\n    sender context the default codec is used.\n\nTests:\n  * test_pickle_expr.py picks up TestPythonUdfInliningToggle (covers\n    both directions of the toggle plus the explicit-ctx fast path),\n    TestWorkerCtxLifecycle (set/clear/threading), and\n    TestSenderCtxLifecycle.\n  * New test_pickle_multiprocessing.py + helpers exercise the full\n    driver -\u003e worker round-trip on a multiprocessing.Pool with set_*_ctx\n    installed in the worker initializer.\n  * CI workflow gets a 30-minute timeout-minutes backstop so a hung\n    pickle worker can\u0027t block the matrix indefinitely.\n\nUser-guide docs and the runnable examples land in PR4 of this series.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* update uv lock\n\n* docs: clarify Python UDF inlining docstring; drop unresolved :doc: refs\n\nRewrite with_python_udf_inlining docstring for readability and remove\nreferences to /user-guide/io/distributing_work, which does not exist\nyet. Keep security warning inline as a .. warning:: Security block,\nmatching the existing pattern in Expr.to_bytes / from_bytes /\n__reduce__. The central doc will land in a follow-on PR.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: add doctest examples for sender ctx + UDF inlining toggle\n\nPer CLAUDE.md, every Python function needs a docstring example.\nAdds examples to with_python_udf_inlining, set_sender_ctx,\nclear_sender_ctx, and get_sender_ctx. Also clarifies that\nwith_python_udf_inlining returns a new SessionContext and leaves\nthe original unchanged, matching the with_logical_extension_codec\npattern.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: address review nits for UDF inlining toggle + sender ctx\n\n* codec: strict refusal routes through `read_framed_payload` so\n  malformed inline bytes surface their own diagnostic; the\n  \"inlining is disabled\" message now fires only when the payload\n  would have decoded.\n* codec: add summary line above `PythonPhysicalCodec::with_python_udf_inlining`\n  cross-link for rustdoc rendering.\n* expr: hoist `get_sender_ctx` import to module top; note that\n  `__reduce__` also drives `copy.copy` / `copy.deepcopy`.\n* context: accept `with_python_udf_inlining` positionally or as\n  kwarg (drop `*,`).\n* tests: replace size-ratio heuristic with semantic check for the\n  `DFPYUDF` family prefix; switch single-batch closure test to\n  `pool.apply`.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: keyword-only inlining flag, skip GIL on prefix mismatch\n\n- `SessionContext.with_python_udf_inlining` now keyword-only (`*, enabled`)\n  to match the documented call style and the existing doctests/tests.\n- `refuse_if_inline` and the three `try_decode_python_*` decoders short-\n  circuit on a `starts_with(family)` check before `Python::attach`, so\n  plans whose UDFs are not Python-defined no longer pay a GIL acquisition\n  per decode call. Semantics preserved: `strip_wire_header` already\n  returns `Ok(None)` when the prefix does not match.\n- `datafusion.ipc` module docstring wraps the `set_sender_ctx` example in\n  `try`/`finally` and notes that the thread-local holds a strong\n  reference to the installed `SessionContext` until cleared.\n\nCo-Authored-By: Claude Opus 4.7 \u003cnoreply@anthropic.com\u003e\n\n* Add dev dependency\n\n* Add testing for CI failure\n\n* Additional debugging for mp tests in CI\n\n* Set path for workers\n\n* more path updates for unit tests\n\n* test(pickle): remove multiprocessing CI debug instrumentation\n\nMultiprocessing forkserver/spawn hang was diagnosed and fixed: workers\ncould not import `tests._pickle_multiprocessing_helpers` because\n`pytest --import-mode\u003dimportlib` does not add the test parent dir to\n`sys.path`. The fix (appending the parent dir to `sys.path` so it is\ninherited by mp workers without shadowing the installed `datafusion`\nwheel) is retained. This commit drops the diagnostic scaffolding that\nwas added to identify the hang point:\n\n- `_diag` + per-import / per-task log writes to /tmp\n- `snapshot_processes` and the `threading.Timer` that captured worker\n  state mid-hang\n- `diag_init` Pool initializer\n- \"Dump multiprocessing diagnostic log\" CI step\n\nPre-existing infrastructure is kept: per-test `@pytest.mark.timeout(120)`\n(backed by `pytest-timeout` dev dep) and the job-level\n`timeout-minutes: 30` backstop on the test matrix.\n\nCo-Authored-By: Claude Opus 4.7 \u003cnoreply@anthropic.com\u003e\n\n* Shorten rust side docstring since it\u0027s duplicative of the exposed python docstring\n\n* docs: clarify strict-mode refusal message and to_bytes inlining docs\n\nAddress PR review feedback:\n\n- codec.rs: rewrite strict-refusal error to present the two real\n  remediations (sender re-encode by-name + receiver register; or\n  receiver enables inlining, accepting cloudpickle risk) instead of\n  bundling registration with both-side inlining.\n- expr.py: qualify to_bytes docstring so Python UDF self-contained\n  behavior is conditional on with_python_udf_inlining being enabled.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: clarify with_python_udf_inlining enabled arg is required\n\nReword docstring to drop misleading \"(the default)\" claim. The\n`enabled` parameter is keyword-only and required — there is no\nargument default. Note instead that fresh sessions inline UDFs\nuntil the toggle overrides them (a session-level default, not an\nargument default).\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: demonstrate strict-mode refusal in with_python_udf_inlining docstring\n\nReplace placeholder isinstance check with a doctest that registers\na Python UDF, encodes an expression on the default session, then\nshows the strict session refusing to decode the inline payload.\nExercises the actual behavior the toggle controls.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: convert sender-ctx example to executable doctest\n\nReplace the code-block in the ipc module docstring that demonstrated\nset_sender_ctx with a doctest that actually runs. Worker-init example\nremains a code-block since it documents a Pool-initializer pattern\nthat does not fit naturally into a doctest.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: use \u0027thread-local sender context\u0027 as adjectival phrase\n\nBare \u0027thread-local\u0027 as a noun reads ambiguously next to the\n_local.ctx attribute name. Hyphenate as adjective with explicit\n\u0027sender context\u0027 noun so the referent is unambiguous.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: drop trailing clear_sender_ctx from set_sender_ctx example\n\nThe trailing cleanup call was test hygiene, not API teaching, and\nrisked implying callers must always pair set with clear. Adjacent\nclear_sender_ctx and get_sender_ctx doctests are self-contained\n(they explicitly set or clear before asserting), so removing the\ncleanup line does not affect doctest outcomes.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "fa021fef203c9194747b9ebf1c1f526867d407b1",
      "tree": "17fb28c1801c45c0b31022291191c84e8a2d0264",
      "parents": [
        "f43830480b754900b93a30c36e6280d0ce0577f1"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Tue May 26 11:49:45 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue May 26 11:49:45 2026 -0400"
      },
      "message": "Add details on caching to skill (#1521)"
    },
    {
      "commit": "f43830480b754900b93a30c36e6280d0ce0577f1",
      "tree": "584a6d50fa243d6fbe1c8863a2a77d371d887230",
      "parents": [
        "dac9ec6230dba8717d8a0d27de19a141600486b1"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue May 26 11:48:57 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue May 26 11:48:57 2026 -0400"
      },
      "message": "feat: expose variety of features from DF54 update (#1554)\n\n* refactor: migrate FFI example table function to call_with_args\n\nDataFusion 53 deprecated `TableFunctionImpl::call(args: \u0026[Expr])` in\nfavor of `call_with_args(args: TableFunctionArgs)`. `PyTableFunction`\nwas migrated in 5a64b0d; this brings the FFI example along so it no\nlonger relies on the deprecated entry point.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: type SessionContext codec setters with exportable Protocols\n\nPR #1541 introduced `with_logical_extension_codec` /\n`with_physical_extension_codec` setters typed as `codec: Any`. The Rust\nextractors accept either a raw `PyCapsule` or any object exposing\n`__datafusion_logical_extension_codec__` /\n`__datafusion_physical_extension_codec__`.\n\nAdd `LogicalExtensionCodecExportable` / `PhysicalExtensionCodecExportable`\nProtocols in `python/datafusion/user_defined.py` (matching the existing\n`ScalarUDFExportable` pattern) and tighten both setter signatures to\n`Protocol | _PyCapsule`. Pure typing change; no runtime behavior diff.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: accept variadic field path in get_field\n\nUpstream exposes both `get_field(expr, name)` and\n`get_field_path(expr, [names...])`, but both ultimately call the same\nscalar UDF with a base expression plus one or more name args. Collapse\nthe Python surface into a single variadic `get_field(expr, *names)`\nthat accepts either a one-step lookup or a path of names, dispatching\nthrough a single Rust binding.\n\nNote in `.ai/skills/check-upstream/SKILL.md` that `get_field_path` is\ncovered by the variadic form so future audits do not flag it as a gap.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: SessionContext.read_batches / read_batch\n\nWrap upstream `SessionContext::read_batches`, which materializes a\nDataFrame directly from a sequence of `RecordBatch`es without\nregistering a named table. The single-batch convenience\n`SessionContext.read_batch` is implemented in pure Python by calling\n`read_batches([batch])`, so the Rust side only needs the one binding.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: SessionContext UDF lookup helpers\n\nExpose `udf(name)` / `udaf(name)` / `udwf(name)` lookups symmetric with\nthe existing `register_udf` / `register_udaf` / `register_udwf` setters,\nplus `udfs()` / `udafs()` / `udwfs()` for enumerating registered\nfunction names. Looked-up functions come back as the same\n`ScalarUDF` / `AggregateUDF` / `WindowUDF` wrappers users already get\nfrom registration, so they can be called as expressions or re-registered\ninto a different session.\n\nReturns Vec\u003cString\u003e from the list helpers (sorted) rather than the raw\nHashSet upstream returns, so calling code gets a stable ordering.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* bump pre-commit so it stops failing CI checks\n\n* test: drop xfail on timestamp[s] parquet roundtrip\n\npyarrow.parquet promotes timestamp[s] to timestamp[ms] on write (apache/arrow#41382),\nso the read array never matched the input. Cast the expected array to timestamp[ms]\nin test_simple_select to assert DataFusion reads what Arrow actually stored.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* test: capture deprecation warning in repr_rows conflict case\n\nDataFrameHtmlFormatter(repr_rows\u003d..., max_rows\u003d...) fires the deprecation\nwarning before raising ValueError, but pytest.raises does not catch warnings.\nThe escaping warning surfaced in every pytest run. Wrap the call in both\npytest.raises and pytest.warns so the warning is asserted, not leaked.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(udf): document SessionContext UDF lookup with worked examples\n\nAdd Examples docstrings (doctest) for `udf` / `udaf` / `udwf` / `udfs` /\n`udafs` / `udwfs` that demonstrate the lookup pattern, including a\nlate-binding example where the function name comes from configuration.\nAdd tests covering config-driven dispatch and built-in UDAF / UDWF\nlookup so the documented patterns are exercised end-to-end.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor(udf): raise KeyError on UDF/UDAF/UDWF lookup miss\n\n`SessionContext.udf` / `udaf` / `udwf` previously surfaced upstream\n`DataFusionError::Plan` as a generic exception whose message (\"There\nis no UDF named ...\") is set by DataFusion and can drift between\nreleases. Pre-check membership via `udfs()` / `udafs()` / `udwfs()`\nand raise `PyKeyError` on miss so callers get the Pythonic\ndict-style lookup behavior and tests are no longer coupled to the\nupstream wording.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor(udf): add _from_internal classmethod to UDF wrappers\n\n`SessionContext.udf` / `udaf` / `udwf` previously constructed wrapper\nobjects by calling `__new__` directly and writing the private `_udf`\n/ `_udaf` / `_udwf` attribute from outside the owning module. Three\nnear-identical blocks coupled `context.py` to wrapper internals.\n\nAdd a `_from_internal` classmethod on each wrapper that takes an\nalready-constructed `df_internal` handle and returns a wrapper\nwithout re-running `__init__`. The lookup methods now collapse to a\nsingle call, the `__new__` bypass is documented on the wrapper class\nitself, and renaming the private field is a one-spot edit.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: widen SessionContext.read_batches to accept any iterable\n\nThe underlying PyArrow FFI extractor for `Vec\u003cRecordBatch\u003e` requires a\nPython `list`, so the previous `list[pa.RecordBatch]` annotation was\naccurate but unnecessarily strict. Accept any\n`Iterable[pa.RecordBatch]` on the Python side and materialize to a\nlist before crossing the FFI boundary so callers can pass generators,\ntuples, or other iterables without manual conversion.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(context): trim codec docstrings, reference Exportable protocols\n\nDrop prose restatement of the type union for `with_logical_extension_codec`\nand `with_physical_extension_codec`. Keep the dunder name (not visible from\nthe type hint) and cross-link the `LogicalExtensionCodecExportable` /\n`PhysicalExtensionCodecExportable` protocols so Sphinx resolves them.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(udf): drop return-type cross-refs in udf/udaf/udwf docstrings\n\nThe `:py:class:` link back to the wrapper class shadowed the return type\nannotation and risked drifting if the class were moved. Replace with a\nplain backtick literal; surrounding contract prose is unchanged.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(functions): use F alias in get_field doctest\n\nThe doctest namespace already imports `datafusion.functions as F`,\nmaking `F.named_struct` / `F.get_field` shorter than the\n`dfn.functions.*` form.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "dac9ec6230dba8717d8a0d27de19a141600486b1",
      "tree": "2c4c4327e94248d28aeb205cfb44d8596b0849d7",
      "parents": [
        "afaeccbf27623e983bba594f466024d1f52c5a0f"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed May 20 18:58:32 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 20 18:58:32 2026 -0400"
      },
      "message": "feat: enable pickling for Python aggregate and window UDFs (#1545)\n\n* feat: inline encoding for Python aggregate and window UDFs\n\nExtends the PythonLogicalCodec / PythonPhysicalCodec inline encoding\nintroduced for scalar UDFs to also cover Python-defined aggregate and\nwindow UDFs. The cloudpickle tuple shape per family is:\n\n  DFPYUDA  (agg)     (name, accumulator_factory, input_schema_bytes,\n                      return_schema_bytes, state_schema_bytes,\n                      volatility_str)\n  DFPYUDW  (window)  (name, evaluator_factory, input_schema_bytes,\n                      return_schema_bytes, volatility_str)\n\nSame wire-framing as scalar (family magic + version byte + cloudpickle\nblob), same schema serde (arrow-rs native IPC), same cached cloudpickle\nhandle. The agg state schema is encoded as a full IPC schema so the\npost-decode UDF reports the same names + nullability + metadata as the\nsender — relevant for accumulators whose StateFieldsArgs consumers key\noff names rather than positional DataType.\n\nRequired restructuring two existing UDF impls so the codec can grab\nthe Python callable directly:\n\n* udaf.rs: replaces create_udaf + AccumulatorFactoryFunction closure\n  with a named PythonFunctionAggregateUDF that stores the Py\u003cPyAny\u003e\n  accumulator factory. Synthesizes state_{i} field names when the\n  Python constructor passes only Vec\u003cDataType\u003e; from_parts preserves\n  the full state schema on the decode side.\n* udwf.rs: renames MultiColumnWindowUDF -\u003e PythonFunctionWindowUDF,\n  drops the PartitionEvaluatorFactory PtrEq wrapper, stores the\n  Py\u003cPyAny\u003e evaluator directly. PartialEq and Hash get the same\n  pointer-identity fast path + debug-log exception handling already\n  on PythonFunctionScalarUDF.\n\nUser-facing surface:\n\n* AggregateUDF.name and WindowUDF.name properties (parallel to the\n  ScalarUDF.name shipped in PR1).\n* Existing UDAF/UDWF construction paths are unchanged.\n\nThe per-session with_python_udf_inlining toggle, sender-side context,\nstrict refusal, and user-guide docs land in PRs 3-4 of this series.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: restore pub UDAF/UDWF helpers and document inline encoding\n\nRe-export `to_rust_accumulator`, `to_rust_partition_evaluator`, and\n`PythonFunctionWindowUDF` (with a `MultiColumnWindowUDF` alias) by\npromoting `udaf` and `udwf` to `pub mod` so prior downstream Rust\nconsumers keep their API surface after the inline-encoding refactor.\n\nAdds an end-to-end window UDF pickle round-trip test that runs the\ndecoded evaluator over a real session, mirroring the aggregate test.\n\nDocuments the cloudpickle-based shipping behavior of Python aggregate\nand window UDFs in the user-guide aggregations and windows pages.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix: address PR #1545 review feedback\n\n- Fix CountAcc.merge in pickle test: sum over states[0] (partition\n  counts), not over the list of state fields. The prior implementation\n  only added partition 0\u0027s count when merging across partitions.\n- Drive test_agg_udf_evaluates_after_roundtrip with a two-batch\n  DataFrame so merge actually runs and the round-tripped state-field\n  schema is exercised end-to-end.\n- Correct PY_AGG_UDF_FAMILY / PY_WINDOW_UDF_FAMILY doc comments and the\n  aggregate block comment to reference \"return schema bytes\" rather\n  than \"return type\" / \"return_type_bytes\" so the docs match the actual\n  on-wire layout.\n- Keep `udaf` and `udwf` modules private (matching `udf`) and\n  selectively re-export the helpers downstream Rust consumers rely on\n  (`to_rust_accumulator`, `to_rust_partition_evaluator`,\n  `PythonFunctionWindowUDF`, `MultiColumnWindowUDF`) instead of\n  exposing the whole module surface.\n- Rename codec helpers `*_agg_udf` -\u003e `*_udaf` and `*_window_udf` -\u003e\n  `*_udwf` for naming consistency with the Python public aliases.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "afaeccbf27623e983bba594f466024d1f52c5a0f",
      "tree": "88bff2377b13d0acef907ac81871225f7e27fcf1",
      "parents": [
        "8ba06e4147122962f356e11b73175379473f75a7"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue May 19 09:31:30 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue May 19 09:31:30 2026 -0400"
      },
      "message": "feat: enable pickling of most Expr except udaf and udwf (#1544)\n\n* feat: pickle support for Expr via inline scalar UDF encoding\n\nAdds Python-aware encoding to PythonLogicalCodec/PythonPhysicalCodec\nso a ScalarUDF defined in Python travels inside the serialized\nexpression (cloudpickled into fun_definition) instead of needing a\nmatching registration on the receiver. With that in place, Expr gains\n__reduce__ + classmethod from_bytes(buf, ctx\u003dNone) so pickle.dumps /\npickle.loads work end-to-end on expressions built from col, lit,\nbuilt-in functions, and Python scalar UDFs.\n\nWire format is framed as \u003cDFPYUDF magic, version byte, cloudpickle\ntuple\u003e; the version byte lets a too-new/too-old payload surface a\nclean Execution error instead of an opaque cloudpickle unpack\nfailure. Schema serde is via arrow-rs\u0027s native IPC (no pyarrow\nround-trip). Cloudpickle module handle is cached per-interpreter\nthrough PyOnceLock.\n\nWorker-side context resolution lives in a new datafusion.ipc module:\nset_worker_ctx / get_worker_ctx / clear_worker_ctx plus a private\n_resolve_ctx helper consulted by Expr.from_bytes. Priority is\nexplicit ctx \u003e worker ctx \u003e global SessionContext. FFI UDFs still\ntravel by name and require the matching registration on the\nreceiver\u0027s context.\n\nAggregate and window UDF inline encoding, the per-session\nwith_python_udf_inlining toggle, sender-side context, and the\nuser-guide docs land in follow-on PRs.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs(pickle): add cloudpickle security warnings, docstring examples, edge-case tests\n\nInline `.. warning::` blocks on `Expr.to_bytes`, `Expr.from_bytes`, and\n`Expr.__reduce__` so the cloudpickle / arbitrary-code-execution caveat is\nvisible at the public API surface in advance of the user-guide page that\nlands in PR 4.\n\nAdd doctest-style `Examples:` blocks to `datafusion.ipc` functions\n(`set_worker_ctx`, `clear_worker_ctx`, `get_worker_ctx`, `_resolve_ctx`),\n`ScalarUDF.name`, and the new `Expr` pickle methods, per CLAUDE.md.\n\nTighten `Expr.__reduce__` return annotation to\n`tuple[Callable[[bytes], Expr], tuple[bytes]]`.\n\nTests: multi-arg UDF round-trip (covers synthetic `arg_{i}` schema-field\nloop in the codec) plus malformed-bytes paths through `Expr.from_bytes`.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* as_any no longer in api\n\n* feat(pickle): stamp Python (major, minor) in UDF wire header\n\ncloudpickle bytecode is not portable across Python minor versions —\na payload produced on 3.11 fails to load on 3.12 with an opaque\nmarshal/unpickle error. Embed the sender\u0027s (major, minor) in the\nDFPYUDF wire header and reject mismatches at decode time with an\nactionable error that names both versions, instead of letting the\nfailure surface from inside cloudpickle.loads.\n\nHeader layout becomes:\n  DFPYUDF (7) | version (1) | py_major (1) | py_minor (1) | cloudpickle\n\nExtend the Security warnings on Expr.to_bytes / from_bytes /\n__reduce__ with a Portability section covering the cross-version\nconstraint and cloudpickle\u0027s by-value/by-reference behavior (the\ncallable inlines bytecode and closure cells, but imported names\ntravel by reference and must be importable on the receiver). Add\na matching Serialization model note to the datafusion.ipc module\ndocstring.\n\nNew tests:\n  - codec::wire_header_tests: py-major/minor mismatch, truncated\n    py-version bytes, round-trip with py-version\n  - test_pickle_expr::test_cross_version_error_message: patches the\n    py_minor byte inside an emitted payload and asserts the error\n    message identifies the version mismatch\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "8ba06e4147122962f356e11b73175379473f75a7",
      "tree": "8238b487946e3b83109ea7bf5d27a6d25aad1cb8",
      "parents": [
        "baef8f00e1de6a086fa1814c98e65d216fb16277"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon May 18 11:51:36 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon May 18 11:51:36 2026 -0400"
      },
      "message": "Update datafusion dependency to latest in preparation for DF54 (#1532)\n\n* feat: upgrade upstream DataFusion 53 → main (pre-54)\n\nBump workspace deps to apache/datafusion@3d06bedc (git pin) in\npreparation for the 54.0.0 release. Workspace package version moves\nto 54.0.0 to track the upstream major convention.\n\nCompile fixes:\n- Drop as_any impls (trait now has Any as supertrait) and use the\n  upstream-provided downcast_ref helper on dyn trait objects.\n- Reconcile FFI provider From conversions to drop redundant `+ Send`\n  on Arc\u003cdyn ...\u003e bounds.\n- Cast/TryCast: data_type → field.data_type() (FieldRef rename).\n- Stub match arms for new Expr::HigherOrderFunction / Lambda /\n  LambdaVariable and ScalarValue::ListView / LargeListView variants;\n  proper exposure deferred to PR 3 audit.\n- DatasetExec: partition_statistics returns Arc\u003cStatistics\u003e; add\n  required apply_expressions trait method.\n- Suppress TableFunctionImpl::call deprecation pending call_with_args\n  refactor that needs Session plumbing.\n\nUser-facing test updates for upstream behavior changes:\n- median / approx_median / approx_percentile_cont now return Float64.\n- String functions (concat_ws, lower, upper, repeat, reverse,\n  split_part, translate) return StringView when given StringView.\n- overlay appends past end-of-string rather than replacing the input.\n- arrays_zip / list_zip struct field names \"c0\"/\"c1\" → \"1\"/\"2\".\n- Filter on mismatched cast types now errors (was 0 matches).\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: expose DataFrame.alias and tidy public API after DF53→54 audit\n\nCompanion to the upstream DataFusion 53 → main bump. The\ncheck-upstream audit (PR 3 of dev/release/upstream-sync.md) surfaced a\nsmall set of trivial wins; this commit ships them.\n\nTrivial wins:\n- DataFrame.alias(name) — wraps the logical plan in a SubqueryAlias.\n- functions.__all__: add `instr` and `position` (both were defined as\n  public defs but missing from `__all__`, so they didn\u0027t show up in\n  `from datafusion.functions import *` or generated docs).\n- top-level `datafusion.__all__`: re-export `TableProviderFactory` and\n  `TableProviderFactoryExportable` (previously only reachable via the\n  `datafusion.catalog` submodule).\n\nNon-trivial gaps surfaced by the audit (DataFrame.registry,\ninto_*/task_ctx, SessionContext extensibility surface, distinct-aware\naggregate variants, TableFunctionImpl::call_with_args migration, FFI\nProtocol pipeline gaps) are deferred — each warrants its own design\nand PR.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* taplo fmt\n\n* Update unit test to go along with https://github.com/apache/datafusion/pull/22133\n\n* docs: demonstrate alias via self-join in DataFrame.alias example\n\nPrior example called alias(\"t\") then to_pydict(), which did not show\nthe qualifier effect. Replace with a self-join that uses col(\"l.val\")\nand col(\"r.val\") so the disambiguation behavior is visible.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: wrap higher-order, lambda, and lambda-variable Expr variants\n\nDataFusion 54 introduces Expr::HigherOrderFunction, Expr::Lambda, and\nExpr::LambdaVariable. PyExpr::to_variant previously errored on each\nwith py_unsupported_variant_err. Add PyHigherOrderFunction, PyLambda,\nand PyLambdaVariable wrappers, register them in the expr pymodule and\nre-export from python/datafusion/expr.py, and dispatch to_variant to\nthe new wrappers.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: wire rex_type and rex_call_operands for new Expr variants\n\nMap HigherOrderFunction and Lambda to RexType::Call; LambdaVariable to\nRexType::Reference. In rex_call_operands return the args for\nHigherOrderFunction, the body for Lambda, and self for LambdaVariable\n(mirroring Column). In rex_call_operator return the underlying UDF\nname for HigherOrderFunction and the literal \"lambda\" for Lambda.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: support LargeList/ListView/LargeListView in map_from_scalar_to_arrow\n\nThese ScalarValue variants all wrap Arc\u003c...Array\u003e, exposing the outer\nDataType via Array::data_type(), so we can mirror the existing\nScalarValue::List arm instead of returning PyNotImplementedError. This\nmakes Expr.types() work for plans that round-trip through SQL or proto\nwhere these scalar variants surface.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: switch PyTableFunction to non-deprecated call_with_args\n\nDataFusion 53.0.0 deprecated TableFunctionImpl::call in favor of\ncall_with_args(args: TableFunctionArgs), which threads a Session\nreference alongside the exprs. Implement call_with_args on\nPyTableFunction (delegating to the FFI variant\u0027s call_with_args, or\nignoring the session for the pure-Python variant which doesn\u0027t use it)\nand have __call__ build a TableFunctionArgs from the global session.\nDrops both #[allow(deprecated)] attributes.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* build: revert workspace version to 53.0.0 and move DF overrides to [patch.crates-io]\n\nThe workspace version was prematurely bumped to 54.0.0 in the\nDF53→pre-54 upgrade. Restore it to 53.0.0 until we are actually\nready to cut the 54 release.\n\nThe same change had moved every datafusion-* dependency from a\ncrates.io version constraint to a direct git dep in\n[workspace.dependencies]. Switch them back to \"version \u003d \\\"53\\\"\" and\nmove the git rev overrides into [patch.crates-io] so the published\nmanifest will be patch-free.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* taplo format\n\n* test: sort FFI test results by partition key before equality compare\n\nMulti-partition `collect()` returns batches in execution-scheduling\norder, which is non-deterministic and differs between local and CI\nrunners. Sort by the first value of column 0 (unique per partition in\neach affected test) so the expected/actual comparison is stable.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Bump datafusion main commit\n\n* test: cover new DF54 expr wrappers, catalog factories, and DataFrame.alias\n\nAdd module-metadata checks for HigherOrderFunction, Lambda, LambdaVariable\nand the top-level TableProviderFactory / TableProviderFactoryExportable\nre-exports, plus a self-join regression test exercising the new\nDataFrame.alias() qualifier-based selection path.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "baef8f00e1de6a086fa1814c98e65d216fb16277",
      "tree": "53d7aaca6eeebed0d0f6662343b9b04a4dba6222",
      "parents": [
        "55870ff30c4a1f086e0b63de434ebf8b674ac110"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri May 15 09:00:21 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri May 15 09:00:21 2026 -0400"
      },
      "message": "Add support for logical and physical codecs (#1541)\n\n* feat: unify logical + physical proto codec stack via SessionContext\n\nIntroduces a single composable codec layer that every serializer reads\nfrom the session, replacing the hardcoded `DefaultLogicalExtensionCodec`\n/ `DefaultPhysicalExtensionCodec` calls scattered across PyLogicalPlan,\nPyExecutionPlan, and the Rust-wrapped Python provider plumbing.\n\nKey changes:\n\n* New `PythonLogicalCodec` and `PythonPhysicalCodec` (crates/core/src/codec.rs)\n  wrap any inner `LogicalExtensionCodec` / `PhysicalExtensionCodec`. Both\n  share a `DFPYUDF1` magic-prefix path for in-band cloudpickle encoding\n  of Python scalar UDFs, so an `ExecutionPlan` / `PhysicalExpr`\n  referencing a Python `ScalarUDF` round-trips through either layer.\n  Magic-prefix registry table (DFPYUDF1 in use; DFPYUDA1 / DFPYUDW1 /\n  DFPYPE1 reserved) documented in the module header.\n\n* `PySessionContext` stores `Arc\u003cPythonLogicalCodec\u003e` and\n  `Arc\u003cPythonPhysicalCodec\u003e` directly. FFI wrappers are built on demand\n  via `ffi_logical_codec()` / `ffi_physical_codec()` for capsule export\n  and downstream `RustWrappedPy*` consumers. Adds\n  `__datafusion_physical_extension_codec__` getter +\n  `with_physical_extension_codec` setter (symmetric with the logical\n  pair).\n\n* `PyLogicalPlan.to_proto` / `from_proto` renamed to `to_bytes` /\n  `from_bytes`, now reading the session\u0027s logical codec. `to_proto` /\n  `from_proto` survive as deprecated thin wrappers emitting\n  `DeprecationWarning`.\n\n* `PyExecutionPlan` gains the same `to_bytes` / `from_bytes` rename +\n  deprecated aliases, plus `__datafusion_execution_plan__` capsule\n  getter and `from_pycapsule` (ported from poc_ffi_query_planner).\n\n* New `PyPhysicalExpr` class with `to_bytes` / `from_bytes` /\n  `from_pycapsule` / `__datafusion_physical_expr__`. `from_bytes`\n  takes an input pyarrow Schema for column-reference resolution.\n\n* `datafusion-python-util` gains `from_pycapsule!` /\n  `try_from_pycapsule!` macros + `physical_codec_from_pycapsule`,\n  `task_context_from_pycapsule`, `create_physical_extension_capsule`\n  (ported from poc_ffi_query_planner).\n\n* `PythonFunctionScalarUDF` exposes `func()`, `input_fields()`,\n  `return_field()`, `volatility()`, `from_parts()` accessors needed\n  by the codec.\n\nPython wrapper updates: `LogicalPlan` / `ExecutionPlan` add\n`to_bytes` / `from_bytes` + deprecate `to_proto` / `from_proto`;\n`ExecutionPlan` adds capsule getter + `from_pycapsule`; new\n`PhysicalExpr` wrapper class exported from the top-level package;\n`SessionContext` exposes the physical codec capsule + setter.\n\nTest coverage in python/tests/test_plans.py: round-trip via new API,\ndeprecation warnings on old API, capsule protocol getters,\nsession-routed codec on both layers.\n\n`PyLogicalPlan` PyCapsule protocol is intentionally not added —\n`datafusion-ffi` does not expose `FFI_LogicalPlan`, so there is no\nstable cross-crate shape to publish. Round-tripping a `LogicalPlan`\ngoes through `to_bytes` / `from_bytes` only.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* test: FFI-example integration tests for codec + plan capsule APIs\n\nAdds four downstream-crate fixtures in `datafusion-ffi-example` so the\nnew PR1 surface can be tested with the same FFI-handoff pattern used\nfor table providers, UDFs, etc. Existing tests prove the API exists;\nthese tests prove it composes with code that lives in another crate.\n\nNew Rust types in `examples/datafusion-ffi-example/src/`:\n\n* `MyLogicalExtensionCodec` — delegates to\n  `DefaultLogicalExtensionCodec` and bumps atomic counters on the UDF\n  encode/decode entry points. Exported via\n  `__datafusion_logical_extension_codec__`. Installed onto a session\n  with `ctx.with_logical_extension_codec(my_codec)`.\n* `MyPhysicalExtensionCodec` — mirror for `PhysicalExtensionCodec`.\n* `MyExecutionPlan` — wraps a one-column `EmptyExec`, exposes\n  `__datafusion_execution_plan__`. Lets the receiver consume an\n  `ExecutionPlan` capsule that did not originate in\n  datafusion-python.\n* `MyPhysicalExpr` — wraps `Literal(Int32(42))`, exposes\n  `__datafusion_physical_expr__`. Same FFI handoff for physical\n  expressions.\n\nNew tests:\n\n* `_test_logical_extension_codec.py` — codec installs cleanly, the\n  session re-exports its capsule, and `try_encode_udf` fires on the\n  user codec when serializing a plan that references a `ScalarUDF`.\n  The decode counterpart is a round-trip check rather than a counter\n  assertion: when the UDF is in the receiver\u0027s function registry,\n  `parse_expr` resolves by name before consulting the codec.\n* `_test_physical_extension_codec.py` — symmetric.\n* `_test_execution_plan.py` — parametrized over typed-class vs\n  raw-capsule input; verifies `ExecutionPlan.from_pycapsule` consumes\n  the downstream capsule.\n* `_test_physical_expr.py` — same for `PhysicalExpr.from_pycapsule`.\n\nAPI changes forced by the new tests:\n\n* `PyLogicalPlan.to_bytes`, `PyExecutionPlan.to_bytes`,\n  `PyPhysicalExpr.to_bytes` now accept an optional `ctx` parameter.\n  When supplied, encoding routes through the session\u0027s installed\n  codec instead of a fresh default. `ctx\u003dNone` preserves the previous\n  default-codec behavior used by the deprecated `to_proto` shims.\n* The util `from_pycapsule!` / `try_from_pycapsule!` macros now\n  validate the capsule name via `pointer_checked(Some(c\"...\"))`\n  rather than `pointer_checked(None)`. The latter rejects named\n  capsules outright with CPython\u0027s \"incorrect name\" error.\n* `SessionContext.with_logical_extension_codec` and\n  `with_physical_extension_codec` now wrap the returned internal\n  context in `SessionContext` so the result has the full Python\n  surface. The pre-existing logical setter was returning a raw\n  internal object that lacked `sql()` and friends.\n\n`examples/datafusion-ffi-example/Cargo.toml` gains `datafusion` and\n`datafusion-proto` workspace dependencies for the new Rust impls.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: tighten PR1 scope to codec plumbing only\n\nReview feedback pass. PR1 is now strictly the composable codec layer +\nsession routing + class-method serialization API. Anything that\ntouches actual Python UDF inline encoding or Python expression\nwrapping moves to PR2 alongside the pickle work.\n\nDropped:\n\n* `encode_python_scalar_udf` / `decode_python_scalar_udf` helpers\n  from `crates/core/src/codec.rs`, along with cloudpickle and pyarrow\n  imports. The wrapper codecs now delegate every method to `inner`.\n  `DFPYUDF1` magic constant is kept (marked `dead_code` for now) as a\n  reservation so PR2 has a single definition site.\n* `udf.rs` reverted to pre-PR1 shape. The codec no longer needs\n  `func()` / `input_fields()` / `volatility()` / `from_parts()`\n  accessors. Re-added by PR2 when scalar-UDF inlining lands.\n* `PyPhysicalExpr` class + Python wrapper + `__init__` export +\n  `MyPhysicalExpr` FFI fixture + `_test_physical_expr.py`. No\n  consumer in PR1 or PR2 plan documents; symmetry with\n  `PyExecutionPlan` is not enough to justify the surface area.\n* Rust-side `PyLogicalPlan::to_proto` / `from_proto` and\n  `PyExecutionPlan::to_proto` / `from_proto` deprecated wrappers.\n  The deprecation lives entirely in the Python wrapper layer, which\n  emits `DeprecationWarning` and forwards to `to_bytes` /\n  `from_bytes`. Less Rust duplication.\n* `PythonLogicalCodec::with_default_inner` /\n  `PythonPhysicalCodec::with_default_inner` — redundant with\n  `impl Default`. Logic moved into `Default::default`.\n* `PySessionContext::default_logical_codec` /\n  `default_physical_codec` helpers. Inlined as\n  `Arc::new(PythonLogicalCodec::default())` at the three call sites.\n\nTests (root: 1076, FFI example: 36) all green after the cuts.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* remove unuseful code comments\n\n* docs: rewrite codec module comments around purpose, not PR sequence\n\nThe previous doc-block framed PythonLogicalCodec / PythonPhysicalCodec\nin terms of \"PR1 delegates, PR2 will add encoding\" — useful for\nreview, useless for someone reading the code later.\n\nReframed in terms of what the codecs exist to do: encode Python-side\nplan references (pure-Python UDFs, etc.) into the proto wire format\nso plans can cross process boundaries without the receiver having to\npre-register every callable. The wrappers sit at the top of the\nsession\u0027s codec stack and delegate non-Python encoding to a\ncomposable inner codec.\n\nMagic-prefix registry table loses the \"reserved\" column. Doc still\nnotes that the in-module impls currently delegate and that\nencoder/decoder hooks land alongside the corresponding Python-side\nserialization work.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat(codec): forward every LogicalExtensionCodec /\nPhysicalExtensionCodec method to inner\n\nPythonLogicalCodec previously only overrode the four required methods\non the trait plus the scalar UDF pair, so the default trait impls\n(returning \"LogicalExtensionCodec is not provided\") shadowed any\ndownstream FFI codec for file formats, aggregate UDFs, and window\nUDFs. A user installing their own codec via\n`SessionContext.with_logical_extension_codec(...)` would silently\nlose access to its `try_*_file_format`, `try_*_udaf`,\n`try_*_udwf` implementations.\n\nForward every trait method to `inner` so the user-installed codec is\nfully reachable. Same change on the physical side, including\n`try_*_expr`, `try_*_udaf`, `try_*_udwf` — the corresponding\nPython-aware paths can layer on later by intercepting before\ndelegation.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: tighten codec dispatch test docstrings\n\nThe previous docstrings claimed the tests verify \"PythonLogicalCodec\ndelegates non-Python UDFs to the inner codec.\" That\u0027s\nforward-looking — the codecs currently delegate every UDF\nunconditionally, so the test would behave identically for Python and\nnon-Python UDFs.\n\nRewrite to describe what the test actually proves: the dispatch chain\n`PyLogicalPlan.to_bytes -\u003e session.logical_codec -\u003e PythonLogicalCodec\n-\u003e FFI -\u003e user impl` (and the physical mirror) forwards correctly,\nobservable via the user codec\u0027s atomic counter incrementing after one\nencode pass.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor(ffi-example): MyExecutionPlan emits real data via\nMemorySourceConfig\n\nWas a one-column `EmptyExec` stub useful only as a capsule-handoff\ntarget. Promoted to a minimal reference impl that a downstream Rust\ncrate can copy when exposing a custom `ExecutionPlan` to\ndatafusion-python: configurable `num_rows`, produces a single batch\nof sequential `Int32` values under column `value`, wrapped in\n`DataSourceExec` via `MemorySourceConfig::try_new_exec`. Header\ncomment explains the typical use case (remote backend, streaming\nsource, synthetic data generator) and the\n`__datafusion_execution_plan__` capsule shape downstream crates\nshould follow.\n\nTest asserts the schema-bearing plan survives the FFI hop: a\n`DataSourceExec` arrives with the expected partitioning and no\nchildren. Schema details are not surfaced through the FFI display\npath (only the wrapping `ForeignExecutionPlan` name + inner plan\nname appear), so the test does not assert the column name.\n\n`to_bytes` round-trip of an FFI-imported plan is not exercised:\nencoding requires a physical codec that knows how to serialize\n`ForeignExecutionPlan`, which the default codec does not. A\ndownstream user round-tripping such a plan must install their own\ncodec via `with_physical_extension_codec`. Documented in the test\nfile rather than asserted on.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: drop dormant ExecutionPlan PyCapsule round-trip\n\n`PyExecutionPlan::from_pycapsule` and the matching\n`__datafusion_execution_plan__` exporter have no consumer in this\nrepo, on the POC `poc_ffi_query_planner` branch, or on any sibling\nbranch (`testing/datafusion-distributed`, `testing/ffi-library-marker`,\n`tmp/ffi-with-codecs`). The pair was wired up speculatively for FFI\nplan handoff that no Python code path actually performs today.\n\nDrop the whole capsule round-trip for `ExecutionPlan`:\n\n* Rust `PyExecutionPlan::from_pycapsule` and\n  `__datafusion_execution_plan__`.\n* Python `ExecutionPlan.from_pycapsule` and\n  `__datafusion_execution_plan__` wrappers.\n* `MyExecutionPlan` FFI fixture + `_test_execution_plan.py` + lib.rs\n  registration. Was solely a test fixture for the dropped path.\n* `test_execution_plan_pycapsule_protocol` in `python/tests/test_plans.py`.\n\n`PyExecutionPlan.to_bytes` / `from_bytes` survive — they encode\nthrough the session\u0027s physical codec and have real coverage.\nCapsule round-trip can be re-added when a concrete consumer\n(distributed worker, bridge library) lands.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* feat: PyExpr.to_bytes / from_bytes via session logical codec\n\nMirrors PyLogicalPlan / PyExecutionPlan: encode through the session\u0027s\ninstalled `LogicalExtensionCodec` (or a default-inner\n`PythonLogicalCodec` when no `ctx` is supplied), decode against the\nsession\u0027s function registry + codec via `parse_expr`.\n\nRust side calls `datafusion_proto::logical_plan::to_proto::serialize_expr`\nand `from_proto::parse_expr`. Python wrapper threads an optional\n`SessionContext` through.\n\nTests cover the session-routed roundtrip and the no-ctx default-codec\nencode path. Adds a third consumer of `session.logical_codec()`\nalongside `PyLogicalPlan` and the codec dispatch tests in the FFI\nexample, broadening coverage of the codec stack.\n\nThis is the last piece of the PR1 codec surface — follow-up pickle\nwork (`Expr.__reduce__`, worker-scoped context, multiprocessing) can\nbuild on this without bundling the byte-level serialization API.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* test(ffi-example): assert codec roundtrip restores plan output\n\nPR review feedback: weak `is not None` checks let regressions slip\npast. Mirror python/tests/test_plans.py — logical compares\n`df.collect() \u003d\u003d round_trip.collect()`; physical compares\n`str(original) \u003d\u003d str(restored)`.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "55870ff30c4a1f086e0b63de434ebf8b674ac110",
      "tree": "3b3c62ab32911e96aa773ee6374555ca40c0423c",
      "parents": [
        "db22a9240e5832d9a88b74c640e3b60abfbe52c1"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed May 13 10:28:12 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 13 10:28:12 2026 -0400"
      },
      "message": "build(deps): combined dependabot bumps (Cargo + workflows) (#1534)\n\nCombines the following dependabot PRs:\n\nCargo:\n- tokio 1.50.0 -\u003e 1.52.3 (#1530)\n- arrow-array 58.1.0 -\u003e 58.3.0 (#1522, also picks up 58.3.0)\n- arrow-schema 58.1.0 -\u003e 58.3.0 (#1523, also picks up 58.3.0)\n- datafusion 53.0.0 -\u003e 53.1.0 (#1515)\n- datafusion-catalog 53.0.0 -\u003e 53.1.0 (#1513)\n- datafusion-ffi 53.0.0 -\u003e 53.1.0 (#1512)\n- datafusion-proto 53.0.0 -\u003e 53.1.0 (#1511)\n- datafusion-common 53.0.0 -\u003e 53.1.0 (#1510)\n- mimalloc 0.1.48 -\u003e 0.1.50 (#1514)\n- uuid 1.23.0 -\u003e 1.23.1 (#1508)\n- rustls-webpki 0.103.10 -\u003e 0.103.13 (#1506)\n- rand 0.9.2 -\u003e 0.9.4 (#1495)\n\nGitHub Actions:\n- github/codeql-action 4.32.5 -\u003e 4.35.4 (#1531)\n- astral-sh/setup-uv 7.3.1 -\u003e 8.1.0 (#1500)\n\nEach individual PR was passing CI before this combined bump.\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "db22a9240e5832d9a88b74c640e3b60abfbe52c1",
      "tree": "995fb0241ff6c61d7ab14234e8068b92c71ee47a",
      "parents": [
        "13b2c47b0d5e348cea24b9264e87fd67666c56f1"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed May 06 15:06:53 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed May 06 15:06:53 2026 -0400"
      },
      "message": "docs: add upstream sync process documentation (#1524)\n\n* docs: add upstream sync process documentation\n\nDocument the three-PR workflow used to sync to a newer upstream\napache/datafusion version: bump crate deps + fix breakage, consolidate\ntransitive deps, then fill API and documentation gaps via\n/check-upstream. Cross-reference from dev/release/README.md.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: add audit-skill-md skill\n\nNew AI agent skill at .ai/skills/audit-skill-md/SKILL.md to keep the\nuser-facing skills/datafusion_python/SKILL.md in sync with the public\nPython API. Audits SessionContext, DataFrame, Expr, and functions\nsurfaces for new APIs not covered, stale mentions, examples that drifted\nfrom idiomatic style, and missing version notes. Wired into PR 3 of the\nupstream sync workflow documented in dev/release/upstream-sync.md.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: verify upstream sync completed before release\n\nAdd a checklist item to \"Preparing the main Branch\" pointing release\nmanagers at dev/release/upstream-sync.md so the crate bump, dependency\nconsolidation, and /check-upstream and /audit-skill-md passes are\nconfirmed done before the release branch is cut.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: scope upstream sync cargo update to datafusion family\n\nReplace `cargo update -p datafusion` with an explicit multi-`-p`\ninvocation listing every `datafusion-*` workspace dependency, so PR 1\nof the upstream-sync workflow refreshes only the datafusion family\nand leaves other transitives for PR 2 to consolidate.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: correct datafusion-* pin location in upstream sync\n\nPR 1 step 1 incorrectly stated downstream `datafusion-*` crates are\npinned in `crates/core/Cargo.toml`. Pins live in the root\n`[workspace.dependencies]`; per-crate manifests inherit via\n`workspace \u003d true`. Reword step 1 to point at the right file.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: restore workspace.package version bump in upstream sync\n\nPR 1 step 1 must also bump `[workspace.package].version` because the\n`datafusion-python` major version tracks the upstream `datafusion`\nmajor. The previous reword dropped that instruction. Reinstate it\nalongside the `[workspace.dependencies]` updates.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: align audit-skill-md description with body version phrasing\n\nFrontmatter description referenced \"requires upstream DataFusion vX\",\nbut the body of the skill settles on the `datafusion-python NN` form\n(consistent with the package/upstream-major equivalence). Switch the\ndescription to match so the skill speaks one language end to end.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: fold make-pythonic step into PR 3 of upstream sync\n\nAudit-skill-md documents the order\n`/check-upstream` -\u003e `/make-pythonic` (optional) -\u003e `/audit-skill-md`,\nbut PR 3 of the upstream-sync workflow only listed the first and last.\nInsert the make-pythonic pass as step 3 so signatures get aligned\nbefore the SKILL.md audit, avoiding example churn. Drops the orphan\ntrailing paragraph in favor of inline guidance on when to defer\nlarger reshapes to their own PR.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: drop literal Cargo.toml version from audit-skill-md inputs\n\nReplace literal `version \u003d \"53.0.0\"` example with a pointer to the\n`[workspace.package]` field plus an `NN.0.0` placeholder so the skill\nprose does not drift each major bump.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "13b2c47b0d5e348cea24b9264e87fd67666c56f1",
      "tree": "14579478ae62f6a4e95898402a17c5392dd7722b",
      "parents": [
        "c657dad97349c1113e843e3e15bb41f865e65a97"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Sun May 03 10:52:10 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun May 03 10:52:10 2026 -0400"
      },
      "message": "Update user documentation for AI agent skill usage (#1505)\n\n* docs: publish SKILL.md on the docs site via myst include\n\nAdds a new `skill` page that embeds the repo-root `SKILL.md` through the\nmyst `{include}` directive, so the agent-facing guide lives on the\npublished docs site without duplication. The page is wired into the\nUser Guide toctree. Implements PR 4a of the plan in #1394.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: publish llms.txt at docs site root\n\nAdds `docs/source/llms.txt` in llmstxt.org schema: a short description\nplus categorized links to the agent skill, user guide pages, DataFrame\nAPI reference, and example queries. `html_extra_path` in `conf.py`\ncopies it verbatim to the published site root so it resolves at\n`https://datafusion.apache.org/python/llms.txt`. Implements PR 4b of\nthe plan in #1394.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: add write-dataframe-code contributor skill\n\nAdds `.ai/skills/write-dataframe-code/SKILL.md`, a contributor-facing\nskill for agents working on this repo. It layers on top of the\nuser-facing repo-root SKILL.md with:\n\n- a TPC-H pattern index mapping idiomatic API usages to the query file\n  that demonstrates them,\n- an ad-hoc plan-comparison workflow for checking DataFrame translations\n  against a reference SQL query via `optimized_logical_plan()`, and\n- the project-specific docstring and aggregate/window documentation\n  conventions that CLAUDE.md already enforces for contributors.\n\nImplements PR 4c of the plan in #1394.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: add audit-skill-md skill\n\nAdds `.ai/skills/audit-skill-md/SKILL.md`, a contributor skill that\ncross-references the repo-root `SKILL.md` against the current public\nPython API (functions module, DataFrame, Expr, SessionContext, and\npackage-root re-exports). Reports two classes of drift:\n\n- new APIs exposed by the Python surface that are not yet covered in\n  the user-facing guide, and\n- stale mentions in the guide that no longer exist in the public API.\n\nThe skill is diff-only — it produces a report the user reviews before\nany edit to `SKILL.md`. Complements `check-upstream/`, which audits in\nthe opposite direction (upstream Rust features not yet exposed).\n\nImplements PR 4d of the plan in #1394.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: enrich RST pages with demos relocated from TPC-H rewrite\n\nMoves the illustrative patterns that #1504 removed from the TPC-H\nexamples into the common-operations docs, where they serve as\npattern-focused teaching material without cluttering the TPC-H\ntranslations:\n\n- expressions.rst gains a \"Testing membership in a list\" section\n  comparing `|`-compound filters, `in_list`, and `array_position` +\n  `make_array`, plus a \"Conditional expressions\" section contrasting\n  switched and searched `case`.\n- udf-and-udfa.rst gains a \"When not to use a UDF\" subsection\n  showing the compound-OR predicate that replaces a Python-side UDF\n  for disjunctive bucket filters (the Q19 case).\n- aggregations.rst gains a \"Building per-group arrays\" subsection\n  covering `array_agg(filter\u003d..., distinct\u003dTrue)` with\n  `array_length`/`array_element` for the single-value-per-group\n  pattern (the Q21 case).\n- Adds `examples/array-operations.py`, a runnable end-to-end\n  walkthrough of the membership and array_agg patterns.\n\nImplements PR 4e of the plan in #1394.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: wire new contributor skills and plan-comparison diagnostic into AGENTS.md\n\n- List the three contributor skills (`check-upstream`,\n  `write-dataframe-code`, `audit-skill-md`) under the Skills section so\n  agents know what tools they have before starting work.\n- Document the plan-comparison diagnostic workflow (comparing\n  `ctx.sql(...).optimized_logical_plan()` against a DataFrame\u0027s\n  `optimized_logical_plan()` via `LogicalPlan.__eq__`) for translating\n  SQL queries to DataFrame form. Points at the full write-up in the\n  `write-dataframe-code` skill rather than duplicating it.\n\n`CLAUDE.md` is a symlink to `AGENTS.md`, so the change lands in both.\n\nImplements PR 4f of the plan in #1394.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: rename aggregations.rst demo df to orders_df to avoid clobbering state\n\nThe \"Building per-group arrays\" block added in the previous commit\nreassigned `df` and `ctx` mid-page, which then broke the\nGrouping Sets examples further down that share the Pokemon `df`\nbinding (`col_type_1` etc. no longer resolved). Rename the demo\nDataFrame to `orders_df` and drop the redundant `ctx \u003d SessionContext()`\nso the shared state from the top of the page stays intact.\n\nVerified with `sphinx-build -W --keep-going` against the full docs\ntree.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: replace raw SKILL.md include with a human-written AI-assistants page\n\nThe previous approach embedded the repo-root `SKILL.md` on the docs\nsite via a myst `{include}`. That file is written for agents -- dense,\nskill-formatted, and not suited to a human browsing the User Guide. It\nalso relied on a fragile `:start-line:` offset to strip YAML\nfrontmatter.\n\nReplace it with `docs/source/ai-coding-assistants.md`, a short\nhuman-readable page that mirrors the README section added in #1503:\nwhat the skill is, how to install it via `npx skills` or a manual\npointer, and what kinds of things it covers. `SKILL.md` stays at the\nrepo root as the single source of truth; agents fetch the raw GitHub\nURL directly.\n\n`llms.txt` is updated to point its Agent Guide entry at\n`raw.githubusercontent.com/.../SKILL.md` and to include the new\nhuman-readable page as a secondary link. The User Guide toctree now\nreferences `ai-coding-assistants` in place of the removed `skill`\nstub.\n\nVerified with `sphinx-build -W --keep-going` against the full docs\ntree.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: drop redundant assistants list in ai-coding-assistants intro\n\nThe introduction and the \"Installing the skill\" section both enumerated\nthe same set of supported assistants. Drop the intro copy; the list\nthat matters is next to `npx skills add`, where it answers \"what does\nthis command actually configure?\"\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: convert ai-coding-assistants page from markdown to rst, shorten title\n\nEvery other page in `docs/source/user-guide` and the top-level\n`docs/source` is written in reStructuredText; the lone `.md` page was\nan inconsistency. Rewrite in rst so the ASF header matches the rest of\nthe tree, cross-references can use `:py:func:` roles if we ever add\nany, and myst is no longer required just to render this one page.\n\nAlso shorten the page title from \"Using DataFusion with AI Coding\nAssistants\" to \"Using AI Coding Assistants\" -- it already sits under\nthe DataFusion user guide so the product name is redundant.\n\nVerified with `sphinx-build -W --keep-going`.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: drop audit-skill-md skill\n\nThe skill as written pushed for every public method to be mentioned\nin `SKILL.md`, which is the wrong goal. `SKILL.md` is a distilled\nagent guide of idiomatic patterns and pitfalls, not an API reference\n-- autoapi-generated docs and module docstrings already provide full\nper-method coverage. An audit pressing for 100% method coverage would\nbloat the skill file into a stale copy of that reference.\n\nThe two checks with actual value (stale mentions in `SKILL.md`, and\ndrift between `functions.__all__` and the categorized function list)\nare small enough to be ad-hoc greps at release time and do not\nwarrant a dedicated skill.\n\nAlso remove references to the skill from `AGENTS.md` and the\n`write-dataframe-code` skill\u0027s \"Related\" section.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: drop write-dataframe-code skill\n\nA separate PR covers the same contributor-facing material (TPC-H\npattern index, plan-comparison workflow, docstring conventions),\nso this skill is redundant. Remove the skill directory and the\ncorresponding references in `AGENTS.md`, including the\nplan-comparison section that pointed at it.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: show Parquet pushdown plan diff in \"When not to use a UDF\"\n\nThe previous version of the section asserted that a UDF predicate\nblocks optimizer rewrites but did not show evidence. Replace the two\n`code-block` examples with an executable walkthrough that writes a\nsmall Parquet file, runs the same filter two ways, and prints the\nphysical plan for each.\n\nThe native-expression plan renders with three annotations on the\n`DataSourceExec` node that the UDF plan does not have:\n\n- `predicate\u003dbrand@1 \u003d A AND qty@2 \u003e\u003d 150` pushed into the scan\n- `pruning_predicate\u003d... brand_min@0 \u003c\u003d A AND ... qty_max@4 \u003e\u003d 150`\n  for row-group pruning via Parquet footer min/max stats\n- `required_guarantees\u003d[brand in (A)]` for bloom-filter / dictionary\n  skipping\n\nThe UDF form keeps only `predicate\u003dbrand_qty_filter(...)`: the scan\nhas to materialize every row group and call the Python callback.\n\nThe disjunctive-OR rewrite (previously the main example) stays at the\nend as the idiomatic alternative for multi-bucket filters.\n\nVerified with `sphinx-build -W --keep-going`.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: rework \"subsets within a group\" aggregation example\n\nRename the section from \"Building per-group arrays\" to \"Comparing subsets\nwithin a group\" so the heading matches the content. Rewrite the intro to\nlead with the problem (compare full group vs filtered subset), reframe\nthe worked example around partially failed orders, and replace the\ntrailing bullet list with a one-line walkthrough of the result.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: clarify \"When not to use a UDF\" intro\n\nRewrite the opening of the section to make three things clearer: the\ncontrast is with native DataFusion expressions (not Python in general),\nsome predicates genuinely feel easier to write as a Python loop and that\ntension is worth acknowledging, and predicate pushdown is a table-provider\nmechanism rather than a Parquet-only feature. Parquet stays as the\nconcrete demo.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: move ai-coding-assistants under user-guide/\n\nThe page was sitting at the top level of docs/source/ while every other\npage in the USER GUIDE toctree lives under docs/source/user-guide/.\nMove the file, update the toctree entry, and update the absolute URL\nin llms.txt to match the new path.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: replace AGENTS.md skill list with discovery instructions\n\nA static skill list in AGENTS.md goes stale as new skills are added\n(it already missed the make-pythonic skill that was merged separately).\nReplace the enumerated list with a pointer telling agents to list\n.ai/skills/ and read each SKILL.md frontmatter, so the catalog never\nhas to be hand-maintained.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: fix broken llms.txt link and stale otherwise xref\n\n- ai-coding-assistants.rst: use absolute https://datafusion.apache.org/python/llms.txt URL; the relative `llms.txt` resolved to /python/user-guide/llms.txt and 404\u0027d because html_extra_path publishes the file at the site root.\n- expressions.rst: drop the broken `:py:meth:~datafusion.expr.Expr.otherwise` xref (otherwise lives on CaseBuilder, not Expr) and spell the recommended replacement as `f.when(f.in_list(...), value).otherwise(default)`.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: update SKILL.md path after move to skills/datafusion_python/\n\nUpstream #1519 moved the root `SKILL.md` to `skills/datafusion_python/SKILL.md`\nso that consumers can install the skill without cloning the whole repo. Update\nall repo-internal links and external GitHub URLs in the docs site, README,\nAGENTS.md, and the package docstring to point at the new location.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "c657dad97349c1113e843e3e15bb41f865e65a97",
      "tree": "b2ee2e39947e7431734591fc11fdf8ad8a4049a9",
      "parents": [
        "e0284c6e788b6fc893495ed929b9badef1cf925c"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Wed Apr 29 07:35:21 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Apr 29 07:35:21 2026 -0400"
      },
      "message": "Move public skills to a directory to avoid downloading the whole repo (#1519)\n\n* Move to skill directory\n\n* Avoid moved skill with test"
    },
    {
      "commit": "e0284c6e788b6fc893495ed929b9badef1cf925c",
      "tree": "fa186f6dc053173f57633a247c3e59b6037fe262",
      "parents": [
        "03577163a057f791b19f30ce5130464a4a1c78a4"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Apr 24 13:09:24 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Apr 24 13:09:24 2026 -0400"
      },
      "message": "feat: add AI skill to find and improve the Pythonic interface to functions (#1484)\n\n* feat: accept native Python types in function arguments instead of requiring lit()\n\nUpdate 47 functions in functions.py to accept native Python types (int, float,\nstr) for arguments that are contextually literals, eliminating verbose lit()\nwrapping. For example, users can now write split_part(col(\"a\"), \",\", 2) instead\nof split_part(col(\"a\"), lit(\",\"), lit(2)). All changes are backward compatible.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix: update alias function signatures to match pythonic primary functions\n\nUpdate instr and position (aliases of strpos) to accept Expr | str for\nthe substring parameter, matching the updated primary function signature.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: update make-pythonic skill to require alias type hint updates\n\nAlias functions that delegate to a primary function must have their type\nhints updated to match, even though coercion logic is only added to the\nprimary. Added a new Step 3 to the implementation workflow for this.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix: address review feedback on pythonic skill and function signatures\n\nUpdate SKILL.md to prevent three classes of issues: clarify that float\nalready accepts int per PEP 484 (avoiding redundant int | float that\nfails ruff PYI041), add backward-compat rule for Category B so existing\nExpr params aren\u0027t removed, and add guidance for inline coercion with\nmany optional nullable params instead of local helpers.\n\nReplace regexp_instr\u0027s _to_raw() helper with inline coercion matching\nthe pattern used throughout the rest of the file.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* refactor: add coerce_to_expr helpers and replace inline coercion patterns\n\nIntroduce coerce_to_expr() and coerce_to_expr_or_none() in expr.py as the\ncomplement to ensure_expr() — where ensure_expr rejects non-Expr values,\nthese helpers wrap them via Expr.literal(). Replaces ~60 inline isinstance\nchecks in functions.py with single-line helper calls, and updates the\nmake-pythonic skill to document the new pattern.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: add aggregate function literal detection to make-pythonic skill\n\nAdd Technique 1a to detect literal-only arguments in aggregate functions.\nUnlike scalar UDFs which enforce literals in invoke_with_args(), aggregate\nfunctions enforce them in accumulator() via get_scalar_value(),\nvalidate_percentile_expr(), or downcast_ref::\u003cLiteral\u003e(). Without this\ntechnique, the skill would incorrectly classify arguments like\napprox_percentile_cont\u0027s percentile as Category A (Expr | float) when they\nshould be Category B (float only). Updates the decision flow to branch on\nscalar vs aggregate before checking for literal enforcement.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: add window function literal detection to make-pythonic skill\n\nAdd Technique 1b to detect literal-only arguments in window functions.\nWindow functions enforce literals in partition_evaluator() via\nget_scalar_value_from_args() / downcast_ref::\u003cLiteral\u003e(), not in\ninvoke_with_args() (scalar) or accumulator() (aggregate). Updates the\ndecision flow to branch on scalar vs aggregate vs window.\n\nKnown window functions with literal-only arguments: ntile (n), lead/lag\n(offset, default_value), nth_value (n).\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix: use explicit None checks, widen numeric type hints, and add tests\n\nReplace 7 fragile truthiness checks (x.expr if x else None) with\nexplicit is not None checks to prevent silent None when zero-valued\nliterals are passed. Widen log/power/pow type hints to Expr | int | float\nwith noqa: PYI041 for clarity. Add unit tests for coerce_to_expr helpers\nand integration tests for pythonic calling conventions.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* chore: suppress FBT003 in tests and remove redundant noqa comments\n\nAdd FBT003 (boolean positional value) to the per-file-ignores for\npython/tests/* in pyproject.toml, and remove the 6 now-redundant\ninline noqa: FBT003 comments across test_expr.py and test_context.py.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* docs: replace static function lists with discovery instructions in skill\n\nReplace hardcoded \"Known aggregate/window functions with literal-only\narguments\" lists with instructions to discover them dynamically by\nsearching the upstream crate source. Keeps a few examples as validation\nanchors so the agent knows its search is working correctly.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* fix: make interrupt test reliable on Python 3.11\n\nPyThreadState_SetAsyncExc only delivers exceptions when the thread is\nexecuting Python bytecode, not while in native (Rust/C) code. The\nprevious test had two issues causing flakiness on Python 3.11:\n\n1. The interrupt fired before df.collect() entered the UDF, while the\n   thread was still in native code where async exceptions are ignored.\n2. time.sleep(2.0) is a single C call where async exceptions are not\n   checked — they\u0027re only checked between bytecode instructions.\n\nFix by adding a threading.Event so the interrupt waits until the UDF is\nactually executing Python code, and by sleeping in small increments so\nthe eval loop has opportunities to check for pending exceptions.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "03577163a057f791b19f30ce5130464a4a1c78a4",
      "tree": "2b02d946a60edddb9df94a0a9438b3e734acaf39",
      "parents": [
        "c8bb9f7d3876de97141d204740a6b99d5facd10f"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Apr 24 11:47:06 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Apr 24 11:47:06 2026 -0400"
      },
      "message": "tpch examples: rewrite queries idiomatically and embed reference SQL (#1504)\n\n* tpch examples: add reference SQL to each query, fix Q20\n\n- Append the canonical TPC-H reference SQL (from benchmarks/tpch/queries/)\n  to each q01..q22 module docstring so readers can compare the DataFrame\n  translation against the SQL at a glance.\n- Fix Q20: `df \u003d df.filter(col(\"ps_availqty\") \u003e lit(0.5) * col(\"total_sold\"))`\n  was missing the assignment so the filter was dropped from the pipeline.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* tpch examples: rewrite non-idiomatic queries in idiomatic DataFrame form\n\nRewrite the seven TPC-H example queries that did not demonstrate the\nidiomatic DataFrame pattern. The remaining queries (Q02/Q11/Q15/Q17/Q22,\nwhich use window functions in place of correlated subqueries) already are\nidiomatic and are left unchanged.\n\n- Q04: replace `.aggregate([col(\"l_orderkey\")], [])` with\n  `.select(\"l_orderkey\").distinct()`, which is the natural way to express\n  \"reduce to one row per order\" on a DataFrame.\n- Q07: remove the CASE-as-filter on `n_name` and use\n  `F.in_list(col(\"n_name\"), [nation_1, nation_2])` instead. Drops a\n  comment block that admitted the filter form was simpler.\n- Q08: rewrite the switched CASE `F.case(...).when(lit(False), ...)` as a\n  searched `F.when(col(...).is_not_null(), ...).otherwise(...)`. That\n  mirrors the reference SQL\u0027s `case when ... then ... else 0 end` shape.\n- Q12: replace `array_position(make_array(...), col)` with\n  `F.in_list(col(\"l_shipmode\"), [...])`. Same semantics, without routing\n  through array construction / array search.\n- Q19: remove the pyarrow UDF that re-implemented a disjunctive predicate\n  in Python. Build the same predicate in DataFusion by OR-combining one\n  `in_list` + range-filter expression per brand. Keeps the per-brand\n  constants in the existing `items_of_interest` dict.\n- Q20: use `F.starts_with` instead of an explicit substring slice. Replace\n  the inner-join + `select(...).distinct()` tail with a semi join against\n  a precomputed set of excess-quantity suppliers so the supplier columns\n  are preserved without deduplication after the fact.\n- Q21: replace the `array_agg` / `array_length` / `array_element` pipeline\n  with two semi joins. One semi join keeps orders with more than one\n  distinct supplier (stand-in for the reference SQL\u0027s `exists` subquery),\n  the other keeps orders with exactly one late supplier (stand-in for the\n  `not exists` subquery).\n\nAll 22 answer-file comparisons and 22 plan-comparison diagnostics still\npass (`pytest examples/tpch/_tests.py`: 44 passed).\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* tpch examples: align reference SQL constants with DataFrame queries\n\nThe reference SQL embedded in each q01..q22 module docstring was carried\nover verbatim from ``benchmarks/tpch/queries/`` and uses a different set\nof TPC-H substitution parameters than the DataFrame examples\n(answer-file-validated at scale factor 1). Update each reference SQL to\nuse the substitution parameters the DataFrame uses, so both expressions\ndescribe the same query and would produce the same results against the\nsame data.\n\nConstants aligned:\n\n- Q01: ``90 days`` cutoff (DataFrame ``DAYS_BEFORE_FINAL \u003d 90``).\n- Q02: ``p_size \u003d 15``, ``p_type like \u0027%BRASS\u0027``, ``r_name \u003d \u0027EUROPE\u0027``.\n- Q04: base date ``1993-07-01`` (``3 month`` interval preserved per the\n  \"quarter of a year\" wording).\n- Q05: ``r_name \u003d \u0027ASIA\u0027``.\n- Q06: ``l_discount between 0.06 - 0.01 and 0.06 + 0.01``.\n- Q07: nations ``\u0027FRANCE\u0027`` / ``\u0027GERMANY\u0027``.\n- Q08: ``r_name \u003d \u0027AMERICA\u0027``, ``p_type \u003d \u0027ECONOMY ANODIZED STEEL\u0027``,\n  inner-case ``nation \u003d \u0027BRAZIL\u0027``.\n- Q09: ``p_name like \u0027%green%\u0027``.\n- Q10: base date ``1993-10-01`` (``3 month`` interval preserved).\n- Q11: ``n_name \u003d \u0027GERMANY\u0027``.\n- Q12: ship modes ``(\u0027MAIL\u0027, \u0027SHIP\u0027)``, base date ``1994-01-01``.\n- Q13: ``o_comment not like \u0027%special%requests%\u0027``.\n- Q14: base date ``1995-09-01``.\n- Q15: base date ``1996-01-01``.\n- Q16: ``p_brand \u003c\u003e \u0027Brand#45\u0027``, ``p_type not like \u0027MEDIUM POLISHED%\u0027``,\n  sizes ``(49, 14, 23, 45, 19, 3, 36, 9)``.\n- Q17: ``p_brand \u003d \u0027Brand#23\u0027``, ``p_container \u003d \u0027MED BOX\u0027``.\n- Q18: ``sum(l_quantity) \u003e 300``.\n- Q19: brands ``Brand#12`` / ``Brand#23`` / ``Brand#34`` with the matching\n  minimum quantities (1, 10, 20).\n- Q20: ``p_name like \u0027forest%\u0027``, base date ``1994-01-01``,\n  ``n_name \u003d \u0027CANADA\u0027``.\n- Q21: ``n_name \u003d \u0027SAUDI ARABIA\u0027``.\n- Q22: country codes ``(\u002713\u0027, \u002731\u0027, \u002723\u0027, \u002729\u0027, \u002730\u0027, \u002718\u0027, \u002717\u0027)``.\n\nInterval units (month / year) are preserved where the problem-statement\ntext reads \"given quarter\", \"given year\", \"given month\". Q01 keeps the\nliteral \"days\" unit because the TPC-H problem statement itself describes\nthe cutoff in days.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* tpch examples: apply SKILL.md idioms across all 22 queries\n\nSweep every q01..q22 example for idiomatic DataFrame style as described in\nthe repo-root SKILL.md:\n\n- ``col(\"x\") \u003d\u003d \"s\"`` in place of ``col(\"x\") \u003d\u003d lit(\"s\")`` on comparison\n  right-hand sides (auto-wrap applies).\n- Plain-name strings in ``select``/``aggregate``/``sort`` group/sort key\n  lists when the key is a bare column.\n- Drop redundant ``how\u003d\"inner\"`` and single-element ``left_on``/``right_on``\n  list wrapping on equi-joins.\n- Collapse chained ``.filter(a).filter(b)`` runs into ``.filter(a, b)``\n  and chained ``.with_column`` runs into ``.with_columns(a\u003d..., b\u003d...)``.\n- ``df.sort_by(...)`` or plain-name ``df.sort(...)`` when no null-placement\n  override is needed.\n- ``F.count_star()`` in place of ``F.count(col(\"x\"))`` whenever the SQL\n  reads ``count(*)``.\n- ``F.starts_with(col, lit(prefix))`` and ``~F.starts_with(...)`` in place\n  of substring-prefix equality/inequality tricks.\n- ``F.in_list(col, [lit(...)])`` in place of ``~F.array_position(...).\n  is_null()`` and in place of disjunctions of equality comparisons.\n- Searched ``F.when(cond, x).otherwise(y)`` in place of switched\n  ``F.case(bool_expr).when(lit(True/False), x).end()`` forms.\n- Semi-joins as the DataFrame form of ``EXISTS`` (Q04); anti-joins as\n  ``NOT EXISTS`` (Q22 was already using this idiom).\n- Whole-frame window aggregates as the DataFrame stand-in for a SQL\n  scalar subquery (Q11/Q15/Q17/Q22).\n\nIndividual query fixes of note:\n\n- Q16 — add the secondary sort keys (``p_brand``, ``p_type``, ``p_size``)\n  that the TPC-H spec requires but the original DataFrame omitted.\n- Q22 — drop a stray ``df.show()`` mid-pipeline; replace the 0-based\n  substring slice with ``F.left(col(\"c_phone\"), lit(2))``.\n- Q14 — rewrite the promo/non-promo factor split as a searched CASE inside\n  ``F.sum(...)`` so the DataFrame expression matches the reference SQL\n  shape exactly.\n\nAll 22 answer-file comparisons still pass at scale factor 1.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* tpch examples: more idiomatic aggregate FILTER, string funcs, date handling\n\nAdditional sweep of the TPC-H DataFrame examples informed by comparing\nagainst a fresh set of SKILL.md-only generations under\n``examples/tpch/agentic_queries/``:\n\n- Q02: ``F.ends_with(col(\"p_type\"), lit(TYPE_OF_INTEREST))`` in place of\n  ``F.strpos(col, lit) \u003e 0``. The reference SQL is ``p_type like \u0027%BRASS\u0027``,\n  which is an ends_with check, not contains. ``F.strpos \u003e 0`` returned the\n  correct rows on TPC-H data by coincidence but is semantically wrong.\n- Q09: ``F.contains(col(\"p_name\"), lit(part_color))`` in place of\n  ``F.strpos(col, lit) \u003e 0``. The SQL is ``p_name like \u0027%green%\u0027``.\n- Q08, Q12, Q14: use the ``filter`` keyword on ``F.sum`` / ``F.count`` —\n  the DataFrame form of SQL ``sum(...) FILTER (WHERE ...)`` — instead of\n  wrapping the aggregate input in ``F.when(cond, x).otherwise(0)``. Q08\n  also reorganises to inner-join the supplier\u0027s nation onto the regional\n  sales, which removes the previous left-join + ``F.when(is_not_null, ...)``\n  dance.\n- Q15: compute the grand maximum revenue as a separate scalar aggregate\n  and ``join_on(...)`` on equality, instead of the whole-frame window\n  ``F.max`` + filter shape. Simpler plan, same result.\n- Q16: ``F.regexp_like(col, pattern)`` in place of\n  ``F.regexp_match(col, pattern).is_not_null()``.\n- Q04, Q05, Q06, Q07, Q08, Q10, Q12, Q14, Q15, Q20: store both the start\n  and the end of the date window as plain ``datetime.date`` objects and\n  compare with ``lit(end_date)``, instead of carrying the start date +\n  ``pa.month_day_nano_interval`` and adding them at query-build time.\n  Drops unused ``pyarrow`` imports from the files that no longer need\n  Arrow scalars.\n\nAll 22 answer-file comparisons still pass at scale factor 1.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "c8bb9f7d3876de97141d204740a6b99d5facd10f",
      "tree": "e8b2602165c4dc1e7208d56b3fc45f5d6332573e",
      "parents": [
        "8741d30cd812e4668f3f9187b56f12ce2de0d6e7"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Apr 24 07:57:11 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Apr 24 07:57:11 2026 -0400"
      },
      "message": "docs: add README section for AI coding assistants (#1503)\n\nPoints users to the repo-root SKILL.md via the npx skills registry or a\nmanual AGENTS.md / CLAUDE.md pointer. Implements PR 1c of the plan in #1394.\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "8741d30cd812e4668f3f9187b56f12ce2de0d6e7",
      "tree": "d7fa7ec580a8ce057c76934d8a7ef01f3e710ce2",
      "parents": [
        "8a5d783c7e418bfbbd95e48a2d9cacafea6162c7"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Thu Apr 23 22:01:01 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Apr 23 22:01:01 2026 -0400"
      },
      "message": "docs: enrich module docstrings and add doctest examples (#1498)\n\n* Enrich module docstrings and add doctest examples\n\nExpands the module docstrings for `functions.py`, `dataframe.py`,\n`expr.py`, and `context.py` so each module opens with a concept summary,\ncross-references to related APIs, and a small executable example.\n\nAdds doctest examples to the high-traffic `DataFrame` methods that\npreviously lacked them: `select`, `aggregate`, `sort`, `limit`, `join`,\nand `union`. Optional parameters are demonstrated with keyword syntax,\nand examples reuse the same input data across variants so the effect of\neach option is easy to see.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Use distinct group sums in aggregate docstring example\n\nChange the score data from [1, 2, 3] to [1, 2, 5] so the grouped\nresult produces [3, 5] instead of [3, 3], removing ambiguity about\nwhich total belongs to which team.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Align module-docstring examples with SKILL.md idioms\n\nDrop the redundant lit() in the dataframe.py module-docstring filter\nexample and use a plain string group key in the aggregate() doctest, so\nboth examples model the style SKILL.md recommends. Also document the\nsort(\"a\") string form and sort_by() shortcut in SKILL.md\u0027s sorting\nsection.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "8a5d783c7e418bfbbd95e48a2d9cacafea6162c7",
      "tree": "08f42ac0dedff8563163d297b8c6d13c95d97eba",
      "parents": [
        "40309978c920bd123a4c7b764a2ddfdb97758607"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Thu Apr 23 19:05:06 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Apr 23 19:05:06 2026 -0400"
      },
      "message": "Skills require the header to be the first thing in the file which conflicts with the RAT check. Make an exception for this file. (#1501)"
    },
    {
      "commit": "40309978c920bd123a4c7b764a2ddfdb97758607",
      "tree": "c70a137ffbb797b52787dac6f858e97af8de001f",
      "parents": [
        "60d8b5dbb5e409cd9ce7692972420e955b8a802e"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Thu Apr 23 18:28:55 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Apr 23 18:28:55 2026 -0400"
      },
      "message": "Add SKILL.md and enrich package docstring (#1497)\n\n* Add AGENTS.md and enrich __init__.py module docstring\n\nAdd python/datafusion/AGENTS.md as a comprehensive DataFrame API guide\nfor AI agents and users. It ships with pip automatically (Maturin includes\neverything under python-source \u003d \"python\"). Covers core abstractions,\nimport conventions, data loading, all DataFrame operations, expression\nbuilding, a SQL-to-DataFrame reference table, common pitfalls, idiomatic\npatterns, and a categorized function index.\n\nEnrich the __init__.py module docstring from 2 lines to a full overview\nwith core abstractions, a quick-start example, and a pointer to AGENTS.md.\n\nCloses #1394 (PR 1a)\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Clarify audience of root vs package AGENTS.md\n\nThe root AGENTS.md (symlinked as CLAUDE.md) is for contributors working\non the project. Add a pointer to python/datafusion/AGENTS.md which is\nthe user-facing DataFrame API guide shipped with the package. Also add\nthe Apache license header to the package AGENTS.md.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add PR template and pre-commit check guidance to AGENTS.md\n\nDocument that all PRs must follow .github/pull_request_template.md and\nthat pre-commit hooks must pass before committing. List all configured\nhooks (actionlint, ruff, ruff-format, cargo fmt, cargo clippy, codespell,\nuv-lock) and the command to run them manually.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Remove duplicated hook list from AGENTS.md\n\nLet the hooks be discoverable from .pre-commit-config.yaml rather than\nmaintaining a separate list that can drift.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Fix AGENTS.md: Arrow C Data Interface, aggregate filter, fluent example\n\n- Clarify that DataFusion works with any Arrow C Data Interface\n  implementation, not just PyArrow.\n- Show the filter keyword argument on aggregate functions (the idiomatic\n  HAVING equivalent) instead of the post-aggregate .filter() pattern.\n- Update the SQL reference table to show FILTER (WHERE ...) syntax.\n- Remove the now-incorrect \"Aggregate then filter for HAVING\" pitfall.\n- Add .collect() to the fluent chaining example so the result is clearly\n  materialized.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Update agents file after working through the first tpc-h query using only the text description\n\n* Add feedback from working through each of the TPC-H queries\n\n* Address Copilot review feedback on AGENTS.md\n\n- Wrap CASE/WHEN method-chain examples in parentheses and assign to a\n  variable so they are valid Python as shown (Copilot #1, #2).\n- Fix INTERSECT/EXCEPT mapping: the default distinct\u003dFalse corresponds to\n  INTERSECT ALL / EXCEPT ALL, not the distinct forms. Updated both the\n  Set Operations section and the SQL reference table to show both the\n  ALL and distinct variants (Copilot #4).\n- Change write_parquet / write_csv / write_json examples to file-style\n  paths (output.parquet, etc.) to match the convention used in existing\n  tests and examples. Note that a directory path is also valid for\n  partitioned output (Copilot #5).\n\nVerified INTERSECT/EXCEPT semantics with a script:\n  df1.intersect(df2)                -\u003e [1, 1, 2]  (\u003d INTERSECT ALL)\n  df1.intersect(df2, distinct\u003dTrue) -\u003e [1, 2]     (\u003d INTERSECT)\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Use short-form comparisons in AGENTS.md examples\n\nDrop lit() on the RHS of comparison operators since Expr auto-wraps raw\nPython values, matching the style the guide recommends (Copilot #3, #6).\n\nUpdates examples in the Aggregation, CASE/WHEN, SQL reference table,\nCommon Pitfalls, Fluent Chaining, and Variables-as-CTEs sections, plus\nthe __init__.py quick-start snippet. Prose explanations of the rule\n(which cite the long form as the thing to avoid) are left unchanged.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Move user guide from python/datafusion/AGENTS.md to SKILL.md\n\nThe in-wheel AGENTS.md was not a real distribution channel -- no shipping\nagent walks site-packages for AGENTS.md files. Moving to SKILL.md at the\nrepo root, with YAML frontmatter, lets the skill ecosystems (npx skills,\nClaude Code plugin marketplaces, community aggregators) discover it.\n\nUpdate the pointers in the contributor AGENTS.md and the __init__.py\nmodule docstring accordingly. The docstring now references the GitHub\nURL since the file no longer ships with the wheel.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Address review feedback: doctest, streaming, date/timestamp\n\n- Convert the __init__.py quick-start block to doctest format so it is\n  picked up by `pytest --doctest-modules` (already the project default),\n  preventing silent rot.\n- Extract streaming into its own SKILL.md subsection with guidance on\n  when to prefer execute_stream() over collect(), sync and async\n  iteration, and execute_stream_partitioned() for per-partition streams.\n- Generalize the date-arithmetic rule from Date32 to both Date32 and\n  Date64 (both reject Duration at any precision, both accept\n  month_day_nano_interval), and note that Timestamp columns differ and\n  do accept Duration.\n- Document the PyArrow-inherited type mapping returned by\n  to_pydict()/to_pylist(), including the nanosecond fallback to\n  pandas.Timestamp / pandas.Timedelta and the to_pandas() footgun where\n  date columns come back as an object dtype.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Distinguish user guide from agent reference in module docstring\n\nThe docstring pointed readers at SKILL.md as a \"comprehensive guide,\" but\nSKILL.md is written in a dense, skill-oriented format for agents — humans\nare better served by the online user guide. Put the online docs first as\nthe primary reference and label the SKILL.md link as the agent reference.\n\nCo-Authored-By: Claude Opus 4.7 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "60d8b5dbb5e409cd9ce7692972420e955b8a802e",
      "tree": "f2364ca93f408363a0929fe25e103226f63ac1d1",
      "parents": [
        "2715a32e939d17222c18e8adacf85ee45da464b9"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Apr 14 03:31:01 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Apr 14 03:31:01 2026 -0400"
      },
      "message": "Fix error on show() with an explain plan (#1492)"
    },
    {
      "commit": "2715a32e939d17222c18e8adacf85ee45da464b9",
      "tree": "91262ce078f88250bfbdd6424445f043859fc2ca",
      "parents": [
        "398980d1edbb8ad6d9744236f2dfe0c6ab4b4665"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Apr 14 03:27:00 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Apr 14 03:27:00 2026 -0400"
      },
      "message": "chore: update release documentation (#1494)\n\n* Update release documentation\n\n* Minor change to workflow because release start at 1"
    },
    {
      "commit": "398980d1edbb8ad6d9744236f2dfe0c6ab4b4665",
      "tree": "17cd377ee141d16fdfee93979a0db7286fc6921f",
      "parents": [
        "8a7efead43cff8dc7515e27e53da7545100e25a7"
      ],
      "author": {
        "name": "Zeel Desai",
        "email": "72783325+zeel2104@users.noreply.github.com",
        "time": "Mon Apr 13 09:24:56 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Apr 13 09:24:56 2026 -0400"
      },
      "message": "Support None comparisons for null expressions (#1489)\n\n* Support None comparisons for null expressions\n\n* Fold None comparison coverage into relational expr test"
    },
    {
      "commit": "8a7efead43cff8dc7515e27e53da7545100e25a7",
      "tree": "64ed2a5076a48e7f737e18b046d3cd43ac04aeb6",
      "parents": [
        "00b24572c98a257f06ff026a90c07634a86204d4"
      ],
      "author": {
        "name": "Shreyesh",
        "email": "shreyesh.arangath@gmail.com",
        "time": "Mon Apr 13 03:34:35 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Apr 13 06:34:35 2026 -0400"
      },
      "message": "Add Python bindings for accessing ExecutionMetrics (#1381)\n\n* feat: add Python bindings for accessing ExecutionMetrics\n\n* test: imporve tests\n\n* first round of reviews\n\n* plan caching\n\n* address some concerns\n\n* merge and address comments\n\n* fix Ci issues\n\n* attempt to fix lint\n\n* fix build\n\n* fix docstring\n\n* address some more comments\n\n---------\n\nCo-authored-by: ShreyeshArangath \u003cshryeyesh.arangath@gmail.com\u003e"
    },
    {
      "commit": "00b24572c98a257f06ff026a90c07634a86204d4",
      "tree": "6e4c531777ff78afe3c9c507d2397517ec59a8b6",
      "parents": [
        "1be838bb47f04bcf4d1a0f65e3e6958aa9366f3f"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon Apr 13 06:33:29 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Apr 13 06:33:29 2026 -0400"
      },
      "message": "ci: disable symbol export on Windows verification (#1486)\n\n* Set rust flags on windows release verification\n\n* Forward flag to linker\n\n* Switch to msvc rust toolchain\n\n* Revert \"Switch to msvc rust toolchain\"\n\nThis reverts commit 9879fc7dbe066098445b9600087e665435b58f8a."
    },
    {
      "commit": "1be838bb47f04bcf4d1a0f65e3e6958aa9366f3f",
      "tree": "947f2e392faa9002020930092d7d1cee9dde83a2",
      "parents": [
        "3585c11eed778810e3317c56c2c25a8cdc29be5b"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Sun Apr 12 21:24:39 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun Apr 12 21:24:39 2026 -0400"
      },
      "message": "Release 53.0.0 (#1491)\n\n* Update version number and changelog\n\n* minor: set version number on dependency to publish to crates.io\n\n* taplo fmt"
    },
    {
      "commit": "3585c11eed778810e3317c56c2c25a8cdc29be5b",
      "tree": "3b4c9265fa211bdd0cdfbe8f2dbe2d345bdcf83a",
      "parents": [
        "ecd14c10aff67169f2bfe1b7f86ff07621088dd0"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Thu Apr 09 07:38:59 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Apr 09 07:38:59 2026 -0400"
      },
      "message": "minor: remove deprecated interfaces (#1481)\n\n* udf module has been deprecated since DF47. html_formatter module has been deprecated since DF48.\n\n* database has been deprecated since DF48\n\n* select_columns has been deprecated since DF43\n\n* unnest_column has been deprecated since DF42\n\n* display_name has been deprecated since DF42\n\n* window() has been deprecated since DF50\n\n* serde functions have been deprecated since DF42\n\n* from_arrow_table and tables have been deprecated since DF42\n\n* RuntimeConfig has been deprecated since DF44\n\n* Update user documentation to remove deprecated function\n\n* update tpch examples for latest function uses\n\n* Remove unnecessary options in example\n\n* update rendering for the most recent dataframe_formatter instead of the deprecated html_formatter"
    },
    {
      "commit": "ecd14c10aff67169f2bfe1b7f86ff07621088dd0",
      "tree": "03d3e29a3a4a7933cff0ffe591c9fa042b9c48e2",
      "parents": [
        "aa3b1948c3a49d14395093287a6e93354229c539"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed Apr 08 11:11:48 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Apr 08 11:11:48 2026 -0400"
      },
      "message": "Add missing SessionContext utility methods (#1475)\n\n* Add missing SessionContext utility methods\n\nExpose upstream DataFusion v53 utility methods: session_start_time,\nenable_ident_normalization, parse_sql_expr, execute_logical_plan,\nrefresh_catalogs, remove_optimizer_rule, and table_provider. The\nadd_optimizer_rule and add_analyzer_rule methods are omitted as the\nOptimizerRule and AnalyzerRule traits are not yet exposed to Python.\nCloses #1459.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Raise KeyError from table_provider for consistency with table()\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add docstring examples for new SessionContext utility methods\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* update docstring\n\n* Address PR review feedback for SessionContext utility methods\n\n- Improve docstring examples to show actual output instead of asserts\n- Use doctest +SKIP for non-deterministic session_start_time output\n- Fix table_provider error mapping: outer async error is now RuntimeError\n- Strengthen tests: validate RFC 3339 with fromisoformat, test both\n  optimizer rule removal paths, exact string match for parse_sql_expr,\n  verify enable_ident_normalization with dynamic state change\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Fix test_session_start_time failure on Python 3.10\n\ndatetime.fromisoformat() only supports up to 6 fractional-second\ndigits (microseconds) on Python 3.10. Truncate nanosecond precision\nbefore parsing.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "aa3b1948c3a49d14395093287a6e93354229c539",
      "tree": "8e2124d5f4029861b1fb54297f257294d742718b",
      "parents": [
        "46f9ab8fcad03913234ce29e5075644c1ecdb9b7"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed Apr 08 09:22:28 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Apr 08 09:22:28 2026 -0400"
      },
      "message": "Add missing registration methods (#1474)\n\n* Add missing SessionContext read/register methods for Arrow IPC and batches\n\nAdd read_arrow, read_empty, register_arrow, and register_batch methods to\nSessionContext, exposing upstream DataFusion v53 functionality. The write_*\nmethods and read_batch/read_batches are already covered by DataFrame.write_*\nand SessionContext.from_arrow respectively. Closes #1458.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Remove redundant read_empty Rust binding, make Python read_empty an alias for empty_table\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add pathlib.Path and empty batch tests for Arrow IPC and register_batch\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Make test_read_empty more robust with length and num_rows checks\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add examples to docstrings for new register/read methods\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Empty table actually returns record batch of length one but there are no columns\n\n* Add optional argument examples to register_arrow and read_arrow docstrings\n\nDemonstrate schema\u003d and file_extension\u003d keyword arguments in the\ndocstring examples for register_arrow and read_arrow, following project\nguidelines for optional parameter documentation.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Simplify read_empty docstring to use alias pattern\n\nFollow the same See Also alias convention used in functions.py since\nread_empty is a simple alias for empty_table.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Remove shared ctx from doctest namespace, use inline SessionContext\n\nAvoid shared SessionContext state across doctests by having each\ndocstring example create its own ctx instance, matching the pattern\nused throughout the rest of the codebase.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Remove redundant import pyarrow as pa from docstrings\n\nThe pa alias is already provided by the doctest namespace in\nconftest.py, so inline imports are unnecessary.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "46f9ab8fcad03913234ce29e5075644c1ecdb9b7",
      "tree": "008f7dcfb9c5d7ed35fb5e3bbfdab371eb6aa865",
      "parents": [
        "52932128d353e417ddae2c5ff3f14135cb806f7e"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Apr 07 15:03:38 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Apr 07 15:03:38 2026 -0400"
      },
      "message": "Add missing deregister methods to SessionContext (#1473)\n\n* Add deregister methods to SessionContext for UDFs and object stores\n\nExpose upstream DataFusion deregister methods (deregister_udf, deregister_udaf,\nderegister_udwf, deregister_udtf, deregister_object_store) in both the Rust\nPyO3 bindings and Python wrappers, closing the gap identified in #1457.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Fix deregister tests to expect ValueError instead of RuntimeError\n\nDataFusion raises ValueError for planning errors when a deregistered\nfunction is used in a query.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Replace .unwrap() with proper error propagation in object store methods\n\nUrl::parse() can fail on invalid input. Use .map_err() to convert\nthe error into a Python exception instead of panicking.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Minor move of import statement\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "52932128d353e417ddae2c5ff3f14135cb806f7e",
      "tree": "b88ea32d565a6b1708831f95a42034c7028a50e4",
      "parents": [
        "898d73de20346bba7241907bb18cba47da53e9a9"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Apr 07 14:58:09 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Apr 07 14:58:09 2026 -0400"
      },
      "message": "Add missing Dataframe functions (#1472)\n\n* Add missing DataFrame methods for set operations and query\n\nExpose upstream DataFusion DataFrame methods that were not yet\navailable in the Python API. Closes #1455.\n\nSet operations:\n- except_distinct: set difference with deduplication\n- intersect_distinct: set intersection with deduplication\n- union_by_name: union matching columns by name instead of position\n- union_by_name_distinct: union by name with deduplication\n\nQuery:\n- distinct_on: deduplicate rows based on specific columns\n- sort_by: sort by expressions with ascending order and nulls last\n\nNote: show_limit is already covered by the existing show(num) method.\nexplain_with_options and with_param_values are deferred as they require\nexposing additional types (ExplainOption, ParamValues).\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add ExplainFormat enum and format option to DataFrame.explain()\n\nExtend the existing explain() method with an optional format parameter\ninstead of adding a separate explain_with_options() method. This keeps\nthe API simple while exposing all upstream ExplainOption functionality.\n\nAvailable formats: indent (default), tree, pgjson, graphviz.\n\nThe ExplainFormat enum is exported from the top-level datafusion module.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add DataFrame.window() and unnest recursion options\n\nExpose remaining DataFrame methods from upstream DataFusion.\nCloses #1456.\n\n- window(*exprs): apply window function expressions and append results\n  as new columns\n- unnest_column/unnest_columns: add optional recursions parameter for\n  controlling unnest depth via (input_column, output_column, depth)\n  tuples\n\nNote: drop_columns is already exposed as the existing drop() method.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Update docstring\n\nCo-authored-by: Copilot \u003c175728472+Copilot@users.noreply.github.com\u003e\n\n* Improve docstrings and test robustness for new DataFrame methods\n\nClarify except_distinct/intersect_distinct docstrings, add deterministic\nsort to test_window, add sort_by ascending verification test, and add\nsmoke tests for PGJSON and GRAPHVIZ explain formats.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Consolidate new DataFrame tests into parametrized tests\n\nCombine set operation tests (except_distinct, intersect_distinct,\nunion_by_name, union_by_name_distinct) into a single parametrized\ntest_set_operations_distinct. Merge sort_by tests and convert\nexplain format tests to parametrized form.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add doctest examples to new DataFrame method docstrings\n\nAdd \u003e\u003e\u003e style usage examples for window, explain, except_distinct,\nintersect_distinct, union_by_name, union_by_name_distinct, distinct_on,\nsort_by, and unnest_columns to match existing docstring conventions.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Improve error messages, tests, and API hygiene from PR review\n\n- Provide actionable error message for invalid explain format strings\n- Remove recursions param from deprecated unnest_column (use unnest_columns)\n- Add null-handling test case for sort_by to verify nulls-last behavior\n- Add format-specific assertions to explain tests (TREE, PGJSON, GRAPHVIZ)\n- Add deep recursion test for unnest_columns with depth \u003e 1\n- Add multi-expression window test to verify variadic *exprs\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Consolidate window and unnest tests into parametrized tests\n\nCombine test_window and test_window_multiple_expressions into a single\nparametrized test. Merge unnest recursion tests into one parametrized\ntest covering basic, explicit depth 1, and deep recursion cases.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Address PR review feedback for DataFrame operations\n\n- Use upstream parse error for explain format instead of hardcoded options\n- Fix sort_by to use column name resolution consistent with sort()\n- Use ExplainFormat enum members directly in tests instead of string lookup\n- Merge union_by_name_distinct into union_by_name(distinct\u003dFalse) for a\n  more Pythonic API\n- Update check-upstream skill to note union_by_name_distinct coverage\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add DataFrame.column(), col(), and find_qualified_columns() methods\n\nExpose upstream find_qualified_columns to resolve unqualified column\nnames into fully qualified column expressions. This is especially\nuseful for disambiguating columns after joins.\n\n- find_qualified_columns(*names) on Rust side calls upstream directly\n- DataFrame.column(name) and col(name) alias on Python side\n- Update join and join_on docstrings to reference DataFrame.col()\n- Add \"Disambiguating Columns with DataFrame.col()\" section to joins docs\n- Add tests for qualified column resolution, ambiguity, and join usage\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Merge union_by_name and union_by_name_distinct into a single method with distinct flag\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* converting into a python dict loses a column when the names are identical\n\n* Consolidate except_all/except_distinct and intersect/intersect_distinct into single methods with distinct flag\n\nFollows the same pattern as union(distinct\u003d) and union_by_name(distinct\u003d).\nAlso deprecates union_distinct() in favor of union(distinct\u003dTrue).\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\nCo-authored-by: Copilot \u003c175728472+Copilot@users.noreply.github.com\u003e"
    },
    {
      "commit": "898d73de20346bba7241907bb18cba47da53e9a9",
      "tree": "2a462b3bbae08b97d637c90a91c9786236553f1b",
      "parents": [
        "d07fdb3ef7d211920f40d0106fa50161c0bf20ce"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Apr 07 09:01:36 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Apr 07 09:01:36 2026 -0400"
      },
      "message": "Add missing aggregate functions (#1471)\n\n* Add missing aggregate functions: grouping, percentile_cont, var_population\n\nExpose upstream DataFusion aggregate functions that were not yet\navailable in the Python API. Closes #1454.\n\n- grouping: returns grouping set membership indicator (rewritten by\n  the ResolveGroupingFunction analyzer rule before physical planning)\n- percentile_cont: computes exact percentile using continuous\n  interpolation (unlike approx_percentile_cont which uses t-digest)\n- var_population: alias for var_pop\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Fix grouping() distinct parameter type for API consistency\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Improve aggregate function tests and docstrings per review feedback\n\nAdd docstring example to grouping(), parametrize percentile_cont tests,\nand add multi-column grouping test case.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add GroupingSet.rollup, .cube, and .grouping_sets factory methods\n\nExpose ROLLUP, CUBE, and GROUPING SETS via the DataFrame API by adding\nstatic methods on GroupingSet that construct the corresponding Expr\nvariants. Update grouping() docstring and tests to use the new API.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Remove _GroupingSetInternal alias, use expr_internal.GroupingSet directly\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Parametrize grouping set tests for rollup and cube\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add grouping sets documentation and note grouping() alias limitation\n\nAdd user documentation for GroupingSet.rollup, .cube, and\n.grouping_sets with Pokemon dataset examples. Document the upstream\nalias limitation (apache/datafusion#21411) in both the grouping()\ndocstring and the aggregation user guide.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add grouping sets note to DataFrame.aggregate() docstring\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Address PR review feedback: add quantile_cont alias and simplify examples\n\n- Add quantile_cont as alias for percentile_cont (matches upstream)\n- Replace pa.concat_arrays batch pattern with collect_column() in docstrings\n- Add percentile_cont, quantile_cont, var_population to docs function list\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Accept string column names in GroupingSet factory methods\n\nGroupingSet.rollup(), .cube(), and .grouping_sets() now accept both\nExpr objects and string column names, consistent with DataFrame.aggregate().\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add agent instructions to keep aggregation/window docs in sync\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* dfn is already available globally\n\n* Remove unnecessary import on doctest\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "d07fdb3ef7d211920f40d0106fa50161c0bf20ce",
      "tree": "84d6cf140850de88bd310be10347a2f351960437",
      "parents": [
        "99bc9602dd077c924685f1fc6e54e6feb3429302"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon Apr 06 08:54:30 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Apr 06 08:54:30 2026 -0400"
      },
      "message": "Add missing scalar functions  (#1470)\n\n* Add missing scalar functions: get_field, union_extract, union_tag, arrow_metadata, version, row\n\nExpose upstream DataFusion scalar functions that were not yet available\nin the Python API. Closes #1453.\n\n- get_field: extracts a field from a struct or map by name\n- union_extract: extracts a value from a union type by field name\n- union_tag: returns the active field name of a union type\n- arrow_metadata: returns Arrow field metadata (all or by key)\n- version: returns the DataFusion version string\n- row: alias for the struct constructor\n\nNote: arrow_try_cast was listed in the issue but does not exist in\nDataFusion 53, so it is not included.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add tests for new scalar functions\n\nTests for get_field, arrow_metadata, version, row, union_tag, and\nunion_extract.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Accept str for field name and type parameters in scalar functions\n\nAllow arrow_cast, get_field, and union_extract to accept plain str\narguments instead of requiring Expr wrappers. Also improve\narrow_metadata test coverage and fix parameter shadowing.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Accept str for key parameter in arrow_metadata for consistency\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add doctest examples and fix docstring style for new scalar functions\n\nReplace Args/Returns sections with doctest Examples blocks for\narrow_metadata, get_field, union_extract, union_tag, and version to\nmatch existing codebase conventions. Simplify row to alias-style\ndocstring with See Also reference. Document that arrow_cast accepts\nboth str and Expr for data_type.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Support pyarrow DataType in arrow_cast\n\nAllow arrow_cast to accept a pyarrow DataType in addition to str and\nExpr. The DataType is converted to its string representation before\nbeing passed to DataFusion. Adds test coverage for the new input type.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Document bracket syntax shorthand in get_field docstring\n\nNote that expr[\"field\"] is a convenient alternative when the field\nname is a static string, and get_field is needed for dynamic\nexpressions. Add a second doctest example showing the bracket syntax.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Fix arrow_cast with pyarrow DataType by delegating to Expr.cast\n\nUse the existing Rust-side PyArrowType\u003cDataType\u003e conversion via\nExpr.cast() instead of str() which produces pyarrow type names\nthat DataFusion does not recognize.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Clarify when to use arrow_cast vs Expr.cast in docstring\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "99bc9602dd077c924685f1fc6e54e6feb3429302",
      "tree": "ccaa7c9a5fb8012dd2c005de2997b35ece6f0382",
      "parents": [
        "ff15648c5dca6b41d3f6146c6c36c97e605f8561"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon Apr 06 07:47:13 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Apr 06 07:47:13 2026 -0400"
      },
      "message": "Add missing array functions (#1468)\n\n* Add missing array/list functions and aliases (#1452)\n\nAdd new array functions from upstream DataFusion v53: array_any_value,\narray_distance, array_max, array_min, array_reverse, arrays_zip,\nstring_to_array, and gen_series. Add corresponding list_* aliases and\nmissing list_* aliases for existing functions (list_empty, list_pop_back,\nlist_pop_front, list_has, list_has_all, list_has_any). Also add\narray_contains/list_contains as aliases for array_has, generate_series\nas alias for gen_series, and string_to_list as alias for string_to_array.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add unit tests for new array/list functions and aliases\n\nTests cover all functions and aliases added in the previous commit:\narray_any_value, array_distance, array_max, array_min, array_reverse,\narrays_zip, string_to_array, gen_series, generate_series,\narray_contains, list_contains, list_empty, list_pop_back,\nlist_pop_front, list_has, list_has_all, list_has_any, and list_*\naliases for the new functions.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Improve array function APIs: optional params, better naming, restore comment\n\n- Make null_string optional in string_to_array/string_to_list\n- Make step optional in gen_series/generate_series\n- Rename second_array to element in array_contains/list_has/list_contains\n- Restore # Window Functions section comment in __all__\n- Add tests for optional parameter variants\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Consolidate array/list function tests using pytest parametrize\n\nReduce 26 individual tests to 14 test functions with parametrized\ncases, eliminating boilerplate while maintaining full coverage.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Move list alias tests into existing test_array_functions parametrize block\n\nMerge standalone tests for list_empty, list_pop_back, list_pop_front,\nlist_has, array_contains, list_contains, list_has_all, and list_has_any\ninto the existing parametrized test_array_functions block alongside\ntheir array_* counterparts.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Merge test_array_any_value into parametrized test_any_value_aliases\n\nUse the richer multi-row dataset (including all-nulls case) for both\narray_any_value and list_any_value via the parametrized test.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add arrays_overlap and list_overlap as aliases for array_has_any\n\nThese aliases match the upstream DataFusion SQL-level aliases, completing\nthe set of missing array functions from issue #1452.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add docstring examples for optional params in string_to_array and gen_series\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Update AGENTS file to demonstrate preferred method of documenting python functions\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "ff15648c5dca6b41d3f6146c6c36c97e605f8561",
      "tree": "9ef8154af5770402b050e0f06705ce10370401be",
      "parents": [
        "8a35caea9ed01492742738f161fa5b4459d69402"
      ],
      "author": {
        "name": "Nuno Faria",
        "email": "nunofpfaria@gmail.com",
        "time": "Sun Apr 05 13:29:32 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sun Apr 05 08:29:32 2026 -0400"
      },
      "message": "minor: Fix pytest instructions in README (#1477)"
    },
    {
      "commit": "8a35caea9ed01492742738f161fa5b4459d69402",
      "tree": "580704c9bf189ddbf4a5f3caa327a9161b86c883",
      "parents": [
        "16feeb136737ae45fac39f7a82cca2d88fd6224b"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Sat Apr 04 12:20:31 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Apr 04 12:20:31 2026 -0400"
      },
      "message": "Add missing map functions (#1461)\n\n* Add map functions (make_map, map_keys, map_values, map_extract, map_entries, element_at)\n\nCloses #1448\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add unit tests for map functions\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Remove redundant pyo3 element_at function\n\nelement_at is already a Python-only alias for map_extract,\nso the Rust binding is unnecessary.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Change make_map to accept a Python dictionary\n\nmake_map now takes a dict for the common case and also supports\nseparate keys/values lists for column expressions. Non-Expr keys\nand values are automatically converted to literals.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Make map the primary function with make_map as alias\n\nmap() now supports three calling conventions matching upstream:\n- map({\"a\": 1, \"b\": 2}) — from a Python dictionary\n- map([keys], [values]) — two lists that get zipped\n- map(k1, v1, k2, v2, ...) — variadic key-value pairs\n\nNon-Expr keys and values are automatically converted to literals.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Improve map function docstrings\n\n- Add examples for all three map() calling conventions\n- Use clearer descriptions instead of jargon (no \"zipped\" or \"variadic\")\n- Break map_keys/map_values/map_extract/map_entries examples into\n  two steps: create the map column first, then call the function\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Remove map() in favor of make_map(), fix docstrings, add validation\n\n- Remove map() function that shadowed Python builtin; make_map() is now\n  the sole entry point for creating map expressions\n- Fix map_extract/element_at docstrings: missing keys return [None],\n  not an empty list (matches actual upstream behavior)\n- Add length validation for the two-list calling convention\n- Update all tests and docstring examples accordingly\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Consolidate map function tests into parametrized groups\n\nReduce boilerplate by combining make_map construction tests and map\naccessor function tests into two @pytest.mark.parametrize groups.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Docstring update\n\nCo-authored-by: Nuno Faria \u003cnunofpfaria@gmail.com\u003e\n\n* Docstring update\n\nCo-authored-by: Nuno Faria \u003cnunofpfaria@gmail.com\u003e\n\n* Simplify test for readability\n\nCo-authored-by: Nuno Faria \u003cnunofpfaria@gmail.com\u003e\n\n* Simplify test for readability\n\nCo-authored-by: Nuno Faria \u003cnunofpfaria@gmail.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\nCo-authored-by: Nuno Faria \u003cnunofpfaria@gmail.com\u003e"
    },
    {
      "commit": "16feeb136737ae45fac39f7a82cca2d88fd6224b",
      "tree": "10fb40bbeab421b23d25bade5da9985dc359c0fe",
      "parents": [
        "0b6ea95a3d304a774bbe512bb70fbca332aa5426"
      ],
      "author": {
        "name": "Kevin Liu",
        "email": "kevinjqliu@users.noreply.github.com",
        "time": "Fri Apr 03 12:47:31 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Apr 03 15:47:31 2026 -0400"
      },
      "message": "Reduce peak memory usage during release builds to fix OOM on manylinux runners (#1445)\n\n* adjust swap to 8gb\n\n* modify profile.release"
    },
    {
      "commit": "0b6ea95a3d304a774bbe512bb70fbca332aa5426",
      "tree": "8c1e2c3ae2ea105c730d84dc47c1dfe3f720c8de",
      "parents": [
        "645d261ce3bc0b3b610c8d82422042b3e573e793"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Apr 03 15:43:28 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Apr 03 15:43:28 2026 -0400"
      },
      "message": "Add missing conditional functions (#1464)\n\n* Add missing conditional functions: greatest, least, nvl2, ifnull (#1449)\n\nExpose four conditional functions from upstream DataFusion that were\nnot yet available in the Python bindings.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add unit tests for greatest, least, nvl2, and ifnull functions\n\nTests cover multiple data types (integers, strings), null handling\n(all-null, partial-null), multiple arguments, and ifnull/nvl equivalence.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Use standard alias docstring pattern for ifnull\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* remove unused df fixture and fix parameter shadowing\n\n* Refactor conditional function tests into parametrized test suite\n\nReplace separate test functions for coalesce, greatest, least, nvl,\nnvl2, ifnull with a single parametrized test using a shared fixture.\nAdds coverage for nvl, nullif (previously untested), datetime and\nboolean types, literal fallbacks, and variadic calls.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "645d261ce3bc0b3b610c8d82422042b3e573e793",
      "tree": "4f05826255e2f5fe8d2367e8137c7057d9dd50a9",
      "parents": [
        "be8dd9d08fd284cf1747a2c1b965d9c95fff117c"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Apr 03 13:51:43 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Apr 03 13:51:43 2026 -0400"
      },
      "message": "Add missing string function `contains` (#1465)\n\n* Add missing `contains` string function\n\nExpose the upstream DataFusion `contains(string, search_str)` function\nwhich returns true if search_str is found within string (case-sensitive).\n\nNote: the other functions from #1450 (instr, position, substring_index)\nalready exist — instr and position are aliases for strpos, and\nsubstring_index is exposed as substr_index.\n\nCloses #1450\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add unit test for contains string function\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Update python/datafusion/functions.py\n\nCo-authored-by: Nuno Faria \u003cnunofpfaria@gmail.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\nCo-authored-by: Nuno Faria \u003cnunofpfaria@gmail.com\u003e"
    },
    {
      "commit": "be8dd9d08fd284cf1747a2c1b965d9c95fff117c",
      "tree": "3c5f66c1cfc4a2631f8255aa1b31928c766ae2d3",
      "parents": [
        "0113a6ee55cc61f9ebd897ae8cfc9213f560e468"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Apr 03 09:37:00 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Apr 03 09:37:00 2026 -0400"
      },
      "message": "Add AI skill to check current repository against upstream APIs (#1460)\n\n* Initial commit for skill to check upstream repo\n\n* Add instructions on using the check-upstream skill\n\n* Add FFI type coverage and implementation pattern to check-upstream skill\n\nDocument the full FFI type pipeline (Rust PyO3 wrapper → Protocol type →\nPython wrapper → ABC base class → exports → example) and catalog which\nupstream datafusion-ffi types are supported, which have been evaluated as\nnot needing direct exposure, and how to check for new gaps.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Update check-upstream skill to include FFI types as a checkable area\n\nAdd \"ffi types\" to the argument-hint and description so users can invoke\nthe skill with `/check-upstream ffi types`. Also add pipeline verification\nstep to ensure each supported FFI type has the full end-to-end chain\n(PyO3 wrapper, Protocol, Python wrapper with type hints, ABC, exports).\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Move FFI Types section alongside other areas to check\n\nSection 7 (FFI Types) was incorrectly placed after the Output Format and\nImplementation Pattern sections. Move it to sit after Section 6\n(SessionContext Methods), consistent with the other checkable areas.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Replace static FFI type list with dynamic discovery instruction\n\nThe supported FFI types list would go stale as new types are added.\nReplace it with a grep instruction to discover them at check time,\nkeeping only the \"evaluated and not requiring exposure\" list which\ncaptures rationale not derivable from code.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Make Python API the source of truth for upstream coverage checks\n\nFunctions exposed in Python (e.g., as aliases of other Rust bindings)\nwere being falsely reported as missing because they lacked a dedicated\n#[pyfunction] in Rust. The user-facing API is the Python layer, so\ncoverage should be measured there.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add exclusion list for DataFrame methods already covered by Python API\n\nshow_limit is covered by DataFrame.show() and with_param_values is\ncovered by SessionContext.sql(param_values\u003d...), so neither needs\nseparate exposure.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Move skills to .ai/skills/ for tool-agnostic discoverability\n\nMoves the canonical skill definitions from .claude/skills/ to .ai/skills/\nand replaces .claude/skills with a symlink, so Claude Code still discovers\nthem while other AI agents can find them in a tool-neutral location.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add AGENTS.md for tool-agnostic agent instructions with CLAUDE.md symlink\n\nAGENTS.md points agents to .ai/skills/ for skill discovery. CLAUDE.md\nsymlinks to it so Claude Code picks it up as project instructions.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Make README upstream coverage section tool-agnostic\n\nRemove Claude Code references and update skill path from .claude/skills/\nto .ai/skills/ to match the new tool-neutral directory structure.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add GitHub issue lookup step to check-upstream skill\n\nWhen gaps are identified, search open issues at\napache/datafusion-python before reporting. Existing issues are\nlinked in the report rather than duplicated.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Require Python test coverage in issues created by check-upstream skill\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add license text\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "0113a6ee55cc61f9ebd897ae8cfc9213f560e468",
      "tree": "c6fcb67b7f267985b667c1794dae89647233a4fe",
      "parents": [
        "24994099e41a4e933f883557e2bce1a963bac0ea"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Thu Apr 02 17:47:47 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Apr 02 17:47:47 2026 -0400"
      },
      "message": "Add missing datetime functions (#1467)\n\n* Add missing datetime functions: make_time, current_timestamp, date_format\n\nCloses #1451. Adds make_time Rust binding and Python wrapper, and adds\ncurrent_timestamp (alias for now) and date_format (alias for to_char)\nPython functions.\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n* Add unit tests for make_time, current_timestamp, and date_format\n\nCo-Authored-By: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e\n\n---------\n\nCo-authored-by: Claude Opus 4.6 (1M context) \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "24994099e41a4e933f883557e2bce1a963bac0ea",
      "tree": "1ffac2a348ed161d1843b85caf710a8e35f94156",
      "parents": [
        "73a9d53a37f6ce864b68dda1b07e92a0fed8c8ba"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Mar 31 14:09:16 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 31 14:09:16 2026 -0400"
      },
      "message": "ci: update codespell paths (#1469)\n\n* Update path so it works well with pre-commit\n\n* Prefix path with asterisk so we get matching in both CI and pre-commit\n\n* Update paths for codespell"
    },
    {
      "commit": "73a9d53a37f6ce864b68dda1b07e92a0fed8c8ba",
      "tree": "677778d5c94e1fd3017e3900a7e5cb93e38bd0c1",
      "parents": [
        "5be412b6a691a57bb2246e6726751fe9e8916035"
      ],
      "author": {
        "name": "Kevin Liu",
        "email": "kevinjqliu@users.noreply.github.com",
        "time": "Tue Mar 31 01:57:32 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 31 16:57:32 2026 +0800"
      },
      "message": "CI: Add CodeQL workflow for GitHub Actions security scanning (#1408)\n\n* CI: Add CodeQL workflow for GitHub Actions security scanning\n\n* Update .github/workflows/codeql.yml"
    },
    {
      "commit": "5be412b6a691a57bb2246e6726751fe9e8916035",
      "tree": "57e8625b67701e99b914a7761d80b350b6cc4b73",
      "parents": [
        "ad8d41f2b5faff9a35aeeb340a24480c8ccb6eff"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Sun Mar 29 18:34:48 2026 -0400"
      },
      "committer": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Sun Mar 29 18:34:48 2026 -0400"
      },
      "message": "Pin rust toolchain to apache allowlist sha\n"
    },
    {
      "commit": "ad8d41f2b5faff9a35aeeb340a24480c8ccb6eff",
      "tree": "7049ed5f3f8e65f7fba2ca600c4174163812f182",
      "parents": [
        "acd9a8dcdd1015497835ae3c9a49e4bf5961d719"
      ],
      "author": {
        "name": "Daniel Mesejo",
        "email": "mesejoleon@gmail.com",
        "time": "Sat Mar 28 14:35:32 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Sat Mar 28 09:35:32 2026 -0400"
      },
      "message": "chore: enforce uv lockfile consistency in CI and pre-commit (#1398)\n\n* chore: enforce uv lockfile consistency in CI and pre-commit\n\n  Add --locked flag to uv sync in CI to fail if uv.lock is out of sync,\n  and add the uv-lock pre-commit hook to automatically keep uv.lock\n  up to date when pyproject.toml changes.\n\n* chore: add missing --locked calls"
    },
    {
      "commit": "acd9a8dcdd1015497835ae3c9a49e4bf5961d719",
      "tree": "e80663b744d7e39ded240e2f168bd6dfee532828",
      "parents": [
        "8c6a481b43b322a80990ff6d793d1a921218f567"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Fri Mar 27 14:19:18 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 27 14:19:18 2026 -0400"
      },
      "message": "Complete doc string examples for functions.py (#1435)\n\n* Verify all non-alias functions have doc string\n\n* MNove all alias for statements to see also blocks and confirm no examples\n\n* Fix google doc style for all examples\n\n* Remove builtins use\n\n* Add coverage for optional filter\n\n* Cover optional argument examples for window and value functions\n\n* Cover optional arguments for scalar functions\n\n* Cover array and aggregation functions\n\n* Make examples different\n\n* Make format more consistent\n\n* Remove duplicated df definition"
    },
    {
      "commit": "8c6a481b43b322a80990ff6d793d1a921218f567",
      "tree": "4a7db74b096cf185a52343fcc47ed56e27a922bb",
      "parents": [
        "4b215724565cec4257ed9dfa25271c5481c9f7b4"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Mar 27 12:44:54 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 27 12:44:54 2026 -0400"
      },
      "message": "chore: update dependencies (#1447)\n\n* Cargo lock update\n\n* Update download-artifact\n\n* Update upload-artifact to v7\n\n* Update cargo toml to latest deps"
    },
    {
      "commit": "4b215724565cec4257ed9dfa25271c5481c9f7b4",
      "tree": "250328694f80cceabdbf7c6ab5be2027f16c7110",
      "parents": [
        "75d07ce706fcbda423ad90222aa3dacccb7a5766"
      ],
      "author": {
        "name": "Topias Pyykkönen",
        "email": "43851547+toppyy@users.noreply.github.com",
        "time": "Fri Mar 27 17:17:44 2026 +0200"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 27 11:17:44 2026 -0400"
      },
      "message": "Add a working, more complete example of using a catalog (docs) (#1427)\n\n* Add a working, more complete example of using a catalog\n\n* the default schema is \u0027public\u0027, not \u0027default\u0027\n\n* in-memory table instead of imaginary csv for standalone example\n\n* typo fix\n\nCo-authored-by: Kevin Liu \u003ckevinjqliu@users.noreply.github.com\u003e\n\n* minor c string fix after merge\n\n---------\n\nCo-authored-by: Kevin Liu \u003ckevinjqliu@users.noreply.github.com\u003e\nCo-authored-by: Tim Saucer \u003ctimsaucer@gmail.com\u003e"
    },
    {
      "commit": "75d07ce706fcbda423ad90222aa3dacccb7a5766",
      "tree": "a4da6f4876c6b8b0a31f680feae70ab31400ebf8",
      "parents": [
        "207fc16d62e2f64b687798741b33964aad9b5b7e"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Mar 27 10:29:23 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 27 10:29:23 2026 -0400"
      },
      "message": "Implement configuration extension support (#1391)\n\n* Implement config options\n\n* Update examples and tests\n\n* pyo3 update\n\n* Add docstring\n\n* rat\n\n* Update examples/datafusion-ffi-example/python/tests/_test_config.py\n\nCo-authored-by: Copilot \u003c175728472+Copilot@users.noreply.github.com\u003e\n\n* Update crates/core/src/context.rs\n\nCo-authored-by: Copilot \u003c175728472+Copilot@users.noreply.github.com\u003e\n\n* Update crates/core/src/context.rs\n\nCo-authored-by: Copilot \u003c175728472+Copilot@users.noreply.github.com\u003e\n\n---------\n\nCo-authored-by: Copilot \u003c175728472+Copilot@users.noreply.github.com\u003e"
    },
    {
      "commit": "207fc16d62e2f64b687798741b33964aad9b5b7e",
      "tree": "a7e1f3ea5a16ad6676727265f495b5fa433eec53",
      "parents": [
        "6cea061abeca55bbe1a53e3c07ad62145d3ac809"
      ],
      "author": {
        "name": "Thomas Tanon",
        "email": "thomas@pellissier-tanon.fr",
        "time": "Fri Mar 27 14:39:20 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 27 09:39:20 2026 -0400"
      },
      "message": "Remove validate_pycapsule (#1426)\n\nThe Bound\u003c\u0027_, PyCapsule\u003e::pointer_checked does the same validation and is already used across the codebase"
    },
    {
      "commit": "6cea061abeca55bbe1a53e3c07ad62145d3ac809",
      "tree": "505297528a70f192ce2ba72290e68d6e98343cee",
      "parents": [
        "876646d67771261cfd9a57c721bece0d95b9740c"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Fri Mar 27 09:29:09 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 27 09:29:09 2026 -0400"
      },
      "message": "Update remaining existing examples to make testable/standalone executable (#1437)\n\n* Move example to doctestable examples for context.py\n\n* Add more standard dafusion namespaces to reduce clutter\n\n* Update project to use ruff compatible with pre-commit version\n\n* Resolve ruff errors for newer version but just ignore them\n\n* Convert dataframe examples to doctestable. Found bug in dropping A\n\n* Move expr.py to doctestable examples\n\n* Move user_defined.py to doctestable examples"
    },
    {
      "commit": "876646d67771261cfd9a57c721bece0d95b9740c",
      "tree": "28928f5553b6ff4fb9dd0c8a83c10586a0e4989c",
      "parents": [
        "e09c93bbe5c7d78c3752adc9158f3ff012d0c4cd"
      ],
      "author": {
        "name": "Kevin Liu",
        "email": "kevinjqliu@users.noreply.github.com",
        "time": "Fri Mar 27 06:08:57 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 27 09:08:57 2026 -0400"
      },
      "message": "docs: clarify DataFusion 52 FFI session-parameter requirement for provider hooks (#1439)\n\n* mention new session arg\n\n* flow better\n\n* smaller change"
    },
    {
      "commit": "e09c93bbe5c7d78c3752adc9158f3ff012d0c4cd",
      "tree": "df5e0198eb5a5f77fe1a0345ed8711b23756fd8f",
      "parents": [
        "1397c5d6444e370a0feee69231fb8bc92c778d5f"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Thu Mar 26 15:45:05 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Mar 26 15:45:05 2026 -0400"
      },
      "message": "ci: add swap during build, use tpchgen-cli (#1443)\n\n* restrict number of rustc jobs during build stage\n\n* temporarily run release build on PR\n\n* Change optimization setting for substrait\n\n* Add swap during release build\n\n* Remove temporary checks to build in PR\n\n* Try using tpchgen-cli for test files. commit answers\n\n* taplo fmt\n\n* do not run rat on data files\n\n* ci needs ./ in path\n\n* add no-project to uv run\n\n* Temporary debug lines to figure out what is happening in CI\n\n* filter null value during aggregation instead now that https://github.com/apache/datafusion/issues/21011 is closed"
    },
    {
      "commit": "1397c5d6444e370a0feee69231fb8bc92c778d5f",
      "tree": "7c839c97aa7ab159547864b8362d5cfa07d0e550",
      "parents": [
        "0c33524dc05091cf0bd5b510417e5b3e2ee48922"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Mar 24 08:05:42 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 24 08:05:42 2026 -0400"
      },
      "message": "bump datafusion to release version (#1441)"
    },
    {
      "commit": "0c33524dc05091cf0bd5b510417e5b3e2ee48922",
      "tree": "43df4600cd3ebc4879c13af1f8b7acaa01a35abf",
      "parents": [
        "85a3595444e7946dc4eaa166cb4843bee2bf2f07"
      ],
      "author": {
        "name": "Kevin Liu",
        "email": "kevinjqliu@users.noreply.github.com",
        "time": "Tue Mar 24 05:04:39 2026 -0700"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 24 08:04:39 2026 -0400"
      },
      "message": "pin setup-uv (#1438)"
    },
    {
      "commit": "85a3595444e7946dc4eaa166cb4843bee2bf2f07",
      "tree": "760d65a3c1f20e59d88f434f99f4d3f64fec6f20",
      "parents": [
        "4e51fa8935799343c973e9cd306f42d278620d42"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Wed Mar 18 22:06:31 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Mar 19 10:06:31 2026 +0800"
      },
      "message": "Add docstring examples for Aggregate window functions (#1418)\n\n* Add docstring examples for Aggregate window functions\n\nAdd example usage to docstrings for Aggregate window functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Remove for example for example docstring\n\n* Actually remove all for example calls in favor of docstrings\n\n* Remove builtins\n\n* Make google docstyle\n\n* Fix bad merge leading to duplicate xample\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "4e51fa8935799343c973e9cd306f42d278620d42",
      "tree": "a2099d91f544fcec9a10f04887d5065233b986e8",
      "parents": [
        "3c5013dd57369c55aaf5a463797b73f1d65f3d8a"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Wed Mar 18 02:01:44 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Mar 18 14:01:44 2026 +0800"
      },
      "message": "Add docstring examples for Scalar string functions (#1423)\n\n* Add docstring examples for Scalar string functions\n\nAdd example usage to docstrings for Scalar string functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Remove examples for aliases\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "3c5013dd57369c55aaf5a463797b73f1d65f3d8a",
      "tree": "a173d047281e680037b047b2a9ab956b3405728a",
      "parents": [
        "74b32214fb2c9a06f72cd0495b19fee5d5a3047b"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Wed Mar 18 01:58:30 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Mar 18 13:58:30 2026 +0800"
      },
      "message": "Add docstring examples for Scalar array/list functions (#1420)\n\n* Add docstring examples for Scalar array/list functions\n\nAdd example usage to docstrings for Scalar array/list functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Remove examples from all aliases, maybe we should just remove the aliases for simple api surface\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "74b32214fb2c9a06f72cd0495b19fee5d5a3047b",
      "tree": "2fa3274158c7854408e6ebe03c4db5a8151a3ed0",
      "parents": [
        "f01f30c6332e40208e9f943a163a66e3d2781d08"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Wed Mar 18 01:52:23 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Mar 18 13:52:23 2026 +0800"
      },
      "message": "Add docstring examples for Aggregate statistical and regression functions (#1417)\n\n* Add docstring examples for Aggregate statistical and regression functions\n\nAdd example usage to docstrings for Aggregate statistical and regression functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Simplify covar\n\n* Make sure everything is google doc style\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "f01f30c6332e40208e9f943a163a66e3d2781d08",
      "tree": "989ad5ef60fa594b9b8b10d5158c9e8437a46845",
      "parents": [
        "3dfd6ee5d9ba7de0896f195cef5bc16b4d5f0dd0"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Wed Mar 18 01:51:06 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Mar 18 13:51:06 2026 +0800"
      },
      "message": "Add docstring examples for Scalar temporal functions (#1424)\n\n* Add docstring examples for Scalar temporal functions\n\nAdd example usage to docstrings for Scalar temporal functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Remove examples for aliases\n\n* Fix claude\u0027s attempt to cheat with sql\n\n* Make examples follow google docstyle\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "3dfd6ee5d9ba7de0896f195cef5bc16b4d5f0dd0",
      "tree": "2c95a68808117dc3fba60254907b5609ed21335c",
      "parents": [
        "93f4c34bf5a4afae2547d5ccb677143d1833ebf0"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Tue Mar 17 15:58:34 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 17 15:58:34 2026 -0400"
      },
      "message": "Fix CI errors on main (#1432)\n\n* Do not run check for patches on main, just release candidates\n\n* It is not necessary to pull submodules. It\u0027s only slowing down CI"
    },
    {
      "commit": "93f4c34bf5a4afae2547d5ccb677143d1833ebf0",
      "tree": "b791f414441679efa788d187f1aaadc63c08820e",
      "parents": [
        "e524121c8a68171d1031db0487ec13a547871c42"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Tue Mar 17 02:14:42 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 17 14:14:42 2026 +0800"
      },
      "message": "Add docstring examples for Aggregate basic and bitwise/boolean functions (#1416)\n\n* Add docstring examples for Aggregate basic and bitwise/boolean functions\n\nAdd example usage to docstrings for Aggregate basic and bitwise/boolean functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Add tighter bound on approx_distinct for small sizes\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "e524121c8a68171d1031db0487ec13a547871c42",
      "tree": "ff2519ced5b962699ba0797ec8c206a39b101b28",
      "parents": [
        "b9a958e3893a9a208d67aac314a9ede97b370679"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Tue Mar 17 02:13:40 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 17 14:13:40 2026 +0800"
      },
      "message": "Add docstring examples for Common utility functions (#1419)\n\n* Add docstring examples for Common utility functions\n\nAdd example usage to docstrings for Common utility functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Don\u0027t add examples for aliases\n\n* Parameters back to args\n\n* Examples to google doc style\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "b9a958e3893a9a208d67aac314a9ede97b370679",
      "tree": "a1fa23f725a95389e826fe6a79752677410a3168",
      "parents": [
        "89751b552e8c5388e9cc994acadf1de5b896422f"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Tue Mar 17 02:13:18 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 17 14:13:18 2026 +0800"
      },
      "message": "Add docstring examples for Scalar math functions (#1421)\n\n* Add docstring examples for Scalar math functions\n\nAdd example usage to docstrings for Scalar math functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Fix copy past error on name\n\n* Remove example from alias\n\n* Examples google docstyle\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "89751b552e8c5388e9cc994acadf1de5b896422f",
      "tree": "10ff116c84173f252c84b521ec9f5852aa2ae84c",
      "parents": [
        "21990b0bb01599fb67dbd8686c907e5f810aace3"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Tue Mar 17 02:12:28 2026 -0400"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Tue Mar 17 14:12:28 2026 +0800"
      },
      "message": "Add docstring examples for Scalar regex, crypto, struct and other (#1422)\n\n* Add docstring examples for Scalar regex, crypto, struct and other functions\n\nAdd example usage to docstrings for Scalar regex, crypto, struct and other functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Fix typo\n\n* Fix docstring already broken that I added an example to\n\n* Add sha outputs\n\n* clarify struct results\n\n* Examples should follow google docstyle\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "21990b0bb01599fb67dbd8686c907e5f810aace3",
      "tree": "b70f5fec8ab315add514b001dbc1e26146d8270b",
      "parents": [
        "9af1681f203ec2a21b64371a1dc6361641ffb2f9"
      ],
      "author": {
        "name": "Paul J. Davis",
        "email": "paul.joseph.davis@gmail.com",
        "time": "Mon Mar 16 07:07:50 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Mar 16 08:07:50 2026 -0400"
      },
      "message": "feat: Add FFI_TableProviderFactory support (#1396)\n\n* feat: Add FFI_TableProviderFactory support\n\nThis wraps the new FFI_TableProviderFactory APIs in datafusion-ffi.\n\n* Address PR comments\n\n* Add support for Python based TableProviderFactory\n\nThis adds the ability to register Python based TableProviderFactory\ninstances to the SessionContext.\n\n* Correction after rebase\n\n---------\n\nCo-authored-by: Tim Saucer \u003ctimsaucer@gmail.com\u003e"
    },
    {
      "commit": "9af1681f203ec2a21b64371a1dc6361641ffb2f9",
      "tree": "8dbd8e0b20021110c5e1a3f75ab67e1dea47c643",
      "parents": [
        "1160d5a91d586927dc6e466829965770c3fa299a"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon Mar 16 12:05:00 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Mar 16 07:05:00 2026 -0400"
      },
      "message": "Create workspace with core and util crates (#1414)\n\n* Break up the repository into a workspace with three crates\n\n* We have a workspace cargo lock file now so this is not needed\n\n* Cleanup\n\n* These files should be redundant because of the build.rs file\n\n* More moving around of utils to clean up\n\n* Add note on how to run FFI example tests\n\n* Add back in dep removed during rebase\n\n* taplo fmt\n\n* Since we have a workspace we know the example version is in sync so we do not need this test\n\n* Add description, homepage, and repository to Cargo.toml\n\nCo-authored-by: Kevin Liu \u003ckevinjqliu@users.noreply.github.com\u003e\n\n* Add description, homepage, and repository to Cargo.toml\n\nCo-authored-by: Kevin Liu \u003ckevinjqliu@users.noreply.github.com\u003e\n\n* Add description, homepage, and repository to Cargo.toml\n\nCo-authored-by: Kevin Liu \u003ckevinjqliu@users.noreply.github.com\u003e\n\n* Removed unused include\n\nCo-authored-by: Kevin Liu \u003ckevinjqliu@users.noreply.github.com\u003e\n\n---------\n\nCo-authored-by: Kevin Liu \u003ckevinjqliu@users.noreply.github.com\u003e"
    },
    {
      "commit": "1160d5a91d586927dc6e466829965770c3fa299a",
      "tree": "fe0c2acef8aa576601071750c7e28741cd3d8616",
      "parents": [
        "d322b7b7bfd527370f03854717661488737c9f8b"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Wed Mar 11 12:01:18 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Mar 11 07:01:18 2026 -0400"
      },
      "message": "Add docstring examples for Scalar trigonometric functions (#1411)\n\n* Add docstring examples for Scalar trigonometric functions\n\nAdd example usage to docstrings for Scalar trigonometric functions to improve documentation.\n\nCo-Authored-By: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e\n\n* Remove weird artifact\n\n* Move conftest so it doesn\u0027t get packaged in release\n\n---------\n\nCo-authored-by: Claude Opus 4.6 \u003cnoreply@anthropic.com\u003e"
    },
    {
      "commit": "d322b7b7bfd527370f03854717661488737c9f8b",
      "tree": "37023b01b33266caf5cc6064f2a6d56adc7cb74c",
      "parents": [
        "f914fc854a54ba133ee8eb0c3cb0e9845a5d7f7f"
      ],
      "author": {
        "name": "Daniel Mesejo",
        "email": "mesejoleon@gmail.com",
        "time": "Mon Mar 09 07:52:47 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Mar 09 14:52:47 2026 +0800"
      },
      "message": "feat: feat: add to_time, to_local_time, to_date functions (#1387)\n\n* feat: add to_time, to_local_time, to_date, to_char functions\n\nAdditionally fix conditional on formatters (since it is *args it cannot be None)\nRefactor name to avoid possible collision with f.\n\n* address comments in PR\n\n* chore: add tests for today"
    },
    {
      "commit": "f914fc854a54ba133ee8eb0c3cb0e9845a5d7f7f",
      "tree": "e6dc27e290f558ca3478c18f50120a02996d0e30",
      "parents": [
        "8ef2cd75d984758b3ae2db43629666da1a7bee19"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Fri Mar 06 21:12:56 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 06 21:12:56 2026 -0500"
      },
      "message": "Catch warnings in FFI unit tests (#1410)\n\n* Failed pytest in ffi crate when warnings are generated\n\n* Bump DF53 version"
    },
    {
      "commit": "8ef2cd75d984758b3ae2db43629666da1a7bee19",
      "tree": "bb9069da7f788023205ad827b3d87f4a9d492d92",
      "parents": [
        "231ed2b1d375fefe9aa01cdc8ae41c620c772f76"
      ],
      "author": {
        "name": "Nuno Faria",
        "email": "nunofpfaria@gmail.com",
        "time": "Fri Mar 06 16:11:42 2026 +0000"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Fri Mar 06 11:11:42 2026 -0500"
      },
      "message": "Upgrade to DataFusion 53 (#1402)\n\n* Upgrade to DataFusion 53\n\n* Fix fmt\n\n* Fix fmt\n\n* Fix docs\n\n* Bump datafusion rev to 53.0.0\n\n* Bump ffi example datafusion commit to the same as main repo\n\n---------\n\nCo-authored-by: Tim Saucer \u003ctimsaucer@gmail.com\u003e"
    },
    {
      "commit": "231ed2b1d375fefe9aa01cdc8ae41c620c772f76",
      "tree": "f4fe40a7e1e912afb92cd6b9530a1c782ec856e5",
      "parents": [
        "7b630ee893d6b81ae7a0f2c35b77dab723567b13"
      ],
      "author": {
        "name": "Nick",
        "email": "24689722+ntjohnson1@users.noreply.github.com",
        "time": "Thu Mar 05 10:42:20 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Mar 05 10:42:20 2026 -0500"
      },
      "message": "Enable doc tests in local and CI testing (#1409)\n\n* Turn on doctests\n\n* Fix existing doc examples\n\n* Remove stale referenece to rust-toolchain removed in #1383, surpised pre-commit didn\u0027t flag for anyone else"
    },
    {
      "commit": "7b630ee893d6b81ae7a0f2c35b77dab723567b13",
      "tree": "60130a9a39f2678310f80785c80ff5d8924ddb12",
      "parents": [
        "0c1499cddea5fa20c13728b0c2726aea4fbd1b08"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Wed Mar 04 10:24:32 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Mar 04 10:24:32 2026 -0500"
      },
      "message": "Add check for crates.io patches to CI (#1407)\n\n"
    },
    {
      "commit": "0c1499cddea5fa20c13728b0c2726aea4fbd1b08",
      "tree": "33c8d3d3a3a71aa5e3b5fd318dd49a9bc6159f1d",
      "parents": [
        "e42775c2fcfe8929df0874414ba2bcd6bbea174c"
      ],
      "author": {
        "name": "Kevin Liu",
        "email": "kevinjqliu@users.noreply.github.com",
        "time": "Mon Mar 02 08:21:37 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Mar 02 08:21:37 2026 -0500"
      },
      "message": "fix: satisfy rustfmt check in lib.rs re-exports (#1406)\n\n"
    },
    {
      "commit": "e42775c2fcfe8929df0874414ba2bcd6bbea174c",
      "tree": "dc00636ab2c095427c71c2ed6191101be8476d47",
      "parents": [
        "57a50faebb93365a56f337e53120ca215c03774b"
      ],
      "author": {
        "name": "dario curreri",
        "email": "48800335+dariocurr@users.noreply.github.com",
        "time": "Thu Feb 26 15:13:38 2026 +0100"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Thu Feb 26 09:13:38 2026 -0500"
      },
      "message": "ci: update pre-commit hooks, fix linting, and refresh dependencies (#1385)\n\n* ci: update pre-commit hooks and fix linting issues\n\n* Update Ruff version in pre-commit configuration to v0.15.1.\n* Add noqa comments to suppress specific linting warnings in various files.\n* Update regex patterns in test cases for better matching.\n\n* style: correct indentation in GitHub Actions workflow file\n\n* Adjusted indentation for the enable-cache option in the test.yml workflow file to ensure proper YAML formatting.\n\n* refactor: reorder imports in indexed_field.rs for clarity\n\n* Adjusted the order of imports in indexed_field.rs to improve readability and maintain consistency with project conventions.\n\n* build: update dependencies in Cargo.toml and Cargo.lock\n\n* Bump versions of several dependencies including tokio, pyo3-log, prost, uuid, and log to their latest releases.\n* Update Cargo.lock to reflect the changes in dependency versions.\n\n* style: format pyproject.toml for consistency\n\n* Adjusted formatting in pyproject.toml for improved readability by aligning lists and ensuring consistent indentation.\n* Updated dependencies and configuration settings for better organization.\n\n* style: remove noqa comments for import statements\n\n* Cleaned up import statements in multiple files by removing unnecessary noqa comments, enhancing code readability and maintaining consistency across the codebase.\n\n* style: simplify formatting in pyproject.toml\n\n* Streamlined list formatting in pyproject.toml for improved readability by removing unnecessary line breaks and ensuring consistent structure across sections.\n* No functional changes were made; the focus was solely on code style and organization."
    },
    {
      "commit": "57a50faebb93365a56f337e53120ca215c03774b",
      "tree": "f7ca24cd93ebf0be125113a2f62abd2fa532a613",
      "parents": [
        "22086650c8df8b7bc382130c9560f76762dbe6c0"
      ],
      "author": {
        "name": "Kevin Liu",
        "email": "kevinjqliu@users.noreply.github.com",
        "time": "Wed Feb 25 16:34:55 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Wed Feb 25 16:34:55 2026 -0500"
      },
      "message": "Allow running \"verify release candidate\" github workflow on Windows (#1392)\n\n* run for windows\n\n* readme"
    },
    {
      "commit": "22086650c8df8b7bc382130c9560f76762dbe6c0",
      "tree": "4a6818b9a4f57808858d63a81ffc4de3453fa942",
      "parents": [
        "44a3eb353960e96d16013139d30bb588b7c901db"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon Feb 23 08:36:12 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Feb 23 08:36:12 2026 -0500"
      },
      "message": "Add workflow to verify release candidate on multiple systems (#1388)\n\n* add workflow\n\n* add protoc\n\n* update coverage\n\n* upgrade\n\n* newline\n\n* add a note about manual trigger\n\n* add a section to release about manually running the matrix\n\n* more details\n\n---------\n\nCo-authored-by: Kevin Liu \u003ckevin.jq.liu@gmail.com\u003e"
    },
    {
      "commit": "44a3eb353960e96d16013139d30bb588b7c901db",
      "tree": "2311dfd754396d4a4b6a3a5847ac0af55b0b5773",
      "parents": [
        "d87c6e8049c165158071460f4550546fdc5c42c6"
      ],
      "author": {
        "name": "Tim Saucer",
        "email": "timsaucer@gmail.com",
        "time": "Mon Feb 23 07:29:03 2026 -0500"
      },
      "committer": {
        "name": "GitHub",
        "email": "noreply@github.com",
        "time": "Mon Feb 23 07:29:03 2026 -0500"
      },
      "message": "Merge release 52.0.0 into main (#1389)\n\n* Update version number to 52.0.0\n\n* Update changelog for 52.0.0"
    }
  ],
  "next": "d87c6e8049c165158071460f4550546fdc5c42c6"
}
