[SPARK-49729][SQL][DOCS] Forcefully check `usage` and correct the non-standard writing of 4 expressions
### What changes were proposed in this pull request?
The pr aims to
- forcefully check `usage`
- correct the non-standard writing of 4 expressions (`shiftleft`, `shiftright`, `shiftrightunsigned`, `between`)
### Why are the changes needed?
1.When some expressions have non-standard `usage` writing, corresponding explanations may be omitted in our documentation, such as `shiftleft`
https://spark.apache.org/docs/preview/sql-ref-functions-builtin.html
- Before (Note: It looks very weird to only appear in `examples` and not in the `Conditional Functions` catalog)
<img width="906" alt="image" src="https://github.com/user-attachments/assets/b7713cda-a8ea-4367-82d6-252efbac1c47">
- After
<img width="904" alt="image" src="https://github.com/user-attachments/assets/ebbb1bc7-74a8-4acf-9d18-ec80b881aeb7">
2.When there is an `non-standard` writing format, it fails directly in GA and can be corrected in a timely manner to avoid omissions. Refer to `Manually check` below.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- Pass GA.
- Manually check:
```python
The usage of between is not standardized, please correct it. Refer to: `AesDecrypt`
------------------------------------------------
Jekyll 4.3.3 Please append `--trace` to the `build` command
for any additional information or backtrace.
------------------------------------------------
/Users/panbingkun/Developer/spark/spark-community/docs/_plugins/build_api_docs.rb:184:in `build_sql_docs': SQL doc generation failed (RuntimeError)
from /Users/panbingkun/Developer/spark/spark-community/docs/_plugins/build_api_docs.rb:225:in `<top (required)>'
from <internal:/Users/panbingkun/.rbenv/versions/3.3.2/lib/ruby/site_ruby/3.3.0/rubygems/core_ext/kernel_require.rb>:37:in `require'
from <internal:/Users/panbingkun/.rbenv/versions/3.3.2/lib/ruby/site_ruby/3.3.0/rubygems/core_ext/kernel_require.rb>:37:in `require'
from /Users/panbingkun/Developer/spark/spark-community/docs/.local_ruby_bundle/ruby/3.3.0/gems/jekyll-4.3.3/lib/jekyll/external.rb:57:in `block in require_with_graceful_fail'
from /Users/panbingkun/Developer/spark/spark-community/docs/.local_ruby_bundle/ruby/3.3.0/gems/jekyll-4.3.3/lib/jekyll/external.rb:55:in `each'
from /Users/panbingkun/Developer/spark/spark-community/docs/.local_ruby_bundle/ruby/3.3.0/gems/jekyll-4.3.3/lib/jekyll/external.rb:55:in `require_with_graceful_fail'
from /Users/panbingkun/Developer/spark/spark-community/docs/.local_ruby_bundle/ruby/3.3.0/gems/jekyll-4.3.3/lib/jekyll/plugin_manager.rb:96:in `block in require_plugin_files'
```
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #48179 from panbingkun/SPARK-49729.
Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
index deec1ab..c226e48 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Between.scala
@@ -21,7 +21,7 @@
// scalastyle:off line.size.limit
@ExpressionDescription(
- usage = "Usage: input [NOT] BETWEEN lower AND upper - evaluate if `input` is [not] in between `lower` and `upper`",
+ usage = "input [NOT] _FUNC_ lower AND upper - evaluate if `input` is [not] in between `lower` and `upper`",
examples = """
Examples:
> SELECT 0.5 _FUNC_ 0.1 AND 1.0;
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
index 00274a1..ddba820 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
@@ -1293,7 +1293,7 @@
* @param right number of bits to left shift.
*/
@ExpressionDescription(
- usage = "base << exp - Bitwise left shift.",
+ usage = "base _FUNC_ exp - Bitwise left shift.",
examples = """
Examples:
> SELECT shiftleft(2, 1);
@@ -1322,7 +1322,7 @@
* @param right number of bits to right shift.
*/
@ExpressionDescription(
- usage = "base >> expr - Bitwise (signed) right shift.",
+ usage = "base _FUNC_ expr - Bitwise (signed) right shift.",
examples = """
Examples:
> SELECT shiftright(4, 1);
@@ -1350,7 +1350,7 @@
* @param right the number of bits to right shift.
*/
@ExpressionDescription(
- usage = "base >>> expr - Bitwise unsigned right shift.",
+ usage = "base _FUNC_ expr - Bitwise unsigned right shift.",
examples = """
Examples:
> SELECT shiftrightunsigned(4, 1);
diff --git a/sql/gen-sql-functions-docs.py b/sql/gen-sql-functions-docs.py
index bb813cf..4be9966 100644
--- a/sql/gen-sql-functions-docs.py
+++ b/sql/gen-sql-functions-docs.py
@@ -39,6 +39,10 @@
}
+def _print_red(text):
+ print('\033[31m' + text + '\033[0m')
+
+
def _list_grouped_function_infos(jvm):
"""
Returns a list of function information grouped by each group value via JVM.
@@ -126,7 +130,13 @@
func_name = "\\" + func_name
elif (info.name == "when"):
func_name = "CASE WHEN"
- usages = iter(re.split(r"(.*%s.*) - " % func_name, info.usage.strip())[1:])
+ expr_usages = re.split(r"(.*%s.*) - " % func_name, info.usage.strip())
+ if len(expr_usages) <= 1:
+ _print_red("\nThe `usage` of %s is not standardized, please correct it. "
+ "Refer to: `AesDecrypt`" % (func_name))
+ os._exit(-1)
+ usages = iter(expr_usages[1:])
+
for (sig, description) in zip(usages, usages):
result.append(" <tr>")
result.append(" <td>%s</td>" % sig)
diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
index 594c097..1405103 100644
--- a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
+++ b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
@@ -246,7 +246,7 @@
checkKeywordsExist(sql("describe function `between`"),
"Function: between",
- "Usage: input [NOT] BETWEEN lower AND upper - " +
+ "input [NOT] between lower AND upper - " +
"evaluate if `input` is [not] in between `lower` and `upper`")
checkKeywordsExist(sql("describe function `case`"),