[SPARK-57726][SQL] Fix NPE in AttributeReference.hashCode when the attribute name is null
### What changes were proposed in this pull request?
`AttributeReference.hashCode` computes `name.hashCode()` directly:
```scala
override def hashCode: Int = {
var h = 17
h = h * 37 + name.hashCode() // NPE if name == null
...
}
```
This throws a `NullPointerException` when the attribute has a null name. This PR makes the name contribution null-safe using `Objects.hashCode(name)` (`java.util.Objects` is already imported).
### Why are the changes needed?
`AttributeReference.equals` already treats the name nullably (`name == ar.name`), but `hashCode` does not. An `AttributeReference` can carry a null name (e.g. a `StructField` permits a null name and it flows into `AttributeReference` via `DataTypeUtils.toAttribute`). When such an attribute is placed in a hash-based collection (`HashSet`/`HashMap`, or via `distinct`/`toSet`), `hashCode` throws an NPE.
This was noticed during review of #56831 (SPARK-57725), which fixes a related but distinct null-name NPE in `AttributeSeq`'s case-insensitive name maps. The two issues are independent and are addressed separately.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added a test in `NamedExpressionSuite` asserting that `AttributeReference.hashCode` does not throw on a null-named attribute and that the equals/hashCode contract holds:
```
build/sbt 'catalyst/testOnly *NamedExpressionSuite'
```
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
Closes #56832 from MaxGekk/fix-attrref-hashcode-npe.
Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
index d27f140..c7a2165 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
@@ -312,7 +312,9 @@
override def hashCode: Int = {
// See http://stackoverflow.com/questions/113511/hash-code-implementation
var h = 17
- h = h * 37 + name.hashCode()
+ // Use Objects.hashCode to stay null-safe: an AttributeReference can carry a null name
+ // (e.g. from a StructField built with a null name), and equals already treats name nullably.
+ h = h * 37 + Objects.hashCode(name)
h = h * 37 + dataType.hashCode()
h = h * 37 + nullable.hashCode()
h = h * 37 + metadata.hashCode()
diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NamedExpressionSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NamedExpressionSuite.scala
index 3e6f40f..362313c 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NamedExpressionSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NamedExpressionSuite.scala
@@ -64,4 +64,17 @@
assert(!alias.metadata.contains(nonInheritableMetadataKey))
assert(alias.metadata.contains("key"))
}
+
+ test("SPARK-57726: AttributeReference.hashCode is null-safe when the name is null") {
+ // An AttributeReference can carry a null name (e.g. from a StructField built with a null
+ // name). hashCode must not throw a NullPointerException so the attribute can be used in
+ // hash-based collections; equals is already null-safe on the name.
+ val exprId = NamedExpression.newExprId
+ val a1 = AttributeReference(null, IntegerType)(exprId)
+ val a2 = AttributeReference(null, IntegerType)(exprId)
+ a1.hashCode()
+ assert(a1 == a2)
+ assert(a1.hashCode() == a2.hashCode())
+ assert(Set(a1).contains(a2))
+ }
}