To throw a user-facing error or exception, developers should specify a standardized SQLSTATE, an error condition, and message parameters rather than an arbitrary error message.
This guide will describe how to do this.
The error hierarchy is as follows:
The error state / SQLSTATE itself is comprised of two parts:
Acceptable values for these various error parts are defined in the following files:
The terms error class, state, and condition come from the SQL standard.
42K01 (Class: 42; Sub-class: K01)DATATYPE_MISSING_SIZEINCOMPLETE_TYPE_DEFINITIONARRAYMAPSTRUCT42604 (Class: 42; Sub-class: 604)INVALID_ESCAPE_CHARAS_OF_JOINTOLERANCE_IS_NON_NEGATIVETOLERANCE_IS_UNFOLDABLEUNSUPPORTED_DIRECTIONUnfortunately, we have historically used the term “error class” inconsistently to refer both to a proper error class like 42 and also to an error condition like DATATYPE_MISSING_SIZE.
Fixing this will require renaming SparkException.errorClass to SparkException.errorCondition and making similar changes to ErrorClassesJsonReader and other parts of the codebase. We will address this in SPARK-47429. Until that is complete, we will have to live with the fact that a string like DATATYPE_MISSING_SIZE is called an “error condition” in our user-facing documentation but an “error class” in the code.
For more details, please see SPARK-46810.
INTERNAL_ERROR and skip to step 4.error-conditions.json. If true, use the error condition and skip to step 4.error-conditions.json. If the new condition requires a new error state, add the new error state to error-states.json.SparkThrowable. If true, skip to step 6.SparkThrowable into the exception.QueryCompilationErrors.scala to instantiate the exception.Throw with arbitrary error message:
throw new TestException("Problem A because B")
error-conditions.json
"PROBLEM_BECAUSE" : { "message" : ["Problem <problem> because <cause>"], "sqlState" : "XXXXX" }
SparkException.scala
class SparkTestException( errorClass: String, messageParameters: Map[String, String]) extends TestException(SparkThrowableHelper.getMessage(errorClass, messageParameters)) with SparkThrowable { override def getMessageParameters: java.util.Map[String, String] = messageParameters.asJava override def getErrorClass: String = errorClass }
Throw with error condition and message parameters:
throw new SparkTestException("PROBLEM_BECAUSE", Map("problem" -> "A", "cause" -> "B"))
To access error fields, catch exceptions that extend org.apache.spark.SparkThrowable and access
getErrorClassgetSqlStatetry { ... } catch { case e: SparkThrowable if Option(e.getSqlState).forall(_.startsWith("42")) => warn("Syntax error") }
Error conditions are a succinct, human-readable representation of the error category.
An uncategorized errors can be assigned to a legacy error condition with the prefix _LEGACY_ERROR_TEMP_ and an unused sequential number, for instance _LEGACY_ERROR_TEMP_0053.
You should not introduce new uncategorized errors. Instead, convert them to proper errors whenever encountering them in new code.
Note: Though the proper term for this field is an “error condition”, it is called errorClass in the codebase due to an unfortunate accident of history. For more details, please refer to SPARK-46810.
Error messages provide a descriptive, human-readable representation of the error. The message format accepts string parameters via the HTML tag syntax: e.g. <relationName>.
The values passed to the message should not themselves be messages. They should be: runtime-values, keywords, identifiers, or other values that are not translated.
The quality of the error message should match the guidelines.
SQLSTATE is a mandatory portable error identifier across SQL engines. SQLSTATE comprises a 2-character class followed by a 3-character sub-class. Spark prefers to re-use existing SQLSTATEs, preferably used by multiple vendors. For extension Spark claims the K** sub-class range. If a new class is needed it will also claim the K0 class.
Internal errors should use the XX class. You can subdivide internal errors by component. For example: The existing XXKD0 is used for an internal analyzer error.
The SQLSTATEs in error-states.json are collated from: