Spark SQL and DataFrames support the following data types:
ByteType
: Represents 1-byte signed integer numbers. The range of numbers is from -128
to 127
.ShortType
: Represents 2-byte signed integer numbers. The range of numbers is from -32768
to 32767
.IntegerType
: Represents 4-byte signed integer numbers. The range of numbers is from -2147483648
to 2147483647
.LongType
: Represents 8-byte signed integer numbers. The range of numbers is from -9223372036854775808
to 9223372036854775807
.FloatType
: Represents 4-byte single-precision floating point numbers.DoubleType
: Represents 8-byte double-precision floating point numbers.DecimalType
: Represents arbitrary-precision signed decimal numbers. Backed internally by java.math.BigDecimal
. A BigDecimal
consists of an arbitrary precision integer unscaled value and a 32-bit integer scale.StringType
: Represents character string values.BinaryType
: Represents byte sequence values.BooleanType
: Represents boolean values.TimestampType
: Represents values comprising values of fields year, month, day, hour, minute, and second.DateType
: Represents values comprising values of fields year, month, day.ArrayType(elementType, containsNull)
: Represents values comprising a sequence of elements with the type of elementType
. containsNull
is used to indicate if elements in a ArrayType
value can have null
values.MapType(keyType, valueType, valueContainsNull)
: Represents values comprising a set of key-value pairs. The data type of keys are described by keyType
and the data type of values are described by valueType
. For a MapType
value, keys are not allowed to have null
values. valueContainsNull
is used to indicate if values of a MapType
value can have null
values.StructType(fields)
: Represents values with the structure described by a sequence of StructField
s (fields
).StructField(name, dataType, nullable)
: Represents a field in a StructType
. The name of a field is indicated by name
. The data type of a field is indicated by dataType
. nullable
is used to indicate if values of this fields can have null
values.All data types of Spark SQL are located in the package org.apache.spark.sql.types
. You can access them by doing
{% include_example data_types scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
All data types of Spark SQL are located in the package of org.apache.spark.sql.types
. To access or create a data type, please use factory methods provided in org.apache.spark.sql.types.DataTypes
.
All data types of Spark SQL are located in the package of pyspark.sql.types
. You can access them by doing {% highlight python %} from pyspark.sql.types import * {% endhighlight %}
There is specially handling for not-a-number (NaN) when dealing with float
or double
types that does not exactly match standard floating point semantics. Specifically:
Operations performed on numeric types (with the exception of decimal
) are not checked for overflow. This means that in case an operation causes an overflow, the result is the same that the same operation returns in a Java/Scala program (eg. if the sum of 2 integers is higher than the maximum value representable, the result is a negative number).