| .. Licensed to the Apache Software Foundation (ASF) under one |
| .. or more contributor license agreements. See the NOTICE file |
| .. distributed with this work for additional information |
| .. regarding copyright ownership. The ASF licenses this file |
| .. to you under the Apache License, Version 2.0 (the |
| .. "License"); you may not use this file except in compliance |
| .. with the License. You may obtain a copy of the License at |
| .. |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| .. |
| .. Unless required by applicable law or agreed to in writing, software |
| .. distributed under the License is distributed on an "AS IS" BASIS, |
| .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| .. See the License for the specific language governing permissions and |
| .. limitations under the License. |
| |
| .. highlight:: cql |
| |
| .. _cql-functions: |
| |
| .. Need some intro for UDF and native functions in general and point those to it. |
| .. _udfs: |
| .. _native-functions: |
| |
| Functions |
| --------- |
| |
| CQL supports 2 main categories of functions: |
| |
| - the :ref:`scalar functions <scalar-functions>`, which simply take a number of values and produce an output with it. |
| - the :ref:`aggregate functions <aggregate-functions>`, which are used to aggregate multiple rows results from a |
| ``SELECT`` statement. |
| |
| In both cases, CQL provides a number of native "hard-coded" functions as well as the ability to create new user-defined |
| functions. |
| |
| .. note:: By default, the use of user-defined functions is disabled by default for security concerns (even when |
| enabled, the execution of user-defined functions is sandboxed and a "rogue" function should not be allowed to do |
| evil, but no sandbox is perfect so using user-defined functions is opt-in). See the ``enable_user_defined_functions`` |
| in ``cassandra.yaml`` to enable them. |
| |
| A function is identifier by its name: |
| |
| .. productionlist:: |
| function_name: [ `keyspace_name` '.' ] `name` |
| |
| .. _scalar-functions: |
| |
| Scalar functions |
| ^^^^^^^^^^^^^^^^ |
| |
| .. _scalar-native-functions: |
| |
| Native functions |
| ~~~~~~~~~~~~~~~~ |
| |
| Cast |
| ```` |
| |
| The ``cast`` function can be used to converts one native datatype to another. |
| |
| The following table describes the conversions supported by the ``cast`` function. Cassandra will silently ignore any |
| cast converting a datatype into its own datatype. |
| |
| =============== ======================================================================================================= |
| From To |
| =============== ======================================================================================================= |
| ``ascii`` ``text``, ``varchar`` |
| ``bigint`` ``tinyint``, ``smallint``, ``int``, ``float``, ``double``, ``decimal``, ``varint``, ``text``, |
| ``varchar`` |
| ``boolean`` ``text``, ``varchar`` |
| ``counter`` ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``double``, ``decimal``, ``varint``, |
| ``text``, ``varchar`` |
| ``date`` ``timestamp`` |
| ``decimal`` ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``double``, ``varint``, ``text``, |
| ``varchar`` |
| ``double`` ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``decimal``, ``varint``, ``text``, |
| ``varchar`` |
| ``float`` ``tinyint``, ``smallint``, ``int``, ``bigint``, ``double``, ``decimal``, ``varint``, ``text``, |
| ``varchar`` |
| ``inet`` ``text``, ``varchar`` |
| ``int`` ``tinyint``, ``smallint``, ``bigint``, ``float``, ``double``, ``decimal``, ``varint``, ``text``, |
| ``varchar`` |
| ``smallint`` ``tinyint``, ``int``, ``bigint``, ``float``, ``double``, ``decimal``, ``varint``, ``text``, |
| ``varchar`` |
| ``time`` ``text``, ``varchar`` |
| ``timestamp`` ``date``, ``text``, ``varchar`` |
| ``timeuuid`` ``timestamp``, ``date``, ``text``, ``varchar`` |
| ``tinyint`` ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``double``, ``decimal``, ``varint``, |
| ``text``, ``varchar`` |
| ``uuid`` ``text``, ``varchar`` |
| ``varint`` ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``double``, ``decimal``, ``text``, |
| ``varchar`` |
| =============== ======================================================================================================= |
| |
| The conversions rely strictly on Java's semantics. For example, the double value 1 will be converted to the text value |
| '1.0'. For instance:: |
| |
| SELECT avg(cast(count as double)) FROM myTable |
| |
| Token |
| ````` |
| |
| The ``token`` function allows to compute the token for a given partition key. The exact signature of the token function |
| depends on the table concerned and of the partitioner used by the cluster. |
| |
| The type of the arguments of the ``token`` depend on the type of the partition key columns. The return type depend on |
| the partitioner in use: |
| |
| - For Murmur3Partitioner, the return type is ``bigint``. |
| - For RandomPartitioner, the return type is ``varint``. |
| - For ByteOrderedPartitioner, the return type is ``blob``. |
| |
| For instance, in a cluster using the default Murmur3Partitioner, if a table is defined by:: |
| |
| CREATE TABLE users ( |
| userid text PRIMARY KEY, |
| username text, |
| ) |
| |
| then the ``token`` function will take a single argument of type ``text`` (in that case, the partition key is ``userid`` |
| (there is no clustering columns so the partition key is the same than the primary key)), and the return type will be |
| ``bigint``. |
| |
| Uuid |
| ```` |
| The ``uuid`` function takes no parameters and generates a random type 4 uuid suitable for use in ``INSERT`` or |
| ``UPDATE`` statements. |
| |
| .. _timeuuid-functions: |
| |
| Timeuuid functions |
| `````````````````` |
| |
| ``now`` |
| ####### |
| |
| The ``now`` function takes no arguments and generates, on the coordinator node, a new unique timeuuid (at the time where |
| the statement using it is executed). Note that this method is useful for insertion but is largely non-sensical in |
| ``WHERE`` clauses. For instance, a query of the form:: |
| |
| SELECT * FROM myTable WHERE t = now() |
| |
| will never return any result by design, since the value returned by ``now()`` is guaranteed to be unique. |
| |
| ``minTimeuuid`` and ``maxTimeuuid`` |
| ################################### |
| |
| The ``minTimeuuid`` (resp. ``maxTimeuuid``) function takes a ``timestamp`` value ``t`` (which can be `either a timestamp |
| or a date string <timestamps>`) and return a *fake* ``timeuuid`` corresponding to the *smallest* (resp. *biggest*) |
| possible ``timeuuid`` having for timestamp ``t``. So for instance:: |
| |
| SELECT * FROM myTable |
| WHERE t > maxTimeuuid('2013-01-01 00:05+0000') |
| AND t < minTimeuuid('2013-02-02 10:00+0000') |
| |
| will select all rows where the ``timeuuid`` column ``t`` is strictly older than ``'2013-01-01 00:05+0000'`` but strictly |
| younger than ``'2013-02-02 10:00+0000'``. Please note that ``t >= maxTimeuuid('2013-01-01 00:05+0000')`` would still |
| *not* select a ``timeuuid`` generated exactly at '2013-01-01 00:05+0000' and is essentially equivalent to ``t > |
| maxTimeuuid('2013-01-01 00:05+0000')``. |
| |
| .. note:: We called the values generated by ``minTimeuuid`` and ``maxTimeuuid`` *fake* UUID because they do no respect |
| the Time-Based UUID generation process specified by the `RFC 4122 <http://www.ietf.org/rfc/rfc4122.txt>`__. In |
| particular, the value returned by these 2 methods will not be unique. This means you should only use those methods |
| for querying (as in the example above). Inserting the result of those methods is almost certainly *a bad idea*. |
| |
| Time conversion functions |
| ````````````````````````` |
| |
| A number of functions are provided to “convert” a ``timeuuid``, a ``timestamp`` or a ``date`` into another ``native`` |
| type. |
| |
| ===================== =============== =================================================================== |
| Function name Input type Description |
| ===================== =============== =================================================================== |
| ``toDate`` ``timeuuid`` Converts the ``timeuuid`` argument into a ``date`` type |
| ``toDate`` ``timestamp`` Converts the ``timestamp`` argument into a ``date`` type |
| ``toTimestamp`` ``timeuuid`` Converts the ``timeuuid`` argument into a ``timestamp`` type |
| ``toTimestamp`` ``date`` Converts the ``date`` argument into a ``timestamp`` type |
| ``toUnixTimestamp`` ``timeuuid`` Converts the ``timeuuid`` argument into a ``bigInt`` raw value |
| ``toUnixTimestamp`` ``timestamp`` Converts the ``timestamp`` argument into a ``bigInt`` raw value |
| ``toUnixTimestamp`` ``date`` Converts the ``date`` argument into a ``bigInt`` raw value |
| ``dateOf`` ``timeuuid`` Similar to ``toTimestamp(timeuuid)`` (DEPRECATED) |
| ``unixTimestampOf`` ``timeuuid`` Similar to ``toUnixTimestamp(timeuuid)`` (DEPRECATED) |
| ===================== =============== =================================================================== |
| |
| Blob conversion functions |
| ````````````````````````` |
| A number of functions are provided to “convert” the native types into binary data (``blob``). For every |
| ``<native-type>`` ``type`` supported by CQL (a notable exceptions is ``blob``, for obvious reasons), the function |
| ``typeAsBlob`` takes a argument of type ``type`` and return it as a ``blob``. Conversely, the function ``blobAsType`` |
| takes a 64-bit ``blob`` argument and convert it to a ``bigint`` value. And so for instance, ``bigintAsBlob(3)`` is |
| ``0x0000000000000003`` and ``blobAsBigint(0x0000000000000003)`` is ``3``. |
| |
| .. _user-defined-scalar-functions: |
| |
| User-defined functions |
| ~~~~~~~~~~~~~~~~~~~~~~ |
| |
| User-defined functions allow execution of user-provided code in Cassandra. By default, Cassandra supports defining |
| functions in *Java* and *JavaScript*. Support for other JSR 223 compliant scripting languages (such as Python, Ruby, and |
| Scala) can be added by adding a JAR to the classpath. |
| |
| UDFs are part of the Cassandra schema. As such, they are automatically propagated to all nodes in the cluster. |
| |
| UDFs can be *overloaded* - i.e. multiple UDFs with different argument types but the same function name. Example:: |
| |
| CREATE FUNCTION sample ( arg int ) ...; |
| CREATE FUNCTION sample ( arg text ) ...; |
| |
| User-defined functions are susceptible to all of the normal problems with the chosen programming language. Accordingly, |
| implementations should be safe against null pointer exceptions, illegal arguments, or any other potential source of |
| exceptions. An exception during function execution will result in the entire statement failing. |
| |
| It is valid to use *complex* types like collections, tuple types and user-defined types as argument and return types. |
| Tuple types and user-defined types are handled by the conversion functions of the DataStax Java Driver. Please see the |
| documentation of the Java Driver for details on handling tuple types and user-defined types. |
| |
| Arguments for functions can be literals or terms. Prepared statement placeholders can be used, too. |
| |
| Note that you can use the double-quoted string syntax to enclose the UDF source code. For example:: |
| |
| CREATE FUNCTION some_function ( arg int ) |
| RETURNS NULL ON NULL INPUT |
| RETURNS int |
| LANGUAGE java |
| AS $$ return arg; $$; |
| |
| SELECT some_function(column) FROM atable ...; |
| UPDATE atable SET col = some_function(?) ...; |
| |
| CREATE TYPE custom_type (txt text, i int); |
| CREATE FUNCTION fct_using_udt ( udtarg frozen ) |
| RETURNS NULL ON NULL INPUT |
| RETURNS text |
| LANGUAGE java |
| AS $$ return udtarg.getString("txt"); $$; |
| |
| User-defined functions can be used in ``SELECT``, ``INSERT`` and ``UPDATE`` statements. |
| |
| The implicitly available ``udfContext`` field (or binding for script UDFs) provides the necessary functionality to |
| create new UDT and tuple values:: |
| |
| CREATE TYPE custom_type (txt text, i int); |
| CREATE FUNCTION fct\_using\_udt ( somearg int ) |
| RETURNS NULL ON NULL INPUT |
| RETURNS custom_type |
| LANGUAGE java |
| AS $$ |
| UDTValue udt = udfContext.newReturnUDTValue(); |
| udt.setString("txt", "some string"); |
| udt.setInt("i", 42); |
| return udt; |
| $$; |
| |
| The definition of the ``UDFContext`` interface can be found in the Apache Cassandra source code for |
| ``org.apache.cassandra.cql3.functions.UDFContext``. |
| |
| .. code-block:: java |
| |
| public interface UDFContext |
| { |
| UDTValue newArgUDTValue(String argName); |
| UDTValue newArgUDTValue(int argNum); |
| UDTValue newReturnUDTValue(); |
| UDTValue newUDTValue(String udtName); |
| TupleValue newArgTupleValue(String argName); |
| TupleValue newArgTupleValue(int argNum); |
| TupleValue newReturnTupleValue(); |
| TupleValue newTupleValue(String cqlDefinition); |
| } |
| |
| Java UDFs already have some imports for common interfaces and classes defined. These imports are: |
| |
| .. code-block:: java |
| |
| import java.nio.ByteBuffer; |
| import java.util.List; |
| import java.util.Map; |
| import java.util.Set; |
| import org.apache.cassandra.cql3.functions.UDFContext; |
| import com.datastax.driver.core.TypeCodec; |
| import com.datastax.driver.core.TupleValue; |
| import com.datastax.driver.core.UDTValue; |
| |
| Please note, that these convenience imports are not available for script UDFs. |
| |
| .. _create-function-statement: |
| |
| CREATE FUNCTION |
| ``````````````` |
| |
| Creating a new user-defined function uses the ``CREATE FUNCTION`` statement: |
| |
| .. productionlist:: |
| create_function_statement: CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS] |
| : `function_name` '(' `arguments_declaration` ')' |
| : [ CALLED | RETURNS NULL ] ON NULL INPUT |
| : RETURNS `cql_type` |
| : LANGUAGE `identifier` |
| : AS `string` |
| arguments_declaration: `identifier` `cql_type` ( ',' `identifier` `cql_type` )* |
| |
| For instance:: |
| |
| CREATE OR REPLACE FUNCTION somefunction(somearg int, anotherarg text, complexarg frozen<someUDT>, listarg list) |
| RETURNS NULL ON NULL INPUT |
| RETURNS text |
| LANGUAGE java |
| AS $$ |
| // some Java code |
| $$; |
| |
| CREATE FUNCTION IF NOT EXISTS akeyspace.fname(someArg int) |
| CALLED ON NULL INPUT |
| RETURNS text |
| LANGUAGE java |
| AS $$ |
| // some Java code |
| $$; |
| |
| ``CREATE FUNCTION`` with the optional ``OR REPLACE`` keywords either creates a function or replaces an existing one with |
| the same signature. A ``CREATE FUNCTION`` without ``OR REPLACE`` fails if a function with the same signature already |
| exists. |
| |
| If the optional ``IF NOT EXISTS`` keywords are used, the function will |
| only be created if another function with the same signature does not |
| exist. |
| |
| ``OR REPLACE`` and ``IF NOT EXISTS`` cannot be used together. |
| |
| Behavior on invocation with ``null`` values must be defined for each |
| function. There are two options: |
| |
| #. ``RETURNS NULL ON NULL INPUT`` declares that the function will always |
| return ``null`` if any of the input arguments is ``null``. |
| #. ``CALLED ON NULL INPUT`` declares that the function will always be |
| executed. |
| |
| Function Signature |
| ################## |
| |
| Signatures are used to distinguish individual functions. The signature consists of: |
| |
| #. The fully qualified function name - i.e *keyspace* plus *function-name* |
| #. The concatenated list of all argument types |
| |
| Note that keyspace names, function names and argument types are subject to the default naming conventions and |
| case-sensitivity rules. |
| |
| Functions belong to a keyspace. If no keyspace is specified in ``<function-name>``, the current keyspace is used (i.e. |
| the keyspace specified using the ``USE`` statement). It is not possible to create a user-defined function in one of the |
| system keyspaces. |
| |
| .. _drop-function-statement: |
| |
| DROP FUNCTION |
| ````````````` |
| |
| Dropping a function uses the ``DROP FUNCTION`` statement: |
| |
| .. productionlist:: |
| drop_function_statement: DROP FUNCTION [ IF EXISTS ] `function_name` [ '(' `arguments_signature` ')' ] |
| arguments_signature: `cql_type` ( ',' `cql_type` )* |
| |
| For instance:: |
| |
| DROP FUNCTION myfunction; |
| DROP FUNCTION mykeyspace.afunction; |
| DROP FUNCTION afunction ( int ); |
| DROP FUNCTION afunction ( text ); |
| |
| You must specify the argument types (:token:`arguments_signature`) of the function to drop if there are multiple |
| functions with the same name but a different signature (overloaded functions). |
| |
| ``DROP FUNCTION`` with the optional ``IF EXISTS`` keywords drops a function if it exists, but does not throw an error if |
| it doesn't |
| |
| .. _aggregate-functions: |
| |
| Aggregate functions |
| ^^^^^^^^^^^^^^^^^^^ |
| |
| Aggregate functions work on a set of rows. They receive values for each row and returns one value for the whole set. |
| |
| If ``normal`` columns, ``scalar functions``, ``UDT`` fields, ``writetime`` or ``ttl`` are selected together with |
| aggregate functions, the values returned for them will be the ones of the first row matching the query. |
| |
| Native aggregates |
| ~~~~~~~~~~~~~~~~~ |
| |
| .. _count-function: |
| |
| Count |
| ````` |
| |
| The ``count`` function can be used to count the rows returned by a query. Example:: |
| |
| SELECT COUNT (*) FROM plays; |
| SELECT COUNT (1) FROM plays; |
| |
| It also can be used to count the non null value of a given column:: |
| |
| SELECT COUNT (scores) FROM plays; |
| |
| Max and Min |
| ``````````` |
| |
| The ``max`` and ``min`` functions can be used to compute the maximum and the minimum value returned by a query for a |
| given column. For instance:: |
| |
| SELECT MIN (players), MAX (players) FROM plays WHERE game = 'quake'; |
| |
| Sum |
| ``` |
| |
| The ``sum`` function can be used to sum up all the values returned by a query for a given column. For instance:: |
| |
| SELECT SUM (players) FROM plays; |
| |
| Avg |
| ``` |
| |
| The ``avg`` function can be used to compute the average of all the values returned by a query for a given column. For |
| instance:: |
| |
| SELECT AVG (players) FROM plays; |
| |
| .. _user-defined-aggregates-functions: |
| |
| User-Defined Aggregates |
| ~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| User-defined aggregates allow the creation of custom aggregate functions. Common examples of aggregate functions are |
| *count*, *min*, and *max*. |
| |
| Each aggregate requires an *initial state* (``INITCOND``, which defaults to ``null``) of type ``STYPE``. The first |
| argument of the state function must have type ``STYPE``. The remaining arguments of the state function must match the |
| types of the user-defined aggregate arguments. The state function is called once for each row, and the value returned by |
| the state function becomes the new state. After all rows are processed, the optional ``FINALFUNC`` is executed with last |
| state value as its argument. |
| |
| ``STYPE`` is mandatory in order to be able to distinguish possibly overloaded versions of the state and/or final |
| function (since the overload can appear after creation of the aggregate). |
| |
| User-defined aggregates can be used in ``SELECT`` statement. |
| |
| A complete working example for user-defined aggregates (assuming that a keyspace has been selected using the ``USE`` |
| statement):: |
| |
| CREATE OR REPLACE FUNCTION averageState(state tuple<int,bigint>, val int) |
| CALLED ON NULL INPUT |
| RETURNS tuple |
| LANGUAGE java |
| AS $$ |
| if (val != null) { |
| state.setInt(0, state.getInt(0)+1); |
| state.setLong(1, state.getLong(1)+val.intValue()); |
| } |
| return state; |
| $$; |
| |
| CREATE OR REPLACE FUNCTION averageFinal (state tuple<int,bigint>) |
| CALLED ON NULL INPUT |
| RETURNS double |
| LANGUAGE java |
| AS $$ |
| double r = 0; |
| if (state.getInt(0) == 0) return null; |
| r = state.getLong(1); |
| r /= state.getInt(0); |
| return Double.valueOf(r); |
| $$; |
| |
| CREATE OR REPLACE AGGREGATE average(int) |
| SFUNC averageState |
| STYPE tuple |
| FINALFUNC averageFinal |
| INITCOND (0, 0); |
| |
| CREATE TABLE atable ( |
| pk int PRIMARY KEY, |
| val int |
| ); |
| |
| INSERT INTO atable (pk, val) VALUES (1,1); |
| INSERT INTO atable (pk, val) VALUES (2,2); |
| INSERT INTO atable (pk, val) VALUES (3,3); |
| INSERT INTO atable (pk, val) VALUES (4,4); |
| |
| SELECT average(val) FROM atable; |
| |
| .. _create-aggregate-statement: |
| |
| CREATE AGGREGATE |
| ```````````````` |
| |
| Creating (or replacing) a user-defined aggregate function uses the ``CREATE AGGREGATE`` statement: |
| |
| .. productionlist:: |
| create_aggregate_statement: CREATE [ OR REPLACE ] AGGREGATE [ IF NOT EXISTS ] |
| : `function_name` '(' `arguments_signature` ')' |
| : SFUNC `function_name` |
| : STYPE `cql_type` |
| : [ FINALFUNC `function_name` ] |
| : [ INITCOND `term` ] |
| |
| See above for a complete example. |
| |
| ``CREATE AGGREGATE`` with the optional ``OR REPLACE`` keywords either creates an aggregate or replaces an existing one |
| with the same signature. A ``CREATE AGGREGATE`` without ``OR REPLACE`` fails if an aggregate with the same signature |
| already exists. |
| |
| ``CREATE AGGREGATE`` with the optional ``IF NOT EXISTS`` keywords either creates an aggregate if it does not already |
| exist. |
| |
| ``OR REPLACE`` and ``IF NOT EXISTS`` cannot be used together. |
| |
| ``STYPE`` defines the type of the state value and must be specified. |
| |
| The optional ``INITCOND`` defines the initial state value for the aggregate. It defaults to ``null``. A non-\ ``null`` |
| ``INITCOND`` must be specified for state functions that are declared with ``RETURNS NULL ON NULL INPUT``. |
| |
| ``SFUNC`` references an existing function to be used as the state modifying function. The type of first argument of the |
| state function must match ``STYPE``. The remaining argument types of the state function must match the argument types of |
| the aggregate function. State is not updated for state functions declared with ``RETURNS NULL ON NULL INPUT`` and called |
| with ``null``. |
| |
| The optional ``FINALFUNC`` is called just before the aggregate result is returned. It must take only one argument with |
| type ``STYPE``. The return type of the ``FINALFUNC`` may be a different type. A final function declared with ``RETURNS |
| NULL ON NULL INPUT`` means that the aggregate's return value will be ``null``, if the last state is ``null``. |
| |
| If no ``FINALFUNC`` is defined, the overall return type of the aggregate function is ``STYPE``. If a ``FINALFUNC`` is |
| defined, it is the return type of that function. |
| |
| .. _drop-aggregate-statement: |
| |
| DROP AGGREGATE |
| `````````````` |
| |
| Dropping an user-defined aggregate function uses the ``DROP AGGREGATE`` statement: |
| |
| .. productionlist:: |
| drop_aggregate_statement: DROP AGGREGATE [ IF EXISTS ] `function_name` [ '(' `arguments_signature` ')' ] |
| |
| For instance:: |
| |
| DROP AGGREGATE myAggregate; |
| DROP AGGREGATE myKeyspace.anAggregate; |
| DROP AGGREGATE someAggregate ( int ); |
| DROP AGGREGATE someAggregate ( text ); |
| |
| The ``DROP AGGREGATE`` statement removes an aggregate created using ``CREATE AGGREGATE``. You must specify the argument |
| types of the aggregate to drop if there are multiple aggregates with the same name but a different signature (overloaded |
| aggregates). |
| |
| ``DROP AGGREGATE`` with the optional ``IF EXISTS`` keywords drops an aggregate if it exists, and does nothing if a |
| function with the signature does not exist. |