blob: 125d88cb96d947ba914371a817843a2ffd2206d0 [file] [log] [blame]
= Using Arithmetic Operators in Cassandra 4.0
:page-layout: single-post
:page-role: blog-post
:page-post-date: December 21, 2021
:page-post-author: Benjamin Lerer
:description: The Apache Cassandra Community
:keywords:
image:blog/using-arithmetic-operators-in-cassandra-4.0-unsplash-bekky-bekks.jpg[operations]
Support for arithmetic operators was added to CQL in Cassandra 4.0 and this blog post describes the important things to know about CQL arithmetics operators, and how we addressed the non-trivial challenges around return types and types inference.
=== Operations on Numbers
Cassandra supports five arithmetic operators on numbers:
// adding a table and assuming symbols aren’t an issue. Fingers crossed.
.Arithmetic Operators in 4.0
[cols="1,1"]
|===
|Operator |Description
|+ |addition
|- |negates operand/subtraction
|* |multiplication
|/ |division
|% |returns the remainder of a division
|===
Those operators can be used in the `SELECT` `INSERT`, `UPDATE,` and `DELETE` statements to perform some computations. For example, if you wish to know the percentage of completion of the currently running compactions, you can query the `sstable_tasks` system view using the following query:
[source,java]
----
SELECT keyspace_name,
table_name,
Task_id,
(progress * 100) / total AS percentage_of_completion
FROM system_views.sstable_tasks
WHERE kind = 'compaction'
ALLOW FILTERING;
----
=== The Important Role of Return Types
An important thing to take into account when using operators is the return type of the operation. In Cassandra the return type is based on the type of the operands and can be determined using the following table:
.Return Type Based on Operand in Apache Cassandra
[cols="1s,1,1,1,1,1,1,1,1,1,stripes=even"]
|===
|Left/Right |tinyint |smallint |int |bigint |counter |float |double |varint |decimal
|tinyint |tinyint |smallint |int |bigint |bigint |float |double |varint |decimal
|smallint |smallint |smallint |int |bigint |bigint |float |double |varint |decimal
|int |int |int |int |bigint |bigint |float |double |varint |decimal
|bigint |bigint |bigint |bigint |bigint |bigint |double |double |varint | decimal
|counter |bigint |bigint |bigint |bigint |bigint |double |double |varint |decimal
|float |float |float |float |double |double |float |double |decimal |decimal
|double |double |double |double |double |double |double |double |decimal |decimal
|varint |varint |varint |varint |varint |varint |decimal |decimal |varint |decimal
|decimal |decimal |decimal |decimal |decimal |decimal |decimal |decimal |decimal |decimal
|===
Based on that table you can see that if the `percentage_of_completion` had been computed as `(progress / total) * 100` instead of as `(progress * 100) / total` the output will always have been a `bigint` equal to `0` or `1`. As `progress` and `total` are both `bigints` the output of the division would have been of type `bigint` and equal to 0 if `progress` was smaller than `total`.
=== Type Inference
Of course, literals also play a role in the selection of the return type. Cassandra will try to infer the type of the literals to determine how the operation must be performed.
In an expression like `percentage_of_completion`, the only information that C* has to infer the type is its value.
If the literal is an integer, C* will look at it and consider it as an `int`, a `bigint` or a `varint` if the type can hold this value. For example, in our computation, 100 will be considered as an `int` because the value can fit in a 32-bit signed integer.
For floats, C* will consider them as either `doubles` or `decimals`.
If the `percentage_of_completion` had been computed as `(progress * 100.0) / total`, 100.0 would have been considered as a `double` and the `percentage_of_completion` type would also have been a `double`.
Now, in some cases, Cassandra knows the return type of the expression and can use it to infer the type of the literals.
If for example we have a table like:
[source,java]
----
CREATE TABLE myTable (pk int, c tinyint, v text, PRIMARY KEY(pk, c));
----
Cassandra will be able to prepare the following query without problem.
[source,java]
----
SELECT * FROM myTable WHERE pk = 2 AND c = 1 + ?;
----
As the `c` column is of type `tinyint`, C* can guess that the binding parameter and the literal would be of the `tinyint` type as only the sum of 2 `tinyints` can return a `tinyint`.
But if you try to prepare the following query:
[source,java]
----
SELECT * FROM myTable WHERE pk = ? + 1 AND c = 1;
----
Cassandra will return an error saying:
[source,java]
----
Ambiguous '+' operation with args ? and 1: use type casts to disambiguate
----
The problem here is that the binding parameter could have 3 different types (`tinyint`, `smallint` or `int`) and C* does not know which one the user intend to use.
=== Hints and Casting to the Rescue
To solve type inference problems related to binding parameters Cassandra has support for *type hints*. A type hint is a way to explicitly tell Cassandra what will be the type of the parameter.
For example in our previous example if the binding parameter will be an `int`, the query can be modified as follow:
[source,java]
----
SELECT * FROM myTable WHERE pk = (int) ? + 1 AND c = 1;
----
Type hints can also be used to control the type of a literal. If we go back to the *sstable_tasks* query. We could have changed the type of the `percentage_of_completion` to `decimal` by using a `decimal` type hint in front of the literal:
[source,java]
----
(progress * (decimal) 100) / total AS percentage_of_completion
----
Of course, the same result could have been reached by casting progress into a decimal:
[source,java]
----
(CAST(progress AS decimal) * 100) / total AS percentage_of_completion
----
or
[source,java]
----
(CAST(progress AS decimal) / total) * 100 AS percentage_of_completion
----
as the operation between a `decimal` and a `bigint` will result in a decimal.
=== Operator Precedence
`*`, `/` and `%` operators have a higher precedence level than `+` and `-` operator. By consequence, they will be evaluated before the other two. If two operators in an expression have the same precedence level, they will be evaluated left to right based on their positions in the expression.
Parentheses can be used to modify the order in which the operations must be performed within an expression. Everything within parentheses will be evaluated first to yield a single value before that value can be used by any operator outside the parentheses.
For example, it is possible to compute the percentage remaining before completion for the compactions as:
[source,java]
----
(total - progress) * 100 / total AS remaining_percentage
----
=== Operations on Timestamps and Dates
In version 4.0, CQL also supports the addition or subtraction of durations from `timestamps` and `dates`.
If you are working with time-series data and use the following table to store your sensor data:
[source,java]
----
CREATE TABLE sensor_data (
sensor text,
day date,
ts timeuuid,
value double,
primary key((sensor, day), ts)
) WITH CLUSTERING ORDER BY (ts DESC)
----
You can use the following query to retrieve some statistics on the data from the previous day:
[source,java]
----
SELECT sensor, day, min(value), max(value), avg(value)
FROM sensor_data
WHERE sensor = ? AND day = currentdate() - 1d;
----
You can express durations as `(quantity unit)+` like `12h30m` where the unit can be:
* `y`: years (12` months)
* `mo`: months (1 month)
* `w`: weeks (7 days)
* `d`: days (1 day)
* `h`: hours (3,600,000,000,000 nanoseconds)
* `m`: minutes (60,000,000,000 nanoseconds)
* `s`: seconds (1,000,000,000 nanoseconds)
* `ms`: milliseconds (1,000,000 nanoseconds)
* `us` or `µs` : microseconds (1000 nanoseconds)
* `ns`: nanoseconds (1 nanosecond)
=== What About Daylight Savings and Leap Seconds?
Internally the timestamp and date types store information in UTC time. As UTC does not change with a change of seasons arithmetic operations on timestamps and dates are safe and will always return the expected results. However, be aware that the Java libraries used internally by Cassandra, ignore leap seconds.