{% include JB/setup %}
In a notebook, to enable the Cassandra interpreter, click on the Gear icon and select Cassandra
In a paragraph, use %cassandra to select the Cassandra interpreter and then input all commands.
To access the interactive help, type HELP;
The Cassandra interpreter accepts the following commands
This interpreter is compatible with any CQL statement supported by Cassandra. Ex:
INSERT INTO users(login,name) VALUES('jdoe','John DOE'); SELECT * FROM users WHERE login='jdoe';
Each statement should be separated by a semi-colon ( ; ) except the special commands below:
Multi-line statements as well as multiple statements on the same line are also supported as long as they are separated by a semi-colon. Ex:
USE spark_demo; SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1; SELECT * FROM artists WHERE login='jlennon';
Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements:
BEGIN BATCH INSERT INTO users(login,name) VALUES('jdoe','John DOE'); INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC'); APPLY BATCH; CREATE TABLE IF NOT EXISTS test( key int PRIMARY KEY, value text );
CQL statements are case-insensitive (except for column names and values). This means that the following statements are equivalent and valid:
INSERT INTO users(login,name) VALUES('jdoe','John DOE'); Insert into users(login,name) vAlues('hsue','Helen SUE');
The complete list of all CQL statements and versions can be found below:
It is possible to add comments between statements. Single line comments start with the hash sign (#) or double slashes (//). Multi-line comments are enclosed between /** and **/. Ex:
#Single line comment style 1 INSERT INTO users(login,name) VALUES('jdoe','John DOE'); //Single line comment style 2 /** Multi line comments **/ Insert into users(login,name) vAlues('hsue','Helen SUE');
The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors.
All CQL-related syntax validation is delegated directly to Cassandra
Most of the time, syntax errors are due to missing semi-colons between statements or typo errors.
To make schema discovery easier and more interactive, the following commands are supported:
The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend.
Sometimes you want to be able to pass runtime query parameters to your statements.
Those parameters are not part of the CQL specs and are specific to the interpreter.
Below is the list of all parameters:
Some parameters only accept restricted values:
Please note that you should not add semi-colon ( ; ) at the end of each parameter statement
Some examples:
CREATE TABLE IF NOT EXISTS spark_demo.ts( key int PRIMARY KEY, value text ); TRUNCATE spark_demo.ts; // Timestamp in the past @timestamp=10 // Force timestamp directly in the first insert INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100; // Select some data to make the clock turn SELECT * FROM spark_demo.albums LIMIT 100; // Now insert using the timestamp parameter set at the beginning(10) INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); // Check for the result. You should see 'first insert' SELECT value FROM spark_demo.ts WHERE key=1;
Some remarks about query parameters:
- many query parameters can be set in the same paragraph
- if the same query parameter is set many time with different values, the interpreter only take into account the first value
- each query parameter applies to all CQL statements in the same paragraph, unless you override the option using plain CQL text (like forcing timestamp with the USING clause)
- the order of each query parameter with regard to CQL statement does not matter
For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values.
This interpreter provides 3 commands to handle prepared and bound statements:
Example:
@prepare[statement-name]=... @bind[statement-name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’] @bind[statement-name-with-no-bound-value] @remove_prepare[statement-name]
You can use the syntax “@prepare[statement-name]=SELECT...” to create a prepared statement. The statement-name is mandatory because the interpreter prepares the given statement with the Java driver and saves the generated prepared statement in an internal hash map, using the provided statement-name as search key.
Please note that this internal prepared statement map is shared with all notebooks and all paragraphs because there is only one instance of the interpreter for Cassandra
If the interpreter encounters many @prepare for the same statement-name (key), only the first statement will be taken into account.
Example:
@prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? @prepare[select]=SELECT * FROM spark_demo.artists LIMIT ?
For the above example, the prepared statement is SELECT * FROM spark_demo.albums LIMIT ?
. `SELECT * FROM spark_demo.artists LIMIT ? is ignored because an entry already exists in the prepared statements map with the key select.
In the context of Zeppelin, a notebook can be scheduled to be executed at regular interval, thus it is necessary to avoid re-preparing many time the same statement (considered an anti-pattern).
Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it:
@bind[select_first]=10
Bound values are not mandatory for the @bind statement. However if you provide bound values, they need to comply to some syntax:
It is possible to use the @bind statement inside a batch:
BEGIN BATCH @bind[insert_user]='jdoe','John DOE' UPDATE users SET age = 27 WHERE login='hsue'; APPLY BATCH;
To avoid for a prepared statement to stay forever in the prepared statement map, you can use the @remove_prepare[statement-name] syntax to remove it. Removing a non-existing prepared statement yields no error.
Instead of hard-coding your CQL queries, it is possible to use Zeppelin dynamic form syntax to inject simple value or multiple choices forms.
The legacy mustache syntax ( {{ }} ) to bind input text and select form is still supported but is deprecated and will be removed in future releases.
Legacy The syntax for simple parameter is: {{input_Label=default value}}. The default value is mandatory because the first time the paragraph is executed, we launch the CQL query before rendering the form so at least one value should be provided.
The syntax for multiple choices parameter is: {{input_Label=value1 | value2 | … | valueN }}. By default the first choice is used for CQL query the first time the paragraph is executed.
Example:
{% raw %} #Secondary index on performer style SELECT name, country, performer FROM spark_demo.performers WHERE name=‘${performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}’ AND styles CONTAINS ‘${style=Rock}’; {% endraw %}
In the above example, the first CQL query will be executed for performer=‘Sheryl Crow’ AND style=‘Rock’. For subsequent queries, you can change the value directly using the form.
Please note that we enclosed the ${ } block between simple quotes ( ' ) because Cassandra expects a String here. We could have also use the ${style=‘Rock’} syntax but this time, the value displayed on the form is ‘Rock’ and not Rock.
It is also possible to use dynamic forms for prepared statements:
{% raw %}
@bind[select]=='${performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}', '${style=Rock}'
{% endraw %}
It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. Asynchronous execution is only possible when it is possible to return a Future
value in the InterpreterResult
. It may be an interesting proposal for the Zeppelin project.
Recently, Zeppelin allows you to choose the level of isolation for your interpreters (see Interpreter Binding Mode ).
Long story short, you have 3 available bindings:
Using the shared binding, the same com.datastax.driver.core.Session
object is used for all notes and paragraphs. Consequently, if you use the USE keyspace name; statement to log into a keyspace, it will change the keyspace for all current users of the Cassandra interpreter because we only create 1 com.datastax.driver.core.Session
object per instance of Cassandra interpreter.
The same remark does apply to the prepared statement hash map, it is shared by all users using the same instance of Cassandra interpreter.
When using scoped binding, in the same JVM Zeppelin will create multiple instances of the Cassandra interpreter, thus multiple com.datastax.driver.core.Session
objects. Beware of resource and memory usage using this binding !
The isolated mode is the most extreme and will create as many JVM/com.datastax.driver.core.Session
object as there are distinct notes.
To configure the Cassandra interpreter, go to the Interpreter menu and scroll down to change the parameters. The Cassandra interpreter is using the official Cassandra Java Driver and most of the parameters are used to configure the Java driver
Below are the configuration parameters and their default value.
3.0 (Zeppelin {{ site.ZEPPELIN_VERSION }}) :
@requestTimeOut
runtime optionALTER
statements2.0 (Zeppelin {{ site.ZEPPELIN_VERSION }}) :
1.0 (Zeppelin 0.5.5-incubating) :
If you encounter a bug for this interpreter, please create a JIRA ticket and ping me on Twitter at @doanduyhai