blob: c300f7ae661b824c731d4ded0acaa48115670d4b [file] [log] [blame] [view]
---
layout: page
title: Samza SQL
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
### Table Of Contents
- [Introduction](#introduction)
- [Code Examples](#code-examples)
- [Key Concepts](#key-concepts)
- [SQL Representation](#sql-representation)
- [SQL Grammar](#sql-grammar)
- [UDFs](#udfs)
- [UDF Polymorphism](#udf-polymorphism)
- [Known Limitations](#known-limitations)
### Introduction
Samza SQL allows you to define your stream processing logic
declaratively as a a SQL query. This allows you to create streaming
pipelines without Java code or configuration unless you require
user-defined functions (UDFs).
### Code Examples
The [Hello Samza](https://github.com/apache/samza-hello-samza/tree/master/src/main/java/samza/examples/sql)
SQL examples demonstrate how to use Samza SQL API.
- The [filter](https://github.com/apache/samza-hello-samza/tree/master/src/main/java/samza/examples/sql/samza-sql-filter)
demonstrates filtering and insert Samza SQL job.
- The [Case-When](https://github.com/apache/samza-hello-samza/tree/master/src/main/java/samza/examples/sql/samza-sql-casewhen/src/main/sql)
shows how to use ```CASE WHEN``` statement, along with ```UDF``` to identify qualifying events.
- The [join](https://github.com/apache/samza-hello-samza/tree/master/src/main/java/samza/examples/sql/samza-sql-stream-table-join)
demonstrates how to peform a stream-table join. Please note that join operation is currently not
fully cooked, and we are actively working on stabilizing it.
- The [group by](https://github.com/apache/samza-hello-samza/tree/master/src/main/java/samza/examples/sql/samza-sql-groupby)
show how to do group by. Similar to Join, Group By is being actively stabilized.
### Key Concepts
Each Samza SQL job consists of one or more Samza SQL statements.
Each statement represents a single streaming pipeline.
#### SQL Representation
Support for SQL internally uses Apache Calcite, which provides
SQL parsing and query planning. The query is automatically translated
to Samza's high level API and runs on Samza's execution engine.
The mapping from SQL to the Samza's high level API is a simple
deterministic one-to-one mapping. For example, ```Select```, i.e.,
projections, maps to a filter operation, while ```from``` maps to
a scan(s) and join(s) - if selecting from multiple streams and tables
- operators, and so on.
The table below lists the supported SQL operations.
Operation | Syntax hints | Comments
--- | --- | ---
PROJECTION | SELECT/INSERT/UPSERT | See [SQL Grammar](#sql-grammar) below
FILTERING | WHERE expression |See [SQL Grammar](#sql-grammar) below
UDFs | udf_name(args) | In both SELECT and WHERE clause
JOIN | [LEFT/RIGHT] JOIN .. ON .. | Stream-table inner, left- or right-outer joins. Currently not fully stable.
AGGREGATION | COUNT ( ...) .. GROUP BY | Currently only COUNT is supported, using processing-time based window.
#### SQL Grammar
Samza SQL's grammar is a subset of capabilities supported by Calcite's SQL parser.
```
statement:
| insert
| query
query:
values
| {
select
}
insert:
( INSERT | UPSERT ) INTO tablePrimary
[ '(' column [, column ]* ')' ]
query
select:
SELECT
{ * | projectItem [, projectItem ]* }
FROM tableExpression
[ WHERE booleanExpression ]
[ GROUP BY { groupItem [, groupItem ]* } ]
projectItem:
expression [ [ AS ] columnAlias ]
| tableAlias . *
tableExpression:
tableReference [, tableReference ]*
| tableExpression [ NATURAL ] [ LEFT | RIGHT | FULL ] JOIN tableExpression [ joinCondition ]
joinCondition:
ON booleanExpression
| USING '(' column [, column ]* ')'
tableReference:
tablePrimary
[ [ AS ] alias [ '(' columnAlias [, columnAlias ]* ')' ] ]
tablePrimary:
[ TABLE ] [ [ catalogName . ] schemaName . ] tableName
values:
VALUES expression [, expression ]*
```
#### UDFs
In addition to existing SQL logical operations, Samza SQL
allows the user to extend its functionality by running
user-code through User Defined Functions (UDFs) as part of the Stream processing
pipeline corresponding to the SQL.
#### UDF Polymorphism
Since UDF's execute method takes an array of generic objects as
parameter, Samza SQL UDF framework is flexible enough to
support polymorphic udf functions with varying sets of arguments
as long as UDF implementations support them.
For example in the below sql statement, UDF will be passed an
object array of size 2 with first element containing id of type
"LONG" and second element name of type "String". The type of the
objects that are passed depends on the type of those fields in Samza
SQL message format.
{% highlight sql %}
select myudf(id, name) from identity.profile
{% endhighlight %}
### Known Limitations
Samza SQL only supports simple stateless queries including selections
and projections. We are actively working on supporting stateful operations
such as aggregations, windows and joins.