blob: eae4e0816c72a90c01b7302404c40036cb26a603 [file] [log] [blame] [view]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# AccessExpression Specification
This document specifies the format of an Apache Accumulo AccessExpression. An AccessExpression is an
encoding of a boolean expression that defines the attributes an entity requires to access specific
data.
## Concepts
* AccessExpression - A boolean expression detailing the attributes needed to access an object (e.g. Key/Value pair in Accumulo).
* Authorizations - A set of attributes, typically attributed to the entity trying to access an object.
* AccessEvaluator - An object that determines if an entity can access an object based on the
entity's Authorizations and the object's AccessExpression.
## Syntax
The formal definition of the AccessExpression UTF-8 string representation is provided by
the following [ABNF][1]:
```ABNF
access-expression = [expression] ; empty string is a valid access expression
expression = (access-token / paren-expression) [and-expression / or-expression]
paren-expression = "(" expression ")"
and-expression = "&" (access-token / paren-expression) [and-expression]
or-expression = "|" (access-token / paren-expression) [or-expression]
access-token = 1*( ALPHA / DIGIT / "_" / "-" / "." / ":" / slash )
access-token =/ DQUOTE 1*(utf8-subset / escaped) DQUOTE
utf8-subset = %x20-21 / %x23-5B / %x5D-7E / unicode-beyond-ascii ; utf8 minus '"' and '\'
unicode-beyond-ascii = %x0080-D7FF / %xE000-10FFFF
escaped = "\" DQUOTE / "\\"
slash = "/"
```
### Examples of Proper Expressions
* `BLUE`
* `RED&BLUE`
* `RED&BLUE&GREEN`
* `(RED&BLUE)|(GREEN&(PINK|PURPLE))`
### Examples of Improper Expressions
* `&BLUE` : Must start with an access token or a paren expression.
* `(RED&BLUE)|` : An access token or paren expression must follow a `|`.
* `RED&BLUE|GREEN` : Once a `&` is seen, then can only have `&` and not `|`, unless using parenthesis.
* `RED|BLUE&GREEN` : Once a `|` is seen, then can only have `|`and not `&`, unless using parenthesis.
## Serialization
An AccessExpression is a UTF-8 string. It can be serialized using a byte array as long as it
can be deserialized back into the same UTF-8 string.
## Evaluation
The evaluation process combines [set][2] existence checks with [boolean algebra][3]. Specifically,
AccessExpressions use:
* The symbol `&` for [logical conjunction][4] (`∧` in boolean algebra).
* The symbol `|` for [logical disjunction][5] (`∨` in boolean algebra).
When evaluating an AccessExpression, existence checks are done against an
entities Authorizations. The following is the algorithm for evaluation of an
AccessExpression.
1. For each access-token in an AccessExpression check if it exists in the
entities Authorizations. Replace the access-token with `true` if it
exists in the set and `false` otherwise.
2. Evaluate the resulting expression using boolean algebra. If the result is true, the entity can
access the data associated with the AccessExpression.
The following is an example of evaluating the AccessExpression
`RED&(BLUE|GREEN)` using boolean algebra for an entity with the Authorizations
`{RED,GREEN}`. In the example below `RED ∈ {RED,GREEN}` translates to does
`RED` exist in the set `{RED,GREEN}` which it does, so it is true.
1. RED ∈ {RED,GREEN} ∧ ( BLUE ∈ {RED,GREEN} ∨ GREEN ∈ {RED,GREEN} )
2. true ∧ ( false ∨ true )
Since `true ∧ ( false ∨ true )` is true then the entity with Authorizations of
`{RED,GREEN}` can access data labeled with the AccessExpression
`RED&(BLUE|GREEN)`. The AccessExpression `(RED&BLUE)|(GREEN&PINK)` is an
example of an AccessExpression that is false for an entity with Authorizations of
`{RED,GREEN}` and it would look like the following using boolean algebra.
1. ( RED ∈ {RED,GREEN} ∧ BLUE ∈ {RED,GREEN} ) ∨ ( GREEN ∈ {RED,GREEN} ∧ PINK ∈
{RED,GREEN} )
2. ( true ∧ false ) ∨ ( true ∧ false )
An entity with empty Authorizations can only access data associated with an empty access
expression. This is because an empty AccessExpression always evaluates to true.
## Escaping
Access tokens can only contain alphanumeric characters or the characters
`_`,`-`,`.`,`:`, or `/` unless quoted using `"`. Within quotes, the characters
`"` and `\` must escaped by prefixing them with `\`. For example, to use `abc\xyz` as
an access-token, it would need to be quoted and escaped like `"abc\\xyz"`. When
checking if an access-token exists in the entities Authorizations, it must
be unquoted and unescaped.
Evaluating `"abc!12"&"abc\\xyz"&GHI` for an entity with Authorizations of
`{abc\xyz,abc!12}` looks like the following in boolean algebra which evaluates
to `false`.
1. abc!12 ∈ {abc\xyz,abc!12} ∧ abc\xyz ∈ {abc\xyz,abc!12} ∧ GHI ∈
{abc\xyz,abc!12}
2. true ∧ true ∧ false
It's important to note that when verifying the existence of "abc\\xyz" in the set of authorizations
within the Authorizations object, the token is unquoted, and the `\` character is unescaped.
[1]: https://www.rfc-editor.org/rfc/rfc5234
[2]: https://en.wikipedia.org/wiki/Set_(mathematics)
[3]: https://en.wikipedia.org/wiki/Boolean_algebra
[4]: https://en.wikipedia.org/wiki/Logical_conjunction
[5]: https://en.wikipedia.org/wiki/Logical_disjunction