AccessExpression Specification

This document specifies the format of an Apache Accumulo AccessExpression. An AccessExpression is an encoding of a boolean expression that defines the attributes an entity requires to access specific data.

Concepts

  • AccessExpression - A boolean expression detailing the attributes needed to access an object (e.g. Key/Value pair in Accumulo).
  • Authorizations - A set of attributes, typically attributed to the entity trying to access an object.
  • AccessEvaluator - An object that determines if an entity can access an object based on the entity‘s Authorizations and the object’s AccessExpression.

Syntax

The formal definition of the AccessExpression UTF-8 string representation is provided by the following ABNF:

access-expression       = [expression] ; empty string is a valid access expression

expression              =  (access-token / paren-expression) [and-expression / or-expression]

paren-expression        =  "(" expression ")"

and-expression          =  "&" (access-token / paren-expression) [and-expression]

or-expression           =  "|" (access-token / paren-expression) [or-expression]

access-token            = 1*( ALPHA / DIGIT / "_" / "-" / "." / ":" / slash )
access-token            =/ DQUOTE 1*(utf8-subset / escaped) DQUOTE

utf8-subset             = %x20-21 / %x23-5B / %x5D-7E / unicode-beyond-ascii ; utf8 minus '"' and '\'
unicode-beyond-ascii    = %x0080-D7FF / %xE000-10FFFF
escaped                 = "\" DQUOTE / "\\"
slash                   = "/"

Examples of Proper Expressions

  • BLUE
  • RED&BLUE
  • RED&BLUE&GREEN
  • (RED&BLUE)|(GREEN&(PINK|PURPLE))

Examples of Improper Expressions

  • &BLUE : Must start with an access token or a paren expression.
  • (RED&BLUE)| : An access token or paren expression must follow a |.
  • RED&BLUE|GREEN : Once a & is seen, then can only have & and not |, unless using parenthesis.
  • RED|BLUE&GREEN : Once a | is seen, then can only have |and not &, unless using parenthesis.

Serialization

An AccessExpression is a UTF-8 string. It can be serialized using a byte array as long as it can be deserialized back into the same UTF-8 string.

Evaluation

The evaluation process combines set existence checks with boolean algebra. Specifically, AccessExpressions use:

When evaluating an AccessExpression, existence checks are done against an entities Authorizations. The following is the algorithm for evaluation of an AccessExpression.

  1. For each access-token in an AccessExpression check if it exists in the entities Authorizations. Replace the access-token with true if it exists in the set and false otherwise.
  2. Evaluate the resulting expression using boolean algebra. If the result is true, the entity can access the data associated with the AccessExpression.

The following is an example of evaluating the AccessExpression RED&(BLUE|GREEN) using boolean algebra for an entity with the Authorizations {RED,GREEN}. In the example below RED ∈ {RED,GREEN} translates to does RED exist in the set {RED,GREEN} which it does, so it is true.

  1. RED ∈ {RED,GREEN} ∧ ( BLUE ∈ {RED,GREEN} ∨ GREEN ∈ {RED,GREEN} )
  2. true ∧ ( false ∨ true )

Since true ∧ ( false ∨ true ) is true then the entity with Authorizations of {RED,GREEN} can access data labeled with the AccessExpression RED&(BLUE|GREEN). The AccessExpression (RED&BLUE)|(GREEN&PINK) is an example of an AccessExpression that is false for an entity with Authorizations of {RED,GREEN} and it would look like the following using boolean algebra.

  1. ( RED ∈ {RED,GREEN} ∧ BLUE ∈ {RED,GREEN} ) ∨ ( GREEN ∈ {RED,GREEN} ∧ PINK ∈ {RED,GREEN} )
  2. ( true ∧ false ) ∨ ( true ∧ false )

An entity with empty Authorizations can only access data associated with an empty access expression. This is because an empty AccessExpression always evaluates to true.

Escaping

Access tokens can only contain alphanumeric characters or the characters _,-,.,:, or / unless quoted using ". Within quotes, the characters " and \ must escaped by prefixing them with \. For example, to use abc\xyz as an access-token, it would need to be quoted and escaped like "abc\\xyz". When checking if an access-token exists in the entities Authorizations, it must be unquoted and unescaped.

Evaluating "abc!12"&"abc\\xyz"&GHI for an entity with Authorizations of {abc\xyz,abc!12} looks like the following in boolean algebra which evaluates to false.

  1. abc!12 ∈ {abc\xyz,abc!12} ∧ abc\xyz ∈ {abc\xyz,abc!12} ∧ GHI ∈ {abc\xyz,abc!12}
  2. true ∧ true ∧ false

It's important to note that when verifying the existence of “abc\xyz” in the set of authorizations within the Authorizations object, the token is unquoted, and the \ character is unescaped.