name: Advanced Search route: /SearchAdvance menu: Documentation submenu: Search

import themen from ‘theme/styles/styled-colors’; import * as theme from ‘react-syntax-highlighter/dist/esm/styles/hljs’; import SyntaxHighlighter from ‘react-syntax-highlighter’;

Advanced Search

Background

Advanced Search in Atlas is also referred to as DSL-based Search.

Domain Specific Search (DSL) is a language with simple constructs that help users navigate Atlas data repository. The syntax loosely emulates the popular Structured Query Language (SQL) from relation database world.

Benefits of DSL:

  • Abstracts the implementation-level database constructs. This avoids the necessity of knowing about the underlying graph database constructs.
  • User are provided with an abstraction that helps them retrieve the data by just being aware of the types and their relationships within their dataset.
  • Allows for a way to specify the desired output.
  • Use of classifications is accounted for in the syntax.
  • Provides way to group and aggregate results.

We will be using the quick start dataset in the examples that follow. This dataset is comprehensive enough to be used to to demonstrate the various features of the language.

For details on the grammar, please refer to Atlas DSL Grammer on Github (Antlr G4 format).

Using Advanced Search

Within the Atlas UI, select Advanced in the Search pane on the left.

Notice that the Favorite Searches pane below the Search By Query box. Like Basic Search, it is possible to save the Advanced Searches as well.

Introduction to Domain Specific Language

DSL uses the familiar SQL-like syntax.

At a high-level a query has a from-where-select format. Additional keywords like grouby, orderby, limit can be used to added to affect the output. We will see examples of these below.

From Clause

Specifying the from clause is mandatory. Using the from keyword itself is optional. The value specified in the from clause acts as the source or starting point for the rest of the query to source its inputs.

Example: To retrieve all entities of type DB:

In the absence of where for filtering on the source, the dataset fetched by the from clause is everything from the database. Based on the size of the data present in the database, there is a potential to overwhelm the server. The query processor thus adds limit clause with a default value set. See the section on limit clause for details.

Where Clause

The where clause allows for filtering over the dataset. This achieved by using conditions within the where clause.

A conditions is identifier followed by an operator followed by a literal. Literal must be enclosed in single or double quotes. Example, name = “Sales”. An identifier can be name of the property of the type specified in the from clause or an alias.

Example: To retrieve entity of type Table with a specific name say time_dim:

It is possible to specify multiple conditions by combining them using and, or operators.

Example: To retrieve entity of type Table with name that can be either time_dim or customer_dim:

Filtering based on a list of values is done using by specifying the values in the square brackets. A value array is a list of values enclosed within square brackets. This is a simple way to specify an OR clause on an identifier.

Note that having several OR clauses on the same attribute may be inefficient. Alternate way is to use the value array as shown in the example below.

Example: The query in the example above can be written using a value array as shown below.

A condition that uses the LIKE operator, allows for filtering using wildcards like ‘*’ or ‘?’. Example: To retrieve entity of type Table whose name ends with ‘_dim’:

Additional forms of regular expressions can also be used.

Example: To retrieve DB whose name starts with R followed by has any 3 characters, followed by rt followed by at least 1 character, followed by none or any number of characters.

Example: To find all the columns in a Table.

Example: To find all the Tables for a column.

Using Date Literals

Dates used in literals need to be specified using the ISO 8601 format.

Dates in this format follow this notation:

  • yyyy-MM-ddTHH:mm:ss.SSSZ. Which means, year-month-day followed by time in hour-minutes-seconds-milli-seconds. Date and time need to be separated by ‘T’. It should end with ‘Z’.
  • yyyy-MM-dd. Which means, year-month-day.

Example: Date represents December 11, 2017 at 2:35 AM.

Example: To retrieve entity of type Table created within 2017 and 2018.

Using Boolean Literals

Properties of entities of type boolean can be used within queries.

Eample: To retrieve entity of type hdfs_path whose attribute isFile is set to true and whose name is Invoice.

Valid values for boolean literals are ‘true’ and ‘false’.

Existence of a Property

The has keyword can be used with or without the where clause. It is used to check existence of a property in an entity.

Example: To retreive entity of type Table with a property locationUri.

Select Clause

If you noticed the output displayed on the web page, it displays a tabular display, each row corresponding to an entity and columns are properties of that entity. The select clause allows for choosing the properties of entity that are of interest.

Example: To retrieve entity of type Table with few properties:

Example: To retrieve entity of type Table for a specific table with some properties.

To display column headers that are more meaningful, aliases can be added using the ‘as’ clause.

Example: To display column headers as ‘Owner’, ‘Name’ and ‘FullName’.

Note About Select Clauses

Given the complexity involved in using select clauses, these are the few rules to remember when using select clauses:

  • Works with all immediate attributes.
  • Works with Immediate attributes and aggregation on immediate attributes.
  • Referred attributes cannot be mixed with immediate attributes.

Example: To retrieve entity of type Table with name ‘Sales’ and display ‘name’ and ‘owner’ attribute of the referred entity DB.

Current implementation does not allow the following:

Classification-based Filtering

In order to retrieve entities based on classification, a query would use is or isa keywords.

Example: To retrieve all entities of type Table that are tagged with Dimension classification.

Since, from is optional and is (or isa) are equivalent, the following queries yield the same results:

The is and isa clauses can also be used in where condition like:

To search for all entities having a particular classification, simply use the name of the classification.

Example: To retrieve all entities that have Dimension classification.

###Non Primitive attribute Filtering In the discussion so far we looked at where clauses with primitive types. This section will look at using properties that are non-primitive types.

Relationship-based filtering

In this model, the DB is modeled such that it is aware of all the Table it contains. Table on the other hand is aware of existence of the DB but is not aware of all the other Table instances within the system. Each Table maintains reference of the DB it belongs to.

Similar structure exists within the hive data model.

Example: To retrieve all the instances of the Table belonging to a database named ‘Sales’:

Example: To retrieve all the instances of the Table belonging to a database named ‘Sales’ and whose column name starts with ‘customer’:

The entity Column is modeled in a similar way. Each Table entity has outward edges pointing to Column entity instances corresponding to each column within the table.

Example: To retrieve all the Column entities for a given Table.

The properties of each Column entity type are displayed.

Glossary Term-based Filtering

In order to retrieve entities based on glossary term, a query would use hasTerm keyword.

To search for entities having a particular glossary term, user needs to add a fully qualified name. i.e {termName}@{glossaryName}. In case the user adds only the term name, all the entities with particular term name will be returned, irrespective of which glossary it is in.

Example: To retrieve all entities of type Table having glossary term savingsAccount@Banking, below are the possible ways.

Example: To retrieve all entities of type Table having glossary term savingsAccount@Banking and whose name is ‘customer’.

Example: To retrieve all entities of type Table having glossary term savingsAccount@Banking or tagged with ‘Dimension’ classification and whose column name starts with ‘customer’.

Limit & Offset Clauses

Often a query yields large number of results. To limit the outcome of the query, the limit and offset clauses are used.

Example: To retrieve only the 5 entities from a result set.

The offset clauses retrieves results after the offset value.

Example: To retrieve only 5 entities from the result set after skipping the first 10.

The limit and offset clauses are usually specified in conjunction.

If no limit clause is specified in the query, a limit clause with a default limit (usually 100) is added to the query. This prevents the query from inadvertently fetching large number of results.

The offset clause is useful for displaying results in a user interface where few results from the result set are showing and more results are fetched as the user advances to next page.

Ordering Results

The orderby clause allows for sorting of results. Results are sorted in ascending order by default. Only immediate attributes can be used within this clause.

Ordering can be changed by using:

  • ASC Sort in ascending order. This is the default. If no ordering is specified after the orderby clause.
  • DESC Sort in descending order. This needs to be explicitly specified after the orderby clause.

Example: To retrieve the entities of type Column that are sorted in ascending order using the name property.

Example: Same results as above except that they are sorted in descending order.

Example: To retrieve the entities of type Column filtered with name and associated with ‘savingsAccount@Banking’ glossary term, that are sorted in ascending order using the name property.

Aggregate Functions

Let's look at aggregate functions:

  • sum: Adds (sums up) a value of the property specified, within the result set.
  • min: Finds the minimum value of the property specified, within a result set.
  • max: Finds the maximum value of the property specified, within a result set.
  • count: Finds the number of items specified by the group by clause.

These work only on immediate attributes.

Other examples of these in the Grouping Results section.

The count Keyword

Shows the number of items in a result set.

Example: To know how may entities of a type Column.

Example: To find the number of tables in a database.

Example: To find the number of terms associated with particular type ‘Table’.

The max Keyword

Using this keyword it is possible to retrieve the maximum value of a property for an entity.

Example: Get the most recently created value of the createTime property of the Table entity.

The min Keyword

Using this keyword it is possible to retrieve the minimum value of a property for an entity.

Example: Get the least recently created value of the createTime property of the Table entity.

Grouping Results

The groupby clause groups results within the result using specified property.

Example: To retrieve entity of type Table such that tables belonging to an owner are together (grouped by owner).

While groupby can work without select, if aggregate functions are used within select clause, using groupby clause becomes mandatory as aggregate functions operate on a group.

Example: To retrieve entity of type Table such we know the most recently created entity.

Example: To retrieve entity of type Table such we know the oldest entity.

Example: To know the number of entities owned by each owner.

Using System Attributes

Each type defined within Atlas gets few attributes by default. These attributes help with internal book keeping of the entities. All the system attributes are prefixed with ‘__’ (double underscore). This helps in identifying them from other attributes. Following are the system attributes:

  • __guid Each entity within Atlas is assigned a globally unique identifier (GUID for short).
  • __modifiedBy Name of the user who last modified the entity.
  • __createdBy Name of the user who created the entity.
  • __state Current state of the entity. Please see below for details.
  • __timestamp Timestamp (date represented as integer) of the entity at the time of creation.
  • __modificationTimestamp Timestamp (date represented as integer) of the entity at the time of last modification.

State of an Entity

Entity within Atlas can be in the following states:

  • ACTIVE This is the state of entities that when it is available and is used within the system. It can be retrieved by default by searches.
  • DELETED When an entity is deleted, it's state is marked as DELETED. Entity in this state does not show up in search results. Explicit request needs to be made to retrieve this entity.

Using System Attributes in Queries

Example: To retrieve all entities that are deleted.

Example: To retrieve entity GUIDs.

Example: To retrieve several system attributes.

Advanced Search REST API

Relevant models for these operations:

The V2 API

Get Results using DSL Search

ExampleSee Examples sections below.
URLapi/atlas/v2/search/dsl
MethodGET
URL Parametersquery: Query conforming to DSL syntax.
typeName: Type name of the entity to be retrived.
classification: Classification associated with the type or query.
limit: Maximum number of items in the result set.
offset: Starting index of the item in the result set.
Data ParametersNone
Success ResponseThe JSON will correspond to AtlasSearchResult.
Error ResponseErrors that are handled within the system will be returned as AtlasBaseException.
Method Signature@GET
@Path(“/dsl”)
@Consumes(Servlets.JSON_MEDIA_TYPE)
@Produces(Servlets.JSON_MEDIA_TYPE)

Examples

Implementation Approach

The general approach followed in implementation of DSL within Atlas can be enumerated in following steps:

  • Parser parses the incoming query for syntax.
  • Abstract syntax tree is generated by for a query that is parsed successfully.
  • Syntax tree is ‘walked’ using visitor pattern.
  • Each ‘visit’ within the tree adds a step in the Gremlin pipeline.
  • When done, the generated script is executed using Gremlin Script Engine.
  • Results generated be the query, if any, are processed and packaged in AtlasSearchResult structure.

Differences Between Master and Earlier Versions

The following clauses are no longer supported:

  • path
  • loop

Resources