import themen from ‘theme/styles/styled-colors’; import * as theme from ‘react-syntax-highlighter/dist/esm/styles/hljs’; import SyntaxHighlighter from ‘react-syntax-highlighter’;
Advanced Search in Atlas is also referred to as DSL-based Search.
Domain Specific Search (DSL) is a language with simple constructs that help users navigate Atlas data repository. The syntax loosely emulates the popular Structured Query Language (SQL) from relation database world.
Benefits of DSL:
We will be using the quick start dataset in the examples that follow. This dataset is comprehensive enough to be used to to demonstrate the various features of the language.
For details on the grammar, please refer to Atlas DSL Grammer on Github (Antlr G4 format).
Within the Atlas UI, select Advanced in the Search pane on the left.
Notice that the Favorite Searches pane below the Search By Query box. Like Basic Search, it is possible to save the Advanced Searches as well.
DSL uses the familiar SQL-like syntax.
At a high-level a query has a from-where-select format. Additional keywords like grouby, orderby, limit can be used to added to affect the output. We will see examples of these below.
Specifying the from clause is mandatory. Using the from keyword itself is optional. The value specified in the from clause acts as the source or starting point for the rest of the query to source its inputs.
Example: To retrieve all entities of type DB:
In the absence of where for filtering on the source, the dataset fetched by the from clause is everything from the database. Based on the size of the data present in the database, there is a potential to overwhelm the server. The query processor thus adds limit clause with a default value set. See the section on limit clause for details.
The where clause allows for filtering over the dataset. This achieved by using conditions within the where clause.
A conditions is identifier followed by an operator followed by a literal. Literal must be enclosed in single or double quotes. Example, name = “Sales”. An identifier can be name of the property of the type specified in the from clause or an alias.
Example: To retrieve entity of type Table with a specific name say time_dim:
It is possible to specify multiple conditions by combining them using and, or operators.
Example: To retrieve entity of type Table with name that can be either time_dim or customer_dim:
Filtering based on a list of values is done using by specifying the values in the square brackets. A value array is a list of values enclosed within square brackets. This is a simple way to specify an OR clause on an identifier.
Note that having several OR clauses on the same attribute may be inefficient. Alternate way is to use the value array as shown in the example below.
Example: The query in the example above can be written using a value array as shown below.
A condition that uses the LIKE operator, allows for filtering using wildcards like ‘*’ or ‘?’. Example: To retrieve entity of type Table whose name ends with ‘_dim’:
Additional forms of regular expressions can also be used.
Example: To retrieve DB whose name starts with R followed by has any 3 characters, followed by rt followed by at least 1 character, followed by none or any number of characters.
Example: To find all the columns in a Table.
Example: To find all the Tables for a column.
Dates used in literals need to be specified using the ISO 8601 format.
Dates in this format follow this notation:
Example: Date represents December 11, 2017 at 2:35 AM.
Example: To retrieve entity of type Table created within 2017 and 2018.
Properties of entities of type boolean can be used within queries.
Eample: To retrieve entity of type hdfs_path whose attribute isFile is set to true and whose name is Invoice.
Valid values for boolean literals are ‘true’ and ‘false’.
The has keyword can be used with or without the where clause. It is used to check existence of a property in an entity.
Example: To retreive entity of type Table with a property locationUri.
If you noticed the output displayed on the web page, it displays a tabular display, each row corresponding to an entity and columns are properties of that entity. The select clause allows for choosing the properties of entity that are of interest.
Example: To retrieve entity of type Table with few properties:
Example: To retrieve entity of type Table for a specific table with some properties.
To display column headers that are more meaningful, aliases can be added using the ‘as’ clause.
Example: To display column headers as ‘Owner’, ‘Name’ and ‘FullName’.
Given the complexity involved in using select clauses, these are the few rules to remember when using select clauses:
Example: To retrieve entity of type Table with name ‘Sales’ and display ‘name’ and ‘owner’ attribute of the referred entity DB.
Current implementation does not allow the following:
In order to retrieve entities based on classification, a query would use is or isa keywords.
Example: To retrieve all entities of type Table that are tagged with Dimension classification.
Since, from is optional and is (or isa) are equivalent, the following queries yield the same results:
The is and isa clauses can also be used in where condition like:
To search for all entities having a particular classification, simply use the name of the classification.
Example: To retrieve all entities that have Dimension classification.
###Non Primitive attribute Filtering In the discussion so far we looked at where clauses with primitive types. This section will look at using properties that are non-primitive types.
In this model, the DB is modeled such that it is aware of all the Table it contains. Table on the other hand is aware of existence of the DB but is not aware of all the other Table instances within the system. Each Table maintains reference of the DB it belongs to.
Similar structure exists within the hive data model.
Example: To retrieve all the instances of the Table belonging to a database named ‘Sales’:
Example: To retrieve all the instances of the Table belonging to a database named ‘Sales’ and whose column name starts with ‘customer’:
The entity Column is modeled in a similar way. Each Table entity has outward edges pointing to Column entity instances corresponding to each column within the table.
Example: To retrieve all the Column entities for a given Table.
The properties of each Column entity type are displayed.
In order to retrieve entities based on glossary term, a query would use hasTerm keyword.
To search for entities having a particular glossary term, user needs to add a fully qualified name. i.e {termName}@{glossaryName}. In case the user adds only the term name, all the entities with particular term name will be returned, irrespective of which glossary it is in.
Example: To retrieve all entities of type Table having glossary term savingsAccount@Banking, below are the possible ways.
Example: To retrieve all entities of type Table having glossary term savingsAccount@Banking and whose name is ‘customer’.
Example: To retrieve all entities of type Table having glossary term savingsAccount@Banking or tagged with ‘Dimension’ classification and whose column name starts with ‘customer’.
Often a query yields large number of results. To limit the outcome of the query, the limit and offset clauses are used.
Example: To retrieve only the 5 entities from a result set.
The offset clauses retrieves results after the offset value.
Example: To retrieve only 5 entities from the result set after skipping the first 10.
The limit and offset clauses are usually specified in conjunction.
If no limit clause is specified in the query, a limit clause with a default limit (usually 100) is added to the query. This prevents the query from inadvertently fetching large number of results.
The offset clause is useful for displaying results in a user interface where few results from the result set are showing and more results are fetched as the user advances to next page.
The orderby clause allows for sorting of results. Results are sorted in ascending order by default. Only immediate attributes can be used within this clause.
Ordering can be changed by using:
Example: To retrieve the entities of type Column that are sorted in ascending order using the name property.
Example: Same results as above except that they are sorted in descending order.
Example: To retrieve the entities of type Column filtered with name and associated with ‘savingsAccount@Banking’ glossary term, that are sorted in ascending order using the name property.
Let's look at aggregate functions:
These work only on immediate attributes.
Other examples of these in the Grouping Results section.
Shows the number of items in a result set.
Example: To know how may entities of a type Column.
Example: To find the number of tables in a database.
Example: To find the number of terms associated with particular type ‘Table’.
Using this keyword it is possible to retrieve the maximum value of a property for an entity.
Example: Get the most recently created value of the createTime property of the Table entity.
Using this keyword it is possible to retrieve the minimum value of a property for an entity.
Example: Get the least recently created value of the createTime property of the Table entity.
The groupby clause groups results within the result using specified property.
Example: To retrieve entity of type Table such that tables belonging to an owner are together (grouped by owner).
While groupby can work without select, if aggregate functions are used within select clause, using groupby clause becomes mandatory as aggregate functions operate on a group.
Example: To retrieve entity of type Table such we know the most recently created entity.
Example: To retrieve entity of type Table such we know the oldest entity.
Example: To know the number of entities owned by each owner.
Each type defined within Atlas gets few attributes by default. These attributes help with internal book keeping of the entities. All the system attributes are prefixed with ‘__’ (double underscore). This helps in identifying them from other attributes. Following are the system attributes:
Entity within Atlas can be in the following states:
Example: To retrieve all entities that are deleted.
Example: To retrieve entity GUIDs.
Example: To retrieve several system attributes.
Relevant models for these operations:
Get Results using DSL Search
Example | See Examples sections below. |
---|---|
URL | api/atlas/v2/search/dsl |
Method | GET |
URL Parameters | query: Query conforming to DSL syntax. |
typeName: Type name of the entity to be retrived. | |
classification: Classification associated with the type or query. | |
limit: Maximum number of items in the result set. | |
offset: Starting index of the item in the result set. | |
Data Parameters | None |
Success Response | The JSON will correspond to AtlasSearchResult. |
Error Response | Errors that are handled within the system will be returned as AtlasBaseException. |
Method Signature | @GET |
@Path(“/dsl”) | |
@Consumes(Servlets.JSON_MEDIA_TYPE) | |
@Produces(Servlets.JSON_MEDIA_TYPE) |
Examples
The general approach followed in implementation of DSL within Atlas can be enumerated in following steps:
The following clauses are no longer supported: