name: Trino Extractor route: /TrinoExtractor menu: Documentation submenu: Tools

import themen from ‘theme/styles/styled-colors’; import * as theme from ‘react-syntax-highlighter/dist/esm/styles/hljs’; import SyntaxHighlighter from ‘react-syntax-highlighter’;

Trino Extractor

Overview

The Trino Extractor is a comprehensive metadata extraction utility designed for Apache Atlas integration with Trino. It provides discovery, extraction, and synchronization of Trino metadata including catalogs, schemas, tables, and columns into Apache Atlas for enhanced data governance and metadata management.

Key Features

Metadata Extraction

  • Comprehensive Discovery: Automatically discovers and extracts metadata from Trino catalogs, schemas, tables, and columns
  • JDBC-Based Connection: Uses standard Trino JDBC driver for reliable connectivity
  • Selective Extraction: Supports extraction for specific catalog, schema, or table names

Atlas Integration

  • Entity Management: Creates and updates Atlas entities for Trino metadata objects
  • Relationship Mapping: Establishes proper hierarchical relationships between catalogs, schemas, tables, and columns
  • Synchronization: Maintains consistency by removing Atlas entities that no longer exist in Trino
  • Connector Support: Specialized handling for Trino connectors for which Atlas captures the metadata through individual Hook like Hive, Iceberg

Scheduling & Automation

  • Cron-based Scheduling: Supports automated periodic extraction using cron expressions
  • One-time Execution: Can be run as a single extraction job
  • Error Handling: Robust error handling with detailed logging

Architecture

Quick Start

1. Configuration Setup

Configure the atlas-trino-extractor.properties file:

2. Basic Execution

Configuration Properties

PropertyDescriptionDefaultExample
atlas.rest.addressAtlas REST API endpointhttp://localhost:21000/https://atlas.company.com:21443/
atlas.trino.jdbc.addressTrino JDBC URL-jdbc:trino://trino-server:8080/
atlas.trino.jdbc.userTrino username-admin
atlas.trino.jdbc.passwordTrino password""password123
atlas.trino.namespaceTrino instance namespacecmproduction-cluster
atlas.trino.catalogs.registeredCatalogs to extract-hive,iceberg,mysql
atlas.trino.catalog.hook.enabled.<catalog-name>Hook enabled under atlas for this catalog?falsetrue
atlas.trino.catalog.hook.enabled.<catalog-name>.namespaceNamespace under Atlas for this Hookcmcm
atlas.trino.extractor.scheduleCron expression-0 0 2 * * ?

Command Line Usage

Available Options

OptionLong FormDescriptionExample
-c--catalogExtract specific catalog-c hive_catalog
-s--schemaExtract specific schema-s sales_data
-t--tableExtract specific table-t customer_orders
-cx--cronExpressionSchedule with cron expression-cx "0 0 2 * * ?"
-h--helpDisplay help information-h

Connector-Specific Processing

For Example: Hive Connector Integration

Benefits:

  • Links Trino entities with existing Hive entities
  • Maintains consistency between Hive and Trino metadata
  • Supports environments with Atlas Hive hooks

FAQ

Troubleshooting Questions

Q: Why are some entities not appearing in Atlas?

A: Check catalog registration, permissions, and network connectivity. Review logs for specific errors.

Q: How do I handle large clusters with thousands of tables?

A: Use selective extraction, increase memory allocation, schedule during off-peak hours, and process catalogs individually.

Documentation