commit	7f4701846ce0421bec69a5c7b78d8ad66ee285af	[log] [tgz]
author	Julian Hyde <julian@hydromatic.net>	Sat May 25 16:16:09 2013 -0600
committer	Julian Hyde <julian@hydromatic.net>	Sat May 25 16:16:09 2013 -0600
tree	68a8515cd9b8f0dcb60f9d32dea30a6e6a523aad
parent	82eb2a7f2df0cd9a19bc1c4ca79ed3ce51736fef [diff]

tree: 68a8515cd9b8f0dcb60f9d32dea30a6e6a523aad

README.md

optiq-csv

Optiq adapter that reads CSV files.

Optiq-csv is a nice simple example of how to connect Optiq to your own data source and quickly get a full SQL/JDBC interface.

Download and build

You need Java (1.5 or higher; 1.7 preferred) and maven (2 or higher).

$ git clone git://github.com/julianhyde/optiq-csv.git
$ cd optiq-csv
$ mvn compile

Run sqlline

Sqlline is a shell that connects to any JDBC data source and lets you execute SQL queries. Connect to Optiq and try out some queries.

$ ./sqlline
sqlline> !connect jdbc:optiq:model=target/test-classes/model.json admin admin
sqlline> !tables
sqlline> !describe emps
sqlline> SELECT * FROM emps;
sqlline> EXPLAIN PLAN FOR SELECT * FROM emps;
sqlline> !connect jdbc:optiq:model=target/test-classes/smart.json admin admin
sqlline> EXPLAIN PLAN FOR SELECT * FROM emps;
sqlline> SELECT depts.name, count(*)
. . . .> FROM emps JOIN depts USING (deptno)
. . . .> GROUP BY depts.name;
sqlline> VALUES char_length('hello, ' || 'world!');
sqlline> !quit

Advanced use

You can also register a CsvSchema as a schema within an Optiq instance. Then you can combine with other data sources.

You can write a “vanity JDBC driver” with a different name.

You can add optimizer rules and new implementations of relational operators to execute queries more efficiently.

More information

License: Apache License, Version 2.0.
Author: Julian Hyde
Blog: http://julianhyde.blogspot.com
Project page: http://www.hydromatic.net/optiq-csv
Source code: http://github.com/julianhyde/optiq-csv
Developers list: http://groups.google.com/group/optiq-dev

Tutorial

Optiq-csv is a fully functional adapter for Optiq that reads text files in CSV (comma-separated values) format. It is remarkable that a couple of hundred lines of Java code are sufficient to provide full SQL query capability.

But optiq-csv also serves as a template for building adapters to other data formats. It covers several important concepts:

user-defined schema using SchemaFactory and Schema interfaces;
declaring schemas in a model JSON file;
user-defined table using the Table interface;
determining the record type of a table;
a simple implementation of Table that enumerates all rows directly;
advanced implementation of Table that translates to relational operators using planner rules.

Without further ado, let‘s get started. Assuming you have already installed (using git clone and mvn install), we’ll connect to Optiq using the sqlline utility, which is included in the optiq-csv github project.

  $ ./sqlline
  sqlline> !connect jdbc:optiq:model=target/test-classes/model.json admin admin

Execute a metadata query:

  sqlline> !tables
  +------------+--------------+-------------+---------------+----------+------+
  | TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  |  TABLE_TYPE   | REMARKS  | TYPE |
  +------------+--------------+-------------+---------------+----------+------+
  | null       | SALES        | DEPTS       | TABLE         | null     | null |
  | null       | SALES        | EMPS        | TABLE         | null     | null |
  | null       | metadata     | COLUMNS     | SYSTEM_TABLE  | null     | null |
  | null       | metadata     | TABLES      | SYSTEM_TABLE  | null     | null |
  +------------+--------------+-------------+---------------+----------+------+

As you can see there are 4 tables in the system: tables EMPS and DEPTS in the current SALES schema, and COLUMNS and TABLES in the system “metadata” schema. The system tables are always in Optiq, but the other tables are provided by the specific implementation of the schema; in this case, the EMPS and DEPTS tables are based on the EMPS.csv and DEPTS.csv files in the target/test-classes directory.

Let's execute some queries on those tables, to show that Optiq is providing a full implementation of SQL.

  sqlline> SELECT * FROM emps;
  +--------+--------+---------+---------+----------------+--------+-------+---+
  | EMPNO  |  NAME  | DEPTNO  | GENDER  |      CITY      | EMPID  |  AGE  | S |
  +--------+--------+---------+---------+----------------+--------+-------+---+
  | 100    | Fred   | 10      |         |                | 30     | 25    | t |
  | 110    | Eric   | 20      | M       | San Francisco  | 3      | 80    | n |
  | 110    | John   | 40      | M       | Vancouver      | 2      | null  | f |
  | 120    | Wilma  | 20      | F       |                | 1      | 5     | n |
  | 130    | Alice  | 40      | F       | Vancouver      | 2      | null  | f |
  +--------+--------+---------+---------+----------------+--------+-------+---+

JOIN and GROUP BY:

  sqlline> SELECT d.name, COUNT(*)
  . . . .> FROM emps AS e JOIN depts AS d ON e.deptno = d.deptno
  . . . .> GROUP BY d.name;
  +------------+---------+
  |    NAME    | EXPR$1  |
  +------------+---------+
  | Sales      | 1       |
  | Marketing  | 2       |
  +------------+---------+

The VALUES operator generates a single row:

  sqlline> VALUES CHAR_LENGTH('Hello, ' || 'world!');
  +---------+
  | EXPR$0  |
  +---------+
  | 13      |
  +---------+

Now, how did Optiq find these tables? Remember, core Optiq does not know anything about CSV files. (As a “database without a storage layer”, Optiq doesn't know about any file formats.) Optiq knows about those tables because we told it to run code in the optiq-csv project. There are a couple of steps in that chain.

First, on the JDBC connect string we gave the path of a model in JSON format. Here is the model:

    {
      version: '1.0',
      defaultSchema: 'SALES',
      schemas: [
        {
          name: 'SALES',
          type: 'custom',
          factory: 'net.hydromatic.optiq.impl.csv.CsvSchemaFactory',
          operand: {
            directory: 'target/test-classes/sales'
          }
        }
      ]
    }

The model defines a single schema called ‘SALES’. The schema is powered by a plugin class, net.hydromatic.optiq.impl.csv.CsvSchemaFactory, which is part of the optiq-csv project and implements the Optiq interface SchemaFactory. Its create method instantiates a schema, passing in the directory argument from the model file:

public Schema create(MutableSchema parentSchema, String name,
    Map<String, Object> operand) {
  Map map = (Map) operand;
  String directory = (String) map.get("directory");
  Boolean smart = (Boolean) map.get("smart");
  final CsvSchema schema =
      new CsvSchema(
          parentSchema,
          new File(directory),
          parentSchema.getSubSchemaExpression(name, CsvSchema.class),
          smart != null && smart);
  parentSchema.addSchema(name, schema);
  return schema;
}

Driven by the model, the schema factory instantiates a single schema called ‘SALES’. The schema is an instance of net.hydromatic.optiq.impl.csv.CsvSchema and implements the Optiq interface Schema.

A schema‘s job is to produce a list of tables. (It can also list sub-schemas and table-functions, but these are advanced features and optiq-csv does not support them.) The tables implement Optiq’s Table interface. CsvSchema produces tables that are instances of CsvTable.

Here is the relevant code from CsvSchema:

  private Map<String, TableInSchema> map;

  public Collection<TableInSchema> getTables() {
    return computeMap().values();
  }

  private synchronized Map<String, TableInSchema> computeMap() {
    if (map == null) {
      map = new HashMap<String, TableInSchema>();
      File[] files = directoryFile.listFiles(
          new FilenameFilter() {
            public boolean accept(File dir, String name) {
              return name.endsWith(".csv");
            }
          });
      for (File file : files) {
        String tableName = file.getName();
        if (tableName.endsWith(".csv")) {
          tableName = tableName.substring(
              0, tableName.length() - ".csv".length());
        }
        final List<CsvFieldType> fieldTypes = new ArrayList<CsvFieldType>();
        final RelDataType rowType =
            CsvTable.deduceRowType(typeFactory, file, fieldTypes);
        final CsvTable table;
        if (smart) {
          table = new CsvSmartTable(this, tableName, file, rowType, fieldTypes);
        } else {
          table = new CsvTable(this, tableName, file, rowType, fieldTypes);
        }
        map.put(
            tableName,
            new TableInSchemaImpl(this, tableName, TableType.TABLE, table));
      }
    }
    return map;
  }

The schema scans the directory and finds all files whose name ends with “.csv” and creates tables for them. In this case, the directory is target/test-classes/sales and contains files EMPS.csv and DEPTS.csv, which these become the tables EMPS and DEPTS.

Note how we did not need to define any tables in the model; the schema generated the tables automatically. (It is also possible to define tables explicitly. We shall cover this later.)

Further topics (to be written): defining a custom schema; DML; defining views; calling conventions; statistics and cost; user-defined functions; defining tables explicitly; defining custom tables.