| --- |
| id: druid-lookups |
| title: "Cached Lookup Module" |
| --- |
| |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one |
| ~ or more contributor license agreements. See the NOTICE file |
| ~ distributed with this work for additional information |
| ~ regarding copyright ownership. The ASF licenses this file |
| ~ to you under the Apache License, Version 2.0 (the |
| ~ "License"); you may not use this file except in compliance |
| ~ with the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, |
| ~ software distributed under the License is distributed on an |
| ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| ~ KIND, either express or implied. See the License for the |
| ~ specific language governing permissions and limitations |
| ~ under the License. |
| --> |
| |
| ## Description |
| This Apache Druid module provides a per-lookup caching mechanism for JDBC data sources. |
| The main goal of this cache is to speed up the access to a high latency lookup sources and to provide a caching isolation for every lookup source. |
| Thus user can define various caching strategies or and implementation per lookup, even if the source is the same. |
| This module can be used side to side with other lookup module like the global cached lookup module. |
| |
| To use this Apache Druid extension, [include](../../configuration/extensions.md#loading-extensions) `druid-lookups-cached-single` in the extensions load list. |
| |
| :::info |
| To use JDBC, you must add your database client JAR files to the extension's directory. |
| For Postgres, the connector JAR is already included. |
| See the MySQL extension documentation for instructions to obtain [MySQL](./mysql.md#install-mysql-connectorj) or [MariaDB](./mysql.md#install-mariadb-connectorj) connector libraries. |
| Copy or symlink the downloaded file to `extensions/druid-lookups-cached-single` under the distribution root directory. |
| ::: |
| |
| ## Architecture |
| Generally speaking this module can be divided into two main component, namely, the data fetcher layer and caching layer. |
| |
| ### Data Fetcher layer |
| |
| First part is the data fetcher layer API `DataFetcher`, that exposes a set of fetch methods to fetch data from the actual Lookup dimension source. |
| For instance `JdbcDataFetcher` provides an implementation of `DataFetcher` that can be used to fetch key/value from a RDBMS via JDBC driver. |
| If you need new type of data fetcher, all you need to do, is to implement the interface `DataFetcher` and load it via another druid module. |
| ### Caching layer |
| |
| This extension comes with two different caching strategies. First strategy is a poll based and the second is a load based. |
| #### Poll lookup cache |
| |
| The poll strategy cache strategy will fetch and swap all the pair of key/values periodically from the lookup source. |
| Hence, user should make sure that the cache can fit all the data. |
| The current implementation provides 2 type of poll cache, the first is on-heap (uses immutable map), while the second uses MapDB based off-heap map. |
| User can also implement a different lookup polling cache by implementing `PollingCacheFactory` and `PollingCache` interfaces. |
| |
| #### Loading lookup |
| Loading cache strategy will load the key/value pair upon request on the key it self, the general algorithm is load key if absent. |
| Once the key/value pair is loaded eviction will occur according to the cache eviction policy. |
| This module comes with two loading lookup implementation, the first is on-heap backed by a Guava cache implementation, the second is MapDB off-heap implementation. |
| Both implementations offer various eviction strategies. |
| Same for Loading cache, developer can implement a new type of loading cache by implementing `LookupLoadingCache` interface. |
| |
| ## Configuration and Operation: |
| |
| |
| ### Polling Lookup |
| |
| **Note that the current implementation of `offHeapPolling` and `onHeapPolling` will create two caches one to lookup value based on key and the other to reverse lookup the key from value** |
| |
| |Field|Type|Description|Required|default| |
| |-----|----|-----------|--------|-------| |
| |dataFetcher|JSON object|Specifies the lookup data fetcher type for fetching data|yes|null| |
| |cacheFactory|JSON Object|Cache factory implementation|no |onHeapPolling| |
| |pollPeriod|Period|polling period |no |null (poll once)| |
| |
| |
| ##### Example of Polling On-heap Lookup |
| This example demonstrates a polling cache that will update its on-heap cache every 10 minutes |
| |
| ```json |
| { |
| "type": "pollingLookup", |
| "pollPeriod": "PT10M", |
| "dataFetcher": { |
| "type": "jdbcDataFetcher", |
| "connectorConfig": { |
| "connectURI": "jdbc://mysql://localhost:3306/my_data_base", |
| "user": "druid", |
| "password": "druid" |
| }, |
| "table": "lookup_table_name", |
| "keyColumn": "key_column_name", |
| "valueColumn": "value_column_name" |
| }, |
| "cacheFactory": { |
| "type": "onHeapPolling" |
| } |
| } |
| |
| ``` |
| |
| ##### Example Polling Off-heap Lookup |
| This example demonstrates an off-heap lookup that will be cached once and never swapped `(pollPeriod == null)` |
| |
| ```json |
| { |
| "type": "pollingLookup", |
| "dataFetcher": { |
| "type": "jdbcDataFetcher", |
| "connectorConfig": { |
| "connectURI": "jdbc://mysql://localhost:3306/my_data_base", |
| "user": "druid", |
| "password": "druid" |
| }, |
| "table": "lookup_table_name", |
| "keyColumn": "key_column_name", |
| "valueColumn": "value_column_name" |
| }, |
| "cacheFactory": { |
| "type": "offHeapPolling" |
| } |
| } |
| |
| ``` |
| |
| |
| ### Loading lookup |
| |
| |Field|Type|Description|Required|default| |
| |-----|----|-----------|--------|-------| |
| |dataFetcher|JSON object|Specifies the lookup data fetcher type to use in order to fetch data|yes|null| |
| |loadingCacheSpec|JSON Object|Lookup cache spec implementation|yes |null| |
| |reverseLoadingCacheSpec|JSON Object| Reverse lookup cache implementation|yes |null| |
| |
| |
| ##### Example Loading On-heap Guava |
| |
| Guava cache configuration spec. |
| |
| |Field|Type|Description|Required|default| |
| |-----|----|-----------|--------|-------| |
| |concurrencyLevel|int|Allowed concurrency among update operations|no|4| |
| |initialCapacity|int|Initial capacity size|no |null| |
| |maximumSize|long| Specifies the maximum number of entries the cache may contain.|no |null (infinite capacity)| |
| |expireAfterAccess|long| Specifies the eviction time after last read in milliseconds.|no |null (No read-time-based eviction when set to null)| |
| |expireAfterWrite|long| Specifies the eviction time after last write in milliseconds.|no |null (No write-time-based eviction when set to null)| |
| |
| ```json |
| { |
| "type": "loadingLookup", |
| "dataFetcher": { |
| "type": "jdbcDataFetcher", |
| "connectorConfig": { |
| "connectURI": "jdbc://mysql://localhost:3306/my_data_base", |
| "user": "druid", |
| "password": "druid" |
| }, |
| "table": "lookup_table_name", |
| "keyColumn": "key_column_name", |
| "valueColumn": "value_column_name" |
| }, |
| "loadingCacheSpec": { |
| "type": "guava" |
| }, |
| "reverseLoadingCacheSpec": { |
| "type": "guava", |
| "maximumSize": 500000, |
| "expireAfterAccess": 100000, |
| "expireAfterWrite": 10000 |
| } |
| } |
| ``` |
| |
| ##### Example Loading Off-heap MapDB |
| |
| Off heap cache is backed by [MapDB](http://www.mapdb.org/) implementation. MapDB is using direct memory as memory pool, please take that into account when limiting the JVM direct memory setup. |
| |
| |Field|Type|Description|Required|default| |
| |-----|----|-----------|--------|-------| |
| |maxStoreSize|double|maximal size of store in GiB, if store is larger entries will start expiring|no |0| |
| |maxEntriesSize|long| Specifies the maximum number of entries the cache may contain.|no |0 (infinite capacity)| |
| |expireAfterAccess|long| Specifies the eviction time after last read in milliseconds.|no |0 (No read-time-based eviction when set to null)| |
| |expireAfterWrite|long| Specifies the eviction time after last write in milliseconds.|no |0 (No write-time-based eviction when set to null)| |
| |
| |
| ```json |
| { |
| "type": "loadingLookup", |
| "dataFetcher": { |
| "type": "jdbcDataFetcher", |
| "connectorConfig": { |
| "connectURI": "jdbc://mysql://localhost:3306/my_data_base", |
| "user": "druid", |
| "password": "druid" |
| }, |
| "table": "lookup_table_name", |
| "keyColumn": "key_column_name", |
| "valueColumn": "value_column_name" |
| }, |
| "loadingCacheSpec": { |
| "type": "mapDb", |
| "maxEntriesSize": 100000 |
| }, |
| "reverseLoadingCacheSpec": { |
| "type": "mapDb", |
| "maxStoreSize": 5, |
| "expireAfterAccess": 100000, |
| "expireAfterWrite": 10000 |
| } |
| } |
| ``` |
| |
| ### JDBC Data Fetcher |
| |
| |Field|Type|Description|Required|default| |
| |-----|----|-----------|--------|-------| |
| |`connectorConfig`|JSON object|Specifies the database connection details. You can set `connectURI`, `user` and `password`. You can selectively allow JDBC properties in `connectURI`. See [JDBC connections security config](../../configuration/index.md#jdbc-connections-to-external-databases) for more details.|yes|| |
| |`table`|string|The table name to read from.|yes|| |
| |`keyColumn`|string|The column name that contains the lookup key.|yes|| |
| |`valueColumn`|string|The column name that contains the lookup value.|yes|| |
| |`streamingFetchSize`|int|Fetch size used in JDBC connections.|no|1000| |