| .. Licensed to the Apache Software Foundation (ASF) under one |
| .. or more contributor license agreements. See the NOTICE file |
| .. distributed with this work for additional information |
| .. regarding copyright ownership. The ASF licenses this file |
| .. to you under the Apache License, Version 2.0 (the |
| .. "License"); you may not use this file except in compliance |
| .. with the License. You may obtain a copy of the License at |
| .. |
| .. http://www.apache.org/licenses/LICENSE-2.0 |
| .. |
| .. Unless required by applicable law or agreed to in writing, |
| .. software distributed under the License is distributed on an |
| .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| .. KIND, either express or implied. See the License for the |
| .. specific language governing permissions and limitations |
| .. under the License. |
| |
| ================ |
| Snowflake Driver |
| ================ |
| |
| **Available for:** C/C++, GLib/Ruby, Go, Python, R |
| |
| The Snowflake Driver provides access to Snowflake Database Warehouses. |
| |
| Installation |
| ============ |
| |
| .. tab-set:: |
| |
| .. tab-item:: C/C++ |
| :sync: cpp |
| |
| For conda-forge users: |
| |
| .. code-block:: shell |
| |
| mamba install libadbc-driver-snowflake |
| |
| .. tab-item:: Go |
| :sync: go |
| |
| .. code-block:: shell |
| |
| go get github.com/apache/arrow-adbc/go/adbc/driver/snowflake |
| |
| .. tab-item:: Python |
| :sync: python |
| |
| .. code-block:: shell |
| |
| # For conda-forge |
| mamba install adbc-driver-snowflake |
| |
| # For pip |
| pip install adbc_driver_snowflake |
| |
| .. tab-item:: R |
| :sync: r |
| |
| .. code-block:: shell |
| |
| # install.packages("pak") |
| pak::pak("apache/arrow-adbc/r/adbcsnowflake") |
| |
| Usage |
| ===== |
| |
| To connect to a Snowflake database you can supply the "uri" parameter when |
| constructing the :cpp::class:`AdbcDatabase`. |
| |
| .. tab-set:: |
| |
| .. tab-item:: C++ |
| :sync: cpp |
| |
| .. code-block:: cpp |
| |
| #include "adbc.h" |
| |
| // Ignoring error handling |
| struct AdbcDatabase database; |
| AdbcDatabaseNew(&database, nullptr); |
| AdbcDatabaseSetOption(&database, "driver", "adbc_driver_snowflake", nullptr); |
| AdbcDatabaseSetOption(&database, "uri", "<snowflake uri>", nullptr); |
| AdbcDatabaseInit(&database, nullptr); |
| |
| .. tab-item:: Python |
| :sync: python |
| |
| .. code-block:: python |
| |
| import adbc_driver_snowflake.dbapi |
| |
| with adbc_driver_snowflake.dbapi.connect("<snowflake uri>") as conn: |
| pass |
| |
| .. tab-item:: R |
| :sync: r |
| |
| .. code-block:: r |
| |
| library(adbcdrivermanager) |
| |
| # Use the driver manager to connect to a database |
| uri <- Sys.getenv("ADBC_SNOWFLAKE_TEST_URI") |
| db <- adbc_database_init(adbcsnowflake::adbcsnowflake(), uri = uri) |
| con <- adbc_connection_init(db) |
| |
| .. tab-item:: Go |
| :sync: go |
| |
| .. code-block:: go |
| |
| import ( |
| "context" |
| |
| "github.com/apache/arrow-adbc/go/adbc" |
| "github.com/apache/arrow-adbc/go/adbc/driver/snowflake" |
| ) |
| |
| func main() { |
| var drv snowflake.Driver |
| db, err := drv.NewDatabase(map[string]string{ |
| adbc.OptionKeyURI: "<snowflake uri>", |
| }) |
| if err != nil { |
| // handle error |
| } |
| defer db.Close() |
| |
| cnxn, err := db.Open(context.Background()) |
| if err != nil { |
| // handle error |
| } |
| defer cnxn.Close() |
| } |
| |
| URI Format |
| ---------- |
| |
| The Snowflake URI should be of one of the following formats: |
| |
| - ``user[:password]@account/database/schema[?param1=value1¶mN=valueN]`` |
| - ``user[:password]@account/database[?param1=value1¶mN=valueN]`` |
| - ``user[:password]@host:port/database/schema?account=user_account[¶m1=value1¶mN=valueN]`` |
| - ``host:port/database/schema?account=user_account[¶m1=value1¶mN=valueN]`` |
| |
| Alternately, instead of providing a full URI, the configuration can |
| be entirely supplied using the other available options or some combination |
| of the URI and other options. If a URI is provided, it will be parsed first |
| and any explicit options provided will override anything parsed from the URI. |
| |
| Supported Features |
| ================== |
| |
| The Snowflake driver generally supports features defined in the ADBC API |
| specification 1.0.0, as well as some additional, custom options. |
| |
| Authentication |
| -------------- |
| |
| Snowflake requires some form of authentication to be enabled. By default |
| it will attempt to use Username/Password authentication. The username and |
| password can be provided in the URI or via the ``username`` and ``password`` |
| options to the :cpp:class:`AdbcDatabase`. |
| |
| Alternately, other types of authentication can be specified and customized. |
| See "Client Options" below for details on all the options. |
| |
| SSO Authentication |
| ~~~~~~~~~~~~~~~~~~ |
| |
| Snowflake supports `single sign-on |
| <https://docs.snowflake.com/en/user-guide/admin-security-fed-auth-overview>`_. |
| If your account has been configured with SSO, it can be used with the |
| Snowflake driver by setting the following options when constructing the |
| :cpp:class:`AdbcDatabase`: |
| |
| - ``adbc.snowflake.sql.account``: your Snowflake account. (For example, if |
| you log in to ``https://foobar.snowflakecomputing.com``, then your account |
| identifier is ``foobar``.) |
| - ``adbc.snowflake.sql.auth_type``: ``auth_ext_browser``. |
| - ``username``: your username. (This should probably be your email, |
| e.g. ``jdoe@example.com``.) |
| |
| A new browser tab or window should appear where you can continue the login. |
| Once this is complete, you will have a complete ADBC database/connection |
| object. Some users have reported needing other configuration options, such as |
| ``adbc.snowflake.sql.region`` and ``adbc.snowflake.sql.uri.*`` (see below for |
| a listing). |
| |
| .. tab-set:: |
| |
| .. tab-item:: Python |
| :sync: python |
| |
| .. code-block:: python |
| |
| import adbc_driver_snowflake.dbapi |
| # This will open a new browser tab, and block until you log in. |
| adbc_driver_snowflake.dbapi.connect(db_kwargs={ |
| "adbc.snowflake.sql.account": "foobar", |
| "adbc.snowflake.sql.auth_type": "auth_ext_browser", |
| "username": "jdoe@example.com", |
| }) |
| |
| .. tab-item:: R |
| :sync: r |
| |
| .. code-block:: r |
| |
| library(adbcdrivermanager) |
| db <- adbc_database_init( |
| adbcsnowflake::adbcsnowflake(), |
| adbc.snowflake.sql.account = 'foobar', |
| adbc.snowflake.sql.auth_type = 'auth_ext_browser' |
| username = 'jdoe@example.com', |
| ) |
| # This will open a new browser tab, and block until you log in. |
| con <- adbc_connection_init(db) |
| |
| .. tab-item:: Go |
| :sync: go |
| |
| .. code-block:: go |
| |
| import ( |
| "context" |
| |
| "github.com/apache/arrow-adbc/go/adbc" |
| "github.com/apache/arrow-adbc/go/adbc/driver/snowflake" |
| ) |
| |
| func main() { |
| var drv snowflake.Driver |
| db, err := drv.NewDatabase(map[string]string{ |
| snowflake.OptionAccount: "foobar", |
| snowflake.OptionAuthType: snowflake.OptionValueAuthExternalBrowser, |
| adbc.OptionKeyUsername: "jdoe@example.com", |
| }) |
| if err != nil { |
| // handle error |
| } |
| defer db.Close() |
| |
| cnxn, err := db.Open(context.Background()) |
| if err != nil { |
| // handle error |
| } |
| defer cnxn.Close() |
| } |
| |
| |
| |
| Bulk Ingestion |
| -------------- |
| |
| Bulk ingestion is supported. The mapping from Arrow types to Snowflake types |
| is provided below. |
| |
| Bulk ingestion is implemented by writing Arrow data to Parquet file(s) and uploading (via PUT) to a temporary internal stage. |
| One or more COPY queries are executed in order to load the data into the target table. |
| |
| In order for the driver to leverage this temporary stage, the user must have |
| the `CREATE STAGE <https://docs.snowflake.com/en/sql-reference/sql/create-stage>` privilege on the schema. In addition, |
| the current database and schema for the session must be set. If these are not set, the ``CREATE TEMPORARY STAGE`` command |
| executed by the driver can fail with the following error: |
| |
| .. code-block:: sql |
| |
| CREATE TEMPORARY STAGE ADBC$BIND FILE_FORMAT = (TYPE = PARQUET USE_LOGICAL_TYPE = TRUE BINARY_AS_TEXT = FALSE) |
| CANNOT perform CREATE STAGE. This session does not have a current schema. Call 'USE SCHEMA' or use a qualified name. |
| |
| The following informal benchmark demonstrates expected performance using default ingestion settings:: |
| |
| Running on GCP e2-standard-4 (4 vCPU, 16GB RAM) |
| Snowflake warehouse size M, same GCP region as Snowflake account |
| Default ingestion settings |
| |
| TPC-H Lineitem (16 Columns): |
| Scale Factor 1 (6M Rows): 9.5s |
| Scale Factor 10 (60M Rows): 45s |
| |
| The default settings for ingestion should be well balanced for many real-world configurations. If required, performance |
| and resource usage may be tuned with the following options on the :cpp:class:`AdbcStatement` object: |
| |
| ``adbc.snowflake.rpc.ingest_writer_concurrency`` |
| Number of Parquet files to write in parallel. Default attempts to maximize workers based on logical cores detected, |
| but may need to be adjusted if running in a constrained environment. If set to 0, default value is used. Cannot be negative. |
| |
| ``adbc.snowflake.rpc.ingest_upload_concurrency`` |
| Number of Parquet files to upload in parallel. Greater concurrency can smooth out TCP congestion and help make |
| use of available network bandwith, but will increase memory utilization. Default is 8. If set to 0, default value is used. |
| Cannot be negative. |
| |
| ``adbc.snowflake.rpc.ingest_copy_concurrency`` |
| Maximum number of COPY operations to run concurrently. Bulk ingestion performance is optimized by executing COPY |
| queries as files are still being uploaded. Snowflake COPY speed scales with warehouse size, so smaller warehouses |
| may benefit from setting this value higher to ensure long-running COPY queries do not block newly uploaded files |
| from being loaded. Default is 4. If set to 0, only a single COPY query will be executed as part of ingestion, |
| once all files have finished uploading. Cannot be negative. |
| |
| ``adbc.snowflake.rpc.ingest_target_file_size`` |
| Approximate size of Parquet files written during ingestion. Actual size will be slightly larger, depending on |
| size of footer/metadata. Default is 10 MB. If set to 0, file size has no limit. Cannot be negative. |
| |
| Partitioned Result Sets |
| ----------------------- |
| |
| Partitioned result sets are not currently supported. |
| |
| Performance |
| ----------- |
| |
| When querying Snowflake data, results are potentially fetched in parallel from multiple endpoints. |
| A limited number of batches are queued per endpoint, though data is always |
| returned to the client in the order of the endpoints. |
| |
| To manage the performance of result fetching there are two options to control |
| buffering and concurrency behavior. These options are only available to be set |
| on the :cpp:class:`AdbcStatement` object: |
| |
| ``adbc.rpc.result_queue_size`` |
| The number of batches to queue in the record reader. Defaults to 200. |
| Must be an integer > 0. |
| |
| ``adbc.snowflake.rpc.prefetch_concurrency`` |
| The number of concurrent streams being fetched from snowflake at a time. |
| Defaults to 10. Must be an integer > 0. |
| |
| Transactions |
| ------------ |
| |
| Transactions are supported. Keep in mind that Snowflake transactions will |
| implicitly commit if any DDL statements are run, such as ``CREATE TABLE``. |
| |
| Client Options |
| -------------- |
| |
| The options used for creating a Snowflake Database connection can be customized. |
| These options map 1:1 with the Snowflake `Config object <https://pkg.go.dev/github.com/snowflakedb/gosnowflake#Config>`. |
| |
| ``adbc.snowflake.sql.db`` |
| The database this session should default to using. |
| |
| ``adbc.snowflake.sql.schema`` |
| The schema this session should default to using. |
| |
| ``adbc.snowflake.sql.warehouse`` |
| The warehouse this session should default to using. |
| |
| ``adbc.snowflake.sql.role`` |
| The role that should be used for authentication. |
| |
| ``adbc.snowflake.sql.region`` |
| The Snowflake region to use for constructing the connection URI. |
| |
| ``adbc.snowflake.sql.account`` |
| The Snowflake account that should be used for authentication and building the |
| connection URI. |
| |
| ``adbc.snowflake.sql.uri.protocol`` |
| This should be either `http` or `https`. |
| |
| ``adbc.snowflake.sql.uri.port`` |
| The port to use for constructing the URI for connection. |
| |
| ``adbc.snowflake.sql.uri.host`` |
| The explicit host to use for constructing the URL to connect to. |
| |
| ``adbc.snowflake.sql.auth_type`` |
| Allows specifying alternate types of authentication, the allowed values are: |
| |
| - ``auth_snowflake``: General username/password authentication (this is the default) |
| - ``auth_oauth``: Use OAuth authentication for the snowflake connection. |
| - ``auth_ext_browser``: Use an external browser to access a FED and perform SSO auth. |
| - ``auth_okta``: Use a native Okta URL to perform SSO authentication using Okta |
| - ``auth_jwt``: Use a provided JWT to perform authentication. |
| - ``auth_mfa``: Use a username and password with MFA. |
| |
| ``adbc.snowflake.sql.client_option.auth_token`` |
| If using OAuth or another form of authentication, this option is how you can |
| explicitly specify the token to be used for connection. |
| |
| ``adbc.snowflake.sql.client_option.okta_url`` |
| If using ``auth_okta``, this option is required in order to specify the |
| Okta URL to connect to for SSO authentication. |
| |
| ``adbc.snowflake.sql.client_option.login_timeout`` |
| Specify login retry timeout *excluding* network roundtrip and reading http responses. |
| Value should be formatted as described `here <https://pkg.go.dev/time#ParseDuration>`, |
| such as ``300ms``, ``1.5s`` or ``1m30s``. Even though negative values are accepted, |
| the absolute value of such a duration will be used. |
| |
| ``adbc.snowflake.sql.client_option.request_timeout`` |
| Specify request retry timeout *excluding* network roundtrip and reading http responses. |
| Value should be formatted as described `here <https://pkg.go.dev/time#ParseDuration>`, |
| such as ``300ms``, ``1.5s`` or ``1m30s``. Even though negative values are accepted, |
| the absolute value of such a duration will be used. |
| |
| ``adbc.snowflake.sql.client_option.jwt_expire_timeout`` |
| JWT expiration will occur after this timeout. |
| Value should be formatted as described `here <https://pkg.go.dev/time#ParseDuration>`, |
| such as ``300ms``, ``1.5s`` or ``1m30s``. Even though negative values are accepted, |
| the absolute value of such a duration will be used. |
| |
| ``adbc.snowflake.sql.client_option.client_timeout`` |
| Specify timeout for network roundtrip and reading http responses. |
| Value should be formatted as described `here <https://pkg.go.dev/time#ParseDuration>`, |
| such as ``300ms``, ``1.5s`` or ``1m30s``. Even though negative values are accepted, |
| the absolute value of such a duration will be used. |
| |
| ``adbc.snowflake.sql.client_option.app_name`` |
| Allows specifying the Application Name to Snowflake for the connection. |
| |
| ``adbc.snowflake.sql.client_option.tls_skip_verify`` |
| Disable verification of the server's TLS certificate. Value should be ``true`` |
| or ``false``. |
| |
| ``adbc.snowflake.sql.client_option.ocsp_fail_open_mode`` |
| Control the fail open mode for OCSP. Default is ``true``. Value should |
| be either ``true`` or ``false``. |
| |
| ``adbc.snowflake.sql.client_option.keep_session_alive`` |
| Enable the session to persist even after the connection is closed. Value |
| should be either ``true`` or ``false``. |
| |
| ``adbc.snowflake.sql.client_option.jwt_private_key`` |
| Specify the RSA private key which should be used to sign the JWT for |
| authentication. This should be a path to a file containing a PKCS1 |
| private key to be read in and parsed. Commonly encoded in PEM blocks |
| of type "RSA PRIVATE KEY". |
| |
| ``adbc.snowflake.sql.client_option.jwt_private_key_pkcs8_value`` |
| Parses an encrypted or unencrypted PKCS #8 private key without having to |
| read it from the file system. If using encrypted, the |
| ``adbc.snowflake.sql.client_option.jwt_private_key_pkcs8_password`` value |
| is required and used to decrypt. |
| |
| ``adbc.snowflake.sql.client_option.jwt_private_key_pkcs8_password`` |
| Passcode to use when passing an encrypted PKCS #8 value. |
| |
| ``adbc.snowflake.sql.client_option.disable_telemetry`` |
| The Snowflake driver allows for telemetry information which can be |
| disabled by setting this to ``true``. Value should be either ``true`` |
| or ``false``. |
| |
| ``adbc.snowflake.sql.client_option.tracing`` |
| Set the logging level |
| |
| ``adbc.snowflake.sql.client_option.cache_mfa_token`` |
| When ``true``, the MFA token is cached in the credential manager. Defaults |
| to ``true`` on Windows/OSX, ``false`` on Linux. |
| |
| ``adbc.snowflake.sql.client_option.store_temp_creds`` |
| When ``true``, the ID token is cached in the credential manager. Defaults |
| to ``true`` on Windows/OSX, ``false`` on Linux. |
| |
| ``adbc.snowflake.sql.client_option.use_high_precision`` |
| When ``true``, fixed-point snowflake columns with the type ``NUMBER`` |
| will be returned as ``Decimal128`` type Arrow columns using the precision |
| and scale of the ``NUMBER`` type. When ``false``, ``NUMBER`` columns |
| with a scale of 0 will be returned as ``Int64`` typed Arrow columns and |
| non-zero scaled columns will be returned as ``Float64`` typed Arrow columns. |
| The default is ``true``. |
| |
| |
| Metadata |
| -------- |
| |
| When calling :cpp:func:`AdbcConnectionGetTableSchema`, the returned Arrow Schema |
| will contain metadata on each field: |
| |
| ``DATA_TYPE`` |
| This will be a string containing the raw Snowflake data type of this column |
| |
| ``PRIMARY_KEY`` |
| This will be either ``Y`` or ``N`` to indicate a column is a primary key. |
| |
| In addition, the schema on the stream of results from a query will contain |
| the following metadata keys on each field: |
| |
| ``logicalType`` |
| The Snowflake logical type of this column. Will be one of ``fixed``, |
| ``real``, ``text``, ``date``, ``variant``, ``timestamp_ltz``, ``timestamp_ntz``, |
| ``timestamp_tz``, ``object``, ``array``, ``binary``, ``time``, ``boolean``. |
| |
| ``precision`` |
| An integer representing the Snowflake precision of the field. |
| |
| ``scale`` |
| An integer representing the Snowflake scale of the values in this field. |
| |
| ``charLength`` |
| If a text field, this will be equivalent to the ``VARCHAR(#)`` parameter ``#``. |
| |
| ``byteLength`` |
| Will contain the length, in bytes, of the raw data sent back from Snowflake |
| regardless of the type of the field in Arrow. |
| |
| Type Support |
| ------------ |
| |
| Because Snowflake types do not necessary match up 1-to-1 with Arrow types |
| the following is what should be expected when requesting data. Any conversions |
| indicated are done to ensure consistency of the stream of record batches. |
| |
| .. list-table:: |
| :header-rows: 1 |
| |
| * - Snowflake Type |
| - Arrow Type |
| - Notes |
| |
| * - integral types |
| - number(38, 0) |
| - All integral types in Snowflake are stored as numbers for which neither |
| precision nor scale can be specified. |
| |
| * - float/double |
| - float64 |
| - Snowflake does not distinguish between float or double. Both are 64-bit values. |
| |
| * - decimal/numeric |
| - numeric |
| - Snowflake will respect the precision/scale of the Arrow type. See the |
| ``adbc.snowflake.sql.client_option.use_high_precision`` for exceptions to this |
| behavior. |
| |
| * - time |
| - time64[ns] |
| - For ingestion, time32 can also be used. |
| |
| * - date |
| - date32 |
| - For ingestion, date64 can also be used. |
| |
| * - | timestamp_ltz |
| | timestamp_ntz |
| | timestamp_tz |
| - timestamp[ns] |
| - Local time zone will be used, except for timestamp_ntz which is not an instant. |
| In this case no timezone will be present in the type. Physical values will be |
| UTC-normalized. |
| |
| * - | variant |
| | object |
| | array |
| - string |
| - Snowflake does not provide information about nested |
| types. Values will be strings in a format similar to JSON that |
| can be parsed. The Arrow type will contain a metadata key |
| ``logicalType`` with the Snowflake field type. Arrow Struct and |
| Map types will be stored as objects when ingested. List types will |
| be stored as arrays. |
| |
| * - | geography |
| | geometry |
| - string |
| - There is no current canonical Arrow (extension) type for these |
| types, so they will be returned as the string values that |
| Snowflake provides. |