blob: f1d301bac3a1e0afb08d19a5542731eec05d76a2 [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Define custom functions\n",
"\n",
"This function loads custom Python functions into a table for use by deep learning algorithms.\n",
"\n",
"Custom functions can be useful if, for example, you need loss functions or metrics that are not built into the standard libraries. The functions to be loaded must be in the form of serialized Python objects created using Dill, which extends Python's pickle module to the majority of the built-in Python types.\n",
"\n",
"Custom functions are also used to return top k categorical accuracy rate in the case that you want a different k value than the default from Keras. This module includes a helper function to create the custom function automatically for a specified k.\n",
"\n",
"This method was added in MADlib 1.18.0.\n",
"\n",
"## <em>Warning</em>\n",
"<em>For security reasons there are controls on custom functions in MADlib. You must be a superuser to create custom functions because they could theoretically allow execution of any untrusted Python code. Regular users with MADlib USAGE permission can use existing custom functions but cannot create new ones or update existing ones.</em>\n",
"\n",
"## Table of contents\n",
"\n",
"<a href=\"#load_psycopg2\">1. Load object using psycopg2</a>\n",
"\n",
"<a href=\"#load_plpython\">2. Load object using a PL/Python function</a>\n",
"\n",
"<a href=\"#delete_object\">3. Delete object</a>\n",
"\n",
"<a href=\"#top_k\">4. Top k accuracy function</a>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%load_ext sql"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Greenplum Database 5.x on GCP - via tunnel\n",
"%sql postgresql://gpadmin@localhost:8000/madlib\n",
" \n",
"# PostgreSQL local\n",
"#%sql postgresql://fmcquillan@localhost:5432/madlib"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>version</th>\n",
" </tr>\n",
" <tr>\n",
" <td>MADlib version: 1.18.0-dev, git revision: rel/v1.17.0-89-g14a91ce, cmake configuration time: Fri Mar 5 23:08:38 UTC 2021, build type: release, build system: Linux-3.10.0-1160.11.1.el7.x86_64, C compiler: gcc 4.8.5, C++ compiler: g++ 4.8.5</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(u'MADlib version: 1.18.0-dev, git revision: rel/v1.17.0-89-g14a91ce, cmake configuration time: Fri Mar 5 23:08:38 UTC 2021, build type: release, build system: Linux-3.10.0-1160.11.1.el7.x86_64, C compiler: gcc 4.8.5, C++ compiler: g++ 4.8.5',)]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql select madlib.version();\n",
"#%sql select version();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"load_psycopg2\"></a>\n",
"# 1. Load object using psycopg2\n",
"Psycopg is a PostgreSQL database adapter for the Python programming language. Note need to use the psycopg2.Binary() method to pass as bytes."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# import database connector psycopg2 and create connection cursor\n",
"import psycopg2 as p2\n",
"conn = p2.connect('postgresql://gpadmin@localhost:8000/madlib')\n",
"cur = conn.cursor()\n",
"\n",
"# import Dill and define functions\n",
"import dill\n",
"\n",
"# custom loss\n",
"def squared_error(y_true, y_pred):\n",
" import keras.backend as K \n",
" return K.square(y_pred - y_true)\n",
"pb_squared_error=dill.dumps(squared_error)\n",
"\n",
"# custom metric\n",
"def rmse(y_true, y_pred):\n",
" import keras.backend as K \n",
" return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))\n",
"pb_rmse=dill.dumps(rmse)\n",
"\n",
"# call load function\n",
"cur.execute(\"DROP TABLE IF EXISTS madlib.custom_function_table\")\n",
"cur.execute(\"SELECT madlib.load_custom_function('custom_function_table', %s,'squared_error', 'squared error')\", [p2.Binary(pb_squared_error)])\n",
"cur.execute(\"SELECT madlib.load_custom_function('custom_function_table', %s,'rmse', 'root mean square error')\", [p2.Binary(pb_rmse)])\n",
"conn.commit()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>description</th>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>squared_error</td>\n",
" <td>squared error</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>rmse</td>\n",
" <td>root mean square error</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(1, u'squared_error', u'squared error'),\n",
" (2, u'rmse', u'root mean square error')]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"load_plpython\"></a>\n",
"# 2. Load object using a PL/Python function"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n"
]
},
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"CREATE OR REPLACE FUNCTION custom_function_squared_error()\n",
"RETURNS BYTEA AS\n",
"$$\n",
"import dill\n",
"def squared_error(y_true, y_pred):\n",
" import tensorflow.keras.backend as K\n",
" return K.square(y_pred - y_true)\n",
"pb_squared_error=dill.dumps(squared_error)\n",
"return pb_squared_error\n",
"$$ language plpythonu;\n",
"CREATE OR REPLACE FUNCTION custom_function_rmse()\n",
"RETURNS BYTEA AS\n",
"$$\n",
"import dill\n",
"def rmse(y_true, y_pred):\n",
" import tensorflow.keras.backend as K\n",
" return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))\n",
"pb_rmse=dill.dumps(rmse)\n",
"return pb_rmse\n",
"$$ language plpythonu;"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"1 rows affected.\n",
"1 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>load_custom_function</th>\n",
" </tr>\n",
" <tr>\n",
" <td></td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('',)]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS madlib.custom_function_table;\n",
"SELECT madlib.load_custom_function('custom_function_table', \n",
" custom_function_squared_error(), \n",
" 'squared_error', \n",
" 'squared error');\n",
"\n",
"SELECT madlib.load_custom_function('custom_function_table', \n",
" custom_function_rmse(), \n",
" 'rmse', \n",
" 'root mean square error');"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>description</th>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>squared_error</td>\n",
" <td>squared error</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>rmse</td>\n",
" <td>root mean square error</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(1, u'squared_error', u'squared error'),\n",
" (2, u'rmse', u'root mean square error')]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"delete_object\"></a>\n",
"# 3. Delete object\n",
"Delete by id:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 rows affected.\n",
"1 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>description</th>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>rmse</td>\n",
" <td>root mean square error</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(2, u'rmse', u'root mean square error')]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT madlib.delete_custom_function( 'custom_function_table', 1);\n",
"SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Delete by name:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>delete_custom_function</th>\n",
" </tr>\n",
" <tr>\n",
" <td></td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[('',)]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT madlib.delete_custom_function( 'custom_function_table', 'rmse');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since this was the last object in the table, if you delete it then the table will also be dropped."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"top_k\"></a>\n",
"# 4. Top k accuracy function\n",
"Load top 3 accuracy function followed by a top 10 accuracy function:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"1 rows affected.\n",
"1 rows affected.\n",
"2 rows affected.\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
" <tr>\n",
" <th>id</th>\n",
" <th>name</th>\n",
" <th>description</th>\n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>top_3_accuracy</td>\n",
" <td>returns top_3_accuracy</td>\n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>top_10_accuracy</td>\n",
" <td>returns top_10_accuracy</td>\n",
" </tr>\n",
"</table>"
],
"text/plain": [
"[(1, u'top_3_accuracy', u'returns top_3_accuracy'),\n",
" (2, u'top_10_accuracy', u'returns top_10_accuracy')]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS madlib.custom_function_table;\n",
"\n",
"SELECT madlib.load_top_k_accuracy_function('custom_function_table',\n",
" 3);\n",
"\n",
"SELECT madlib.load_top_k_accuracy_function('custom_function_table',\n",
" 10);\n",
"\n",
"SELECT id, name, description FROM madlib.custom_function_table ORDER BY id;"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.16"
}
},
"nbformat": 4,
"nbformat_minor": 1
}