| { |
| "cells": [ |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "# Transfer Learning Using Keras and MADlib\n", |
| "\n", |
| "This is a transfer learning example based on https://keras.io/examples/mnist_transfer_cnn/ \n", |
| "\n", |
| "To load images into tables we use the script called <em>madlib_image_loader.py</em> located at https://github.com/apache/madlib-site/tree/asf-site/community-artifacts/Deep-learning which uses the Python Imaging Library so supports multiple formats http://www.pythonware.com/products/pil/\n", |
| "\n", |
| "## Table of contents\n", |
| "<a href=\"#import_libraries\">1. Import libraries</a>\n", |
| "\n", |
| "<a href=\"#load_and_prepare_data\">2. Load and prepare data</a>\n", |
| "\n", |
| "<a href=\"#image_preproc\">3. Call image preprocessor</a>\n", |
| "\n", |
| "<a href=\"#define_and_load_model\">4. Define and load model architecture</a>\n", |
| "\n", |
| "<a href=\"#train\">5. Train</a>\n", |
| "\n", |
| "<a href=\"#transfer_learning\">6. Transfer learning</a>" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 1, |
| "metadata": { |
| "scrolled": true |
| }, |
| "outputs": [], |
| "source": [ |
| "%load_ext sql" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 2, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Greenplum Database 5.x on GCP - via tunnel\n", |
| "%sql postgresql://gpadmin@localhost:8000/madlib\n", |
| " \n", |
| "# PostgreSQL local\n", |
| "#%sql postgresql://fmcquillan@localhost:5432/madlib" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 3, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "1 rows affected.\n" |
| ] |
| }, |
| { |
| "data": { |
| "text/html": [ |
| "<table>\n", |
| " <tr>\n", |
| " <th>version</th>\n", |
| " </tr>\n", |
| " <tr>\n", |
| " <td>MADlib version: 1.18.0-dev, git revision: rel/v1.17.0-89-g14a91ce, cmake configuration time: Fri Mar 5 23:08:38 UTC 2021, build type: release, build system: Linux-3.10.0-1160.11.1.el7.x86_64, C compiler: gcc 4.8.5, C++ compiler: g++ 4.8.5</td>\n", |
| " </tr>\n", |
| "</table>" |
| ], |
| "text/plain": [ |
| "[(u'MADlib version: 1.18.0-dev, git revision: rel/v1.17.0-89-g14a91ce, cmake configuration time: Fri Mar 5 23:08:38 UTC 2021, build type: release, build system: Linux-3.10.0-1160.11.1.el7.x86_64, C compiler: gcc 4.8.5, C++ compiler: g++ 4.8.5',)]" |
| ] |
| }, |
| "execution_count": 3, |
| "metadata": {}, |
| "output_type": "execute_result" |
| } |
| ], |
| "source": [ |
| "%sql select madlib.version();\n", |
| "#%sql select version();" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "<a id=\"import_libraries\"></a>\n", |
| "# 1. Import libraries\n", |
| "From https://keras.io/examples/mnist_transfer_cnn/ import libraries and define some params" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 4, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "from __future__ import print_function\n", |
| "\n", |
| "import datetime\n", |
| "from tensorflow import keras\n", |
| "from tensorflow.keras.datasets import mnist\n", |
| "from tensorflow.keras.models import Sequential\n", |
| "from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten\n", |
| "from tensorflow.keras.layers import Conv2D, MaxPooling2D\n", |
| "from tensorflow.keras import backend as K\n", |
| "\n", |
| "now = datetime.datetime.now\n", |
| "\n", |
| "batch_size = 128\n", |
| "num_classes = 5\n", |
| "epochs = 5\n", |
| "\n", |
| "# input image dimensions\n", |
| "img_rows, img_cols = 28, 28\n", |
| "# number of convolutional filters to use\n", |
| "filters = 32\n", |
| "# size of pooling area for max pooling\n", |
| "pool_size = 2\n", |
| "# convolution kernel size\n", |
| "kernel_size = 3\n", |
| "\n", |
| "if K.image_data_format() == 'channels_first':\n", |
| " input_shape = (1, img_rows, img_cols)\n", |
| "else:\n", |
| " input_shape = (img_rows, img_cols, 1)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Others needed in this workbook" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 5, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "import pandas as pd\n", |
| "import numpy as np" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "<a id=\"load_and_prepare_data\"></a>\n", |
| "# 2. Load and prepare data\n", |
| "\n", |
| "First load MNIST data from Keras, consisting of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images." |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 6, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "(4861, 28, 28)\n", |
| "(4861, 28, 28, 1)\n" |
| ] |
| } |
| ], |
| "source": [ |
| "# the data, split between train and test sets\n", |
| "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", |
| "\n", |
| "# create two datasets one with digits below 5 and one with 5 and above\n", |
| "x_train_lt5 = x_train[y_train < 5]\n", |
| "y_train_lt5 = y_train[y_train < 5]\n", |
| "x_test_lt5 = x_test[y_test < 5]\n", |
| "y_test_lt5 = y_test[y_test < 5]\n", |
| "\n", |
| "x_train_gte5 = x_train[y_train >= 5]\n", |
| "y_train_gte5 = y_train[y_train >= 5] - 5\n", |
| "x_test_gte5 = x_test[y_test >= 5]\n", |
| "y_test_gte5 = y_test[y_test >= 5] - 5\n", |
| "\n", |
| "# reshape to match model architecture\n", |
| "print(x_test_gte5.shape)\n", |
| "x_train_lt5=x_train_lt5.reshape(len(x_train_lt5), *input_shape)\n", |
| "x_test_lt5 = x_test_lt5.reshape(len(x_test_lt5), *input_shape)\n", |
| "x_train_gte5=x_train_gte5.reshape(len(x_train_gte5), *input_shape)\n", |
| "x_test_gte5 = x_test_gte5.reshape(len(x_test_gte5), *input_shape)\n", |
| "print(x_test_gte5.shape)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Load datasets into tables using image loader scripts called <em>madlib_image_loader.py</em> located at https://github.com/apache/madlib-site/tree/asf-site/community-artifacts/Deep-learning" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 7, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# MADlib tools directory\n", |
| "import sys\n", |
| "import os\n", |
| "madlib_site_dir = '/Users/fmcquillan/Documents/Product/MADlib/Demos/data'\n", |
| "sys.path.append(madlib_site_dir)\n", |
| "\n", |
| "# Import image loader module\n", |
| "from madlib_image_loader import ImageLoader, DbCredentials" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 8, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Specify database credentials, for connecting to db\n", |
| "#db_creds = DbCredentials(user='gpadmin',\n", |
| "# host='35.239.240.26',\n", |
| "# port='5432',\n", |
| "# password='')\n", |
| "\n", |
| "db_creds = DbCredentials(user='gpadmin',\n", |
| " host='localhost',\n", |
| " port='8000',\n", |
| " password='')" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 9, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# Initialize ImageLoader (increase num_workers to run faster)\n", |
| "iloader = ImageLoader(num_workers=5, db_creds=db_creds)" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": 10, |
| "metadata": {}, |
| "outputs": [ |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "Done.\n" |
| ] |
| }, |
| { |
| "data": { |
| "text/plain": [ |
| "[]" |
| ] |
| }, |
| "execution_count": 10, |
| "metadata": {}, |
| "output_type": "execute_result" |
| }, |
| { |
| "name": "stdout", |
| "output_type": "stream", |
| "text": [ |
| "MainProcess: Connected to madlib db.\n", |
| "Executing: CREATE TABLE train_lt5 (id SERIAL, x REAL[], y TEXT[])\n", |
| "CREATE TABLE\n", |
| "Created table train_lt5 in madlib db\n", |
| "Spawning 5 workers...\n", |
| "Initializing PoolWorker-1 [pid 84275]\n", |
| "PoolWorker-1: Created temporary directory /tmp/madlib_5TU8FybuWQ\n", |
| "Initializing PoolWorker-2 [pid 84276]\n", |
| "PoolWorker-2: Created temporary directory /tmp/madlib_LjDRu2RVLy\n", |
| "Initializing PoolWorker-3 [pid 84277]\n", |
| "PoolWorker-3: Created temporary directory /tmp/madlib_ksuUrx0mOn\n", |
| "Initializing PoolWorker-4 [pid 84278]\n", |
| "PoolWorker-4: Created temporary directory /tmp/madlib_f2SlPjS13H\n", |
| "PoolWorker-5: Created temporary directory /tmp/madlib_8GA0SlnXzj\n", |
| "Initializing PoolWorker-5 [pid 84279]\n", |
| "PoolWorker-4: Connected to madlib db.\n", |
| "PoolWorker-5: Connected to madlib db.\n", |
| "PoolWorker-2: Connected to madlib db.\n", |
| "PoolWorker-1: Connected to madlib db.\n", |
| "PoolWorker-3: Connected to madlib db.\n", |
| "PoolWorker-5: Wrote 1000 images to /tmp/madlib_8GA0SlnXzj/train_lt50000.tmp\n", |
| "PoolWorker-2: Wrote 1000 images to /tmp/madlib_LjDRu2RVLy/train_lt50000.tmp\n", |
| "PoolWorker-4: Wrote 1000 images to /tmp/madlib_f2SlPjS13H/train_lt50000.tmp\n", |
| "PoolWorker-1: Wrote 1000 images to /tmp/madlib_5TU8FybuWQ/train_lt50000.tmp\n", |
| "PoolWorker-3: Wrote 1000 images to /tmp/madlib_ksuUrx0mOn/train_lt50000.tmp\n", |
| "PoolWorker-5: Removed temporary directory /tmp/madlib_8GA0SlnXzj\n", |
| "\n", |
| "Error in PoolWorker-5 while loading images\n", |
| "Traceback (most recent call last):\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n", |
| " iloader._write_tmp_file_and_load(data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n", |
| " self._copy_into_db(f, data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n", |
| " self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n", |
| "BadCopyFileFormat: array value must start with \"{\" or dimension information (seg0 10.128.0.41:40000 pid=18042)\n", |
| "CONTEXT: COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n", |
| "\n", |
| "\n", |
| "PoolWorker-5: Can't find temporary directory... exiting.\n", |
| "PoolWorker-5: Can't find temporary directory... exiting.\n", |
| "PoolWorker-2: Removed temporary directory /tmp/madlib_LjDRu2RVLy\n", |
| "PoolWorker-1: Removed temporary directory /tmp/madlib_5TU8FybuWQ\n", |
| "\n", |
| "Error in PoolWorker-1 while loading images\n", |
| "Error in PoolWorker-2 while loading images\n", |
| "\n", |
| "Traceback (most recent call last):\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n", |
| " iloader._write_tmp_file_and_load(data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n", |
| " self._copy_into_db(f, data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n", |
| " self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n", |
| "BadCopyFileFormat: array value must start with \"{\" or dimension information (seg0 10.128.0.41:40000 pid=18054)\n", |
| "CONTEXT: COPY train_lt5, line 2, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n", |
| "\n", |
| "Traceback (most recent call last):\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n", |
| " iloader._write_tmp_file_and_load(data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n", |
| " self._copy_into_db(f, data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n", |
| " self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n", |
| "BadCopyFileFormat: array value must start with \"{\" or dimension information (seg0 10.128.0.41:40000 pid=18046)\n", |
| "CONTEXT: COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n", |
| "\n", |
| "\n", |
| "\n", |
| "PoolWorker-1: Can't find temporary directory... exiting.\n", |
| "PoolWorker-2: Can't find temporary directory... exiting.\n", |
| "PoolWorker-4: Removed temporary directory /tmp/madlib_f2SlPjS13H\n", |
| "PoolWorker-3: Removed temporary directory /tmp/madlib_ksuUrx0mOn\n", |
| "\n", |
| "Error in PoolWorker-4 while loading images\n", |
| "Error in PoolWorker-3 while loading images\n", |
| "\n", |
| "Traceback (most recent call last):\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n", |
| " iloader._write_tmp_file_and_load(data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n", |
| " self._copy_into_db(f, data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n", |
| " self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n", |
| "BadCopyFileFormat: array value must start with \"{\" or dimension information (seg0 10.128.0.41:40000 pid=18050)\n", |
| "CONTEXT: COPY train_lt5, line 2, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n", |
| "\n", |
| "Traceback (most recent call last):\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n", |
| " iloader._write_tmp_file_and_load(data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n", |
| " self._copy_into_db(f, data)\n", |
| " File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n", |
| " self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n", |
| "BadCopyFileFormat: array value must start with \"{\" or dimension information (seg0 10.128.0.41:40000 pid=18058)\n", |
| "CONTEXT: COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n", |
| "\n", |
| "\n", |
| "\n", |
| "PoolWorker-4: Can't find temporary directory... exiting.\n", |
| "PoolWorker-3: Can't find temporary directory... exiting.\n", |
| "PoolWorker-5: Can't find temporary directory... exiting.\n", |
| "PoolWorker-1: Can't find temporary directory... exiting.\n", |
| "PoolWorker-2: Can't find temporary directory... exiting.\n", |
| "PoolWorker-4: Can't find temporary directory... exiting.\n", |
| "PoolWorker-3: Can't find temporary directory... exiting.\n", |
| "PoolWorker-5: Can't find temporary directory... exiting.\n", |
| "PoolWorker-1: Can't find temporary directory... exiting.\n", |
| "PoolWorker-2: Can't find temporary directory... exiting.\n", |
| "PoolWorker-4: Can't find temporary directory... exiting.\n", |
| "PoolWorker-3: Can't find temporary directory... exiting.\n", |
| "PoolWorker-5: Can't find temporary directory... exiting.\n", |
| "PoolWorker-1: Can't find temporary directory... exiting.\n", |
| "PoolWorker-2: Can't find temporary directory... exiting.\n", |
| "PoolWorker-4: Can't find temporary directory... exiting.\n", |
| "PoolWorker-3: Can't find temporary directory... exiting.\n", |
| "5 workers terminated.\n" |
| ] |
| }, |
| { |
| "ename": "BadCopyFileFormat", |
| "evalue": "array value must start with \"{\" or dimension information (seg0 10.128.0.41:40000 pid=18042)\nCONTEXT: COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n", |
| "output_type": "error", |
| "traceback": [ |
| "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", |
| "\u001b[0;31mBadCopyFileFormat\u001b[0m Traceback (most recent call last)", |
| "\u001b[0;32m<ipython-input-10-3c25ba51b8fc>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;31m# Save images to temporary directories and load into database\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0miloader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_dataset_from_np\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_train_lt5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train_lt5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'train_lt5'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mappend\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0miloader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_dataset_from_np\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_test_lt5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_test_lt5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'test_lt5'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mappend\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0miloader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_dataset_from_np\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_train_gte5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train_gte5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'train_gte5'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mappend\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", |
| "\u001b[0;32m/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.pyc\u001b[0m in \u001b[0;36mload_dataset_from_np\u001b[0;34m(self, data_x, data_y, table_name, append, label_datatype)\u001b[0m\n\u001b[1;32m 523\u001b[0m \u001b[0;32mexcept\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mException\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 524\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mterminate_workers\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 525\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 526\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 527\u001b[0m \u001b[0mend_time\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", |
| "\u001b[0;31mBadCopyFileFormat\u001b[0m: array value must start with \"{\" or dimension information (seg0 10.128.0.41:40000 pid=18042)\nCONTEXT: COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n" |
| ] |
| } |
| ], |
| "source": [ |
| "# Drop tables\n", |
| "%sql DROP TABLE IF EXISTS train_lt5, test_lt5, train_gte5, test_gte5\n", |
| "\n", |
| "# Save images to temporary directories and load into database\n", |
| "iloader.load_dataset_from_np(x_train_lt5, y_train_lt5, 'train_lt5', append=False)\n", |
| "iloader.load_dataset_from_np(x_test_lt5, y_test_lt5, 'test_lt5', append=False)\n", |
| "iloader.load_dataset_from_np(x_train_gte5, y_train_gte5, 'train_gte5', append=False)\n", |
| "iloader.load_dataset_from_np(x_test_gte5, y_test_gte5, 'test_gte5', append=False)" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "<a id=\"image_preproc\"></a>\n", |
| "# 3. Call image preprocessor\n", |
| "\n", |
| "Transforms from one image per row to multiple images per row for batch optimization. Also normalizes and one-hot encodes.\n", |
| "\n", |
| "Training dataset < 5" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": { |
| "scrolled": true |
| }, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "DROP TABLE IF EXISTS train_lt5_packed, train_lt5_packed_summary;\n", |
| "\n", |
| "SELECT madlib.training_preprocessor_dl('train_lt5', -- Source table\n", |
| " 'train_lt5_packed', -- Output table\n", |
| " 'y', -- Dependent variable\n", |
| " 'x', -- Independent variable\n", |
| " 1000, -- Buffer size\n", |
| " 255 -- Normalizing constant\n", |
| " );\n", |
| "\n", |
| "SELECT * FROM train_lt5_packed_summary;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Test dataset < 5" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "DROP TABLE IF EXISTS test_lt5_packed, test_lt5_packed_summary;\n", |
| "\n", |
| "SELECT madlib.validation_preprocessor_dl('test_lt5', -- Source table\n", |
| " 'test_lt5_packed', -- Output table\n", |
| " 'y', -- Dependent variable\n", |
| " 'x', -- Independent variable\n", |
| " 'train_lt5_packed' -- Training preproc table\n", |
| " );\n", |
| "\n", |
| "SELECT * FROM test_lt5_packed_summary;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Training dataset >= 5" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "DROP TABLE IF EXISTS train_gte5_packed, train_gte5_packed_summary;\n", |
| "\n", |
| "SELECT madlib.training_preprocessor_dl('train_gte5', -- Source table\n", |
| " 'train_gte5_packed', -- Output table\n", |
| " 'y', -- Dependent variable\n", |
| " 'x', -- Independent variable\n", |
| " 1000, -- Buffer size\n", |
| " 255 -- Normalizing constant\n", |
| " );\n", |
| "\n", |
| "SELECT * FROM train_gte5_packed_summary;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Test dataset >= 5" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "DROP TABLE IF EXISTS test_gte5_packed, test_gte5_packed_summary;\n", |
| "\n", |
| "SELECT madlib.validation_preprocessor_dl('test_gte5', -- Source table\n", |
| " 'test_gte5_packed', -- Output table\n", |
| " 'y', -- Dependent variable\n", |
| " 'x', -- Independent variable\n", |
| " 'train_gte5_packed' -- Training preproc table\n", |
| " );\n", |
| "\n", |
| "SELECT * FROM test_gte5_packed_summary;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "<a id=\"define_and_load_model\"></a>\n", |
| "# 4. Define and load model architecture\n", |
| "\n", |
| "Model with feature and classification layers trainable" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# define two groups of layers: feature (convolutions) and classification (dense)\n", |
| "feature_layers = [\n", |
| " Conv2D(filters, kernel_size,\n", |
| " padding='valid',\n", |
| " input_shape=input_shape),\n", |
| " Activation('relu'),\n", |
| " Conv2D(filters, kernel_size),\n", |
| " Activation('relu'),\n", |
| " MaxPooling2D(pool_size=pool_size),\n", |
| " Dropout(0.25),\n", |
| " Flatten(),\n", |
| "]\n", |
| "\n", |
| "classification_layers = [\n", |
| " Dense(128),\n", |
| " Activation('relu'),\n", |
| " Dropout(0.5),\n", |
| " Dense(num_classes),\n", |
| " Activation('softmax')\n", |
| "]\n", |
| "\n", |
| "# create complete model\n", |
| "model = Sequential(feature_layers + classification_layers)\n", |
| "\n", |
| "model.summary()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Load into model architecture table using psycopg2" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "import psycopg2 as p2\n", |
| "#conn = p2.connect('postgresql://gpadmin@35.239.240.26:5432/madlib')\n", |
| "conn = p2.connect('postgresql://gpadmin@localhost:8000/madlib')\n", |
| "cur = conn.cursor()\n", |
| "\n", |
| "%sql DROP TABLE IF EXISTS model_arch_library;\n", |
| "query = \"SELECT madlib.load_keras_model('model_arch_library', %s, NULL, %s)\"\n", |
| "cur.execute(query,[model.to_json(), \"feature + classification layers trainable\"])\n", |
| "conn.commit()\n", |
| "\n", |
| "# check model loaded OK\n", |
| "%sql SELECT model_id, name FROM model_arch_library;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Model with feature layers frozen" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "# freeze feature layers\n", |
| "for l in feature_layers:\n", |
| " l.trainable = False\n", |
| "\n", |
| "model.summary()" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Load into transfer model architecture table using psycopg2" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "cur.execute(query,[model.to_json(), \"only classification layers trainable\"])\n", |
| "conn.commit()\n", |
| "\n", |
| "# check model loaded OK\n", |
| "%sql SELECT model_id, name FROM model_arch_library ORDER BY model_id;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "<a id=\"train\"></a>\n", |
| "# 5. Train\n", |
| "Train the model for 5-digit classification [0..4] " |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "DROP TABLE IF EXISTS mnist_model, mnist_model_summary;\n", |
| "\n", |
| "SELECT madlib.madlib_keras_fit('train_lt5_packed', -- source table\n", |
| " 'mnist_model', -- model output table\n", |
| " 'model_arch_library', -- model arch table\n", |
| " 1, -- model arch id\n", |
| " $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n", |
| " $$ batch_size=128, epochs=1 $$, -- fit_params\n", |
| " 5 -- num_iterations\n", |
| " );" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "View the model summary:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "SELECT * FROM mnist_model_summary;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Evaluate using test data" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "DROP TABLE IF EXISTS mnist_validate;\n", |
| "\n", |
| "SELECT madlib.madlib_keras_evaluate('mnist_model', -- model\n", |
| " 'test_lt5_packed', -- test table\n", |
| " 'mnist_validate' -- output table\n", |
| " );\n", |
| "\n", |
| "SELECT * FROM mnist_validate;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "<a id=\"transfer_learning\"></a>\n", |
| "# 6. Transfer learning" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Use UPDATE to load trained weights from previous run into the model library table:" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "UPDATE model_arch_library\n", |
| "SET model_weights = mnist_model.model_weights\n", |
| "FROM mnist_model\n", |
| "WHERE model_arch_library.model_id = 2;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Transfer: train dense layers for new classification task [5..9]" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "DROP TABLE IF EXISTS mnist_transfer_model, mnist_transfer_model_summary;\n", |
| "\n", |
| "SELECT madlib.madlib_keras_fit('train_gte5_packed', -- source table\n", |
| " 'mnist_transfer_model',-- model output table\n", |
| " 'model_arch_library', -- model arch table\n", |
| " 2, -- model arch id\n", |
| " $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$, -- compile_params\n", |
| " $$ batch_size=128, epochs=1 $$, -- fit_params\n", |
| " 5 -- num_iterations\n", |
| " );" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "View the model summary" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "SELECT * FROM mnist_transfer_model_summary;" |
| ] |
| }, |
| { |
| "cell_type": "markdown", |
| "metadata": {}, |
| "source": [ |
| "Evaluate using test data" |
| ] |
| }, |
| { |
| "cell_type": "code", |
| "execution_count": null, |
| "metadata": {}, |
| "outputs": [], |
| "source": [ |
| "%%sql\n", |
| "DROP TABLE IF EXISTS mnist_transfer_validate;\n", |
| "\n", |
| "SELECT madlib.madlib_keras_evaluate('mnist_transfer_model', -- model\n", |
| " 'test_gte5_packed', -- test table\n", |
| " 'mnist_transfer_validate' -- output table\n", |
| " );\n", |
| "\n", |
| "SELECT * FROM mnist_transfer_validate;" |
| ] |
| } |
| ], |
| "metadata": { |
| "kernelspec": { |
| "display_name": "Python 2", |
| "language": "python", |
| "name": "python2" |
| }, |
| "language_info": { |
| "codemirror_mode": { |
| "name": "ipython", |
| "version": 2 |
| }, |
| "file_extension": ".py", |
| "mimetype": "text/x-python", |
| "name": "python", |
| "nbconvert_exporter": "python", |
| "pygments_lexer": "ipython2", |
| "version": "2.7.16" |
| } |
| }, |
| "nbformat": 4, |
| "nbformat_minor": 1 |
| } |