community-artifacts/Deep-learning/Train-single-model/MADlib-Keras-transfer-learning-v3.ipynb - madlib-site - Git at Google

 {
  "cells": [
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "# Transfer Learning Using Keras and MADlib\n",
     "\n",
     "This is a transfer learning example based on https://keras.io/examples/mnist_transfer_cnn/ \n",
     "\n",
     "To load images into tables we use the script called <em>madlib_image_loader.py</em> located at https://github.com/apache/madlib-site/tree/asf-site/community-artifacts/Deep-learning which uses the Python Imaging Library so supports multiple formats http://www.pythonware.com/products/pil/\n",
     "\n",
     "## Table of contents\n",
     "<a href=\"#import_libraries\">1. Import libraries</a>\n",
     "\n",
     "<a href=\"#load_and_prepare_data\">2. Load and prepare data</a>\n",
     "\n",
     "<a href=\"#image_preproc\">3. Call image preprocessor</a>\n",
     "\n",
     "<a href=\"#define_and_load_model\">4. Define and load model architecture</a>\n",
     "\n",
     "<a href=\"#train\">5. Train</a>\n",
     "\n",
     "<a href=\"#transfer_learning\">6. Transfer learning</a>"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
    "metadata": {
     "scrolled": true
    },
    "outputs": [],
    "source": [
     "%load_ext sql"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
     "# Greenplum Database 5.x on GCP - via tunnel\n",
     "%sql postgresql://gpadmin@localhost:8000/madlib\n",
     "        \n",
     "# PostgreSQL local\n",
     "#%sql postgresql://fmcquillan@localhost:5432/madlib"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "1 rows affected.\n"
      ]
     },
     {
      "data": {
       "text/html": [
        "<table>\n",
        "    <tr>\n",
        "        <th>version</th>\n",
        "    </tr>\n",
        "    <tr>\n",
        "        <td>MADlib version: 1.18.0-dev, git revision: rel/v1.17.0-89-g14a91ce, cmake configuration time: Fri Mar  5 23:08:38 UTC 2021, build type: release, build system: Linux-3.10.0-1160.11.1.el7.x86_64, C compiler: gcc 4.8.5, C++ compiler: g++ 4.8.5</td>\n",
        "    </tr>\n",
        "</table>"
       ],
       "text/plain": [
        "[(u'MADlib version: 1.18.0-dev, git revision: rel/v1.17.0-89-g14a91ce, cmake configuration time: Fri Mar  5 23:08:38 UTC 2021, build type: release, build system: Linux-3.10.0-1160.11.1.el7.x86_64, C compiler: gcc 4.8.5, C++ compiler: g++ 4.8.5',)]"
       ]
      },
      "execution_count": 3,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
     "%sql select madlib.version();\n",
     "#%sql select version();"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "<a id=\"import_libraries\"></a>\n",
     "# 1.  Import libraries\n",
     "From https://keras.io/examples/mnist_transfer_cnn/ import libraries and define some params"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 4,
    "metadata": {},
    "outputs": [],
    "source": [
     "from __future__ import print_function\n",
     "\n",
     "import datetime\n",
     "from tensorflow import keras\n",
     "from tensorflow.keras.datasets import mnist\n",
     "from tensorflow.keras.models import Sequential\n",
     "from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten\n",
     "from tensorflow.keras.layers import Conv2D, MaxPooling2D\n",
     "from tensorflow.keras import backend as K\n",
     "\n",
     "now = datetime.datetime.now\n",
     "\n",
     "batch_size = 128\n",
     "num_classes = 5\n",
     "epochs = 5\n",
     "\n",
     "# input image dimensions\n",
     "img_rows, img_cols = 28, 28\n",
     "# number of convolutional filters to use\n",
     "filters = 32\n",
     "# size of pooling area for max pooling\n",
     "pool_size = 2\n",
     "# convolution kernel size\n",
     "kernel_size = 3\n",
     "\n",
     "if K.image_data_format() == 'channels_first':\n",
     "    input_shape = (1, img_rows, img_cols)\n",
     "else:\n",
     "    input_shape = (img_rows, img_cols, 1)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Others needed in this workbook"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 5,
    "metadata": {},
    "outputs": [],
    "source": [
     "import pandas as pd\n",
     "import numpy as np"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "<a id=\"load_and_prepare_data\"></a>\n",
     "# 2.  Load and prepare data\n",
     "\n",
     "First load MNIST data from Keras, consisting of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "(4861, 28, 28)\n",
       "(4861, 28, 28, 1)\n"
      ]
     }
    ],
    "source": [
     "# the data, split between train and test sets\n",
     "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
     "\n",
     "# create two datasets one with digits below 5 and one with 5 and above\n",
     "x_train_lt5 = x_train[y_train < 5]\n",
     "y_train_lt5 = y_train[y_train < 5]\n",
     "x_test_lt5 = x_test[y_test < 5]\n",
     "y_test_lt5 = y_test[y_test < 5]\n",
     "\n",
     "x_train_gte5 = x_train[y_train >= 5]\n",
     "y_train_gte5 = y_train[y_train >= 5] - 5\n",
     "x_test_gte5 = x_test[y_test >= 5]\n",
     "y_test_gte5 = y_test[y_test >= 5] - 5\n",
     "\n",
     "# reshape to match model architecture\n",
     "print(x_test_gte5.shape)\n",
     "x_train_lt5=x_train_lt5.reshape(len(x_train_lt5), *input_shape)\n",
     "x_test_lt5 = x_test_lt5.reshape(len(x_test_lt5), *input_shape)\n",
     "x_train_gte5=x_train_gte5.reshape(len(x_train_gte5), *input_shape)\n",
     "x_test_gte5 = x_test_gte5.reshape(len(x_test_gte5), *input_shape)\n",
     "print(x_test_gte5.shape)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Load datasets into tables using image loader scripts called <em>madlib_image_loader.py</em> located at https://github.com/apache/madlib-site/tree/asf-site/community-artifacts/Deep-learning"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
     "# MADlib tools directory\n",
     "import sys\n",
     "import os\n",
     "madlib_site_dir = '/Users/fmcquillan/Documents/Product/MADlib/Demos/data'\n",
     "sys.path.append(madlib_site_dir)\n",
     "\n",
     "# Import image loader module\n",
     "from madlib_image_loader import ImageLoader, DbCredentials"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 8,
    "metadata": {},
    "outputs": [],
    "source": [
     "# Specify database credentials, for connecting to db\n",
     "#db_creds = DbCredentials(user='gpadmin',\n",
     "#                         host='35.239.240.26',\n",
     "#                         port='5432',\n",
     "#                         password='')\n",
     "\n",
     "db_creds = DbCredentials(user='gpadmin',\n",
     "                         host='localhost',\n",
     "                         port='8000',\n",
     "                         password='')"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
     "# Initialize ImageLoader (increase num_workers to run faster)\n",
     "iloader = ImageLoader(num_workers=5, db_creds=db_creds)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "Done.\n"
      ]
     },
     {
      "data": {
       "text/plain": [
        "[]"
       ]
      },
      "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
       "MainProcess: Connected to madlib db.\n",
       "Executing: CREATE TABLE train_lt5 (id SERIAL, x REAL[], y TEXT[])\n",
       "CREATE TABLE\n",
       "Created table train_lt5 in madlib db\n",
       "Spawning 5 workers...\n",
       "Initializing PoolWorker-1 [pid 84275]\n",
       "PoolWorker-1: Created temporary directory /tmp/madlib_5TU8FybuWQ\n",
       "Initializing PoolWorker-2 [pid 84276]\n",
       "PoolWorker-2: Created temporary directory /tmp/madlib_LjDRu2RVLy\n",
       "Initializing PoolWorker-3 [pid 84277]\n",
       "PoolWorker-3: Created temporary directory /tmp/madlib_ksuUrx0mOn\n",
       "Initializing PoolWorker-4 [pid 84278]\n",
       "PoolWorker-4: Created temporary directory /tmp/madlib_f2SlPjS13H\n",
       "PoolWorker-5: Created temporary directory /tmp/madlib_8GA0SlnXzj\n",
       "Initializing PoolWorker-5 [pid 84279]\n",
       "PoolWorker-4: Connected to madlib db.\n",
       "PoolWorker-5: Connected to madlib db.\n",
       "PoolWorker-2: Connected to madlib db.\n",
       "PoolWorker-1: Connected to madlib db.\n",
       "PoolWorker-3: Connected to madlib db.\n",
       "PoolWorker-5: Wrote 1000 images to /tmp/madlib_8GA0SlnXzj/train_lt50000.tmp\n",
       "PoolWorker-2: Wrote 1000 images to /tmp/madlib_LjDRu2RVLy/train_lt50000.tmp\n",
       "PoolWorker-4: Wrote 1000 images to /tmp/madlib_f2SlPjS13H/train_lt50000.tmp\n",
       "PoolWorker-1: Wrote 1000 images to /tmp/madlib_5TU8FybuWQ/train_lt50000.tmp\n",
       "PoolWorker-3: Wrote 1000 images to /tmp/madlib_ksuUrx0mOn/train_lt50000.tmp\n",
       "PoolWorker-5: Removed temporary directory /tmp/madlib_8GA0SlnXzj\n",
       "\n",
       "Error in PoolWorker-5 while loading images\n",
       "Traceback (most recent call last):\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n",
       "    iloader._write_tmp_file_and_load(data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n",
       "    self._copy_into_db(f, data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n",
       "    self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n",
       "BadCopyFileFormat: array value must start with \"{\" or dimension information  (seg0 10.128.0.41:40000 pid=18042)\n",
       "CONTEXT:  COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n",
       "\n",
       "\n",
       "PoolWorker-5: Can't find temporary directory... exiting.\n",
       "PoolWorker-5: Can't find temporary directory... exiting.\n",
       "PoolWorker-2: Removed temporary directory /tmp/madlib_LjDRu2RVLy\n",
       "PoolWorker-1: Removed temporary directory /tmp/madlib_5TU8FybuWQ\n",
       "\n",
       "Error in PoolWorker-1 while loading images\n",
       "Error in PoolWorker-2 while loading images\n",
       "\n",
       "Traceback (most recent call last):\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n",
       "    iloader._write_tmp_file_and_load(data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n",
       "    self._copy_into_db(f, data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n",
       "    self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n",
       "BadCopyFileFormat: array value must start with \"{\" or dimension information  (seg0 10.128.0.41:40000 pid=18054)\n",
       "CONTEXT:  COPY train_lt5, line 2, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n",
       "\n",
       "Traceback (most recent call last):\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n",
       "    iloader._write_tmp_file_and_load(data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n",
       "    self._copy_into_db(f, data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n",
       "    self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n",
       "BadCopyFileFormat: array value must start with \"{\" or dimension information  (seg0 10.128.0.41:40000 pid=18046)\n",
       "CONTEXT:  COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n",
       "\n",
       "\n",
       "\n",
       "PoolWorker-1: Can't find temporary directory... exiting.\n",
       "PoolWorker-2: Can't find temporary directory... exiting.\n",
       "PoolWorker-4: Removed temporary directory /tmp/madlib_f2SlPjS13H\n",
       "PoolWorker-3: Removed temporary directory /tmp/madlib_ksuUrx0mOn\n",
       "\n",
       "Error in PoolWorker-4 while loading images\n",
       "Error in PoolWorker-3 while loading images\n",
       "\n",
       "Traceback (most recent call last):\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n",
       "    iloader._write_tmp_file_and_load(data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n",
       "    self._copy_into_db(f, data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n",
       "    self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n",
       "BadCopyFileFormat: array value must start with \"{\" or dimension information  (seg0 10.128.0.41:40000 pid=18050)\n",
       "CONTEXT:  COPY train_lt5, line 2, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n",
       "\n",
       "Traceback (most recent call last):\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 184, in _call_np_worker\n",
       "    iloader._write_tmp_file_and_load(data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 396, in _write_tmp_file_and_load\n",
       "    self._copy_into_db(f, data)\n",
       "  File \"/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.py\", line 362, in _copy_into_db\n",
       "    self.db_cur.copy_from(f, table_name, sep='|', columns=['x','y'])\n",
       "BadCopyFileFormat: array value must start with \"{\" or dimension information  (seg0 10.128.0.41:40000 pid=18058)\n",
       "CONTEXT:  COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n",
       "\n",
       "\n",
       "\n",
       "PoolWorker-4: Can't find temporary directory... exiting.\n",
       "PoolWorker-3: Can't find temporary directory... exiting.\n",
       "PoolWorker-5: Can't find temporary directory... exiting.\n",
       "PoolWorker-1: Can't find temporary directory... exiting.\n",
       "PoolWorker-2: Can't find temporary directory... exiting.\n",
       "PoolWorker-4: Can't find temporary directory... exiting.\n",
       "PoolWorker-3: Can't find temporary directory... exiting.\n",
       "PoolWorker-5: Can't find temporary directory... exiting.\n",
       "PoolWorker-1: Can't find temporary directory... exiting.\n",
       "PoolWorker-2: Can't find temporary directory... exiting.\n",
       "PoolWorker-4: Can't find temporary directory... exiting.\n",
       "PoolWorker-3: Can't find temporary directory... exiting.\n",
       "PoolWorker-5: Can't find temporary directory... exiting.\n",
       "PoolWorker-1: Can't find temporary directory... exiting.\n",
       "PoolWorker-2: Can't find temporary directory... exiting.\n",
       "PoolWorker-4: Can't find temporary directory... exiting.\n",
       "PoolWorker-3: Can't find temporary directory... exiting.\n",
       "5 workers terminated.\n"
      ]
     },
     {
      "ename": "BadCopyFileFormat",
      "evalue": "array value must start with \"{\" or dimension information  (seg0 10.128.0.41:40000 pid=18042)\nCONTEXT:  COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n",
      "output_type": "error",
      "traceback": [
       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
       "\u001b[0;31mBadCopyFileFormat\u001b[0m                         Traceback (most recent call last)",
       "\u001b[0;32m<ipython-input-10-3c25ba51b8fc>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;31m# Save images to temporary directories and load into database\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0miloader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_dataset_from_np\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_train_lt5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train_lt5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'train_lt5'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mappend\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      6\u001b[0m \u001b[0miloader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_dataset_from_np\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_test_lt5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_test_lt5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'test_lt5'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mappend\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      7\u001b[0m \u001b[0miloader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_dataset_from_np\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_train_gte5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train_gte5\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'train_gte5'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mappend\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
       "\u001b[0;32m/Users/fmcquillan/Documents/Product/MADlib/Demos/data/madlib_image_loader.pyc\u001b[0m in \u001b[0;36mload_dataset_from_np\u001b[0;34m(self, data_x, data_y, table_name, append, label_datatype)\u001b[0m\n\u001b[1;32m    523\u001b[0m         \u001b[0;32mexcept\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mException\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    524\u001b[0m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mterminate_workers\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 525\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    526\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    527\u001b[0m         \u001b[0mend_time\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
       "\u001b[0;31mBadCopyFileFormat\u001b[0m: array value must start with \"{\" or dimension information  (seg0 10.128.0.41:40000 pid=18042)\nCONTEXT:  COPY train_lt5, line 1, column {{{0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}, {0}...\n"
      ]
     }
    ],
    "source": [
     "# Drop tables\n",
     "%sql DROP TABLE IF EXISTS train_lt5, test_lt5, train_gte5, test_gte5\n",
     "\n",
     "# Save images to temporary directories and load into database\n",
     "iloader.load_dataset_from_np(x_train_lt5, y_train_lt5, 'train_lt5', append=False)\n",
     "iloader.load_dataset_from_np(x_test_lt5, y_test_lt5, 'test_lt5', append=False)\n",
     "iloader.load_dataset_from_np(x_train_gte5, y_train_gte5, 'train_gte5', append=False)\n",
     "iloader.load_dataset_from_np(x_test_gte5, y_test_gte5, 'test_gte5', append=False)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "<a id=\"image_preproc\"></a>\n",
     "# 3. Call image preprocessor\n",
     "\n",
     "Transforms from one image per row to multiple images per row for batch optimization.  Also normalizes and one-hot encodes.\n",
     "\n",
     "Training dataset < 5"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
     "scrolled": true
    },
    "outputs": [],
    "source": [
     "%%sql\n",
     "DROP TABLE IF EXISTS train_lt5_packed, train_lt5_packed_summary;\n",
     "\n",
     "SELECT madlib.training_preprocessor_dl('train_lt5',               -- Source table\n",
     "                                       'train_lt5_packed',        -- Output table\n",
     "                                       'y',                       -- Dependent variable\n",
     "                                       'x',                       -- Independent variable\n",
     "                                        1000,                     -- Buffer size\n",
     "                                        255                       -- Normalizing constant\n",
     "                                        );\n",
     "\n",
     "SELECT * FROM train_lt5_packed_summary;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Test dataset < 5"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "DROP TABLE IF EXISTS test_lt5_packed, test_lt5_packed_summary;\n",
     "\n",
     "SELECT madlib.validation_preprocessor_dl('test_lt5',                -- Source table\n",
     "                                         'test_lt5_packed',         -- Output table\n",
     "                                         'y',                       -- Dependent variable\n",
     "                                         'x',                       -- Independent variable\n",
     "                                         'train_lt5_packed'         -- Training preproc table\n",
     "                                        );\n",
     "\n",
     "SELECT * FROM test_lt5_packed_summary;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Training dataset >= 5"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "DROP TABLE IF EXISTS train_gte5_packed, train_gte5_packed_summary;\n",
     "\n",
     "SELECT madlib.training_preprocessor_dl('train_gte5',              -- Source table\n",
     "                                       'train_gte5_packed',       -- Output table\n",
     "                                       'y',                       -- Dependent variable\n",
     "                                       'x',                       -- Independent variable\n",
     "                                        1000,                     -- Buffer size\n",
     "                                        255                       -- Normalizing constant\n",
     "                                        );\n",
     "\n",
     "SELECT * FROM train_gte5_packed_summary;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Test dataset >= 5"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "DROP TABLE IF EXISTS test_gte5_packed, test_gte5_packed_summary;\n",
     "\n",
     "SELECT madlib.validation_preprocessor_dl('test_gte5',             -- Source table\n",
     "                                         'test_gte5_packed',      -- Output table\n",
     "                                         'y',                     -- Dependent variable\n",
     "                                         'x',                     -- Independent variable\n",
     "                                         'train_gte5_packed'      -- Training preproc table\n",
     "                                        );\n",
     "\n",
     "SELECT * FROM test_gte5_packed_summary;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "<a id=\"define_and_load_model\"></a>\n",
     "# 4. Define and load model architecture\n",
     "\n",
     "Model with feature and classification layers trainable"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "# define two groups of layers: feature (convolutions) and classification (dense)\n",
     "feature_layers = [\n",
     "    Conv2D(filters, kernel_size,\n",
     "           padding='valid',\n",
     "           input_shape=input_shape),\n",
     "    Activation('relu'),\n",
     "    Conv2D(filters, kernel_size),\n",
     "    Activation('relu'),\n",
     "    MaxPooling2D(pool_size=pool_size),\n",
     "    Dropout(0.25),\n",
     "    Flatten(),\n",
     "]\n",
     "\n",
     "classification_layers = [\n",
     "    Dense(128),\n",
     "    Activation('relu'),\n",
     "    Dropout(0.5),\n",
     "    Dense(num_classes),\n",
     "    Activation('softmax')\n",
     "]\n",
     "\n",
     "# create complete model\n",
     "model = Sequential(feature_layers + classification_layers)\n",
     "\n",
     "model.summary()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Load into model architecture table using psycopg2"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "import psycopg2 as p2\n",
     "#conn = p2.connect('postgresql://gpadmin@35.239.240.26:5432/madlib')\n",
     "conn = p2.connect('postgresql://gpadmin@localhost:8000/madlib')\n",
     "cur = conn.cursor()\n",
     "\n",
     "%sql DROP TABLE IF EXISTS model_arch_library;\n",
     "query = \"SELECT madlib.load_keras_model('model_arch_library', %s, NULL, %s)\"\n",
     "cur.execute(query,[model.to_json(), \"feature + classification layers trainable\"])\n",
     "conn.commit()\n",
     "\n",
     "# check model loaded OK\n",
     "%sql SELECT model_id, name FROM model_arch_library;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Model with feature layers frozen"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "# freeze feature layers\n",
     "for l in feature_layers:\n",
     "    l.trainable = False\n",
     "\n",
     "model.summary()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Load into transfer model architecture table using psycopg2"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "cur.execute(query,[model.to_json(), \"only classification layers trainable\"])\n",
     "conn.commit()\n",
     "\n",
     "# check model loaded OK\n",
     "%sql SELECT model_id, name FROM model_arch_library ORDER BY model_id;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "<a id=\"train\"></a>\n",
     "# 5.  Train\n",
     "Train the model for 5-digit classification [0..4]  "
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "DROP TABLE IF EXISTS mnist_model, mnist_model_summary;\n",
     "\n",
     "SELECT madlib.madlib_keras_fit('train_lt5_packed',    -- source table\n",
     "                               'mnist_model',         -- model output table\n",
     "                               'model_arch_library',  -- model arch table\n",
     "                                1,                    -- model arch id\n",
     "                                $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params\n",
     "                                $$ batch_size=128, epochs=1 $$,  -- fit_params\n",
     "                                5                     -- num_iterations\n",
     "                              );"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "View the model summary:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "SELECT * FROM mnist_model_summary;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Evaluate using test data"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "DROP TABLE IF EXISTS mnist_validate;\n",
     "\n",
     "SELECT madlib.madlib_keras_evaluate('mnist_model',      -- model\n",
     "                                   'test_lt5_packed',   -- test table\n",
     "                                   'mnist_validate'     -- output table\n",
     "                                   );\n",
     "\n",
     "SELECT * FROM mnist_validate;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "<a id=\"transfer_learning\"></a>\n",
     "# 6. Transfer learning"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Use UPDATE to load trained weights from previous run into the model library table:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "UPDATE model_arch_library\n",
     "SET model_weights = mnist_model.model_weights\n",
     "FROM mnist_model\n",
     "WHERE model_arch_library.model_id = 2;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Transfer: train dense layers for new classification task [5..9]"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "DROP TABLE IF EXISTS mnist_transfer_model, mnist_transfer_model_summary;\n",
     "\n",
     "SELECT madlib.madlib_keras_fit('train_gte5_packed',   -- source table\n",
     "                               'mnist_transfer_model',-- model output table\n",
     "                               'model_arch_library',  -- model arch table\n",
     "                                2,                    -- model arch id\n",
     "                                $$ loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy']$$,  -- compile_params\n",
     "                                $$ batch_size=128, epochs=1 $$,  -- fit_params\n",
     "                                5                     -- num_iterations\n",
     "                              );"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "View the model summary"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "SELECT * FROM mnist_transfer_model_summary;"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Evaluate using test data"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "%%sql\n",
     "DROP TABLE IF EXISTS mnist_transfer_validate;\n",
     "\n",
     "SELECT madlib.madlib_keras_evaluate('mnist_transfer_model',      -- model\n",
     "                                   'test_gte5_packed',           -- test table\n",
     "                                   'mnist_transfer_validate'     -- output table\n",
     "                                   );\n",
     "\n",
     "SELECT * FROM mnist_transfer_validate;"
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
    "display_name": "Python 2",
    "language": "python",
    "name": "python2"
   },
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
     "version": 2
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython2",
    "version": "2.7.16"
   }
  },
  "nbformat": 4,
  "nbformat_minor": 1
 }