blob: 71d87d8eb408eeffd12a848b48b768e02f3f5d5c [file] [log] [blame]
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Image Classification using Caffe VGG-19 model\n",
"\n",
"This notebook demonstrates importing VGG-19 model from Caffe to SystemML and use that model to do an image classification. VGG-19 model has been trained using ImageNet dataset (1000 classes with ~ 14M images). If an image to be predicted is in one of the class VGG-19 has trained on then accuracy will be higher.\n",
"We expect prediction of any image through SystemML using VGG-19 model will be similar to that of image predicted through Caffe using VGG-19 model directly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Prerequisite:\n",
"1. SystemML Python Package\n",
"To run this notebook you need to install systeml 1.0 (Master Branch code as of 07/26/2017 or later) python package.\n",
"2. Caffe \n",
"If you want to verify results through Caffe, then you need to have Caffe python package or Caffe installed.\n",
"For this verification I have installed Caffe on local system instead of Caffe python package."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### SystemML Python Package information"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip show systemml"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### SystemML Build information\n",
"Following code will show SystemML information which is installed in the environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"from systemml import MLContext\n",
"ml = MLContext(sc)\n",
"print (\"SystemML Built-Time:\"+ ml.buildTime())\n",
"print(ml.info())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"scrolled": true
},
"outputs": [],
"source": [
"# Workaround for Python 2.7.13 to avoid certificate validation issue while downloading any file.\n",
"\n",
"import ssl\n",
"\n",
"try:\n",
" _create_unverified_https_context = ssl._create_unverified_context\n",
"except AttributeError:\n",
" # Legacy Python that doesn't verify HTTPS certificates by default\n",
" pass\n",
"else:\n",
" # Handle target environment that doesn't support HTTPS verification\n",
" ssl._create_default_https_context = _create_unverified_https_context"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Download model, proto files and convert them to SystemML format.\n",
"\n",
"1. Download Caffe Model (VGG-19), proto files (deployer, network and solver) and label file.\n",
"2. Convert the Caffe model into SystemML input format.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Download caffemodel and proto files \n",
"\n",
"\n",
"def downloadAndConvertModel(downloadDir='.', trained_vgg_weights='trained_vgg_weights'):\n",
" \n",
" # Step 1: Download the VGG-19 model and other files.\n",
" import errno\n",
" import os\n",
" import urllib\n",
"\n",
" # Create directory, if exists don't error out\n",
" try:\n",
" os.makedirs(os.path.join(downloadDir,trained_vgg_weights))\n",
" except OSError as exc: # Python >2.5\n",
" if exc.errno == errno.EEXIST and os.path.isdir(trained_vgg_weights):\n",
" pass\n",
" else:\n",
" raise\n",
" \n",
" # Download deployer, network, solver proto and label files.\n",
" urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_deploy.proto', os.path.join(downloadDir,'VGG_ILSVRC_19_layers_deploy.proto'))\n",
" urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_network.proto',os.path.join(downloadDir,'VGG_ILSVRC_19_layers_network.proto'))\n",
" urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/vgg19/VGG_ILSVRC_19_layers_solver.proto',os.path.join(downloadDir,'VGG_ILSVRC_19_layers_solver.proto'))\n",
"\n",
" # Get labels for data\n",
" urllib.urlretrieve('https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/labels.txt', os.path.join(downloadDir, trained_vgg_weights, 'labels.txt'))\n",
"\n",
" # Following instruction download model of size 500MG file, so based on your network it may take time to download file.\n",
" urllib.urlretrieve('http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_19_layers.caffemodel', os.path.join(downloadDir,'VGG_ILSVRC_19_layers.caffemodel'))\n",
"\n",
" # Step 2: Convert the caffemodel to trained_vgg_weights directory\n",
" import systemml as sml\n",
" sml.convert_caffemodel(sc, os.path.join(downloadDir,'VGG_ILSVRC_19_layers_deploy.proto'), os.path.join(downloadDir,'VGG_ILSVRC_19_layers.caffemodel'), os.path.join(downloadDir,trained_vgg_weights))\n",
" \n",
" return"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### PrintTopK\n",
"This function will print top K probabilities and indices from the result."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Print top K indices and probability\n",
"\n",
"def printTopK(prob, label, k):\n",
" print(label, 'Top ', k, ' Index : ', np.argsort(-prob)[0, :k])\n",
" print(label, 'Top ', k, ' Probability : ', prob[0,np.argsort(-prob)[0, :k]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Classify image using Caffe\n",
"Prerequisite: You need to have Caffe installed on a system to run this code. (or have Caffe Python package installed)\n",
"\n",
"This will classify image using Caffe code directly. \n",
"This can be used to verify classification through SystemML if matches with that through Caffe directly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"def getCaffeLabel(url, printTopKData, topK, size=(224,224), modelDir='trained_vgg_weights'):\n",
" import caffe\n",
"\n",
"\n",
" urllib.urlretrieve(url, 'test.jpg')\n",
" image = caffe.io.resize_image(caffe.io.load_image('test.jpg'), size)\n",
"\n",
" image = [(image * 255).astype(np.float)]\n",
"\n",
" deploy_file = 'VGG_ILSVRC_19_layers_deploy.proto'\n",
" caffemodel_file = 'VGG_ILSVRC_19_layers.caffemodel'\n",
"\n",
" net = caffe.Classifier(deploy_file, caffemodel_file)\n",
" caffe_prob = net.predict(image)\n",
" caffe_prediction = caffe_prob.argmax(axis=1)\n",
" \n",
" if(printTopKData):\n",
" printTopK(caffe_prob, 'Caffe', topK)\n",
"\n",
" import pandas as pd\n",
" labels = pd.read_csv(os.path.join(modelDir,'labels.txt'), names=['index', 'label'])\n",
" caffe_prediction_labels = [ labels[labels.index == x][['label']].values[0][0] for x in caffe_prediction ]\n",
" \n",
" return net, caffe_prediction_labels\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Classify images\n",
"\n",
"This function classify images from images specified through urls.\n",
"\n",
"###### Input Parameters: \n",
" urls: List of urls\n",
" printTokKData (default False): Whether to print top K indices and probabilities\n",
" topK: Top K elements to be displayed.\n",
" caffeInstalled (default False): If Caffe has been installed. If installed, then it will classify image (with top K probability and indices) based on printTopKData. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import urllib\n",
"from systemml.mllearn import Caffe2DML\n",
"import systemml as sml\n",
"\n",
"# Setting other than current directory causes \"network file not found\" issue, as network file\n",
"# location is defined in solver file which does not have a path, so it searches in current dir.\n",
"downloadDir = '.' # /home/asurve/caffe_models' \n",
"trained_vgg_weights = 'trained_vgg_weights'\n",
"\n",
"img_shape = (3, 224, 224)\n",
"size = (img_shape[1], img_shape[2])\n",
"\n",
"\n",
"def classifyImages(urls,printTokKData=False, topK=5, caffeInstalled=False):\n",
"\n",
" downloadAndConvertModel(downloadDir, trained_vgg_weights)\n",
" \n",
" vgg = Caffe2DML(sqlCtx, solver=os.path.join(downloadDir,'VGG_ILSVRC_19_layers_solver.proto'), input_shape=img_shape)\n",
" vgg.load(trained_vgg_weights)\n",
"\n",
" for url in urls:\n",
" outFile = 'inputTest.jpg'\n",
" urllib.urlretrieve(url, outFile)\n",
" \n",
" from IPython.display import Image, display\n",
" display(Image(filename=outFile))\n",
" \n",
" print (\"Prediction of above image to ImageNet Class using\");\n",
"\n",
" ## Do image classification through SystemML processing\n",
" from PIL import Image\n",
" input_image = sml.convertImageToNumPyArr(Image.open(outFile), img_shape=img_shape\n",
" , color_mode='BGR', mean=sml.getDatasetMean('VGG_ILSVRC_19_2014'))\n",
" print (\"Image preprocessed through SystemML :: \", vgg.predict(input_image)[0])\n",
" if(printTopKData == True):\n",
" sysml_proba = vgg.predict_proba(input_image)\n",
" printTopK(sysml_proba, 'SystemML BGR', topK)\n",
" \n",
" if(caffeInstalled == True):\n",
" net, caffeLabel = getCaffeLabel(url, printTopKData, topK, size, os.path.join(downloadDir, trained_vgg_weights))\n",
" print (\"Image classification through Caffe :: \", caffeLabel[0])\n",
"\n",
" print (\"Caffe input data through SystemML :: \", vgg.predict(np.matrix(net.blobs['data'].data.flatten()))[0])\n",
" \n",
" if(printTopKData == True):\n",
" sysml_proba = vgg.predict_proba(np.matrix(net.blobs['data'].data.flatten()))\n",
" printTopK(sysml_proba, 'With Caffe input data', topK)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample API call to classify image\n",
"\n",
"There are couple of parameters to set based on what you are looking for.\n",
"1. printTopKData (default False): If this parameter gets set to True, then top K results (probabilities and indices) will be displayed. \n",
"2. topK (default 5): How many entities (K) to be displayed.\n",
"3. caffeInstalled (default False): If Caffe has installed. If not installed then verification through Caffe won't be done."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"printTopKData=False\n",
"topK=5\n",
"caffeInstalled=False\n",
"\n",
"\n",
"\n",
"urls = ['https://upload.wikimedia.org/wikipedia/commons/thumb/5/58/MountainLion.jpg/312px-MountainLion.jpg', 'https://s-media-cache-ak0.pinimg.com/originals/f2/56/59/f2565989f455984f206411089d6b1b82.jpg', 'http://i2.cdn.cnn.com/cnnnext/dam/assets/161207140243-vanishing-elephant-closeup-exlarge-169.jpg', 'http://wallpaper-gallery.net/images/pictures-of-lilies/pictures-of-lilies-7.jpg', 'https://cdn.pixabay.com/photo/2012/01/07/21/56/sunflower-11574_960_720.jpg', 'https://image.shutterstock.com/z/stock-photo-bird-nest-on-tree-branch-with-five-blue-eggs-inside-108094613.jpg', 'https://i.ytimg.com/vi/6jQDbIv0tDI/maxresdefault.jpg','https://cdn.pixabay.com/photo/2016/11/01/23/53/cat-1790093_1280.jpg']\n",
"\n",
"\n",
"classifyImages(urls,printTopKData, topK, caffeInstalled)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}