blob: a306913a225e5b01b1399c180be1b7aac92492aa [file] [log] [blame]
---
layout: post
title: Detecting objects with Groovy, the Deep Java Library (DJL), and Apache MXNet
date: '2022-08-01T00:00:00+00:00'
categories: groovy
---
<p>This blog posts looks at using <a href="https://groovy-lang.org/" target="_blank">Apache Groovy</a> with the <a href="https://djl.ai/" target="_blank">Deep Java Library (DJL)</a> and backed by the <a href="https://mxnet.incubator.apache.org/" target="_blank">Apache MXNet</a> engine to detect objects within an image. (Apache MXNet is an <a href="https://incubator.apache.org/" target="_blank">incubating project</a> at <a href="https://www.apache.org/" target="_blank">the ASF</a>.)</p>
<h3>Deep Learning</h3>
<p>Deep learning falls under the branches of <a href="https://en.wikipedia.org/wiki/Machine_learning" target="_blank">machine learning</a> and <a href="https://en.wikipedia.org/wiki/Artificial_intelligence" target="_blank">artificial intelligence</a>. It involves multiple layers (hence the "deep") of an <a href="https://en.wikipedia.org/wiki/Artificial_neural_network" target="_blank">artificial neural network</a>. There are lots of ways to configure such networks and the details are beyond the scope of this blog post, but we can give some basic details. We will have four input nodes corresponding to the measurements of our four characteristics. We will have three output nodes corresponding to each possible <i>class </i>(<i>species</i>). We will also have one or more additional layers in between.</p>
<p><a href="https://blogs.apache.org/groovy/mediaresource/2206eccb-0c50-4091-a030-e3057517d810"><img src="https://blogs.apache.org/groovy/mediaresource/2206eccb-0c50-4091-a030-e3057517d810" style="width: 50%;" alt="deep_network.png"></a></p>
<p>Each node in this network mimics to some degree a neuron in the human brain. Again, we'll simplify the details. Each node has multiple inputs, which are given a particular weight, as well as an activation function which will determine whether our node "fires". Training the model is a process which works out what the best weights should be.</p>
<p><a href="https://blogs.apache.org/groovy/mediaresource/b5d32431-a273-481d-b0b5-169b0665b385"><img src="https://blogs.apache.org/groovy/mediaresource/b5d32431-a273-481d-b0b5-169b0665b385" style="width: 50%;" alt="deep_node.png"></a></p>
<h3>Deep Java Library (DJL) &amp; Apache MXNet</h3>
<p>Rather than writing your own neural networks, libraries such as <a href="https://djl.ai/" target="_blank">DJL</a> provide high-level abstractions which automate to some degree the creation of the necessary neural network layers. DJL is engine agnostic, so it's capable of supporting different backends including Apache MXNet, PyTorch, TensorFlow and ONNX Runtime. We'll use the default engine which for our application (at the time of writing) is Apache MXNet.</p><p><a href="https://mxnet.apache.org/" target="_blank">Apache MXNet</a> provides the underlying engine. It has support for imperative and symbolic execution, distributed training of your models using multi-gpu or multi-host hardware, and multiple language bindings. Groovy is fully compatible with the Java binding.</p>
<h3>Using DJL with Groovy</h3>
<p>Groovy uses the Java binding. Consider looking at the DJL beginner tutorials for Java - they will work almost unchanged for Groovy.</p><p>For our example, the first thing we need to do is download the image we want to run the object detection model on:</p><pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono',monospace;font-size:9.6pt;">Path tempDir = Files.<span style="color:#9876aa;font-style:italic;">createTempDirectory</span>(<span style="color:#6a8759;">"resnetssd"</span>)<br><span style="color:#cc7832;">def </span>imageName = <span style="color:#6a8759;">'dog-ssd.jpg'<br></span>Path localImage = tempDir.resolve(imageName)<br><span style="color:#cc7832;">def </span>url = <span style="color:#cc7832;">new </span>URL(<span style="color:#6a8759;">"https://s3.amazonaws.com/model-server/inputs/</span>$imageName<span style="color:#6a8759;">"</span>)<br>DownloadUtils.<span style="color:#9876aa;font-style:italic;">download</span>(url, localImage, <span style="color:#cc7832;">new </span>ProgressBar())<br>Image img = ImageFactory.<span style="color:#9876aa;font-style:italic;">instance</span>.fromFile(localImage)<br></pre><p>It happens to be a well-known already available image. We'll store a local copy of the image in a temporary directory and we'll use a utility class that comes with DJL to provide a nice progress bar while the image is downloading. DJL provides it's own image classes, so we'll create an instance using the appropriate class from the downloaded image.</p><p>Next we want to configure our neural network layers:</p><pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono',monospace;font-size:9.6pt;"><span style="color:#cc7832;">def </span>criteria = Criteria.<span style="color:#9876aa;font-style:italic;">builder</span>()<br> .optApplication(Application.CV.<span style="color:#9876aa;font-style:italic;">OBJECT_DETECTION</span>)<br> .setTypes(Image, DetectedObjects)<br> .optFilter(<span style="color:#6a8759;">"backbone"</span>, <span style="color:#6a8759;">"resnet50"</span>)<br> .optEngine(Engine.<span style="color:#9876aa;font-style:italic;">defaultEngineName</span>)<br> .optProgress(<span style="color:#cc7832;">new </span>ProgressBar())<br> .build()<br></pre><p>DLJ supports numerous model <i>applications</i> including image classification, word recognition, sentiment analysis, linear regression, and others. We'll select <i>object detection</i>. This kind of application looks for the bounding box of known objects within an image. The <i>types</i> configuration option identifies that our input will be an image and the output will be detected objects. The <i>filter</i> option indicates that we will be using ResNet-50 (a 50-layers deep convolutional neural network often used as a backbone for many computer vision tasks). We set the <i>engine</i> to be the default engine which happens to be Apache MXNet. We also configure an optional progress bar to provide feedback of progress while our model is running.</p><p>Now that we have our configuration sorted, we'll use it to load a model and then use the model to make object predictions:</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono',monospace;font-size:9.6pt;"><span style="color:#cc7832;">def </span>detection = criteria.loadModel().withCloseable <span style="font-weight:bold;">{ </span>model <span style="font-weight:bold;">-&gt;<br></span><span style="font-weight:bold;"> </span>model.newPredictor().predict(img)<br><span style="font-weight:bold;">}<br></span>detection.items().each <span style="font-weight:bold;">{ </span>println it <span style="font-weight:bold;">}<br></span>img.drawBoundingBoxes(detection)<br></pre><p>For good measure, we'll draw the bounding boxes into our image.</p><p>Next, we save our image into a file and display it using Groovy's SwingBuilder.</p><pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono',monospace;font-size:9.6pt;">Path imageSaved = tempDir.resolve(<span style="color:#6a8759;">'detected.png'</span>)<br>imageSaved.withOutputStream <span style="font-weight:bold;">{ </span>os <span style="font-weight:bold;">-&gt; </span>img.save(os, <span style="color:#6a8759;">'png'</span>) <span style="font-weight:bold;">}<br></span><span style="color:#cc7832;">def </span>saved = ImageIO.<span style="color:#9876aa;font-style:italic;">read</span>(imageSaved.toFile())<br><span style="color:#cc7832;">new </span>SwingBuilder().edt <span style="font-weight:bold;">{<br></span><span style="font-weight:bold;"> </span>frame(<span style="color:#6a8759;">title</span>: <span style="color:#6a8759;">"</span>$detection.<span style="color:#9876aa;">numberOfObjects</span><span style="color:#6a8759;"> detected objects"</span>,<br> <span style="color:#6a8759;">size</span>: [saved.<span style="color:#9876aa;">width</span>, saved.<span style="color:#9876aa;">height</span>],<br> <span style="color:#6a8759;">defaultCloseOperation</span>: <span style="color:#9876aa;font-style:italic;">DISPOSE_ON_CLOSE</span>,<br> <span style="color:#6a8759;">show</span>: <span style="color:#cc7832;">true</span>) <span style="font-weight:bold;">{ </span>label(<span style="color:#6a8759;">icon</span>: imageIcon(<span style="color:#6a8759;">image</span>: saved)) <span style="font-weight:bold;">}<br></span><span style="font-weight:bold;">}<br></span></pre><p><span style="color: inherit; font-family: inherit; font-size: 24px;">Building and running our application</span><br></p>
<p>Our code is stored on a source file called <code>ObjectDetect.groovy</code>.</p><p>We used <a href="https://gradle.org/" target="_blank">Gradle</a> for our build file:</p><pre style="background-color:#2b2b2b;color:#a9b7c6;font-family:'JetBrains Mono',monospace;font-size:9.6pt;">apply <span style="color:#6a8759;">plugin</span>: <span style="color:#6a8759;">'groovy'<br></span>apply <span style="color:#6a8759;">plugin</span>: <span style="color:#6a8759;">'application'<br></span><span style="color:#6a8759;"><br></span>repositories <span style="font-weight:bold;">{<br></span><span style="font-weight:bold;"> </span>mavenCentral()<br><span style="font-weight:bold;">}<br></span><span style="font-weight:bold;"><br></span>application <span style="font-weight:bold;">{<br></span><span style="font-weight:bold;"> </span>mainClass = <span style="color:#6a8759;">'ObjectDetect'</span><br><span style="font-weight:bold;">}<br></span><span style="font-weight:bold;"><br></span>dependencies <span style="font-weight:bold;">{<br></span><span style="font-weight:bold;"> </span>implementation <span style="color:#6a8759;">"ai.djl:api:0.18.0</span><span style="color:#6a8759;">"<br></span><span style="color:#6a8759;"> </span>implementation <span style="color:#6a8759;">"org.apache.groovy:groovy:4.0.4</span><span style="color:#6a8759;">"<br></span><span style="color:#6a8759;"> </span>implementation <span style="color:#6a8759;">"org.apache.groovy:groovy-swing:4.0.4</span><span style="color:#6a8759;">"<br></span><span style="color:#6a8759;"> </span>runtimeOnly <span style="color:#6a8759;">"ai.djl:model-zoo:0.18.0</span><span style="color:#6a8759;">"<br></span><span style="color:#6a8759;"> </span>runtimeOnly <span style="color:#6a8759;">"ai.djl.mxnet:mxnet-engine:0.18.0</span><span style="color:#6a8759;">"<br></span><span style="color:#6a8759;"> </span>runtimeOnly <span style="color:#6a8759;">"ai.djl.mxnet:mxnet-model-zoo:0.18.0</span><span style="color:#6a8759;">"<br></span><span style="color:#6a8759;"> </span>runtimeOnly <span style="color:#6a8759;">"ai.djl.mxnet:mxnet-native-auto:1.8.0"<br></span><span style="color:#6a8759;"> </span>runtimeOnly <span style="color:#6a8759;">"org.apache.groovy:groovy-nio:4.0.4</span><span style="color:#6a8759;">"<br></span><span style="color:#6a8759;"> </span>runtimeOnly <span style="color:#6a8759;">"org.slf4j:slf4j-jdk14:1.7.36</span><span style="color:#6a8759;">"<br></span><span style="font-weight:bold;">}</span></pre>
<p>We run the application with the gradle run task:</p>
<pre style="background-color:#2b2b2b;color:#a9b7c6;"><span style="color:#4E9A06"><b>paulk@pop-os</b></span>:<span style="color:#3465A4"><b>/extra/projects/groovy-data-science</b></span>$ ./gradlew DLMXNet:run
<b>&gt; Task :DeepLearningMxnet:run</b>
Downloading: 100% |████████████████████████████████████████| dog-ssd.jpg
Loading: 100% |████████████████████████████████████████|
...
class: "car", probability: 0.99991, bounds: [x=0.611, y=0.137, width=0.293, height=0.160]
class: "bicycle", probability: 0.95385, bounds: [x=0.162, y=0.207, width=0.594, height=0.588]
class: "dog", probability: 0.93752, bounds: [x=0.168, y=0.350, width=0.274, height=0.593]
</pre>
<p>The displayed image looks like this:<br><img src="https://blogs.apache.org/groovy/mediaresource/b92cafbe-1866-4335-9c91-c3371253887e" style="width:50%;" alt="2022-08-01 21_28_33-3 detected objects.png"><br></p>
<h3>Further Information</h3><p>The full source code can be found in the following repo:<br><a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/DeepLearningMxnet" target="_blank">https://github.com/paulk-asert/groovy-data-science/subprojects/DeepLearningMxnet</a><a href="https://github.com/paulk-asert/groovy-data-science/tree/master/subprojects/DeepLearningMxnet" target="_blank"></a><br></p>
<h3>Conclusion</h3>
<p>We have examined using Apache Groovy, DLJ and Apache MXNet to detect objects within an image. We've used a model based on a rich deep learning model but we didn't need to get into the details of the model or its neural network layers. DLJ and Apache MXNet did the hard lifting for us. Groovy provided a simple coding experience for building our application.</p>