How to integrate image recognition in iOS apps

Glenn Wedin
4 min readFeb 11, 2019
https://xkcd.com/1425/

The “virtually impossible” is in fact possible these days. At least the topic of this xkcd-strip is. In this short tutorial I will go through how we can build a CoreML model with Microsoft Custom Vision to be able to recognize objects in a photo and how to use it in an iOS application. You will need to have a basic understanding of iOS-development and Swift to follow this post.

First we need to create the CoreML model. Let’s start to have a look at how to use Microsoft Custom Vision.

Start by signing in to customvision.ai and create a new project. Enter name, description and set project-type to classification. Set classification type as multilabel and domain as general (compact). The reason for selecting compact is because we are going to embed the model in our app. You can always change these settings later.

The next step is to upload images of objects we want to be able to recognize and tag them with the name of the object. With multilabel we may use multiple tags on an image. When we have a good amount of images we can start training the model. We can also test if the model performs well by using the quick test button where you may upload images to test if they match with the model you’ve trained.

Yes, my examples uses baked goods…

Get the CoreML model

When we’re done training the model we need to download it and import it to our Xcode project. To download the model go to the performance tab to export in CoreML format. You can then add the file to your XCode project folder where a class will be autogenerated for later use.

Set up a camera device and view

We’ll start by creating a UIViewController. The view is where we want to show the feedback from our iPhones camera. To be able to output live video, we must add the AVCaptureVideoDataOutputSampleBufferDelegate protocol.

When the view is loaded we create an AVCaptureSession and define a device for inputting data to our session. In this case we select the default device for video. We also create an AVCaptureVideoDataOutput from which our captureSession will receive video frames that will be available in our captureOutput method. The last thing we do is to create a AVCaptureVideoPreviewLayer so that we can preview the output from the camera in our app. The preview layer is then added to our main view as a sublayer that fills the entire frame.

Defining input device, output and preview

Capturing the video output

To handle the output and do image classification we must implement the captureOutput method that is defined by the protocol AVCaptureVideoDataOutputSampleBufferDelegate. To be able to analyse pictures from the output we get the CVPixelBuffer from the CMSampleBuffer.

We then need to specify the image analysis request we are going to do for each frame. First we must create a VNCoreMLModel with the trained CoreML class we downloaded from Custom Vision. Your name will be different from _92ed7d5fe39f4438b1ef3e98e3ce80c3_1!

Capture output from the camera perform CoreML requests

For each result we check if the result has a high enough confidence rate and if it has an identifier. In this example I’m only checking for one specific identifier. The identifier is the tags we added to the images in CustomVision. If we have a high ennough confidence, we can be pretty sure the identifier is the correct one and we stop the capturing and dispatch the result to the main thread. In this case I created just a simple info window with a message.

This is all the code needed to implement a camera, preview and image recognition in an iOS app and your app will now be able to react when it recognizes the object in front of the camera.

--

--

Glenn Wedin

⚡️ Software engineer 🖥 at Itera 💼, hobby photographer 📸 and tech enthusiast ⚡️ https://glenn.wedin.no