Creating your own Kotlin detector in TensorFlow
In this article I will show how to create a mobile object detector for the one specific product — a Kotlin mild ketchup:
Not so long ago I was using OpenCV and its Java interface for object detection in one of my projects. I was implementing Haar feature-based cascade classifiers for different type of products. The generation of such classifier involved following 3 steps:
- samples collection
For collecting samples I was generating as much as possible distorted images of identified products. To generate a large number of samples (100) from a single image I used the opencv_createsamples utility script:
opencv_createsamples -img [image_name.jpg] -num 100 -bg negatives.dat -vec samples.vec -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0 -w 24 -h 24
where negatives.dat was just a list of image paths that didn’t contain the given detected object.
For the training I used opencv_haartraining :
opencv_haartraining -data [classifier_dir_object_name] -vec samples.vec -bg negatives.dat -nstages 10 -precalcValBufSize 1024 -precalcIdxBufSize 1024 -minhitrate 0.995 -maxfalsealarm 0.5 -npos 100 -nneg 100 -w 24 -h 24 -mode ALL
here samples.dat is a file that contains a list of image paths and coordinates pointing to the given object. Its format is following:
[filename] [# of objects] [[x y width height] [… 2nd object] …]
picture001.jpg 1 140 100 45 45 picture002.jpg 2 100 200 50 50 50 30 25 25
The application run on a server. The drawback was the cost of generating a valid classifier and the fact, that I couldn’t easily deploy my solution on the mobile.
After TensorFlow has been released I’ve decided to use its object detection API . The cost of generating a properly working model is still high, but TensorFlow comes with a number of handy scripts and samples, which makes a generation of a mobile app with object detection relatively easy.
First we need to download and set up TensorFlow environment and clone appropriate repositories:
git clone - recursive https://github.com/tensorflow/tensorflow
git clone - recursive https://github.com/tensorflow/model
Then we need to collect some Kotlin ketchup images. It will be good to downscale them to some low resolution, i.e. 415px x 553px, because I noticed that when using original, large pictures I often got out of memory errors in a training phase. Even having only 60 training images (3456 x 4608px each) I got OoME (both on my laptop with 16GB RAM, i7–4720HQ and on Google ML Engine Cloud with a Standard_GPU).
We can use mogrify to downscale a bunch of images in one step:
find . -name “*.jpg” | xargs mogrify -resize 15%
I also used Gimp to extract smaller objects from large pictures, especially when a target was visible multiple times:
I used an excellent free tool labelImg to generate product bounding boxes in the XML format.
Then I divided resulting bunch of JPG and XML files to training (277 imgs) and test (87) sets
Then I run a slightly modified version of the xml_to_csv.py script from the Racoon dataset to generate CSV description for my training and test dataset.
python3 xml_to_csv.py train
python3 xml_to_csv.py test
Next I used an altered generate_tfrecord.py script (I’ve added ability to read labels from the file) to generate TFRecord needed in Tensorflow learning phase.
python3 generate_tfrecord.py -t train — labels_path=labels.txt — csv_input=train_labels.csv — output_path=train.record
In the next step I created .pbtxt file containing just one entry for a detected label:
and the last thing before training begins — the config file. It’s quite complex and I based myself on the Coco dataset (as I’m using transfer learning) . I’ve changed num_classes: 1 (we’re detecting only one ‘kotlin’ ketchup object) and I used ‘ssd_mobilenet_v2’ feature extractor.
In the matcher section I increased values for matched_threshold (to 0.75) and lowered unmatched_threshold (to 0.25). I’ve changed fixed_shape_resizer to 500 x 500. Without this step I was getting to many false positives.
Then to avoid Out of Memory Errors, which unfortunately occurred too often, I decreased batch_size in the training_config section to 4. Then I adjusted fine_tune_checkpoint to the Coco dataset and decreased num_steps to 10000 (to significantly lower learning time).
In the training_input_reader I decreased:
again, to avoid OoME.
I set the number of steps to 10 000 (previously it was 200 000). Such a significant drop could affect accuracy of a trained model. However I noticed that between 9 000 and 10 000 there was no meaningful drop in loss (always oscillated between 0.5–1.5), so I decided not to increase that value. Then adjusted input_path and label_map_path both in training and evaluation sections.
Now let’s begin with the training phase:
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim (we have to set this path each time we open a new terminal)
python3 object_detection/train.py — logtostderr — pipeline_config_path=product.config –train_dir=~/training
This will generate a bunch of files in the training directory.
Learning steps are recorded at specific checkpoints (see ‘Recording summary at step.’ in the command output) — one of the checkpoints will be used later in the generation of the Protobuf model. If training fails for some reason we can restart training later and (provided from_detection_checkpoint = true) it will start processing from the latest recorded checkpoint. When the learning finishes (it can last with the given config even for 8 hours) we need to export generated graph to the file that can be used on a mobile. We will use object_detection/export_inference_graph.py from Tensorflow Model project:
python3 object_detection/export_inference_graph.py — input_type image_tensor — pipeline_config_path=products.config — trained_checkpoint_prefix=model.ckpt-10000 — output_directory=~/products/out
As a result in the products/out directory we will get a model file, which we need to copy to the assets directory of the mobile app (I will describe it in a next paragraph):
cp /products/out/frozen_inference_graph.pb ~/AndroidStudioProjects/KotlinDetector/assets/frozen_inference_graph.pb
Now, what’s left is the mobile app. I tweaked a sample TensorFlow Object detection app — extracted and translated into Kotlin (well, in what other language could it be implemented? :)). The application uses TensorFlowObjectDetectionAPIModel, which is a wrapper for frozen detection models. It in turn uses TensorFlowInferenceInterface which is a wrapper over the TensorFlow API.
If you want to have your own detector, you would need to substitute frozen_inteference_graph.pb with your own file and change label names in frozen_inference_labels.txt .
And that’s it! Voila — here is Kotlin detector in action:
Here you will find code of the mobile app.
Here are Python scripts I used in object detection.
Here is a ready to install .apk file for the app evaluation.