Let’s be clear, there is not a lot of commercial value in building a cow detector. But the same logic can be applied when recognising food, vegetables, flowers, traffics sign… There is also nothing wrong with having some fun when learning something new ;-)

Deep learning

If you have read my article about “Machine learning in web games” or have some insights in the history of neural networks, you may know that they have been around since 1957.

So while the idea of building artificial neural networks is nothing new, what we do see is the trend of building “deep networks” consisting out of a lot more (transformation and processing) layers between the input and output layer.

This results in capturing multiple levels of abstractions that help these systems to better (or deeper if you want) understand, recognise and predict things.

A Convolutional Neural Network

For this article we will building something called an “Convolutional Neural Network”. The structure of the network that I have used is not something that I invented. We will often see that for a lot of problems, the structure of good working deep learning networks are public and freely reusable.

network

My cow detector for example, is partially based on the CIFAR-10 code available in TFLearn. TFlearn is a modular and transparent deep learning library built on top of Tensorflow. Tensorflow being Google open source library for machine learning.

I’m not going into detail regarding this network or terms like stride, convolutional filters, pooling, dropout … There is a lot of good material available online that explain these things brilliantly.

I personally can recommend the video “How Convolutional Neural Networks work”.

Building a cow dataset

To start this section with a quote posted on hackernews

I find it so aggravating that nearly every last ML framework documents their CNN libraries in terms of canned MNIST datasets imported from the library in a preprocessed form.

It’s always left as a useless exercise for the reader to divine how to generate such a dataset from his/her own data

And that is also a feeling that I often have. The documentation of a lot of libraries always seems to skim over the dataset part. For this article, I wanted to give some extra attention regarding how I build my cow dataset.

First I went to ImageNet to get a list of URLs containing cow images.

I used a simple script that reads the list of image URLs and downloaded the images into a folder.


""" Creates training data

@author Glenn De Backer <glenn at simplicity dot be>
"""
import os
import os.path
import requests
import time

# only retrie two times before going to the next item
requests.adapters.DEFAULT_RETRIES = 2

def download_data():
    """ Download (imagenet) data """

    # define where to find unprocessed cows images
    dir_unprocessed_cows = os.path.join('training_data', 'unprocessed_cows')

    # get download urls from cow.txt
    download_urls = open('cows.txt').readlines()

    # hold number of files downloaded
    download_counter = 1

    # iterate over urls
    for download_url in download_urls:
        print "Downloading and storing file %s" % download_url.strip()

        try:
            # download file
            req = requests.get(download_url.strip(), stream=True)
            target_path = os.path.join(dir_unprocessed_cows, '%s.jpg' \
                % download_counter)

            # store file locally
            with open(target_path, 'wb') as file_descriptor:
                for chunk in req.iter_content(1024):
                    file_descriptor.write(chunk)

            download_counter += 1

        except requests.exceptions.RequestException as exception:
            print "Skipping file %s" % download_url.strip()

        # wait 5 second so we don't hammer servers
        time.sleep(5)
 

if __name__ == "__main__":
    download_data()

This way I collected roughly 1000 pictures of cows. It may seem a lot, but here it’s a case of “more is better than less”.

Other good sources of images are Google Image search and Wikimedia commons. There is nothing more tedious than right clicking and saving individual images. We don’t need to when using a Chrome extension called Fatkun Batch Download Image.

This way I ended up with 3000 pictures of cows really quickly. Still, it’s not an optimal number (think ten thousand or more) but enough to start playing with.

The network expects images of 32 pixels high and wide. Our cow images are in different sizes. We need to convert them all to 32 pixels high and wide.

I’m not going to do this manually as this would be to labour intensive. A solution here was to write a script using the Python Imaging Library.


""" Converts cow images to right dimensions

@author Glenn De Backer <glenn at simplicity dot be>
"""
from glob import glob
import os
import os.path
from PIL import Image

SIZE = 32, 32

# set directory
os.chdir('raw/cows')

# filter all jpg and png images
IMAGE_FILES = glob('*.jpg')
IMAGE_FILES.extend(glob('*.jpeg'))
IMAGE_FILES.extend(glob('*.png'))

IMAGE_COUNTER = 1

# iterate over files
for image_file in IMAGE_FILES:

    # open file and resize
    im = Image.open(image_file)
    im = im.resize(SIZE, Image.ANTIALIAS)

    #save locally
    output_filename = "%s.jpg" % IMAGE_COUNTER
    im.save(os.path.join('..', 'data', 'cows', output_filename), \
    "JPEG", quality=70)

    # increate image counter
    IMAGE_COUNTER = IMAGE_COUNTER + 1

if __name__ == "__main__":
    pass

But this is only pictures of cows. We are not at a point that we can already train our neural network.

The system also needs examples of things that aren’t cows. I could repeat the process or better yet use a public dataset of images that doesn’t contain cows.

I already mentioned the CIFAR-10 dataset which consists of 60000 32x32 images in 10 classes, with 6000 images for each class. I used the pictures from the animal classes as negative (non-cow) examples.

For this, I got the CIFAR-10 dataset available on kaggle and wrote a small script that extracts the correct images from it.

""" Creates others (based on cifar10) data

@author Glenn De Backer <glenn at simplicity dot be>
"""
import csv
import os
from PIL import Image

# CIFAR-10 classes that we want to keep
CLASSES = ['cat', 'dog', 'deer', 'bird', 'horse', 'frog']

IMAGE_COUNTER = 1

# open csv file containing id -> class
with open('raw/cifar10/trainLabels.csv', 'rb') as f:
    READER = csv.reader(f)

    # iterate over rows
    for row in READER:
        # check if it's a class that we want to keep
        if row[1] in CLASSES:
            # load image and save as jpg
            im = Image.open('raw/cifar10/%s.png' % row[0])

            output_filename = "%s.jpg" % IMAGE_COUNTER
            im.save(os.path.join('data', 'others', output_filename), \
            "JPEG", quality=70)

            # increase image counter
            IMAGE_COUNTER = IMAGE_COUNTER + 1

And we are basically done on the dataset part.

Training our cow network

You can use Tensorflow from within Python or C++. For this project, I went for Python as the C++ API is the less documented API.

class CowTrainer(object):
    """ Cow trainer """
    ...

    def train(self):
        """ Start training """
        # 1: build a list of image filenames
        self.build_image_filenames_list()

        # 2: use list information to init our numpy variables
        self.init_np_variables()

        # 3: Add images to our Tensorflow dataset
        self.add_tf_dataset(self.list_cow_files, 0)
        self.add_tf_dataset(self.list_noncow_files, 1)

        # 4: Process TF dataset
        self.process_tf_dataset()

        # 5: Setup image preprocessing
        self.setup_image_preprocessing()

        # 6: Setup network structure
        self.setup_nn_network()

        # 7: Train our deep neural network
        ...

Here we can see which steps we take when we train our cow detection network.

We start with building a simple list of images and calculate how many files exist in our complete dataset.

def build_image_filenames_list(self):
  """ Get list of filenames for cows and non cows """
  self.list_cow_files = sorted(glob.glob(self.path_cow_images))
  self.list_noncow_files = sorted(glob.glob(self.path_non_cow_images))
  self.total_images_count = len(self.list_cow_files) + len(self.list_noncow_files)

We need the image count to initialize our Numpy variables where we will be storing the image data and their labels.

def init_np_variables(self):
    """ Initialize NP datastructures """
    self.tf_image_data = np.zeros((self.total_images_count, self.image_size,
                                    self.image_size, 3), dtype='float64')

    self.tf_image_labels = np.zeros(self.total_images_count)

Next is adding the image data and labels by opening the image files, getting and storing their content.

def add_tf_dataset(self, list_images, label):
    """ Add tensorflow data we will pass to our network """
    # process list of images
    for image_file in list_images:
        try:
            # read, store image and label
            img = io.imread(image_file)
            self.tf_image_data[self.tf_data_counter] = np.array(img)
            self.tf_image_labels[self.tf_data_counter] = label

            # increase counter
            self.tf_data_counter += 1
        except:
            # on error continue to the next image
            continue

At this point, we have the dataset but we need to split it into a test and training set.

def process_tf_dataset(self):
    """ Process our TF dataset """
    # split our tf set in a test and training part
    self.tf_x, self.tf_x_test, self.tf_y, self.tf_y_test = train_test_split(
        self.tf_image_data, self.tf_image_labels, test_size=0.1, random_state=42)

    # encode our labels
    self.tf_y = to_categorical(self.tf_y, 2)
    self.tf_y_test = to_categorical(self.tf_y_test, 2)

Next, we setup how we will be normalising our images. We will also use augmentation to synthesise new images by flipping or rotating some in our set.


def setup_image_preprocessing(self):
    """ Setup image preprocessing """
    # normalization of images
    self.tf_img_prep = ImagePreprocessing()
    self.tf_img_prep.add_featurewise_zero_center()
    self.tf_img_prep.add_featurewise_stdnorm()

    # Randomly create extra image data by rotating and flipping images
    self.tf_img_aug = ImageAugmentation()
    self.tf_img_aug.add_random_flip_leftright()
    self.tf_img_aug.add_random_rotation(max_angle=30.)

Now we define our network structure which is loosely based on the CIFAR-10 network.

def setup_nn_network(self):
  """ Setup neural network structure """

  # our input is an image of 32 pixels high and wide with 3 channels (RGB)
  # we will also preprocess and create synthetic images
  self.tf_network = input_data(shape=[None, self.image_size, self.image_size, 3],
                               data_preprocessing=self.tf_img_prep,
                               data_augmentation=self.tf_img_aug)

  # layer 1: convolution layer with 32 filters (each being 3x3x3)
    layer_conv_1 = conv_2d(self.tf_network, 32, 3, activation='relu',
                           name='conv_1')

  # layer 2: max pooling layer
  self.tf_network = max_pool_2d(layer_conv_1, 2)

  # layer 3: convolution layer with 64 filters
  layer_conv_2 = conv_2d(self.tf_network, 64, 3, activation='relu',
                         name='conv_2')

  # layer 4: Another convolution layer with 64 filters
  layer_conv_3 = conv_2d(layer_conv_2, 64, 3, activation='relu',
                         name='conv_3')

  # layer 5: Max pooling layer
  self.tf_network = max_pool_2d(layer_conv_3, 2)

  # layer 6: Fully connected 512 node layer
  self.tf_network = fully_connected(self.tf_network, 512, activation='relu')

  # layer 7: Dropout layer (removes neurons randomly to combat overfitting)
  self.tf_network = dropout(self.tf_network, 0.5)

  # layer 8: Fully connected layer with two outputs (cow or non cow class)
  self.tf_network = fully_connected(self.tf_network, 2, activation='softmax')

  # define how we will be training our network
  accuracy = Accuracy(name="Accuracy")
  self.tf_network = regression(self.tf_network, optimizer='adam',
                               loss='categorical_crossentropy',
                               learning_rate=0.0005, metric=accuracy)

Finally, we train our cow network and save it for later reuse.

    # 7: Train our deep neural network
    tf_model = DNN(self.tf_network, tensorboard_verbose=3,
                    checkpoint_path='model_cows.tfl.ckpt')

    tf_model.fit(self.tf_x, self.tf_y, n_epoch=100, shuffle=True,
                    validation_set=(self.tf_x_test, self.tf_y_test),
                    show_metric=True, batch_size=96,
                    snapshot_epoch=True,
                    run_id='model_cows')

    # 8: Save model
    tf_model.save('model_cows.tflearn')

Building our cow classifier

Our cow classifier is not really that different than our classifier. We set up our image normalisation and augmentation like we did when building our trainer. The same for the network we will be using.

There are two methods that our trainer class don’t have.

class CowClassifier(object):
    """ Cow classifier """
    ...

    def load_model(self, model_path):
        """ Load model """
        self.tf_model = DNN(self.tf_network, tensorboard_verbose=0)
        self.tf_model.load(model_path)
    ...

One simply loads our trained network.

def predict_image(self, image_path):
    """ Predict image """
    # Load the image file
    img = scipy.ndimage.imread(image_path, mode="RGB")

    # Scale it to 32x32
    img = scipy.misc.imresize(img, (32, 32),
          interp="bicubic").astype(np.float32, casting='unsafe')

    # Predict
    return self.tf_model.predict([img])

This method feeds an image to our network and returns an array containing class probabilities.

Source

The complete source (including the cow dataset) is available on my github account. License is GPLv3.