Let’s be clear, there is not a lot of commercial value in building a cow detector. But the same logic can be applied when recognising food, vegetables, flowers, traffics sign… There is also nothing wrong with having some fun when learning something new ;-)
If you have read my article about “Machine learning in web games” or have some insights in the history of neural networks, you may know that they have been around since 1957.
So while the idea of building artificial neural networks is nothing new, what we do see is the trend of building “deep networks” consisting out of a lot more (transformation and processing) layers between the input and output layer.
This results in capturing multiple levels of abstractions that help these systems to better (or deeper if you want) understand, recognise and predict things.
A Convolutional Neural Network
For this article we will building something called an “Convolutional Neural Network”. The structure of the network that I have used is not something that I invented. We will often see that for a lot of problems, the structure of good working deep learning networks are public and freely reusable.
My cow detector for example, is partially based on the CIFAR-10 code available in TFLearn. TFlearn is a modular and transparent deep learning library built on top of Tensorflow. Tensorflow being Google open source library for machine learning.
I’m not going into detail regarding this network or terms like stride, convolutional filters, pooling, dropout … There is a lot of good material available online that explain these things brilliantly.
I personally can recommend the video “How Convolutional Neural Networks work”.
Building a cow dataset
To start this section with a quote posted on hackernews
I find it so aggravating that nearly every last ML framework documents their CNN libraries in terms of canned MNIST datasets imported from the library in a preprocessed form.
It’s always left as a useless exercise for the reader to divine how to generate such a dataset from his/her own data
And that is also a feeling that I often have. The documentation of a lot of libraries always seems to skim over the dataset part. For this article, I wanted to give some extra attention regarding how I build my cow dataset.
First I went to ImageNet to get a list of URLs containing cow images.
I used a simple script that reads the list of image URLs and downloaded the images into a folder.
""" Creates training data @author Glenn De Backer <glenn at simplicity dot be> """ import os import os.path import requests import time # only retrie two times before going to the next item requests.adapters.DEFAULT_RETRIES = 2 def download_data(): """ Download (imagenet) data """ # define where to find unprocessed cows images dir_unprocessed_cows = os.path.join('training_data', 'unprocessed_cows') # get download urls from cow.txt download_urls = open('cows.txt').readlines() # hold number of files downloaded download_counter = 1 # iterate over urls for download_url in download_urls: print "Downloading and storing file %s" % download_url.strip() try: # download file req = requests.get(download_url.strip(), stream=True) target_path = os.path.join(dir_unprocessed_cows, '%s.jpg' \ % download_counter) # store file locally with open(target_path, 'wb') as file_descriptor: for chunk in req.iter_content(1024): file_descriptor.write(chunk) download_counter += 1 except requests.exceptions.RequestException as exception: print "Skipping file %s" % download_url.strip() # wait 5 second so we don't hammer servers time.sleep(5) if __name__ == "__main__": download_data()
This way I collected roughly 1000 pictures of cows. It may seem a lot, but here it’s a case of “more is better than less”.
Other good sources of images are Google Image search and Wikimedia commons. There is nothing more tedious than right clicking and saving individual images. We don’t need to when using a Chrome extension called Fatkun Batch Download Image.
This way I ended up with 3000 pictures of cows really quickly. Still, it’s not an optimal number (think ten thousand or more) but enough to start playing with.
The network expects images of 32 pixels high and wide. Our cow images are in different sizes. We need to convert them all to 32 pixels high and wide.
I’m not going to do this manually as this would be to labour intensive. A solution here was to write a script using the Python Imaging Library.
""" Converts cow images to right dimensions @author Glenn De Backer <glenn at simplicity dot be> """ from glob import glob import os import os.path from PIL import Image SIZE = 32, 32 # set directory os.chdir('raw/cows') # filter all jpg and png images IMAGE_FILES = glob('*.jpg') IMAGE_FILES.extend(glob('*.jpeg')) IMAGE_FILES.extend(glob('*.png')) IMAGE_COUNTER = 1 # iterate over files for image_file in IMAGE_FILES: # open file and resize im = Image.open(image_file) im = im.resize(SIZE, Image.ANTIALIAS) #save locally output_filename = "%s.jpg" % IMAGE_COUNTER im.save(os.path.join('..', 'data', 'cows', output_filename), \ "JPEG", quality=70) # increate image counter IMAGE_COUNTER = IMAGE_COUNTER + 1 if __name__ == "__main__": pass
But this is only pictures of cows. We are not at a point that we can already train our neural network.
The system also needs examples of things that aren’t cows. I could repeat the process or better yet use a public dataset of images that doesn’t contain cows.
I already mentioned the CIFAR-10 dataset which consists of 60000 32x32 images in 10 classes, with 6000 images for each class. I used the pictures from the animal classes as negative (non-cow) examples.
For this, I got the CIFAR-10 dataset available on kaggle and wrote a small script that extracts the correct images from it.
""" Creates others (based on cifar10) data @author Glenn De Backer <glenn at simplicity dot be> """ import csv import os from PIL import Image # CIFAR-10 classes that we want to keep CLASSES = ['cat', 'dog', 'deer', 'bird', 'horse', 'frog'] IMAGE_COUNTER = 1 # open csv file containing id -> class with open('raw/cifar10/trainLabels.csv', 'rb') as f: READER = csv.reader(f) # iterate over rows for row in READER: # check if it's a class that we want to keep if row in CLASSES: # load image and save as jpg im = Image.open('raw/cifar10/%s.png' % row) output_filename = "%s.jpg" % IMAGE_COUNTER im.save(os.path.join('data', 'others', output_filename), \ "JPEG", quality=70) # increase image counter IMAGE_COUNTER = IMAGE_COUNTER + 1
And we are basically done on the dataset part.
Training our cow network
You can use Tensorflow from within Python or C++. For this project, I went for Python as the C++ API is the less documented API.
class CowTrainer(object): """ Cow trainer """ ... def train(self): """ Start training """ # 1: build a list of image filenames self.build_image_filenames_list() # 2: use list information to init our numpy variables self.init_np_variables() # 3: Add images to our Tensorflow dataset self.add_tf_dataset(self.list_cow_files, 0) self.add_tf_dataset(self.list_noncow_files, 1) # 4: Process TF dataset self.process_tf_dataset() # 5: Setup image preprocessing self.setup_image_preprocessing() # 6: Setup network structure self.setup_nn_network() # 7: Train our deep neural network ...
Here we can see which steps we take when we train our cow detection network.
We start with building a simple list of images and calculate how many files exist in our complete dataset.
def build_image_filenames_list(self): """ Get list of filenames for cows and non cows """ self.list_cow_files = sorted(glob.glob(self.path_cow_images)) self.list_noncow_files = sorted(glob.glob(self.path_non_cow_images)) self.total_images_count = len(self.list_cow_files) + len(self.list_noncow_files)
We need the image count to initialize our Numpy variables where we will be storing the image data and their labels.
def init_np_variables(self): """ Initialize NP datastructures """ self.tf_image_data = np.zeros((self.total_images_count, self.image_size, self.image_size, 3), dtype='float64') self.tf_image_labels = np.zeros(self.total_images_count)
Next is adding the image data and labels by opening the image files, getting and storing their content.
def add_tf_dataset(self, list_images, label): """ Add tensorflow data we will pass to our network """ # process list of images for image_file in list_images: try: # read, store image and label img = io.imread(image_file) self.tf_image_data[self.tf_data_counter] = np.array(img) self.tf_image_labels[self.tf_data_counter] = label # increase counter self.tf_data_counter += 1 except: # on error continue to the next image continue
At this point, we have the dataset but we need to split it into a test and training set.
def process_tf_dataset(self): """ Process our TF dataset """ # split our tf set in a test and training part self.tf_x, self.tf_x_test, self.tf_y, self.tf_y_test = train_test_split( self.tf_image_data, self.tf_image_labels, test_size=0.1, random_state=42) # encode our labels self.tf_y = to_categorical(self.tf_y, 2) self.tf_y_test = to_categorical(self.tf_y_test, 2)
Next, we setup how we will be normalising our images. We will also use augmentation to synthesise new images by flipping or rotating some in our set.
def setup_image_preprocessing(self): """ Setup image preprocessing """ # normalization of images self.tf_img_prep = ImagePreprocessing() self.tf_img_prep.add_featurewise_zero_center() self.tf_img_prep.add_featurewise_stdnorm() # Randomly create extra image data by rotating and flipping images self.tf_img_aug = ImageAugmentation() self.tf_img_aug.add_random_flip_leftright() self.tf_img_aug.add_random_rotation(max_angle=30.)
Now we define our network structure which is loosely based on the CIFAR-10 network.
def setup_nn_network(self): """ Setup neural network structure """ # our input is an image of 32 pixels high and wide with 3 channels (RGB) # we will also preprocess and create synthetic images self.tf_network = input_data(shape=[None, self.image_size, self.image_size, 3], data_preprocessing=self.tf_img_prep, data_augmentation=self.tf_img_aug) # layer 1: convolution layer with 32 filters (each being 3x3x3) layer_conv_1 = conv_2d(self.tf_network, 32, 3, activation='relu', name='conv_1') # layer 2: max pooling layer self.tf_network = max_pool_2d(layer_conv_1, 2) # layer 3: convolution layer with 64 filters layer_conv_2 = conv_2d(self.tf_network, 64, 3, activation='relu', name='conv_2') # layer 4: Another convolution layer with 64 filters layer_conv_3 = conv_2d(layer_conv_2, 64, 3, activation='relu', name='conv_3') # layer 5: Max pooling layer self.tf_network = max_pool_2d(layer_conv_3, 2) # layer 6: Fully connected 512 node layer self.tf_network = fully_connected(self.tf_network, 512, activation='relu') # layer 7: Dropout layer (removes neurons randomly to combat overfitting) self.tf_network = dropout(self.tf_network, 0.5) # layer 8: Fully connected layer with two outputs (cow or non cow class) self.tf_network = fully_connected(self.tf_network, 2, activation='softmax') # define how we will be training our network accuracy = Accuracy(name="Accuracy") self.tf_network = regression(self.tf_network, optimizer='adam', loss='categorical_crossentropy', learning_rate=0.0005, metric=accuracy)
Finally, we train our cow network and save it for later reuse.
# 7: Train our deep neural network tf_model = DNN(self.tf_network, tensorboard_verbose=3, checkpoint_path='model_cows.tfl.ckpt') tf_model.fit(self.tf_x, self.tf_y, n_epoch=100, shuffle=True, validation_set=(self.tf_x_test, self.tf_y_test), show_metric=True, batch_size=96, snapshot_epoch=True, run_id='model_cows') # 8: Save model tf_model.save('model_cows.tflearn')
Building our cow classifier
Our cow classifier is not really that different than our classifier. We set up our image normalisation and augmentation like we did when building our trainer. The same for the network we will be using.
There are two methods that our trainer class don’t have.
class CowClassifier(object): """ Cow classifier """ ... def load_model(self, model_path): """ Load model """ self.tf_model = DNN(self.tf_network, tensorboard_verbose=0) self.tf_model.load(model_path) ...
One simply loads our trained network.
def predict_image(self, image_path): """ Predict image """ # Load the image file img = scipy.ndimage.imread(image_path, mode="RGB") # Scale it to 32x32 img = scipy.misc.imresize(img, (32, 32), interp="bicubic").astype(np.float32, casting='unsafe') # Predict return self.tf_model.predict([img])
This method feeds an image to our network and returns an array containing class probabilities.
The complete source (including the cow dataset) is available on my github account. License is GPLv3.