For some years I have been experimenting with computer vision and machine learning in general. I wrote code for recognizing cards, beer labels, paintings, numbers or using motion and drawings as the input of a simple game. In this article, I will describe how to create a basic program that can recognize dices using OpenCV.

When developing a roulette game a while ago, I talked with an ex-colleague (that worked for an online gambling company at the time) that pointed me to the fact that they have special software to randomize.

That made me think and also wonder. There is nothing more random than using dices, so what if we could use real dices in a virtual game?

Creating small problems

There are multiple ways to skin a cat and that also the case when coming up with a solution for recognizing dices. This is by no means the only or maybe even the best way to do it.

The way I approach these kinds of problems is by breaking them up into smaller ones. The reason being that it reduces the overall complexity and makes it easier to try to refine individual parts. As a bit of a perfectionist, I absolutely love working this way.

In this situation, we have at least 2 main problems: detecting the dice and counting the pips. But before we can tackle these problems we need to deal with getting some data to work with.

Capturing frames

As I assume not everyone has OpenCV knowledge, I will start with showing how we can easily capture frames from a webcam.

C++

// open the default camera (index 0)
cv::VideoCapture videoCapture(0);

if(!videoCapture .isOpened()){
  return -1;
}

// set camera properties
videoCapture.set(CV_CAP_PROP_FRAME_WIDTH,1920);
videoCapture.set(CV_CAP_PROP_FRAME_HEIGHT,1080);

It’s not particularly rocket science that is going one here. We establish a connection with our camera, we check if there is communication and set the desired frame width and height.

I’m using a Logitech C920 camera on relatively fast hardware, so I opted to use the full resolution that is available to me.

C++

// datastructure that holds our frame
cv::Mat frame;

// capture a single video frame
videoCapture >> frame;

This is also relatively simple. We define a frame variable of the OpenCV type Mat. We use this variable to store the pixel data that we have captured from our webcam.

Detecting dices

The central idea in my solution in finding the dices, is to create big enough contrast between our dices and the background. This makes it easier to extract them.

To achieve this I 3D printed a tray using black filament and use white dices. An even dark colored cloth should also work nicely if you don’t have access to a 3D printer.

The first step we need to take is capturing and storing a single frame.

capture frame

After capturing this frame we can discard any color information as it’s not of any use. That is easily done using the next line of code

C++

// convert Blue Green Red to Grayscale
cvtColor(frame, frame, CV_BGR2GRAY);

Pay attention to the fact that OpenCV uses the BGR format. It can be a bit of a nuisance when you deal with other frameworks that in most cases will follow the RGB pattern.

grayscale

At this stage, we have still too much information in our frame. There is a lot of stuff visible that we simply don’t need e.g. cables, part of a cabinet, …. Fortunately it’s possible to remove some of these elements.

OpenCV comes with a wide range of advanced algorithms for doing this kind of operation. In this case, I settled on capturing a static background frame (i.e. a frame without dices) before anything else. We can then calculate the absolute difference between this single background frame and the frames that contain our dices.

In the end, it’s not about removing the background perfectly (although it would help) but to reduce the data we need to work with.

While it may sound difficult, it’s relatively easy to do in OpenCV and that in only one line of code

C++

// calculate the absolute difference between
// our frame and the background frame
cv::absdiff(frame, backgroundFrame, frame);

background removal

After applying this operation it’s clear that the dice is a lot easier to spot but we still see some glimpse of grayish parts coming through. This is something we also need to deal with before we search for the contours of our dice.

This can be easily fixed by applying a threshold operation. According to Wikipedia thresholding is the simplest method of image segmentation and that is not a lie. In OpenCV, you can use 5 thresholding operations and the one that we will use is the binary threshold.

The way it works is really simple. If the pixel intensity is higher than a threshold we have defined, the new pixel intensity is set to a value we have also defined. If it’s lower than our threshold, it becomes a 0. Pixel intensities start from black (0) to white (255).

Threshold animation

C++

// threshold
cv::threshold(frame, frame, 150, 255, cv::THRESH_BINARY | CV_THRESH_OTSU );

So, in this case, we set our threshold to 150 and all pixel intensities that exceed our threshold will be set to 255. We also indicate that we want to use a binary threshold.

Threshold operation

Now that we have reduced the amount of information, there is only one step we need to take before we can start searching for our dice contour. That step is applying an edge detector.

What an edge detector does is finding the boundaries of objects within an image. While there are multiple edge detectors you can choose, I settled on the one using the Canny algorithm.

C++

// applying canny edge filter
cv::Canny( frame, frame, 2, 2*2, 3, false );

Unfortunately, the explanation of method parameters in the OpenCV documentation is often not that great or always clear.

This method is something that I personally treat as a black box and change parameters until I get something that is desirable.

Canny operation

We can now clearly see the edges of our dice. Now we have everything in place to search for the contours of our dice!

We will be using the findContours method that is suited for finding them in binary (black and white) images.

C++

std::vector<std::vector<cv::Point> > diceContours;
std::vector<cv::Vec4i> diceHierarchy;

cv::findContours( frame,
                  diceContours, diceHierarchy,
                  CV_RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE
);

First, we start with defining two data structures and pass them to our findContour method.

Next, we set the mode to external so it only retrieves the external contours. The parameter that follows defines the approximation method we want to use and is set to the simple version. Here it’s also a case of trial and error.

C++

// iterate over dice contours
for(int i=0; i < diceContours.size(); i++){

  // get contour area
  double diceContourArea = cv::contourArea(diceContours[i]);

  // filter contours based on our dice size
  if (diceContourArea > 2000 && diceContourArea < 3500){

    // get bounding rectangle
    cv::Rect diceBoundsRect = cv::boundingRect( cv::Mat(diceContours[i]) );

    // set dice roi
    cv::Mat diceROI = frame(diceBoundsRect);

    // count number of pips
    int numberOfPips = countPips(diceROI);

  }

}

Afterwards, we iterate over the list of contours it has found and calculate the contour area using the contourArea method. We can use this to filter out everything that doesn’t fit the general size of the dices we are working with.

Then we calculate the smallest enclosing (bounding) box our contour can fit in.

Bounding box

We use the coordinates of this rectangle to create a region of interest and pass that to our method that will count the pips.

Counting the pips

The first thing we do is resize our image to a width and height of 150 pixels. This softens the image a bit which helps when we need to detect blobs. We also discard any colour information as it’s not of any use.

C++

// resize
cv::resize(dice, dice, cv::Size(150, 150));

// convert to grayscale
cvtColor(dice, dice, CV_BGR2GRAY);

Dice grayscale

This an ideal situation (white on black) to apply a threshold like we did when detecting our dice contours.

C++

// threshold
cv::threshold(dice, dice, 150, 255, cv::THRESH_BINARY | CV_THRESH_OTSU );

Dice threshold

In this case, we can use the same values and set the threshold to 150. Everything that exceeds it will be set to 255.

C++

// floodfill
cv::floodFill(dice, cv::Point(0,0), cv::Scalar(255));
cv::floodFill(dice, cv::Point(0,149), cv::Scalar(255));
cv::floodFill(dice, cv::Point(149,0), cv::Scalar(255));
cv::floodFill(dice, cv::Point(149,149), cv::Scalar(255));

We apply a flood fill algorithm to the 4 corners (using white color), so the pips stand out more.

Dice floodfill

What we need to do now is count them using something called a blob detector. Blob detection is detecting regions in an image that differs in properties such as colours and brightness. We can also filter blobs based on their sizes or shapes.

Here it’s clear as day that the pips differ a lot in colour, so this is a perfect case to use blob detection. We will also filter blobs that have a certain inertia ratio.

Inertia

Inertia ratio is a mathematical term to describe how elongated or round a shape is. A perfect circle has an inertia ratio of 1. For an ellipse this falls between 0 < and < 1.

Inertia animation

The beauty of this approach is that when side pips are visible, their inertia ratio will be lower than the top ones. That makes it easy to detect and discard them!

C++

// search for blobs
cv::SimpleBlobDetector::Params params;

// filter by interia defines how elongated a shape is.
params.filterByInertia = true;
params.minInertiaRatio = 0.5;

// will hold our keyponts
std::vector<cv::KeyPoint> keypoints;

// create new blob detector with our parameters
cv::Ptr<cv::SimpleBlobDetector> blobDetector = cv::SimpleBlobDetector::create(params);

// detect blobs
blobDetector->detect(dice, keypoints);

// number of pips
std::cout << "Number of pips :" << keypoints.size() << std::endl;

To detect blobs we use the SimpleBlobDetector class that is relatively easy to use. The only hard part is to define our parameters.

We indicate that we like to filter using inertia and the minimum inertia ratio that our blobs need to have is set to 0,5. Pips will never be “drawn” perfectly (because of camera distortion) so we need to set the ratio to a lower value than 1.

After we defined and passed our parameters, we can try to detect our blobs and store the results in our keypoints vector. We then use the size method to get the number of blobs (pips) it detected.

If you would draw the number of pips detected and the bounding rectangle of the dice, the output could look like the following screenshot

Output

Improvements

The version described above is a simpler version than the one that I have developed for my YahtDice game.

So there are certainly improvements possible. The biggest one is using something called a watershed algorithm before detecting the dice contours.

What it solves, is the problem that sometimes occurs when dices are touching each other and that at a certain angle. Otherwise, it will see it as one big contour. And when we then apply our area filter, it will discard it.

Source code

The source code for the simple version can be found on my Github account.