Thursday, June 7, 2012

Fast Array Manipulation in Numpy


This post is to explain how fast array manipulation can be done in Numpy. Since we are dealing with images in OpenCV, which are loaded as Numpy arrays, we are dealing with a little big arrays. So we need highly efficient method for fast iteration across this array.

For example, consider an image of size 500x500. If we want to access all the pixels, this itself becomes 250000 calculations. To deal with this, Numpy has got some pretty cool methods. I will explain two of them here, which I know.

For this, I take an example case: You have a 500x500 numpy array of random integers between 0 and 5, ie only 0,1,2,3,4 (just consider you got it as a result of some calculations). These integers actually correspond to different colors like below:

0 ---> Green, [0,255,0]
1 ---> Blue, [255,0,0] // Note that according to OpenCV standards, it is BGR, not RGB
2 ---> Red , [0,0,255]
3 ---> White, [255,255,255]
4 ---> Black, [0,0,0]

So you want to create another 500x500x3 array (or a color image) where integers in x is replaced by corresponding color value.

First of all we deal with our normal method, which is direct indexing method.

What we normally do? Yes, a double loop.

Method 1 : Direct element access
for i in x.rows:
    for j in x.cols:
        check what value at x[i,j]
        put corresponding color in y[i,j]
So that is given below:

First create necessary data, input array 'x', output array 'y', colors etc.

import numpy as np
import time

x = np.random.randint(0,5,(500,500))

green = [0,255,0]
blue = [255,0,0]
red = [0,0,255]
white = [255,255,255]
black = [0,0,0]

rows,cols = x.shape

y = np.zeros((rows,cols,3),np.uint8)  # for output

Now enter the loop:

for i in xrange(rows):
    for j in xrange(cols):
        k = x[i,j]

        if k==0:
            y[i,j] = green

        elif k==1:
            y[i,j] = blue

        elif k==2:
            y[i,j] = red

        elif k==3:
            y[i,j] = white

            y[i,j] = black

It took about 40-50 seconds to finish the work (I am considering only the loop, and the time depends on the system configuration. So better check at the comparison of results).

Method 2 : Using item() and itemsize()

We normally use k = x[i,j] or x[i,j] = k to read or write an array element. It is very simple, good for large arrays at a single step.

But this style is not at all good for cases like above, where, out of 250000 elements, select each one and modify each one separately. For that, Numpy has got a method to use, ie x.item() to access an element and x.itemset() to write an element. They are much faster than direct accessing. So next we implement our problem using these features ( Only loop portion is given, all others are same):

for i in xrange(rows):
    for j in xrange(cols):
        k = x.item(i,j)

        if k==0:            

 elif k==1:

        elif k==2:

        elif k==3:

(Don't be disappointed at the length of code, you will be happy when you see the performance.)

This method took nearly 5 seconds to complete the task. On my calculations, it is around 9-10x faster than the previous method. And that is good result, although length of code is a little problem.

But wait, there is a third method, called palette method.

Method 3 : Palette method

Here, there is no loop. Just three lines of code:

color = [green,blue,red,white,black]
color = np.array(color,np.uint8)
y = color[x]

Finished. See, you can considerably reduce the size of code a lot. And what about performance ? It took less than 0.2 seconds. Just compare the results:

Compared to first method, it is around 350x faster.
Compared to second method, it is around 30-40x faster.

Isn't it good, Reducing the code size to 3 lines, while speeding up the method by more than 300 times? (Truly saying, even I was shocked seeing the performance. I knew it would increase the speed, but never thought this much).

So, to understand what palette methods does and how to utilize it in image processing, we take another experiment with small sample of size 3x3.

Fist take an array of size 3x3 and elements includes only digits (0-9):

>>> a = np.random.randint(0,10,(3,3))
>>> a
array([[9, 8, 4],
       [9, 0, 8],
       [6, 6, 3]])

Next we make another array 'b'. ( You can consider it as the color array).

What should be its rows? It depends on how many color you need. In this example, 'a' has only 9 type of elements (ie digits from 0 to 9) and each corresponds to a color. So we need 9 rows here.

And how many columns ? Are you going for RGB color? Then let there be 3 columns. Or grayscale intensity? Then only one column is sufficient. Here, I take grayscale, so single column,or just an 1-dimensional array.

>>> b = np.random.randint(0,255,10)
>>> b
array([ 97, 177, 237,  29,  51, 230,  92, 198,   6,   7])

See, b[9] = 7. That exactly is happening in palette method. When you type b[a], it actually implies b[i for i in a], ie it takes each element of 'a' and subtitute for 'a' in b[a].

So what ? In our case, when we give c = b[a], it means, c[0,0] = b[ a[0,0] ], ie c[0,0] = b[9] = 7, since a[0,0]=9.
Similarly c[0,1] = b[ a[0,1] ]  ==> c[0,1] = b[8] = 6, and so on. So final result is as follows:

>>> c = b[a]
>>> c
array([[ 7,  6, 51],
       [ 7, 97,  6],
       [92, 92, 29]])

ie, replace every element in 'a' with element in 'b', of which index is decided by the value in 'a'.

Now we need a practical example from image processing. Best example is the PointPolygonTest in OpenCV. First, learn and understand the PointPolygonTest code.

That code, on running, took a minimum of 4.084 seconds (out of 10 runs). Now I removed the part under line 39 in that code and added code as follows, which is a implementation of palette method:

First rounded the values of 'res' to nearest integer.
res = np.int0(np.around(res))
Later, found minimum value in it and multiplied it with 255. Same with maximum also. They are to be used in calculation of color.

mini = res.min()
minie = 255.0/mini

maxi = res.max()
maxie = 255.0/maxi

Now create the image to draw the output. Remember, rows = maximum distance - minimum distance + 1 & columns = 3, for RGB values.
drawing = np.zeros((maxi-mini+1,3),np.uint8)

Now we add minimum distance to the 'res'. It is because, some values in 'res' are negative (distance to point outside contour). So when we apply palette method, negative values will be taken as indices which are not allowed. For that, we add minimum value to all elements in 'res', so that, in new 'res', minimum value is 0.

res = res+abs(mini)

Next part we define the color. For that, we need a single loop, which iterates all the values between res.minimum(mini) and res.maximum(maxi). So, instead of iterating over 160000 values in original method, we just iterate over only less than 300 values (in this case, maxi-mini ≈ ≈ 300). Then coloring scheme is same as in previous method.

for h,i in enumerate(xrange(mini,maxi+1)):
    if i<0:
    elif i>0:

Now finally apply the palette.

d = drawing[res]

This method took a maximum time of 0.08 seconds (out of 10 runs). That means, it increases the speed by more than 50X. That is a good improvement.

Finally, in this case, although both output look similar, they are not identical. There may be small variations due to rounding off. But it is just a shift of only one pixel and won't be a problem. Look at the results below:

Results in palette method.
Result in normal method

See any difference between both results ? ( If any, it will be negligible compared to performance)

Hope, you enjoyed it. Let me have your feedback.



  1. Hey Abid, Great post, I really enjoyed reading it! Here's my take on speeding up NumPy:

    1. Thank you. If I am right, you are the one who wrote the face recognition samples for OpenCV. right? Always wanted to say you,that was a great work. Nice to meet you finally.

  2. Hi! I have made a binary copy of my image with white objects and black background. I want to copy all those white pixels over to the source (color) image. Can I do that operation using palette?

    I currently do it by first getting all white pixels (pp = np.transpose ...) and then a for loop over the points in pp that takes that pixel from the binary image and write to the color image. It works but is very slow on a large image.