43

Numpys indexing and slicing notation explained

 6 years ago
source link: https://www.tuicool.com/articles/hit/vquueub
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Recently I wrote a simple subroutine that iterates over chunks in an 2D array. But what is very easy described, turned out to be a little bit harder to put into code. Reason for that is python’s/numpy’s very “information dense” array indexing notation, which makes it more or less confusing to read and to write. So I have gone through the trouble of creating these graphics and explanations (mainly to grok it myself) so you can always look here if you get confused.

But first let me explain what happens when you use pythons index notation on any object. (For the remainder of this post, I’ll be using python3 if you’re using older versions you’re on your own)

You should be encouraged to try these examples out for yourself in the REPL.

Python square bracket overloading

In python the square brackets can be overloaded for any class/type that you define, which is how numpys np.ndarray does it.

Heres an example for you:

>>> class MyClass:
... def __getitem__(self, key):
... print(key)
...
>>> myobj = MyClass()
>>> myobj[3]
3
>>> myobj[1,2,3]
(1,2,3)
>>> myobj[1,None,'mukduk']
(1, None, 'mukduk')
>>>

Pretty easy right ? As a developer your mind is probably already racing thinking of all the different ways you could abuse this operator.

But remember how you also want to slice your arrays ? What happens if we put a slice into there ? To reiterate, a slice is special kind of index notation where you can specify, that you want to access more than index from the array.

The visualization to python’s indexing for one-dimensional lists/arrays is the following:

The array used for the visualization below is simply list(range(10))

yUFFJza.png!web

The slice notation (which I’m going to revisit later) lets us define the start , stop and stepsize.

An6nAf2.png!web
The end-slice marks the border, so your slice will not contain that element. Watch our for “off-by-one” errors
FBV3eeZ.png!web
Here the step is shown,

So this gives us the following:

>>> a = list(range(10))
>>> a[1:5]
[1, 2, 3, 4]
>>> a[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>>> a[1:5:1]
[1, 2, 3, 4]
>>> a[1:5]
[1, 2, 3, 4]

Well, if it works in pure python, then there is no reason that it shouldn’t work in numpy:

>>> b = np.arange(10)
>>> b
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b[2:8]
array([2, 3, 4, 5, 6, 7])
[...]

Now lets revisit how slices are handled with the square-bracket-operator/ __getitem__(self,key) :

>>> myobj[1:1:1]
slice(1, 2, 3)
>>> myobj[1:None:3]
slice(1, None, 3)
>>> myobj[1::3] #empty 'slots' are filled with None
slice(1, None, 3)
>>> myobj[::]
slice(None, None, None)

The slice object just wraps three values (start,stop,step) into a wrapper object.

Lets visit how python sees the indices with slices before we move on:

>>> myobj[::]
slice(None, None, None)
>>> myobj[::,1]
(slice(None, None, None), 1)
>>> myobj[:,1]
(slice(None, None, None), 1) #note that [::,]and [:,] are the same
>>> myobj[:,:,1,]
(slice(None, None, None), slice(None, None, None), 1)
>>> myobj[:1,1]
(slice(None, 1, None), 1)
>>> myobj[:1,2]
(slice(None, 1, None), 2)
>>> myobj[:1:,2]
(slice(None, 1, None), 2)
>>> myobj[:1:2,2]
(slice(None, 1, 2), 2)
>>>

You should pay attention and undestand that [::] and [:] mean the same thing as you can see above.

Numpy array shapes

Numpy arrays have shapes. Shapes are a tuple of values that give information about the dimension of the numpy array and the length of those dimensions.

The length of the dimension tuple gives the dimension of the array the values stored in the tuple give the size of that dimension. A three dimensional array would have shape (x,y,z). Example an RGB image can be represented with a three dimensional array. An image of width a, height b, would have shape (a,b,3) .

Its easier to show than to put into words.

ZFnAbuz.png!web
A 4x5 RGB image with one 4x5 array per channel.

Here are some further examples:

>>> a = np.arange(0,20) 
>>> a.shape
(20,)
>>> np.array([]).shape #shape of empty array
(0,)
>>> np.ones((0,0))
array([], shape=(0, 0), dtype=float64)
>>> np.ones((0,0,0,0,0))
array([], shape=(0, 0, 0, 0, 0), dtype=float64)
>>> np.ones((4,2,3))
array([[[1., 1., 1.],
        [1., 1., 1.]],
[[1., 1., 1.],
        [1., 1., 1.]],
[[1., 1., 1.],
        [1., 1., 1.]],
[[1., 1., 1.],
        [1., 1., 1.]]])
>>>

As an exercise you should practice in the REPL how np.zeros((20,1)) np.zeros((20,0)) , np.zeros((20,20)) , np.zeros((1,20)) will look like.

Multidimensional Slicing in numpy

So by now we’ve figured out how to slice a 1D array in numpy and vanilla python.

But what about two or more dimensions ? Its pretty much straight forward, the intuition is that you slice each dimension. That sentence will make sense once you see how to slice a 2D array.

First some code:

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
>>> _[2:4,1:3]
array([[13, 14],
       [19, 20]])

The first dimension of a np.array is always the outer array, the second dimension the entries that are nested in that outer array and so forth for higher dimensions. So, the first slice slices on the first dimension, the next slice slices on the second dimension and so forth. If you dont want to slice anything in a particular dimension, you just leave  : there. For example: If we dont want to slice anything in a 3 dimsensional array that would be [:,:,:] or as shown earlier [::,::,::] .

In the following picture I’m slicing np.arange(20).reshape((4,5))[2:4,1:3] :

AJzYRnB.png!web
Figure 5): 1.) first slice on the first axis 2.) second slice on the remaining axis

If you only want to access a single value you can just put the selection indices into the square brackets separated by a comma like so a[1,2,3,4,5,6] .

From the illustration in Figure 5 the value at a[1,2] would be: 7.

You should try to extract the green channel from an RGB image as an exercise. (Use skimage.io.imread to read an image file into a 3D numpy array)

Advanced Indexing

I personally think Advanced indexing should be used at all because it’s not very intuitive (IMO) and existing explanations are mediocre at best. I’ll give my best but here be dragons.

Advanced Indexing allows us to select a series of values rather than slicing our desired values.

It is important to mention that you have to put a list or an array or indices into the selection, if you put your values in to the selection without the extra square brackets you will trigger basic indexing, as mentioned in the previous sections.

b = np.arange(30)
a = [1,24,5,6,7]
x = b[a]
y = b[[1,24,5,6,7]] #extra brackets are important
>>> x == y
True

Now lets see what numpy does with multi dimensional adanved indexing, for each additional dimension of our array we have to supply indices which will be indexing their dimension. The values for each dimension have to be the same length as for the other dimensions.

>>> a
array([[100,  99,  98,  97,  96,  95,  94,  93,  92,  91],
       [ 90,  89,  88,  87,  86,  85,  84,  83,  82,  81],
       [ 80,  79,  78,  77,  76,  75,  74,  73,  72,  71],
       [ 70,  69,  68,  67,  66,  65,  64,  63,  62,  61],
       [ 60,  59,  58,  57,  56,  55,  54,  53,  52,  51],
       [ 50,  49,  48,  47,  46,  45,  44,  43,  42,  41],
       [ 40,  39,  38,  37,  36,  35,  34,  33,  32,  31],
       [ 30,  29,  28,  27,  26,  25,  24,  23,  22,  21],
       [ 20,  19,  18,  17,  16,  15,  14,  13,  12,  11],
       [ 10,   9,   8,   7,   6,   5,   4,   3,   2,   1]])
>>> a[ [0,1,2,3,4,5,6,7,8,9], [9,8,7,6,5,4,3,2,1,0] ]
array([91, 82, 73, 64, 55, 46, 37, 28, 19, 10])
>>> a[ [0,1,2,3,4,5,6,7,8,9], [0] ] # broadcasting
array([100,  90,  80,  70,  60,  50,  40,  30,  20,  10])
>>>

As shown above it is also possible to broadcast a value ( or values in a higher dimensional case) if you have repeating indices. Which means that you can apply one index (in a multi dimensional sense) to all other indices.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK