4

Numpy: Heart of scientific computing in Python | Sanrusha

 2 years ago
source link: https://medium.com/sanrusha-consultancy/bodacious-world-of-numpy-95c394d55d99
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Responses

There are currently no responses for this story.

Be the first to respond.

You have 2 free member-only stories left this month.

Heart of scientific computing in Python

0*L4pmaKJgmvoQX1xM
Photo by Mathyas Kurmann on Unsplash

NumPy stands for Numerical Python.

If you aspire to become a Data Scientist or Machine Learning professional, you cannot ignore Numpy. In fact, this is the vital Python library you will have to know by heart.

I won’t go through the academic details of NumPy. You could find such details at the NumPy site https://numpy.org/

As a data scientist, I am more interested in making you aware of the practical use of various aspects of Numpy.

I hope you are using the Jupyter notebook on Anaconda because I am going to provide you link to my Jupyter notebook containing all the commands I have written for this article.

Let’s begin!

Before we take a deeper dive. Let’s develop a common understanding of the terms I am going to use in this article.

A Single value is also referred to as Scalar.

More than one value in horizontal one-dimensional array format like [1 2 3 4] or vertical array format like below

is also referred to as Vector.

A multidimensional array like [[1 2 3], [4 5 6], [7 8 9]] is also referred as Matrix.

Let’s go through the one-dimensional array

First import NumPy library

import NumPy as np

Scalar

If you just want to generate a random integer using NumPy you can do so by calling random.ranint().

The below code will print random integer less than 17.

dnpintsc=np.random.randint(17)
print(dnpintsc)

likewise random.randn() will return a random number.

dnpintsc=np.random.randn()
print(dnpintsc)

Below is the result I got for the above. Of course, it changes with the next run.

1*M3DGGO19DJVCbzFgGhD9YQ.png?q=20
null

One Dimensional Array (Vector)

Let’s go through the better-known and used feature of NumPy for the one-dimensional array.

You can create a one-dimensional array just by calling the array function and passing the list to it like below.

dnpvector1=np.array([1,2,3,4,5,6,7])
print(dnpvector1)

Result:

[1 2 3 4 5 6 7]

If you already have a list defined like below, you can just pass that list array function to create NumPy array.

list1=[1,2,4,7,8,9]
dnplistarr=np.array(list1)
dnplistarr

Result:

array([1, 2, 4, 7, 8, 9])

Likewise, if you want to convert a NumPy array to a python list, you can do so by calling tolist function.

dnplistarr.tolist()

Result:

[1, 2, 4, 7, 8, 9]

append function will append new values at the end of the NumPy array.

dnpvector1=np.append(dnpvector1,10)
dnpvector1

Result:

array([ 1,  2,  3,  4,  5,  6,  7, 10])

If you want to add value at a specific position (called index) in the NumPy array, you should use the insert function.

Below code will add 100 at index 1 in np array dnpvector1

dnpvector1=np.insert(dnpvector1,1,100)
dnpvector1

Result:

array([  1, 100,   2,   3,   4,   5,   6,   7,  10])

Note: Numpy array start position is 0. In the above case, the value at index 0 was 1, and index 1 was 2. 100 is added at index 1, and value 2 is moved to index 3.

Array values can be sorted using the sort function.

np.sort(dnpvector1)

Result:

array([  1,   2,   3,   4,   5,   6,   7,  10, 100])

Values can be dropped/deleted from the array using the delete function.

Below code will delete value at index1 in array dnpvector1.

dnpvector1=np.delete(dnpvector1,1)
dnpvector1

Result:

array([ 1,  2,  3,  4,  5,  6,  7, 10])

A new array can be created by concatenating values from more than one array using concatenate function.

Below code is concatenating arrays dnpvector2 and dnpvector1 and creating a new array dnpvector3.

dnpvector2=np.array([101,102,103])
dnpvector3=np.concatenate((dnpvector1,dnpvector2), axis=0)
dnpvector3

Result:

array([  1,   2,   3,   4,   5,   6,   7,  10, 101, 102, 103])

Are you wondering, how to find an index of a value? This feature is very useful when you have an array dataset with millions of records.

You can use the where function for this.

Below code will return the index of value 7 in array dnpvector3

np.where(dnpvector3==7)

Result:

(array([6], dtype=int32),)

Do you want to create a one-dimensional array of numbers 1 to 10 with interval of 2? You need to use a range function. This function will come very handy while plotting visualizations during data exploration and result in analysis.

Below code will create array dnp1 with values from 1 to 10 (10 not included) with an interval of 2.

dnp1=np.arange(1,10,2)
dnp1

Result:

array([1, 3, 5, 7, 9])

How about creating an array with all zero values? Use zero function for that.

The Below code will create a one-dimensional array containing ten zeros.

dnpzero=np.zeros(10)
print(dnpzero)

Result:

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Likewise, you can create an array with the number 1, using one's function. I don’t think it needs a demonstration.

A very useful function in NumPy is linspace. It creates numbers in the range provided with the space mentioned. Let’s go through the below example.

If you want to print 20 numbers between 1 and 5 (5 included) with equal spacing, you need to run the below code.

dnplinespace=np.linspace(1,5,20)
print(dnplinespace)

Result:

[1.         1.21052632 1.42105263 1.63157895 1.84210526 2.05263158
2.26315789 2.47368421 2.68421053 2.89473684 3.10526316 3.31578947
3.52631579 3.73684211 3.94736842 4.15789474 4.36842105 4.57894737
4.78947368 5. ]

This function is very useful while drawing visualization.

Let’s create a one-dimensional array of sixteen random numbers using randn function.

dnprandomarray=np.random.randn(16)
print(dnprandomarray)

Result:

[-1.76865566  0.44877883  1.65279147  0.9642689   1.14989971 -0.44664962
-1.29255905 -2.04026537 0.4573302 1.24653905 1.23037704 0.02799583
1.49594113 -0.34639749 -1.41667387 1.31684675]

Maximum and minimum values of the array can be known through max and min functions.

print(dnprandomarray.max())
print(dnprandomarray.min())

Result:

1.6527914692877606
-2.0402653731995333

If you want to know the index of maximum and minimum values, you can use argmax and argmin functions.

print(dnprandomarray.argmax())
print(dnprandomarray.argmin())

Result:

2
7

How to find the size of an array? Use size function.

The Below code will return the size of dnprandomarray array.

dnprandomarray.size

Result:

16

Wondering how to find value at a specific index. Just call array[index] like below.

print(dnprandomarray[5])

Result:

[0.44877883 1.65279147 0.9642689  1.14989971]

If you want to fetch value between indexes run array[firstindexposition:lastindexposition+1]

Below code will return values at index 1,2,3 and 4 from array dnprandomarray

print(dnprandomarray[1:5])

Result:

[0.44877883 1.65279147 0.9642689  1.14989971]

Multidimensional Array (Matrix)

Very Good!

Now that you got enough exposure to the single dimensional array, let enhance that knowledge with the multidimensional array.

While a single dimensional array contains values either in one row (horizontal array) or one column (vertical array), a multidimensional array contains values in more than one row and one column. Multidimensional arrays are indicated as array[rowindexfrom:rowindexto,columnindexfrom:columnindexto]

Let’s create a 5x4 multidimensional array of random numbers and explore the world of multidimensional array.

dnprandom=np.random.rand(5,4)
print(dnprandom)

Result:

[[0.21067598 0.07644456 0.51538545 0.81564459]
[0.66354799 0.75554928 0.74104759 0.29199204]
[0.54019744 0.96360781 0.62939973 0.07646806]
[0.29973016 0.76815988 0.3176048 0.1235475 ]
[0.39714328 0.95994687 0.43036685 0.08273214]]

Let’s find the shape of this matrix using shape function.

print(dnprandom.shape)

Result:

(5, 4)

Let’s find data type of this matrix values using dtype.

print(dnprandom.dtype)

Result:

float64

Let’s print value at 4th row and 3rd column.

print(dnprandom[4][2])
print(dnprandom[4,2])

Result:

0.4303668539112995
0.4303668539112995

Notice below picture to understand how numpy numbers rows and columns and why it is returning last row and second last column value when you call 4,2

1*Lsr6qAebMPCr7FfPImvrsw.png?q=20
null

Also, note you can get value at 4th row and 2nd column by calling array[4][2] or array[4,2] . Both will return same value.

How about, if I want to create a subset of this matrix?

Below code will create subset of above matrix with values from row 1 and 2 and columns 2 and 3

print(dnprandom[1:3,2:4])

Result:

[[0.74104759 0.29199204]
[0.62939973 0.07646806]]
1*TBUKrH1iai1F3g1tQARiMw.png?q=20
null

Not the 3 from row range 1:3 and 4 from column range 2:4 are not included in the resulting data subset.

Operations

Now that we understand NumPy arrays, let’s do some mathematical operations on them. These operations can be performed on single-dimensional as well as multidimensional arrays.

Let’s create a multidimensional array

dnpintmat=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])
print(dnpintmat)

Result:

[[ 1  2  3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]]

What would be the result of the below code?

print(dnpintmat>7)

It will indicate True for all the values greater than 7 and False for other values.

Result:

[[False False False]
[False False False]
[False True True]
[ True True True]
[ True True True]]

What if you want to create a subset of above array with all the values between 7 and 14.

Run below code

dnpintmat2=[(dnpintmat>7)&(dnpintmat<14)]
print(dnpintmat[dnpintmat2])

Result:

[ 8  9 10 11 12 13]

What is there is need of broadcasting? As an example, what if one wants to replace values in the 3rd row and 4th column of dnpintmat array with 100.

It can be done by directly assigning the value like below.

dnpintmat[3:4]=100
print(dnpintmat)

Result:

[[  1   2   3]
[ 4 5 6]
[ 7 8 9]
[100 100 100]
[ 13 14 15]]

Likewise you can add values as below

This code will add 50 to each value of matrix dnpintmat

print(dnpintmat+50)

Result:

[[ 51  52  53]
[ 54 55 56]
[ 57 58 59]
[150 150 150]
[ 63 64 65]]

Mathematical & Statistical Functions

You can directly call mathematical and statistical functions like sum, sqrt, log, exp, and std on NumPy arrays.

Below are some examples.

sum function will return a total of all the array values.

print(np.sum(dnpintmat))

Result:

387

Likewise, std will calculate the standard deviation among the array values.

print(np.std(dnpintmat))

Result:

37.31880669760668

sqrt will calculate sqrt of each value in the matrix.

print(np.sqrt(dnpintmat))

Result:

[[ 1.          1.41421356  1.73205081]
[ 2. 2.23606798 2.44948974]
[ 2.64575131 2.82842712 3. ]
[10. 10. 10. ]
[ 3.60555128 3.74165739 3.87298335]]

Congratulations!!

Now you know enough NumPy to start exploring it on data in your journey of Machine Learning and Data Science.

Link to my jupyter notebook with the above commands. ->https://github.com/srssingh/Machine-Learning/blob/master/Numpy%20for%20ML.ipynb

Reference

https://www.geeksforgeeks.org/python-numpy/

Machine Learning Hands-on


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK