Numpy: Heart of scientific computing in Python | Sanrusha
source link: https://medium.com/sanrusha-consultancy/bodacious-world-of-numpy-95c394d55d99
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Responses
There are currently no responses for this story.
Be the first to respond.
You have 2 free member-only stories left this month.
Heart of scientific computing in Python
NumPy stands for Numerical Python.
If you aspire to become a Data Scientist or Machine Learning professional, you cannot ignore Numpy. In fact, this is the vital Python library you will have to know by heart.
I won’t go through the academic details of NumPy. You could find such details at the NumPy site https://numpy.org/
As a data scientist, I am more interested in making you aware of the practical use of various aspects of Numpy.
I hope you are using the Jupyter notebook on Anaconda because I am going to provide you link to my Jupyter notebook containing all the commands I have written for this article.
Let’s begin!
Before we take a deeper dive. Let’s develop a common understanding of the terms I am going to use in this article.
A Single value is also referred to as Scalar.
More than one value in horizontal one-dimensional array format like [1 2 3 4] or vertical array format like below
is also referred to as Vector.
A multidimensional array like [[1 2 3], [4 5 6], [7 8 9]] is also referred as Matrix.
Let’s go through the one-dimensional array
First import NumPy library
import NumPy as np
Scalar
If you just want to generate a random integer using NumPy you can do so by calling random.ranint().
The below code will print random integer less than 17.
dnpintsc=np.random.randint(17)
print(dnpintsc)
likewise random.randn() will return a random number.
dnpintsc=np.random.randn()
print(dnpintsc)
Below is the result I got for the above. Of course, it changes with the next run.
One Dimensional Array (Vector)
Let’s go through the better-known and used feature of NumPy for the one-dimensional array.
You can create a one-dimensional array just by calling the array function and passing the list to it like below.
dnpvector1=np.array([1,2,3,4,5,6,7])
print(dnpvector1)
Result:
[1 2 3 4 5 6 7]
If you already have a list defined like below, you can just pass that list array function to create NumPy array.
list1=[1,2,4,7,8,9]
dnplistarr=np.array(list1)
dnplistarr
Result:
array([1, 2, 4, 7, 8, 9])
Likewise, if you want to convert a NumPy array to a python list, you can do so by calling tolist function.
dnplistarr.tolist()
Result:
[1, 2, 4, 7, 8, 9]
append function will append new values at the end of the NumPy array.
dnpvector1=np.append(dnpvector1,10)
dnpvector1
Result:
array([ 1, 2, 3, 4, 5, 6, 7, 10])
If you want to add value at a specific position (called index) in the NumPy array, you should use the insert function.
Below code will add 100 at index 1 in np array dnpvector1
dnpvector1=np.insert(dnpvector1,1,100)
dnpvector1
Result:
array([ 1, 100, 2, 3, 4, 5, 6, 7, 10])
Note: Numpy array start position is 0. In the above case, the value at index 0 was 1, and index 1 was 2. 100 is added at index 1, and value 2 is moved to index 3.
Array values can be sorted using the sort function.
np.sort(dnpvector1)
Result:
array([ 1, 2, 3, 4, 5, 6, 7, 10, 100])
Values can be dropped/deleted from the array using the delete function.
Below code will delete value at index1 in array dnpvector1.
dnpvector1=np.delete(dnpvector1,1)
dnpvector1
Result:
array([ 1, 2, 3, 4, 5, 6, 7, 10])
A new array can be created by concatenating values from more than one array using concatenate function.
Below code is concatenating arrays dnpvector2 and dnpvector1 and creating a new array dnpvector3.
dnpvector2=np.array([101,102,103])
dnpvector3=np.concatenate((dnpvector1,dnpvector2), axis=0)
dnpvector3
Result:
array([ 1, 2, 3, 4, 5, 6, 7, 10, 101, 102, 103])
Are you wondering, how to find an index of a value? This feature is very useful when you have an array dataset with millions of records.
You can use the where function for this.
Below code will return the index of value 7 in array dnpvector3
np.where(dnpvector3==7)
Result:
(array([6], dtype=int32),)
Do you want to create a one-dimensional array of numbers 1 to 10 with interval of 2? You need to use a range function. This function will come very handy while plotting visualizations during data exploration and result in analysis.
Below code will create array dnp1 with values from 1 to 10 (10 not included) with an interval of 2.
dnp1=np.arange(1,10,2)
dnp1
Result:
array([1, 3, 5, 7, 9])
How about creating an array with all zero values? Use zero function for that.
The Below code will create a one-dimensional array containing ten zeros.
dnpzero=np.zeros(10)
print(dnpzero)
Result:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Likewise, you can create an array with the number 1, using one's function. I don’t think it needs a demonstration.
A very useful function in NumPy is linspace. It creates numbers in the range provided with the space mentioned. Let’s go through the below example.
If you want to print 20 numbers between 1 and 5 (5 included) with equal spacing, you need to run the below code.
dnplinespace=np.linspace(1,5,20)
print(dnplinespace)
Result:
[1. 1.21052632 1.42105263 1.63157895 1.84210526 2.05263158
2.26315789 2.47368421 2.68421053 2.89473684 3.10526316 3.31578947
3.52631579 3.73684211 3.94736842 4.15789474 4.36842105 4.57894737
4.78947368 5. ]
This function is very useful while drawing visualization.
Let’s create a one-dimensional array of sixteen random numbers using randn function.
dnprandomarray=np.random.randn(16)
print(dnprandomarray)
Result:
[-1.76865566 0.44877883 1.65279147 0.9642689 1.14989971 -0.44664962
-1.29255905 -2.04026537 0.4573302 1.24653905 1.23037704 0.02799583
1.49594113 -0.34639749 -1.41667387 1.31684675]
Maximum and minimum values of the array can be known through max and min functions.
print(dnprandomarray.max())
print(dnprandomarray.min())
Result:
1.6527914692877606
-2.0402653731995333
If you want to know the index of maximum and minimum values, you can use argmax and argmin functions.
print(dnprandomarray.argmax())
print(dnprandomarray.argmin())
Result:
2
7
How to find the size of an array? Use size function.
The Below code will return the size of dnprandomarray array.
dnprandomarray.size
Result:
16
Wondering how to find value at a specific index. Just call array[index] like below.
print(dnprandomarray[5])
Result:
[0.44877883 1.65279147 0.9642689 1.14989971]
If you want to fetch value between indexes run array[firstindexposition:lastindexposition+1]
Below code will return values at index 1,2,3 and 4 from array dnprandomarray
print(dnprandomarray[1:5])
Result:
[0.44877883 1.65279147 0.9642689 1.14989971]
Multidimensional Array (Matrix)
Very Good!
Now that you got enough exposure to the single dimensional array, let enhance that knowledge with the multidimensional array.
While a single dimensional array contains values either in one row (horizontal array) or one column (vertical array), a multidimensional array contains values in more than one row and one column. Multidimensional arrays are indicated as array[rowindexfrom:rowindexto,columnindexfrom:columnindexto]
Let’s create a 5x4 multidimensional array of random numbers and explore the world of multidimensional array.
dnprandom=np.random.rand(5,4)
print(dnprandom)
Result:
[[0.21067598 0.07644456 0.51538545 0.81564459]
[0.66354799 0.75554928 0.74104759 0.29199204]
[0.54019744 0.96360781 0.62939973 0.07646806]
[0.29973016 0.76815988 0.3176048 0.1235475 ]
[0.39714328 0.95994687 0.43036685 0.08273214]]
Let’s find the shape of this matrix using shape function.
print(dnprandom.shape)
Result:
(5, 4)
Let’s find data type of this matrix values using dtype.
print(dnprandom.dtype)
Result:
float64
Let’s print value at 4th row and 3rd column.
print(dnprandom[4][2])
print(dnprandom[4,2])
Result:
0.4303668539112995
0.4303668539112995
Notice below picture to understand how numpy numbers rows and columns and why it is returning last row and second last column value when you call 4,2
Also, note you can get value at 4th row and 2nd column by calling array[4][2] or array[4,2] . Both will return same value.
How about, if I want to create a subset of this matrix?
Below code will create subset of above matrix with values from row 1 and 2 and columns 2 and 3
print(dnprandom[1:3,2:4])
Result:
[[0.74104759 0.29199204]
[0.62939973 0.07646806]]
Not the 3 from row range 1:3 and 4 from column range 2:4 are not included in the resulting data subset.
Operations
Now that we understand NumPy arrays, let’s do some mathematical operations on them. These operations can be performed on single-dimensional as well as multidimensional arrays.
Let’s create a multidimensional array
dnpintmat=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])
print(dnpintmat)
Result:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]]
What would be the result of the below code?
print(dnpintmat>7)
It will indicate True for all the values greater than 7 and False for other values.
Result:
[[False False False]
[False False False]
[False True True]
[ True True True]
[ True True True]]
What if you want to create a subset of above array with all the values between 7 and 14.
Run below code
dnpintmat2=[(dnpintmat>7)&(dnpintmat<14)]
print(dnpintmat[dnpintmat2])
Result:
[ 8 9 10 11 12 13]
What is there is need of broadcasting? As an example, what if one wants to replace values in the 3rd row and 4th column of dnpintmat array with 100.
It can be done by directly assigning the value like below.
dnpintmat[3:4]=100
print(dnpintmat)
Result:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[100 100 100]
[ 13 14 15]]
Likewise you can add values as below
This code will add 50 to each value of matrix dnpintmat
print(dnpintmat+50)
Result:
[[ 51 52 53]
[ 54 55 56]
[ 57 58 59]
[150 150 150]
[ 63 64 65]]
Mathematical & Statistical Functions
You can directly call mathematical and statistical functions like sum, sqrt, log, exp, and std on NumPy arrays.
Below are some examples.
sum function will return a total of all the array values.
print(np.sum(dnpintmat))
Result:
387
Likewise, std will calculate the standard deviation among the array values.
print(np.std(dnpintmat))
Result:
37.31880669760668
sqrt will calculate sqrt of each value in the matrix.
print(np.sqrt(dnpintmat))
Result:
[[ 1. 1.41421356 1.73205081]
[ 2. 2.23606798 2.44948974]
[ 2.64575131 2.82842712 3. ]
[10. 10. 10. ]
[ 3.60555128 3.74165739 3.87298335]]
Congratulations!!
Now you know enough NumPy to start exploring it on data in your journey of Machine Learning and Data Science.
Link to my jupyter notebook with the above commands. ->https://github.com/srssingh/Machine-Learning/blob/master/Numpy%20for%20ML.ipynb
Reference
https://www.geeksforgeeks.org/python-numpy/
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK