How does the order of the combination of numpy affect the speed of multiplicatio...
source link: https://www.codesd.com/item/how-does-the-order-of-the-combination-of-numpy-affect-the-speed-of-multiplication.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
How does the order of the combination of numpy affect the speed of multiplication?
How order of numpy array influence on multiplication speed? And how I can auto choose it depending on size of matrices?
The question initially comes from code using cudamat:
def test_mat():
#need to init cublas?
# cm.cublas_init()
n = 1024
for i in xrange(1,20): # 2^15 max or python fails
m= 2
m=m**i
# print m
print i
try:
t0= time.time()
# cpum1 = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='C')
# cpum2 = np.array(np.random.rand(m, 1)*10, dtype=np.float32, order='C')
#CUDA need fortran order of array for speed?
cpum1 = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='F')
cpum2 = np.array(np.random.rand(m, 1)*10, dtype=np.float32, order='F')
c = np.dot(cpum2.T, cpum1.T)
print (time.time()-t0)
t0= time.time()
gpum1 = cm.CUDAMatrix(cpum1)
gpum2 = cm.CUDAMatrix(cpum2)
gm = cm.dot(gpum2.T, gpum1.T)
gm.copy_to_host()
print (time.time()-t0)
except:
pass
# cm.cublas_shutdown()
print 'done'
here is some tests I have done, but I need some theoretical point of view.
def test_order(m,n):
#default
a = np.array(np.random.rand(m, n)*10, dtype=np.float32)
b = np.array(np.random.rand(n, m)*10, dtype=np.float32)
t0= time.time()
c = np.dot(a,b)
print (time.time()-t0)
#1
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='C')
b = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='C')
t0= time.time()
c = np.dot(a,b)
print (time.time()-t0)
#2
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='C')
b = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='F')
t0= time.time()
c = np.dot(a,b)
print (time.time()-t0)
#3
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
b = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='C')
t0= time.time()
c = np.dot(a,b)
print (time.time()-t0)
#4
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
b = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='F')
t0= time.time()
c = np.dot(a,b)
print (time.time()-t0)
print 'done'
m= 1024*10
n= 1024*1
7.125
7.14100003242
6.95299983025
8.14100003242
7.15600013733
m= 1024*1
n= 1024*10
0.718999862671
0.734000205994
0.641000032425
0.656000137329
0.655999898911
Here is the code testing peak memory usage:
import numpy as np
import time
from memory_profiler import profile
@profile
def test_order_():
m= 1024*1
n= 1024*10
#what used by default when c= np.dot(a,b)
c = np.array(np.zeros((m, m)), dtype=np.float32, order='C')
#c = np.array(np.zeros((m, m)), dtype=np.float32, order='F')
#1
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='C')
b = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='C')
t0= time.time()
c[:]= np.dot(a,b)
# np.dot(a,b,out= c) # only for C-Array !
print (time.time()-t0)
del a
del b
# del c
#2
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='C')
b = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='F')
t0= time.time()
c[:]= np.dot(a,b)
# np.dot(a,b,out= c) # only for C-Array !
print (time.time()-t0)
del a
del b
# del c
#3
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
b = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='C')
t0= time.time()
c[:]= np.dot(a,b)
# np.dot(a,b,out= c) # only for C-Array !
print (time.time()-t0)
del a
del b
# del c
#4
a = np.array(np.random.rand(m, n)*10, dtype=np.float32, order='F')
b = np.array(np.random.rand(n, m)*10, dtype=np.float32, order='F')
t0= time.time()
c[:]= np.dot(a,b)
# np.dot(a,b,out= c) # only for C-Array !
print (time.time()-t0)
del a
del b
# del c
print 'done'
if __name__ == '__main__':
test_order_()
Also found some info about numpy.dot copy and fast_dot
The internal workings of dot are a little obscure, as it tries to use BLAS optimized routines, which sometimes require copies of arrays to be in Fortran order
Also some performance tips it's strange but I can't reproduce results each time I run example.(Maybe before reruns some data chaches?)
The performance depends on the underlying linear algebra library that you have.
# ORDER C-C
In [6]: %timeit a.dot(b)
10 loops, best of 3: 87.6 ms per loop
# ORDER C-F
In [8]: %timeit a.dot(b)
10 loops, best of 3: 87.8 ms per loop
# ORDER F-C
In [10]: %timeit a.dot(b)
10 loops, best of 3: 90.1 ms per loop
# ORDER F-F
In [12]: %timeit a.dot(b)
10 loops, best of 3: 90 ms per loop
I am using ATLAS compiled on this machine with SSE3, as seen by np.show_config()
. Re-running the computations show that there is no statistical difference between the two. And indeed, there is no difference because the library is doing a copy of the array before performing the product. Said copy takes 650 µs (including Python overhead), that is below the times you have. As the matrices grow, the dot product dominates, and you don't see the copy effect. If you use smaller matrices, the Python overhead masks any effect.
You can see the copies actually happening if you monitor the memory and use very large arrays.
Recommend
-
11
How does Virtual Reality affect our society, emotionally and ethically?As designers and engineers, we have to be aware that the risk of users suffering psychological trauma will steadily increase as V.R. technol...
-
5
What Is the Google Knowledge Graph (and Does It Affect SEO)?Michelle OfiweMar 31, 202110 min readThe Google Knowledge Graph is an enormous...
-
10
Does space affect the taste of wine and other tech newsDoes space affect the taste of wine and other tech newsChris Fox looks at some of the best technology news s...
-
4
-
4
News Does the motion of the solar system affect the microwave sky? 24 November 2004 ...
-
8
How does the mix order affect the derived class? advertisements Say, I have the following mixins that overlaps with each other by touching di...
-
6
Sort a NumPy Array in descending order in Python In this article we will learn how to sort a NumPy Array in descending order. Table Of Contents Given a NumPy Array we need to to...
-
2
This tutorial will discuss about unique ways to check if a numpy array is sorted in descending order. Table Of Contents Method 1: Using numpy.diff() We can use the numpy....
-
4
This tutorial will discuss about unique ways to check if a numpy array is sorted in ascending order. Table Of Contents Method 1: Using numpy.diff() We can use the numpy.d...
-
6
Understanding CPUs can help speed up Numba and NumPy code by Itamar Turner-TrauringLast updated 08 Jun 2023, originally created 07 Jun 2023 When you need to spe...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK