A: How to rewrite my code to work at high efficiency?
source link: https://www.codesd.com/item/a-how-to-rewrite-my-code-to-work-at-high-efficiency.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
A: How to rewrite my code to work at high efficiency?
I have a matrix (named rating) with dim n x 140000 and another matrix (named trust) with dim nxn where n varying when I change the group and n might have value from 1-15000. I need to multiply each column of rating by trust. for example:
trust= rating=
a1 a2 a3 a4 a5 1 2 3 4 5 6 7 8
b1 b2 b3 b4 b5 2 5 7 8 9 2 1 6
c1 c2 c3 c4 c5 3 5 3 6 8 1 2 5
d1 d2 d3 d4 d5 4 7 8 2 4 5 6 7
e1 e2 e3 e4 e5 5 2 5 7 8 9 1 4
answer1= answer2=
a1.1 a2.2 a3.3 a4.4 a5.5 a1.2 a2.5 a3.5 a4.7 a5.2
b1.1 b2.2 b3.3 b4.4 b5.5 b1.2 b2.5 b3.5 b4.7 b5.2
c1.1 c2.2 c3.3 c4.4 c5.5 c1.2 c2.5 c3.5 c4.7 c5.2
d1.1 d2.2 d3.3 d4.4 d5.5 d1.2 d2.5 d3.5 d4.7 d5.2
e1.1 e2.2 e3.3 e4.4 e5.5 e1.2 e2.5 e3.5 e4.7 e5.2
and answer3 must multiply by 3rd column and so on. Then add each rows of answer1, answer2, ... and store into a vector. Then store each vector into a list for future use.
for (k in 1:ncol(rating)) {
clmy <- as.matrix(rating[, k])
answer <- sweep(trust, MARGIN = 2, clmy, '*')
sumtrustbyrating <- rowSums(answer)
LstsumRbyT[[k]] <- sumtrustbyrating
sumtrustbyrating = NULL
}
It is working perfectly if I change the ncol(rating)
to a small value (about 100). But for the actual data, I have 140000 columns. It takes time and I couldn't get the final execution result. Please help me to enhance the performance of my code for a huge data set.
How about a matrix product? Or is that too slow?
rating <- matrix(c(1, 2, 3, 4, 5,2, 5, 5, 6, 3, 3, 4, 1, 2, 1), ncol=3)
trust <- matrix(rep(1:5, rep(5, 1)), 5, byrow=TRUE)
Running your code above yields
LstsumRbyT
[[1]]
[1] 55 55 55 55 55
[[2]]
[1] 66 66 66 66 66
[[3]]
[1] 27 27 27 27 27
which is the same as
trust %*% rating
[,1] [,2] [,3]
[1,] 55 66 27
[2,] 55 66 27
[3,] 55 66 27
[4,] 55 66 27
[5,] 55 66 27
If this isn't enough then this could be improved a bit in RCppArmadillo I guess.
To add to the benchmarking discussion. If your for loop above is renamed f()
then I get
microbenchmark(trust %*% rating, f())
Unit: microseconds
expr min lq mean median uq max neval cld
trust %*% rating 1.418 1.7010 2.97663 2.7215 3.5965 14.452 100 a
f() 593.890 700.9775 764.00515 766.5535 792.6375 1511.104 100 b
which is quite a substantial speedup with the normal matrix product.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK