8

A: How to rewrite my code to work at high efficiency?

 2 years ago
source link: https://www.codesd.com/item/a-how-to-rewrite-my-code-to-work-at-high-efficiency.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

A: How to rewrite my code to work at high efficiency?

advertisements

I have a matrix (named rating) with dim n x 140000 and another matrix (named trust) with dim nxn where n varying when I change the group and n might have value from 1-15000. I need to multiply each column of rating by trust. for example:

trust=                         rating=
a1 a2 a3 a4 a5                 1 2 3 4 5 6 7 8
b1 b2 b3 b4 b5                 2 5 7 8 9 2 1 6
c1 c2 c3 c4 c5                 3 5 3 6 8 1 2 5
d1 d2 d3 d4 d5                 4 7 8 2 4 5 6 7
e1 e2 e3 e4 e5                 5 2 5 7 8 9 1 4

answer1=                       answer2=
a1.1 a2.2 a3.3 a4.4 a5.5       a1.2 a2.5 a3.5 a4.7 a5.2
b1.1 b2.2 b3.3 b4.4 b5.5       b1.2 b2.5 b3.5 b4.7 b5.2
c1.1 c2.2 c3.3 c4.4 c5.5       c1.2 c2.5 c3.5 c4.7 c5.2
d1.1 d2.2 d3.3 d4.4 d5.5       d1.2 d2.5 d3.5 d4.7 d5.2
e1.1 e2.2 e3.3 e4.4 e5.5       e1.2 e2.5 e3.5 e4.7 e5.2

and answer3 must multiply by 3rd column and so on. Then add each rows of answer1, answer2, ... and store into a vector. Then store each vector into a list for future use.

 for (k in 1:ncol(rating)) {
   clmy <- as.matrix(rating[, k])
   answer <- sweep(trust, MARGIN = 2, clmy, '*')
   sumtrustbyrating <- rowSums(answer)
   LstsumRbyT[[k]] <- sumtrustbyrating
   sumtrustbyrating = NULL
 }

It is working perfectly if I change the ncol(rating) to a small value (about 100). But for the actual data, I have 140000 columns. It takes time and I couldn't get the final execution result. Please help me to enhance the performance of my code for a huge data set.


How about a matrix product? Or is that too slow?

rating <- matrix(c(1, 2, 3, 4, 5,2, 5, 5, 6, 3, 3, 4, 1, 2, 1), ncol=3)
trust <- matrix(rep(1:5, rep(5, 1)), 5, byrow=TRUE)

Running your code above yields

LstsumRbyT
[[1]]
[1] 55 55 55 55 55

[[2]]
[1] 66 66 66 66 66

[[3]]
[1] 27 27 27 27 27

which is the same as

 trust %*% rating
     [,1] [,2] [,3]
[1,]   55   66   27
[2,]   55   66   27
[3,]   55   66   27
[4,]   55   66   27
[5,]   55   66   27

If this isn't enough then this could be improved a bit in RCppArmadillo I guess.

To add to the benchmarking discussion. If your for loop above is renamed f() then I get

microbenchmark(trust %*% rating, f())
Unit: microseconds
             expr     min       lq      mean   median       uq      max neval cld
 trust %*% rating   1.418   1.7010   2.97663   2.7215   3.5965   14.452   100  a
              f() 593.890 700.9775 764.00515 766.5535 792.6375 1511.104   100   b

which is quite a substantial speedup with the normal matrix product.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK