A: How to rewrite my code to work at high efficiency?

advertisements

I have a matrix (named rating) with dim n x 140000 and another matrix (named trust) with dim nxn where n varying when I change the group and n might have value from 1-15000. I need to multiply each column of rating by trust. for example:

trust=                         rating=
a1 a2 a3 a4 a5                 1 2 3 4 5 6 7 8
b1 b2 b3 b4 b5                 2 5 7 8 9 2 1 6
c1 c2 c3 c4 c5                 3 5 3 6 8 1 2 5
d1 d2 d3 d4 d5                 4 7 8 2 4 5 6 7
e1 e2 e3 e4 e5                 5 2 5 7 8 9 1 4

answer1=                       answer2=
a1.1 a2.2 a3.3 a4.4 a5.5       a1.2 a2.5 a3.5 a4.7 a5.2
b1.1 b2.2 b3.3 b4.4 b5.5       b1.2 b2.5 b3.5 b4.7 b5.2
c1.1 c2.2 c3.3 c4.4 c5.5       c1.2 c2.5 c3.5 c4.7 c5.2
d1.1 d2.2 d3.3 d4.4 d5.5       d1.2 d2.5 d3.5 d4.7 d5.2
e1.1 e2.2 e3.3 e4.4 e5.5       e1.2 e2.5 e3.5 e4.7 e5.2

and answer3 must multiply by 3rd column and so on. Then add each rows of answer1, answer2, ... and store into a vector. Then store each vector into a list for future use.

 for (k in 1:ncol(rating)) {
   clmy <- as.matrix(rating[, k])
   answer <- sweep(trust, MARGIN = 2, clmy, '*')
   sumtrustbyrating <- rowSums(answer)
   LstsumRbyT[[k]] <- sumtrustbyrating
   sumtrustbyrating = NULL
 }

It is working perfectly if I change the ncol(rating) to a small value (about 100). But for the actual data, I have 140000 columns. It takes time and I couldn't get the final execution result. Please help me to enhance the performance of my code for a huge data set.

How about a matrix product? Or is that too slow?

rating <- matrix(c(1, 2, 3, 4, 5,2, 5, 5, 6, 3, 3, 4, 1, 2, 1), ncol=3)
trust <- matrix(rep(1:5, rep(5, 1)), 5, byrow=TRUE)

Running your code above yields

LstsumRbyT
[[1]]
[1] 55 55 55 55 55

[[2]]
[1] 66 66 66 66 66

[[3]]
[1] 27 27 27 27 27

which is the same as

 trust %*% rating
     [,1] [,2] [,3]
[1,]   55   66   27
[2,]   55   66   27
[3,]   55   66   27
[4,]   55   66   27
[5,]   55   66   27

If this isn't enough then this could be improved a bit in RCppArmadillo I guess.

To add to the benchmarking discussion. If your for loop above is renamed f() then I get

microbenchmark(trust %*% rating, f())
Unit: microseconds
             expr     min       lq      mean   median       uq      max neval cld
 trust %*% rating   1.418   1.7010   2.97663   2.7215   3.5965   14.452   100  a
              f() 593.890 700.9775 764.00515 766.5535 792.6375 1511.104   100   b

which is quite a substantial speedup with the normal matrix product.

A: How to rewrite my code to work at high efficiency?

A: How to rewrite my code to work at high efficiency?

Recommend

Why use a 64-bit operating system?

Why can not you use C ++ 11 boot with macros?

Oracle: Checking NOT NULL in Tuples

如何关闭maven-default-http-blocker？

Finding the Smallest Number in an Array Error

硅谷教父Paul Graham谈投资：我们一开始其实并不喜欢Airbnb

Restart notification activity in Android

Load web resources (js, css, images, etc.) from secure pages (https) through htt...

How To Install WordPress with LAMP on Ubuntu 18.04

一家十年前就成立的碳中和公司，忽然一天接到1000个投资人的电话丨36氪专访

About Joyk