8

R subassembly when treated as strings

 3 years ago
source link: https://www.codesd.com/item/r-subassembly-when-treated-as-strings.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

R subassembly when treated as strings

advertisements

I have a large data frame where I've forced my vectors into a string (using lapply and toString) so they fit into a dataframe and now I can't check if one column is a subset of the other. Is there a simple way to do this.

X <- data.frame(y=c("ABC","A"), z=c("ABC","A,B,C"))

 X
      y     z
1   ABC   ABC
2    A   A,B,C

all(X$y %in% X$z)
[1] FALSE

(X$y[1] %in% X$z[1])
[1] TRUE

(X$y[2] %in% X$z[2])
[1] FALSE

I need to treat each y and z string value as a vector (comma separated) again and then check if y is a subset of z.

In the above case, A is a subset of A,B,C. However because I've treated both as strings, it doesnt work.

In the above y is just one value and z is 1 and 3. The data frames sample I'll be testing is 10,000 rows and the y will have 1-5 values per row and z 1-100 per row. It looks like the 1-5 are always a subset of z, but I'd like to check.


df = data.frame(y=c("ABC","A"), z=c("ABC","A,B,C"))

apply(df, 1, function(x) {               # perform rowise ops.
  y = unlist(strsplit(x[1], ","))        # splitting X$y if incase it had ","
  z = y %in% unlist(strsplit(x[2], ",")) # check how many of 'X$y' present in 'X$z'
  if (sum(z) == length(y))               # if all present then return TRUE
    return(TRUE)
  else
    return(FALSE)
})

# 1] TRUE TRUE

# Case 2: changed the data. You will have to define if you want perfect subset or not. Accordingly we can update the code
df = data.frame(y=c("ABC","A,B,D"), z=c("ABC","A,B,C"))
#[1]  TRUE FALSE


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK