R dcast error / Find irregular IDs in a dataframe -

- September 15, 2013

my dataframe looks this:

id | value | value b 1  |   a1    |   f 1  |   a2    |   n 1  |   a3    |   b 1  |   a4    |   s 2  |   a1    |   b 2  |   a2    |   g 2  |   a3    |   n 3  |   a1    |   f 3  |   a2    |   h 3  |   a3    |   j 3  |   a4    |   n

so have 4 rows 1 id each. trying utilize dcast() function, works if ids have same number of rows. id no. 2 error case in example. there easy way find ids have more or less 4 rows? or may there way create dcast function ignore error cases?

originally trying reshape dataframe this:

id | a1 | a2 | a3 | a4 1 | f | n | b | s 2 | b | g | n | na 3 | f | h | j | n

apparently dcast() function reshape2 bundle doesn´t work irregular ids. gives me next erros message: 'aggregation function missing: defaulting length' smaller part of dataset - doesn´t have irregular ids - works. ideas? or may thought how reshape dataframe without using dcast? thanks!

i working on mac next (package-) versions:

sessioninfo()  r version 2.14.1 (2011-12-22) platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)  locale: [1] de_de.utf-8/de_de.utf-8/de_de.utf-8/c/de_de.utf-8/de_de.utf-8  attached  base of operations packages: [1] stats     graphics  grdevices utils     datasets  methods    base of operations       other attached packages: [1] reshape2_1.2.1 plyr_1.7.1      loaded via namespace (and not attached): [1] stringr_0.6

the first column values integer, others character values.

sapply(x, class) id fach01 f01_lp "integer" "character" "character"

as reproducible example: hope helps (i used original dataframe), if utilize first 500 rows of dataframe dcast() works fine, problem occurs when seek utilize whole dataframe of 140000 rows.

df <- structure(list(id = c(1l, 1l, 1l, 1l, 2l, 2l, 2l, 3l, 3l,  3l, 3l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 6l, 6l, 6l, 6l, 7l, 7l,  7l, 7l, 8l, 8l, 8l, 8l, 9l, 9l, 9l, 9l),  = c("2.lf",  "1.lf", "3.pf", "4.pf", "3.pf", "1.lf", "2.lf", "3.pf",  "4.pf", "1.lf", "2.lf", "3.pf", "1.lf", "4.pf", "2.lf", "1.lf",  "2.lf", "4.pf", "3.pf", "1.lf", "3.pf", "2.lf", "4.pf", "3.pf",  "4.pf", "1.lf", "2.lf", "4.pf", "2.lf", "3.pf", "1.lf", "1.lf",  "2.lf", "3.pf", "4.pf"), b = c("mu/ku",  "fs", "2.af", "nw", "de", "2.af", "ma", "fs", "2.af", "nw",  "nw", "fs", "2.af", "bel", "nw", "fs", "bel", "bel", "nw", "de",  "2.af", "2.af", "ma", "fs", "2.af", "ma", "nw", "de", "2.af",  "ma", "nw", "mu/ku", "fs", "2.af", "nw")), .names = c("id", "a", "b" ), row.names = c("3", "5", "7", "10", "26", "29", "212", "213",  "32", "35", "38", "39", "43", "44", "45", "48", "53", "56", "57",  "59", "61", "65", "67", "68", "72", "75", "76", "77", "81", "86",  "87", "88", "92", "93", "95", "98"), class = "data.frame")

in original dataframe values a1 -a4 (here called 1.pf - 4.pf) not in right order, want dcast (same above)

id | 1.pf | 2.pf | 3.pf | 4.pf 1 | f | nw | de | s 2 | bel | g | n | <na> 3 | f | nw | bel | n

edit:

i didn´t solve dcast() problem, found way work around it: (reshape() function reshape package)

df <- reshape(df, idvar='id', varying = null, timevar = 'value a', direction='wide')

table , which reply first question:

names(table(dfrm$id))[which(table(dfrm$id) <4)] #[1] "2"

as sec question, maybe should post code generating error. @ moment it's not clear trying (and failing) do.

edit:

if convert factor variables character variables can dcast homecoming right object, although error different yours. got error in both reshape 1.1 , reshape 1.2.1 on r 2.14.1 on mac.

edit2: turned out bug fixed in newest version of plyr. no error reshape 1.2.1 running plyr 1.7. should update 2 packages , restart fresh session.

require(reshape2) dfrm <- structure(list(id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3), value.a = structure(c(1l,  2l, 3l, 4l, 1l, 2l, 3l, 1l, 2l, 3l, 4l), .label = c("   a1    ",  "   a2    ", "   a3    ", "   a4    "), class = "factor"), value.b = structure(c(2l,  6l, 1l, 7l, 1l, 3l, 6l, 2l, 4l, 5l, 6l), .label = c("   b", "   f",  "   g", "   h", "   j", "   n", "   s"), class = "factor")), .names = c("id",  "value.a", "value.b"), class = "data.frame", row.names = c(na,  -11l)) dcast(dfrm2, id ~ value.a) # using value.b value column:  utilize value_var override. # error in names(data) <- array_names(res$labels[[2]]) :  #  'names' attribute [4] must same length vector [1] # first tried removing leading , trainly spaces with: dfrm2 <- data.frame(lapply(dfrm, gsub, patt="^\\s+|\\s+$", rep="")) # still got error.  seek leave "character" type.  dfrm2 <- data.frame(lapply(dfrm, gsub, patt="^\\s+|\\s+$", rep=""),stringsasfactors=false) str(dfrm2) #----------------- 'data.frame':   11 obs. of  3 variables:  $ id     : chr  "1" "1" "1" "1" ...  $ value.a: chr  "a1" "a2" "a3" "a4" ...  $ value.b: chr  "f" "n" "b" "s" ...  dcast(dfrm2, id ~ value.a) #------------------ using value.b value column:  utilize value_var override.   id a1 a2 a3   a4 1  1  f  n  b    s 2  2  b  g  n <na> 3  3  f  h  j    n

r find data.frame

Search This Blog

Kamlesh

R dcast error / Find irregular IDs in a dataframe -

Comments

Post a Comment

Popular posts from this blog

How do I check if an insert was successful with MySQLdb in Python? -

delphi - blogger via idHTTP : error 400 bad request -

postgresql - ERROR: operator is not unique: unknown + unknown -