R dcast error / Find irregular IDs in a dataframe -



R dcast error / Find irregular IDs in a dataframe -

my dataframe looks this:

id | value | value b 1 | a1 | f 1 | a2 | n 1 | a3 | b 1 | a4 | s 2 | a1 | b 2 | a2 | g 2 | a3 | n 3 | a1 | f 3 | a2 | h 3 | a3 | j 3 | a4 | n

so have 4 rows 1 id each. trying utilize dcast() function, works if ids have same number of rows. id no. 2 error case in example. there easy way find ids have more or less 4 rows? or may there way create dcast function ignore error cases?

originally trying reshape dataframe this:

id | a1 | a2 | a3 | a4 1 | f | n | b | s 2 | b | g | n | na 3 | f | h | j | n

apparently dcast() function reshape2 bundle doesn´t work irregular ids. gives me next erros message: 'aggregation function missing: defaulting length' smaller part of dataset - doesn´t have irregular ids - works. ideas? or may thought how reshape dataframe without using dcast? thanks!

i working on mac next (package-) versions:

sessioninfo() r version 2.14.1 (2011-12-22) platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] de_de.utf-8/de_de.utf-8/de_de.utf-8/c/de_de.utf-8/de_de.utf-8 attached base of operations packages: [1] stats graphics grdevices utils datasets methods base of operations other attached packages: [1] reshape2_1.2.1 plyr_1.7.1 loaded via namespace (and not attached): [1] stringr_0.6

the first column values integer, others character values.

sapply(x, class) id fach01 f01_lp "integer" "character" "character"

as reproducible example: hope helps (i used original dataframe), if utilize first 500 rows of dataframe dcast() works fine, problem occurs when seek utilize whole dataframe of 140000 rows.

df <- structure(list(id = c(1l, 1l, 1l, 1l, 2l, 2l, 2l, 3l, 3l, 3l, 3l, 4l, 4l, 4l, 4l, 5l, 5l, 5l, 5l, 6l, 6l, 6l, 6l, 7l, 7l, 7l, 7l, 8l, 8l, 8l, 8l, 9l, 9l, 9l, 9l), = c("2.lf", "1.lf", "3.pf", "4.pf", "3.pf", "1.lf", "2.lf", "3.pf", "4.pf", "1.lf", "2.lf", "3.pf", "1.lf", "4.pf", "2.lf", "1.lf", "2.lf", "4.pf", "3.pf", "1.lf", "3.pf", "2.lf", "4.pf", "3.pf", "4.pf", "1.lf", "2.lf", "4.pf", "2.lf", "3.pf", "1.lf", "1.lf", "2.lf", "3.pf", "4.pf"), b = c("mu/ku", "fs", "2.af", "nw", "de", "2.af", "ma", "fs", "2.af", "nw", "nw", "fs", "2.af", "bel", "nw", "fs", "bel", "bel", "nw", "de", "2.af", "2.af", "ma", "fs", "2.af", "ma", "nw", "de", "2.af", "ma", "nw", "mu/ku", "fs", "2.af", "nw")), .names = c("id", "a", "b" ), row.names = c("3", "5", "7", "10", "26", "29", "212", "213", "32", "35", "38", "39", "43", "44", "45", "48", "53", "56", "57", "59", "61", "65", "67", "68", "72", "75", "76", "77", "81", "86", "87", "88", "92", "93", "95", "98"), class = "data.frame")

in original dataframe values a1 -a4 (here called 1.pf - 4.pf) not in right order, want dcast (same above)

id | 1.pf | 2.pf | 3.pf | 4.pf 1 | f | nw | de | s 2 | bel | g | n | <na> 3 | f | nw | bel | n

edit:

i didn´t solve dcast() problem, found way work around it: (reshape() function reshape package)

df <- reshape(df, idvar='id', varying = null, timevar = 'value a', direction='wide')

table , which reply first question:

names(table(dfrm$id))[which(table(dfrm$id) <4)] #[1] "2"

as sec question, maybe should post code generating error. @ moment it's not clear trying (and failing) do.

edit:

if convert factor variables character variables can dcast homecoming right object, although error different yours. got error in both reshape 1.1 , reshape 1.2.1 on r 2.14.1 on mac.

edit2: turned out bug fixed in newest version of plyr. no error reshape 1.2.1 running plyr 1.7. should update 2 packages , restart fresh session.

require(reshape2) dfrm <- structure(list(id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3), value.a = structure(c(1l, 2l, 3l, 4l, 1l, 2l, 3l, 1l, 2l, 3l, 4l), .label = c(" a1 ", " a2 ", " a3 ", " a4 "), class = "factor"), value.b = structure(c(2l, 6l, 1l, 7l, 1l, 3l, 6l, 2l, 4l, 5l, 6l), .label = c(" b", " f", " g", " h", " j", " n", " s"), class = "factor")), .names = c("id", "value.a", "value.b"), class = "data.frame", row.names = c(na, -11l)) dcast(dfrm2, id ~ value.a) # using value.b value column: utilize value_var override. # error in names(data) <- array_names(res$labels[[2]]) : # 'names' attribute [4] must same length vector [1] # first tried removing leading , trainly spaces with: dfrm2 <- data.frame(lapply(dfrm, gsub, patt="^\\s+|\\s+$", rep="")) # still got error. seek leave "character" type. dfrm2 <- data.frame(lapply(dfrm, gsub, patt="^\\s+|\\s+$", rep=""),stringsasfactors=false) str(dfrm2) #----------------- 'data.frame': 11 obs. of 3 variables: $ id : chr "1" "1" "1" "1" ... $ value.a: chr "a1" "a2" "a3" "a4" ... $ value.b: chr "f" "n" "b" "s" ... dcast(dfrm2, id ~ value.a) #------------------ using value.b value column: utilize value_var override. id a1 a2 a3 a4 1 1 f n b s 2 2 b g n <na> 3 3 f h j n

r find data.frame

Comments

Popular posts from this blog

How do I check if an insert was successful with MySQLdb in Python? -

delphi - blogger via idHTTP : error 400 bad request -

postgresql - ERROR: operator is not unique: unknown + unknown -