time series - R: grouping/splitting a dataset by categories in combination with apply.weekly() -
time series - R: grouping/splitting a dataset by categories in combination with apply.weekly() -
intro
i not r expert yet please excuse question should embarassed of ask. in another question asked on stackoverflow got helpful comments on how aggregate unregularly daily info of xts object weekly values apply.weekly()
function. unfortunately didn't find function tapply()
, ddply()
, by()
or aggregate()
allows split categories works apply.weekly()
function.
my data
this illustration dataset. posted in other question. illustration purposes taking liberty post here:
example <- as.data.frame(structure(c(" 1", " 2", " 1", " 2", " 1", " 1", " 2", " 1", " 2", " 1", " 2", " 3", " 1", " 1", " 2", " 2", " 3", " 1", " 2", " 2", " 1", " 2", " 1", " 1", " 2", na, " 2", na, na, " 1", " 3", " 1", " 3", " 3", " 2", " 3", " 3", " 3", " 2", " 2", " 2", " 3", " 3", " 3", " 2", " 2", " 3", " 3", " 3", " 3", " 1", " 2", " 1", " 2", " 2", " 1", " 2", " 1", " 2", " 2", " 2", " 3", " 1", " 1", " 2", " 2", " 3", " 3", " 2", " 2", " 1", " 2", " 1", " 1", " 2", na, " 2", na, na, " 1", " 3", " 2", " 3", " 2", " 0", " 3", " 3", " 3", " 2", " 0", " 2", " 3", " 3", " 3", " 0", " 2", " 2", " 3", " 3", " 0", "12", " 5", " 9", "14", " 5", "tra", "tra", "man", "inf", "agc", "07-2011", "07-2011", "07-2011", "07-2011", "07-2011" ), .indexclass = c("posixlt", "posixt"), .indextz = "", class = c("xts", "zoo"), .indexformat = "%u-%y", index = structure(c(1297642226, 1297672737, 1297741204, 1297748893, 1297749513), tzone = "", tclass = c("posixlt", "posixt")), .dim = c(5l, 23l), .dimnames = list(null, c("rev_sit", "prof_sit", "emp_nr_sit", "inv_sit", "ord_home_sit", "ord_abr_sit", "emp_cost_sit", "usage_cost_sit", "tax_cost_sit", "gov_cost_sit", "rev_exp", "prof_exp", "emp_nr_exp", "inv_exp", "ord_home_exp", "ord_abr_exp", "emp_cost_exp", "usage_cost_exp", "tax_cost_exp", "gov_cost_exp", "land", "nace", "index"))))
the columns
"rev_sit", "prof_sit", "emp_nr_sit", "inv_sit", "ord_home_sit", "ord_abr_sit", "emp_cost_sit", "usage_cost_sit", "tax_cost_sit", "gov_cost_sit","rev_exp", "prof_exp", "emp_nr_exp", "inv_exp", "ord_home_exp","ord_abr_exp", "emp_cost_exp", "usage_cost_exp","tax_cost_exp","gov_cost_exp",
refer questions in survey. there 3 answering possibilities codes "1", "2", , "3".
the columns
"land", "nace"
are categories 16 , 8 unique factors respectively.
my goal goal count occurrence of "1", "2", , "3" each week each combination of category factors in "nace" , "land". thought create binary vectors each answering possibility {1,2,3} beforehand (example_1,example_2,example_2) , apply like:
apply.weekly(example_1, function(d){ddply(d,list(example$nace,example$land),sum)})
but doesn't work neither ddply
, aggregate
, by
etc.
my goal
my unprofessional work around not create time series, date vector example$date
given time column coded weekly via %v
utilize i.e:
tapply(example_1[,5], list(example$date,example$nace,example$land),sum)
which of course of study have every out of above displayed 20 questions. i.e. example_1:
week1, nace1.land1, nace1.land2, nace1.land3, ..., nace1.land16, nace2.land1,..,nace8.land16 week2, nace1.land1, nace1.land2, nace1.land3, ..., nace1.land16, nace2.land1,..,nace8.land16 ... ... weekn, nace1.land1, nace1.land2, nace1.land3, ..., nace1.land16, nace2.land1,..,nace8.land16
the same have 2 (example_2) , 3 (example_3) , each of 20 questions produce in 16*8*3*20=7680 columns. extreme , additionally method product not time series , not ordered correctly week.
summary
so can teach me or give me hint how utilize function apply.weekly()
in combination functions sort of tapply()
, ddply()
, by()
, split()
, unstack()
etc. or other method accomplish grouping described above. every hint appreciated. frustrated thinking abandon r experiment , changing stata many things much more intuitive collapse()
, by()
etc... don't understand me wrong: keen larn please help me!
i add together "week" column, suggest, convert info tall format before processing -- can convert time series afterwards, if needed.
library(reshape2) d <- melt(example, id.vars=c("land", "nace", "index")) # apparently want 1 of followings dcast( d, land + nace + index ~ value, length ) dcast( d, land + nace + index + variable ~ value, length ) dcast( d, land + nace + index ~ variable + value, length )
equivalently, utilize ddply
:
library(plyr) d <- melt(example, id.vars=c("land", "nace", "index")) ddply( d, c("land", "nace", "index", "value"), summarize, number=length(value) # argument "value" not play role )
your index
column contains number of week in current year (%y-%u
): work if dates within same calendar year. may safer utilize actual date instead of week number, instance, sunday @ start of current week -- makes easier turn result time series.
week_start <- function(u) as.date(u) - as.numeric(format(u, "%u")) example$index <- weekstart( as.posixct(rownames(example)) ) # next may work. example$index <- format( as.posixct(rownames(example)), "%g-%v" )
r time-series xts categorization
Comments
Post a Comment