time series - R: grouping/splitting a dataset by categories in combination with apply.weekly() -

- May 15, 2011

intro

i not r expert yet please excuse question should embarassed of ask. in another question asked on stackoverflow got helpful comments on how aggregate unregularly daily info of xts object weekly values apply.weekly() function. unfortunately didn't find function tapply(), ddply(), by() or aggregate() allows split categories works apply.weekly() function.

my data

this illustration dataset. posted in other question. illustration purposes taking liberty post here:

example <- as.data.frame(structure(c(" 1", " 2", " 1", " 2", " 1", " 1", " 2", " 1", " 2",  " 1", " 2", " 3", " 1", " 1", " 2", " 2", " 3", " 1", " 2", " 2",  " 1", " 2", " 1", " 1", " 2", na, " 2", na, na, " 1", " 3", " 1",  " 3", " 3", " 2", " 3", " 3", " 3", " 2", " 2", " 2", " 3", " 3",  " 3", " 2", " 2", " 3", " 3", " 3", " 3", " 1", " 2", " 1", " 2",  " 2", " 1", " 2", " 1", " 2", " 2", " 2", " 3", " 1", " 1", " 2",  " 2", " 3", " 3", " 2", " 2", " 1", " 2", " 1", " 1", " 2", na,  " 2", na, na, " 1", " 3", " 2", " 3", " 2", " 0", " 3", " 3",  " 3", " 2", " 0", " 2", " 3", " 3", " 3", " 0", " 2", " 2", " 3",  " 3", " 0", "12", " 5", " 9", "14", " 5", "tra", "tra", "man",  "inf", "agc", "07-2011", "07-2011", "07-2011", "07-2011", "07-2011"  ), .indexclass = c("posixlt", "posixt"), .indextz = "", class = c("xts",  "zoo"), .indexformat = "%u-%y", index = structure(c(1297642226,  1297672737, 1297741204, 1297748893, 1297749513), tzone = "", tclass = c("posixlt",  "posixt")), .dim = c(5l, 23l), .dimnames = list(null, c("rev_sit",  "prof_sit", "emp_nr_sit", "inv_sit", "ord_home_sit", "ord_abr_sit",  "emp_cost_sit", "usage_cost_sit", "tax_cost_sit", "gov_cost_sit",  "rev_exp", "prof_exp", "emp_nr_exp", "inv_exp", "ord_home_exp",  "ord_abr_exp", "emp_cost_exp", "usage_cost_exp", "tax_cost_exp",  "gov_cost_exp", "land", "nace", "index"))))

the columns

"rev_sit", "prof_sit", "emp_nr_sit", "inv_sit", "ord_home_sit", "ord_abr_sit", "emp_cost_sit", "usage_cost_sit", "tax_cost_sit", "gov_cost_sit","rev_exp", "prof_exp", "emp_nr_exp", "inv_exp", "ord_home_exp","ord_abr_exp", "emp_cost_exp", "usage_cost_exp","tax_cost_exp","gov_cost_exp",

refer questions in survey. there 3 answering possibilities codes "1", "2", , "3".

the columns

"land", "nace"

are categories 16 , 8 unique factors respectively.

my goal goal count occurrence of "1", "2", , "3" each week each combination of category factors in "nace" , "land". thought create binary vectors each answering possibility {1,2,3} beforehand (example_1,example_2,example_2) , apply like:

apply.weekly(example_1, function(d){ddply(d,list(example$nace,example$land),sum)})

but doesn't work neither ddply, aggregate, by etc.

my goal

my unprofessional work around not create time series, date vector example$date given time column coded weekly via %v utilize i.e:

tapply(example_1[,5], list(example$date,example$nace,example$land),sum)

which of course of study have every out of above displayed 20 questions. i.e. example_1:

week1, nace1.land1, nace1.land2, nace1.land3, ..., nace1.land16, nace2.land1,..,nace8.land16 week2, nace1.land1, nace1.land2, nace1.land3, ..., nace1.land16, nace2.land1,..,nace8.land16 ... ... weekn, nace1.land1, nace1.land2, nace1.land3, ..., nace1.land16, nace2.land1,..,nace8.land16

the same have 2 (example_2) , 3 (example_3) , each of 20 questions produce in 16*8*3*20=7680 columns. extreme , additionally method product not time series , not ordered correctly week.

summary

so can teach me or give me hint how utilize function apply.weekly() in combination functions sort of tapply(), ddply(), by(), split(), unstack() etc. or other method accomplish grouping described above. every hint appreciated. frustrated thinking abandon r experiment , changing stata many things much more intuitive collapse() , by() etc... don't understand me wrong: keen larn please help me!

i add together "week" column, suggest, convert info tall format before processing -- can convert time series afterwards, if needed.

library(reshape2) d <- melt(example, id.vars=c("land", "nace", "index")) # apparently want 1 of followings dcast( d, land + nace + index ~ value, length ) dcast( d, land + nace + index + variable ~ value, length ) dcast( d, land + nace + index ~ variable + value, length )

equivalently, utilize ddply:

library(plyr) d <- melt(example, id.vars=c("land", "nace", "index")) ddply( d,    c("land", "nace", "index", "value"),    summarize,    number=length(value)  # argument "value" not play role )

your index column contains number of week in current year (%y-%u): work if dates within same calendar year. may safer utilize actual date instead of week number, instance, sunday @ start of current week -- makes easier turn result time series.

week_start <- function(u) as.date(u) - as.numeric(format(u, "%u")) example$index <- weekstart( as.posixct(rownames(example)) ) #  next may work. example$index <- format( as.posixct(rownames(example)), "%g-%v" )

r time-series xts categorization

Search This Blog

Kamlesh

time series - R: grouping/splitting a dataset by categories in combination with apply.weekly() -

Comments

Post a Comment

Popular posts from this blog

How do I check if an insert was successful with MySQLdb in Python? -

delphi - blogger via idHTTP : error 400 bad request -

postgresql - ERROR: operator is not unique: unknown + unknown -