i'm having trouble understanding data.table's rollends
argument when doing rolling join.
the docs reference:
a logical vector length 2 (a single logical recycled) indicating whether values falling before first value or after last value group should rolled well.
if rollends[2]=true, roll last value forward. true default locf , false nocb rolls.
if rollends[1]=true, roll first value backward. true default nocb , false locf rolls.
now confusing example. here, build table of commercials , 2 different tables of sales.
# commercials commercials<-data.table(commercialid=c("c1","c2","c3","c4"), commercialdate=as.date(c("2014-1-1","2014-4-1","2014-7-1","2014-9-15"))) commercials[, rolldate:=commercialdate] #add column, rolldate equal commercialdate setkey(commercials, "rolldate") commercials commercialid commercialdate rolldate 1: c1 2014-01-01 2014-01-01 2: c2 2014-04-01 2014-04-01 3: c3 2014-07-01 2014-07-01 4: c4 2014-09-15 2014-09-15 # sales1 (a single sale before commercials) sales1 <- data.table(saleid=c("s0"), saledate=as.date(c("2010-12-31"))) sales1[, rolldate:=saledate] setkey(sales1, "rolldate") sales1 saleid saledate rolldate 1: s0 2010-12-31 2010-12-31 # sales2 (a sale before commercials , sale after commercial1) sales2 <- data.table(saleid=c("s0", "s1"), saledate=as.date(c("2010-12-31", "2014-2-1"))) sales2[, rolldate:=saledate] setkey(sales2, "rolldate") sales2 saleid saledate rolldate 1: s0 2010-12-31 2010-12-31 2: s1 2014-02-01 2014-02-01
now rolling joins
sales1[commercials, roll=true, rollends=c(true, false)] saleid saledate rolldate commercialid commercialdate 1: na <na> 2014-01-01 c1 2014-01-01 2: na <na> 2014-04-01 c2 2014-04-01 3: na <na> 2014-07-01 c3 2014-07-01 4: na <na> 2014-09-15 c4 2014-09-15 sales2[commercials, roll=true, rollends=c(true, false)] saleid saledate rolldate commercialid commercialdate 1: s0 2010-12-31 2014-01-01 c1 2014-01-01 2: na <na> 2014-04-01 c2 2014-04-01 3: na <na> 2014-07-01 c3 2014-07-01 4: na <na> 2014-09-15 c4 2014-09-15
questions
- why sale s0 mapped c1 in second join not first?
- a better/different explanation of
rollends
doing.
oh, , i'm using development version, 1.9.7
in first case,
sales1[commercials, roll=true, rollends=c(true, false)]
2014-01-01
row in commercials
falls after 2010-12-31
. prevailing value has carried forward. falls on end, i.e., after sales1
, , you've provided rollends[2] = false
. doesn't rolled forward.
in second case,
sales2[commercials, roll=true, rollends=c(true, false)]
2014-01-01
row in commercials
falls in between 2010-12-31
, 2014-02-01
. there's no effect of rollends
row since doesn't fall on either end. last value gets rolled forward.
all other values fall outside of sales2
. rollends
argument comes play. , rollends[2] = false]
means prevailing values won't rolled forwards.