i have data.table
many events different customers ("clients") , want split events @ each gap ("missing event") of same customer.
e. g. suppose have monthly event data , missing event 1 or more months "gap" while events several successive months belong same group:
library(data.table) library(lubridate) # ymd() dt <- data.table(client.no = c(rep("client_a", 3), rep("client_b", 5), rep("client_c", 2)), event.date = ymd(20160101, 20160201, 20160301, 20151201, 20160101, 20160301, 20160501, 20160601, 20140701, 20150101))
with dt
client.no event.date 1: client_a 2016-01-01 2: client_a 2016-02-01 3: client_a 2016-03-01 4: client_b 2015-12-01 5: client_b 2016-01-01 6: client_b 2016-03-01 7: client_b 2016-05-01 8: client_b 2016-06-01 9: client_c 2014-07-01 10: client_c 2015-01-01
the result shall group number same each row of same group, e. g.:
client.no event.date group.no 1: client_a 2016-01-01 1 2: client_a 2016-02-01 1 3: client_a 2016-03-01 1 4: client_b 2015-12-01 1 5: client_b 2016-01-01 1 6: client_b 2016-03-01 2 7: client_b 2016-05-01 3 8: client_b 2016-06-01 3 9: client_c 2014-07-01 1 10: client_c 2015-01-01 2
it not required group number reset 1 each client (but nice).
you can assume events ordered within each client , there no duplicated event dates within same client.
you can use cumsum
:
dt[,z:=cumsum(c(1,diff(event.date)>31)),by=client.no]
output:
client.no event.date z 1: client_a 2016-01-01 1 2: client_a 2016-02-01 1 3: client_a 2016-03-01 1 4: client_b 2015-12-01 1 5: client_b 2016-01-01 1 6: client_b 2016-03-01 2 7: client_b 2016-05-01 3 8: client_b 2016-06-01 3 9: client_c 2014-07-01 1 10: client_c 2015-01-01 2