r - Group data.table dates into groups by consecutive time intervals (split by gaps) -


i have data.table many events different customers ("clients") , want split events @ each gap ("missing event") of same customer.

e. g. suppose have monthly event data , missing event 1 or more months "gap" while events several successive months belong same group:

library(data.table) library(lubridate)   # ymd() dt <- data.table(client.no = c(rep("client_a", 3), rep("client_b", 5), rep("client_c", 2)),                  event.date = ymd(20160101, 20160201, 20160301, 20151201, 20160101, 20160301, 20160501, 20160601, 20140701, 20150101)) 

with dt

    client.no event.date  1:  client_a 2016-01-01  2:  client_a 2016-02-01  3:  client_a 2016-03-01  4:  client_b 2015-12-01  5:  client_b 2016-01-01  6:  client_b 2016-03-01  7:  client_b 2016-05-01  8:  client_b 2016-06-01  9:  client_c 2014-07-01 10:  client_c 2015-01-01 

the result shall group number same each row of same group, e. g.:

    client.no event.date group.no  1:  client_a 2016-01-01        1  2:  client_a 2016-02-01        1  3:  client_a 2016-03-01        1  4:  client_b 2015-12-01        1  5:  client_b 2016-01-01        1  6:  client_b 2016-03-01        2  7:  client_b 2016-05-01        3  8:  client_b 2016-06-01        3  9:  client_c 2014-07-01        1 10:  client_c 2015-01-01        2 

it not required group number reset 1 each client (but nice).

you can assume events ordered within each client , there no duplicated event dates within same client.

you can use cumsum:

dt[,z:=cumsum(c(1,diff(event.date)>31)),by=client.no] 

output:

   client.no event.date z  1:  client_a 2016-01-01 1  2:  client_a 2016-02-01 1  3:  client_a 2016-03-01 1  4:  client_b 2015-12-01 1  5:  client_b 2016-01-01 1  6:  client_b 2016-03-01 2  7:  client_b 2016-05-01 3  8:  client_b 2016-06-01 3  9:  client_c 2014-07-01 1 10:  client_c 2015-01-01 2