i have scoured way achieve need without luck here goes. while discovered package dplyr , potential. thinking package can want, don't know how. small subset of data, should representative of problem.
dummy<-structure(list(time = structure(1:20, .label = c("2015-03-25 12:24:00", "2015-03-25 21:08:00", "2015-03-25 21:13:00", "2015-03-25 21:47:00", "2015-03-26 03:08:00", "2015-04-01 20:30:00", "2015-04-01 20:34:00", "2015-04-01 20:42:00", "2015-04-01 20:45:00", "2015-09-29 18:26:00", "2015-09-29 19:11:00", "2015-09-29 21:21:00", "2015-09-29 22:03:00", "2015-09-29 22:38:00", "2015-09-30 00:48:00", "2015-09-30 01:38:00", "2015-09-30 01:41:00", "2015-09-30 01:45:00", "2015-09-30 01:47:00", "2015-09-30 01:49:00"), class = "factor"), id = c(1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l), station = c(1l, 1l, 1l, 2l, 3, 4l, 4l, 4l, 4l, 5l, 5l, 6l, 6l, 5, 5, 5l, 7, 7, 7l, 7)), .names = c("time", "id", "station"), class = "data.frame", row.names = c(na, -20l))
i wish evaluate rows within time column conditional on id , station column. specifically, function (dplyr?) evaluate each time row, , compare time previous time (row-1) , next time (row+1). if time of current row within 1 hour of time of previous and/or next row, , id , station of current row match of previous and/or next row, add in new row 1, otherwise 0.
how achieve using dplyr?
the expected outcome should this:
time id station new.value 1 2015-03-25 12:24:00 1 1 0 2 2015-03-25 21:08:00 1 1 1 3 2015-03-25 21:13:00 1 1 1 4 2015-03-25 21:47:00 1 2 0 5 2015-03-26 03:08:00 1 3 0 6 2015-04-01 20:30:00 1 4 1 7 2015-04-01 20:34:00 1 4 1 8 2015-04-01 20:42:00 1 4 1 9 2015-04-01 20:45:00 1 4 1 10 2015-09-29 18:26:00 2 5 1 11 2015-09-29 19:11:00 2 5 1 12 2015-09-29 21:21:00 2 6 1 13 2015-09-29 22:03:00 2 6 1 14 2015-09-29 22:38:00 2 5 0 15 2015-09-30 00:48:00 2 5 1 16 2015-09-30 01:38:00 2 5 1 17 2015-09-30 01:41:00 2 7 1 18 2015-09-30 01:45:00 2 7 1 19 2015-09-30 01:47:00 2 7 1 20 2015-09-30 01:49:00 2 7 1
here option using difftime
dplyr
mutate function. firstly, use group_by
operation make sure comparison within each unique combination of id , station. difftime
can used calculate difference time, here units set hours
convenience. lag
, lead
functions dplyr
package shift selected column backward or forward. combining vectorised operation of difftime, can calculate time difference between current row , previous/next row. use abs
make sure result absolute value. condition of <1
make sure difference within hour. as.integer
convert logical values (t or f) (1 or 0) correspondingly.
library(dplyr) dummy %>% group_by(id, station) %>% mutate(new.value = as.integer( abs(difftime(time, lag(time, default = inf), units = "hours")) < 1 | abs(difftime(time, lead(time, default = inf), units = "hours")) < 1)) source: local data frame [20 x 4] groups: id, station [7] time id station new.value (time) (int) (dbl) (int) 1 2015-03-25 12:24:00 1 1 0 2 2015-03-25 21:08:00 1 1 1 3 2015-03-25 21:13:00 1 1 1 4 2015-03-25 21:47:00 1 2 0 5 2015-03-26 03:08:00 1 3 0 6 2015-04-01 20:30:00 1 4 1 7 2015-04-01 20:34:00 1 4 1 8 2015-04-01 20:42:00 1 4 1 9 2015-04-01 20:45:00 1 4 1 10 2015-09-29 18:26:00 2 5 1 11 2015-09-29 19:11:00 2 5 1 12 2015-09-29 21:21:00 2 6 1 13 2015-09-29 22:03:00 2 6 1 14 2015-09-29 22:38:00 2 5 0 15 2015-09-30 00:48:00 2 5 1 16 2015-09-30 01:38:00 2 5 1 17 2015-09-30 01:41:00 2 7 1 18 2015-09-30 01:45:00 2 7 1 19 2015-09-30 01:47:00 2 7 1 20 2015-09-30 01:49:00 2 7 1