i working transcript of show , extract text of each speaker , store own vector. data looks -
bob: blah blah blah blah trudy: blah blah bob: idea of text on new line don't know how extract correct vector trudy: blah blah blah
.. , on.
i imagine need use combination of readlines , grep not sure how implement it.
interesting problem. not sure output need, should give idea.
text <- "bob: blah blah blah blah trudy: bleh bleh bob: idea of text on new line don't know how extract correct vector trudy: bleh bleh bleh bob: durrh!!!" # replace line feeds spaces text <- gsub(pattern = "\\n", replacement = " ", x = text) # split string words find alternations of bob / trudy <- strsplit(x = text, split = " ")[[1]] <- who[who %in% c("bob:", "trudy:")] # split string using bob: , trudy: dialog <- strsplit(x = text, split = "(bob: )|(trudy: )", perl = true)[[1]][-1] # create 2 final vectors bob <- trimws(dialog[which(who=="bob:")]) trudy <- trimws(dialog[which(who=="trudy:")])
results
> bob [1] "blah blah blah blah" [2] "you idea of text on new line don't know how extract correct vector" [3] "durrh!!!" > trudy [1] "bleh bleh" "bleh bleh bleh"