i'm using elasticsearch
, running cron job every 10 minutes find newly created/updated data db , sync elasticsearch
. however, want use bulk
sync instead of making , arbitrary amount of requests update/create documents in index. i'm using elasticsearch.js library created elasticsearch.
i face 2 challenges i'm uncertain how handle:
- how use
bulk
update document if exists , create document if doesn't withinbulk
without knowing if exists in index. - how format large amount of
json
run throughbulk
update/create document becausebulk
api expects body formatted way.
the best option when trying stream in data sql database use logstash's jdbc input you (the documentation). can you.
not sql schemes make easy, specific questions:
how use bulk update document if exists , create document if doesn't within bulk without knowing if exists in index.
bulk accepts 4 different types of sub-requests, behave differently expect coming sql world:
index
create
update
delete
the first, index
, commonly used option. means want index
(the verb) elasticsearch index (the noun). however, if exists in index given same _id
, replace it. rest bit more obvious.
each 1 of sub-requests behaves individual option they're associated (so update
updaterequest
under hood, delete
deleterequest
, , index
indexrequest
). in case of create
, specialization of index
, says "add if doesn't exist, fail if exist".
how format large amount of json run through bulk update/create document because bulk api expects body formatted way.
you should using either logstash approach or of existing client language libraries, such python client, should work cron. clients take care of formatting you. 1 preferred language exists.