i'm looking way read entire files every file read entirely single string. want pass pattern of json text files on gs://my_bucket/*/*.json, have pardo process each , every file entirely.
what's best approach it?
i going give useful answer, though there special cases [1] might different.
i think want define new subclass of filebasedsource
, use read.from(<source>)
. source include subclass of filebasedreader
; source contains configuration data , reader reading.
i think full description of api best left javadoc, highlight key override points , how relate needs:
filebasedsource#issplittable()
want override , returnfalse
. indicate there no intra-file splitting.filebasedsource#createforsubrangeoffile(string, long, long)
override return sub-source file specified.filebasedsource#createsinglefilereader()
override producefilebasedreader
current file (the method should assume split level of single file).
to implement reader:
filebasedreader#startreading(...)
override nothing; framework have opened file you, , close it.filebasedreader#readnextrecord()
override read entire file single element.
[1] 1 example easy special case when have small number of files, can expand them prior job submission, , take same amount of time process. can use create.of(expand(<glob>))
followed pardo(<read file>)
.