i'm looking way read entire files every file read entirely single string. want pass pattern of json text files on gs://my_bucket/*/*.json, have pardo process each , every file entirely.
what's best approach it?
i going give useful answer, though there special cases [1] might different.
i think want define new subclass of filebasedsource , use read.from(<source>). source include subclass of filebasedreader; source contains configuration data , reader reading.
i think full description of api best left javadoc, highlight key override points , how relate needs:
filebasedsource#issplittable()want override , returnfalse. indicate there no intra-file splitting.filebasedsource#createforsubrangeoffile(string, long, long)override return sub-source file specified.filebasedsource#createsinglefilereader()override producefilebasedreadercurrent file (the method should assume split level of single file).
to implement reader:
filebasedreader#startreading(...)override nothing; framework have opened file you, , close it.filebasedreader#readnextrecord()override read entire file single element.
[1] 1 example easy special case when have small number of files, can expand them prior job submission, , take same amount of time process. can use create.of(expand(<glob>)) followed pardo(<read file>).