can 1 explain me how secondary sorting works in hadoop ?
why must 1 use groupingcomparator
, how work in hadoop ?
i going through link given below , got doubt on how groupcomapator works.
can 1 explain me how grouping comparator works?
http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html
grouping comparator
once data reaches reducer, data grouped key. since have composite key, need make sure records grouped solely natural key. accomplished writing custom grouppartitioner. have comparator object considering yearmonth field of temperaturepair class purposes of grouping records together.
public class yearmonthgroupingcomparator extends writablecomparator { public yearmonthgroupingcomparator() { super(temperaturepair.class, true); } @override public int compare(writablecomparable tp1, writablecomparable tp2) { temperaturepair temperaturepair = (temperaturepair) tp1; temperaturepair temperaturepair2 = (temperaturepair) tp2; return temperaturepair.getyearmonth().compareto(temperaturepair2.getyearmonth()); } }
here results of running our secondary sort job:
new-host-2:sbin bbejeck$ hdfs dfs -cat secondary-sort/part-r-00000
190101 -206
190102 -333
190103 -272
190104 -61
190105 -33
190106 44
190107 72
190108 44
190109 17
190110 -33
190111 -217
190112 -300
while sorting data value may not common need, it’s nice tool have in pocket when needed. also, have been able take deeper @ inner workings of hadoop working custom partitioners , group partitioners. refer link also..what use of grouping comparator in hadoop map reduce