Mapreduce secondary sorting _ mapreduce secondary sorting principle

About what is secondary sorting

In the mapreduce operation, the shuffle phase is sorted by key value multiple times. However, after shuffle grouping, the order of the values â€‹â€‹of the same key value is undefined (as shown below). If you want the value to be sorted at this time, this requirement is a secondary sort.

By default, the result of the Map output will sort the Key by default, but sometimes you need to sort the Key and also sort the Value. In this case, you need to use the second sort.

Mapreduce secondary sorting analysis

We divide the secondary order into the following stages.

Map start phase

In the Map phase, the InputFormat defined by job.seTInputFormatClass() is used to split the input dataset into small data blocks, and InputFormat provides a RecordReader implementation. Here we use TexTInputFormat, which provides a RecordReader that takes the line number of the text as the Key and the text of this line as the Value. This is why the input of the custom Mapper is "LongWritable, Text". Then call the map method of the custom Mapper, and input the "LongWritable, Text" key-value pairs to the mapper's map method.

The final stage of Map

At the end of the Map phase, the output of the Mapper is partitioned by calling job.setParTITIonerClass(), and each partition is mapped to a Reducer. The key comparison function class sorting set by job.setSortComparatorClass() is called in each partition. As you can see, this is itself a secondary sort. If the Key comparison function class is not set by job.setSortComparatorClass(), then the compareTo() method implemented by Key is used.

Reduce stage

In the Reduce phase, after the reduce() method accepts all map output mapped to this Reduce, it also calls the Key comparison function class set by the job.setSortComparatorClass() method to sort all the data. Then start constructing a Value iterator corresponding to the Key. In this case, you need to use grouping, use the job.setGroupingComparatorClass() method to set the grouping function class. As long as the two keys of this comparator are the same, they belong to the same group, their Value is placed in a Value iterator, and the Key of this iterator uses the first Key of all Keys belonging to the same group. The last step is to enter the Reducer's reduce() method. The input to the reduce() method is all Key and its Value iterator. Also note that the input and output types must be the same as those declared in the custom Reducer.

Mapreduce secondary sorting _ mapreduce secondary sorting principle ,

Indoor Gaming Projector

Shenzhen Happybate Trading Co.,LTD , https://www.szhappybateprojectors.com