1

Map- & gt; Map- & gt; Reduction- & gt; Reduction- & gt; Final re...

 3 years ago
source link: https://www.codesd.com/item/map-map-reduction-reduction-final-release.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Map- & gt; Map- & gt; Reduction- & gt; Reduction- & gt; Final release

advertisements

Recently I read a paper that proposed algorithm for mining Maximum Contiguous patterns from DNA data. The proposed method, which sounds pretty interesting, used the following model of MapReduce. map->map->reduce->reduce. That is, First map phase is executed and its output is input to the second phase map. The second phase map's output is input to the first phase reduce. The output of the first phase reduce is input to the second phase reduce and finally the results are flushed into HDFS. Although it seems like an interesting method, the paper didn't mention how they have implemented it. My question is, how do you implement this sort of MapReduce chaining?


I think there are two methods to deal with your case:

  1. Integrate the two maps function code into one map task with two phase. Reduce task using the same method as map.

  2. Divide the map-map-reduce-reduce progress into two jobs: two maps as first Hadoop job after converting the second map task type to reduce task; two reduces as second Hadoop job after converting first reduce task to map. May be you could use Oozie to deal with Hadoop workflow if submit some hadoop jobs depending on others.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK