MapReduce例子

发表于 2019-12-30

这是崔斯特的第一百一十篇原创文章

努力、奋斗

最近在学习《Hive编程指南》，尝试动手了第一个MapReduce案例，记录下。

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

import java.util.StringTokenizer;

* @author zhujian on 2019/12/29.

public class WorldCount {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {

private final static IntWritable ONE = new IntWritable(1);

private Text word = new Text();

@Override

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens()) {

word.set(tokenizer.nextToken());

context.write(value, ONE);

public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override

public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

context.write(key, new IntWritable(sum));

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = new Job(conf, "world count");

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

job.setMapperClass(Map.class);

job.setReducerClass(Reduce.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);

分别继承Mapper和Reducer这两个方法，重写自己的处理逻辑，最后设置map和reduce。

同样的功能用HQL来写会简单很多：

CREATE TABLE docs (line STRING);

LOAD DATA INPATH 'docs' OVERWRITE INTO TABLE docs;

CREATE TABLE world_counts AS

SELECT world, count(1) AS count

FROM (

SELECT explode(split(line, '\s')) AS word

FROM docs

GROUP BY word

ORDER BY word;

so simple

MapReduce例子

MapReduce例子

Recommend

从尾到头打印链表

Kafka 消费者 Java 实现

Leetcode-136-只出现一次的数字

Redis 和 I/O 多路复用

数组知识

AutoJs+mitmproxy App爬虫

Elevator Pitches for Testers

Golang Testing Parallel

getaddrinfo 中最长前缀匹配实现导致的DNS 负载均衡失效

Golang time 库里的一个矛盾实现

About Joyk