0

MapReduce例子

 2 years ago
source link: https://zhangslob.github.io/2019/12/30/MapReduce%E4%BE%8B%E5%AD%90/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

MapReduce例子

发表于 2019-12-30

| 分类于 大数据

这是崔斯特的第一百一十篇原创文章

努力、奋斗

最近在学习《Hive编程指南》,尝试动手了第一个MapReduce案例,记录下。

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import java.util.StringTokenizer;
* @author zhujian on 2019/12/29.
public class WorldCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable ONE = new IntWritable(1);
private Text word = new Text();
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(value, ONE);
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
context.write(key, new IntWritable(sum));
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "world count");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);

分别继承MapperReducer这两个方法,重写自己的处理逻辑,最后设置map和reduce。

同样的功能用HQL来写会简单很多:

CREATE TABLE docs (line STRING);
LOAD DATA INPATH 'docs' OVERWRITE INTO TABLE docs;
CREATE TABLE world_counts AS
SELECT world, count(1) AS count
FROM (
SELECT explode(split(line, '\s')) AS word
FROM docs
GROUP BY word
ORDER BY word;

so simple


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK