go-zero map reduce的代码实现

项目地址：GitHub - zeromicro/go-zero: A cloud-native Go microservices framework with cli tool for productivity.

一背景

在微服务中开发中，api网关扮演对外提供restful api的角色，而api的数据往往会依赖其他服务，复杂的api更是会依赖多个服务。
虽然单个被依赖服务的耗时一般都比较低，但如果多个服务串行依赖的话那么整个api的耗时将会大大增加。

那么通过什么手段来优化呢？我们首先想到的是通过并发来的方式来处理依赖，这样就能降低整个依赖的耗时，Go基础库中为我们提供了 WaitGroup 工具用来进行并发控制，但实际业务场景中多个依赖如果有一个出错我们期望能立即返回而不是等所有依赖都执行完再返回结果，而且WaitGroup中对变量的赋值往往需要加锁，每个依赖函数都需要添加Add和Done对于新手来说比较容易出错

基于以上的背景，go-zero框架中为我们提供了并发处理工具MapReduce，该工具开箱即用。

MapReduce是Google提出的一个软件架构，用于大规模数据集的并行运算，go-zero中的MapReduce工具正是借鉴了这种架构思想

go-zero框架中的MapReduce工具主要用来对批量数据进行并发的处理，以此来提升服务的性能

二核心代码

2.1 基础概念

generate 生产数据


type (
	// GenerateFunc is used to let callers send elements into source.
	GenerateFunc func(source chan<- interface{})
)

mapper 对generate生产的数据进行处理


 
type (
	// MapFunc is used to do element processing and write the output to writer.
	MapFunc func(item interface{}, writer Writer)
	// MapperFunc is used to do element processing and write the output to writer,
	// use cancel func to cancel the processing.
	MapperFunc func(item interface{}, writer Writer, cancel func(error))
)

reducer 对mapper处理后的数据做聚合返回


 
type (
	// ReducerFunc is used to reduce all the mapping output and write to writer,
	// use cancel func to cancel the processing.
	ReducerFunc func(pipe <-chan interface{}, writer Writer, cancel func(error))
	// VoidReducerFunc is used to reduce all the mapping output, but no output.
	// Use cancel func to cancel the processing.
	VoidReducerFunc func(pipe <-chan interface{}, cancel func(error))
)

souece channel
无缓冲的channel，用于 generte和mapper通信。
generate生产的数据会写入source channel，mapper 则读取source channle的数据进行处理。
collector channel
有缓冲区的channe，缓冲区的长度是option.worker的数量。
mapper处理完成后的数据写入collector channel，reduce读取collector数据进行处理。
output channel
无缓冲区channel ,用于记录reducer处理后最终数据。

2.2 代码

2.2.1 buildSource

buildSource方法通过执行我们自定义generate方法产生数据，并返回无缓冲的channel。
mapper从该channel中读取数据


func buildSource(generate GenerateFunc, panicChan *onceChan) chan interface{} {
	source := make(chan interface{})
	go func() {
		defer func() {
			if r := recover(); r != nil {
				panicChan.write(r)
			}
			close(source)
		}()
 
		generate(source)
	}()
 
	return source
}

2.2.2 executeMappers

主线程中调用executeMappers方法，executeMappers消费generate生产的数据，每一个item都会起一个goroutine单独处理，并将处理结果写入 collector。
mapper默认最大并发数为16，可以通过WithWorkers进行设置。


 
func executeMappers(mCtx mapperContext) {
	var wg sync.WaitGroup
	defer func() {
		wg.Wait()
		close(mCtx.collector)
		drain(mCtx.source)
	}()
 
	var failed int32
	pool := make(chan lang.PlaceholderType, mCtx.workers)
	writer := newGuardedWriter(mCtx.ctx, mCtx.collector, mCtx.doneChan)
	for atomic.LoadInt32(&failed) == 0 {
		select {
		case <-mCtx.ctx.Done():
			return
		case <-mCtx.doneChan:
			return
		case pool <- lang.Placeholder:
			item, ok := <-mCtx.source
			if !ok {
				<-pool
				return
			}
 
			wg.Add(1)
			go func() {
				defer func() {
					if r := recover(); r != nil {
						atomic.AddInt32(&failed, 1)
						mCtx.panicChan.write(r)
					}
					wg.Done()
					<-pool
				}()
 
				mCtx.mapper(item, writer)
			}()
		}
	}
}

2.2.3 reducer

reducer单线程消费collctor channel中数据。并将最终结果写入到output channel 返回。


	
 
finish := func() {
		closeOnce.Do(func() {
			close(done)
			close(output)
		})
}
// if done is closed, all mappers and reducer should stop processing
	done := make(chan lang.PlaceholderType)
go func() {
		defer func() {
			drain(collector)
			if r := recover(); r != nil {
				panicChan.write(r)
			}
			finish()
		}()
 
		reducer(collector, writer, cancel)
}()

2.2.4主线程


 
// MapReduceChan maps all elements from source, and reduce the output elements with given reducer.
func mapReduceWithPanicChan(source <-chan interface{}, panicChan *onceChan, mapper MapperFunc,
	reducer ReducerFunc, opts ...Option) (interface{}, error) {
	options := buildOptions(opts...)
	// output is used to write the final result
	output := make(chan interface{})
	defer func() {
		// reducer can only write once, if more, panic
		for range output {
			panic("more than one element written in reducer")
		}
	}()
 
	// collector is used to collect data from mapper, and consume in reducer
	collector := make(chan interface{}, options.workers)
	// if done is closed, all mappers and reducer should stop processing
	done := make(chan lang.PlaceholderType)
	writer := newGuardedWriter(options.ctx, output, done)
	var closeOnce sync.Once
	// use atomic.Value to avoid data race
	var retErr errorx.AtomicError
	finish := func() {
		closeOnce.Do(func() {
			close(done)
			close(output)
		})
	}
	cancel := once(func(err error) {
		if err != nil {
			retErr.Set(err)
		} else {
			retErr.Set(ErrCancelWithNil)
		}
 
		drain(source)
		finish()
	})
 
	go func() {
		defer func() {
			drain(collector)
			if r := recover(); r != nil {
				panicChan.write(r)
			}
			finish()
		}()
 
		reducer(collector, writer, cancel)
	}()
 
	go executeMappers(mapperContext{
		ctx: options.ctx,
		mapper: func(item interface{}, w Writer) {
			mapper(item, w, cancel)
		},
		source:    source,
		panicChan: panicChan,
		collector: collector,
		doneChan:  done,
		workers:   options.workers,
	})
 
	select {
	case <-options.ctx.Done():
		cancel(context.DeadlineExceeded)
		return nil, context.DeadlineExceeded
	case v := <-panicChan.channel:
		panic(v)
	case v, ok := <-output:
		if err := retErr.Load(); err != nil {
			return nil, err
		} else if ok {
			return v, nil
		} else {
			return nil, ErrReduceNoOutput
		}
	}
}

三使用示例

mapreduce提供的方法：

func Finish(fns ...func() error) error
处理固定数量的依赖，返回error，有一个error立即返回
func FinishVoid(fns ...func())
Finish方法功能类似，没有错误返回值
func ForEach(generate GenerateFunc, mapper ForEachFunc, opts ...Option)
func MapReduce(generate GenerateFunc, mapper MapperFunc, reducer ReducerFunc, opts ...Option) (interface{}, error)
func MapReduceChan(source <-chan interface{}, mapper MapperFunc, reducer ReducerFunc,opts ...Option) (interface{}, error)
func MapReduceVoid(generate GenerateFunc, mapper MapperFunc, reducer VoidReducerFunc, opts ...Option) error

流程图：

相关阅读:
微电网两阶段鲁棒优化问题（Matlab代码实现）
SpringBoot-25-整合持久层-通过Spring Boot starter 整合 Druid
Elasticsearch7.7的安装与启动
 JDBC连接池、JDBCTemplate
教育课堂小程序，三分钟打造专属小程序带完整搭建教程
 【Node.js】包与npm包
 Excel VSTO开发1-VSTO简介
 [附源码]Python计算机毕业设计SSM辽宁科技大学二手车交易平台（程序+LW)
http请求头中的Content-Type到底有什么用？
Flink-窗口概念以及窗口API使用
原文地址：https://blog.csdn.net/qq_16399991/article/details/125730714