【Apache Kafka3.2】KafkaProducer发送消息源码分析

KafkaProducer发送消息流程

ProducerInterceptors对消息进行拦截；
Serializer对消息的key和value进行序列化；
Partitioner为消息选择合适的Partition；
RecordAccumulator收集消息，实现批量发送；
Sender从RecordAccumulator获取消息；
构造ClientRequest；
将ClientRequest交给NetworkClient准备发送；
NetworkClient将请求放入kafkaChannel缓存；
执行网络IO，发哦送请求；
收到响应，调用ClientRequest的回调函数；
调用RecordBatch的回调函数，最终调用每个消息上注册的回调函数。

KafkaProducer中的重要字段

    // 此生产者的位移标识
private final String clientId;
// Visible for testing
// 整个Kafka集群的元数据
final Metrics metrics;
private final KafkaProducerMetrics producerMetrics;

// 分区选择器，根据一定的策略，将消息路由到合适的分区
private final Partitioner partitioner;
// 消息的最大长度，这个长度包含了消息头、序列化后的key和序列化后的value的长度。
private final int maxRequestSize;
// 发送单个消息的缓冲区大小
private final long totalMemorySize;
private final ProducerMetadata metadata;
// 用于收集并缓存消息，等待Sender线程发送。
private final RecordAccumulator accumulator;
// 发送消息的Sender任务，实现了Runnable接口，在ioThread线程中执行。
private final Sender sender;
// 执行Sender任务发送消息的线程，称为“Sender线程”。
private final Thread ioThread;
// 压缩算法
private final CompressionType compressionType;
private final Sensor errors;
private final Time time;
// 序列化器
private final Serializer<K> keySerializer;
private final Serializer<V> valueSerializer;
// 配置对象，使用反射初始化KafkaProducer配置的相对对象
private final ProducerConfig producerConfig;
// 等待更新Kafka集群元数据的最大时长
private final long maxBlockTimeMs;
// ProducerInterceptor集合，在消息发送之前对其进行拦截或修改，也可用于用户callback，对ack响应进行预处理
private final ProducerInterceptors<K, V> interceptors;
private final ApiVersions apiVersions;
private final TransactionManager transactionManager;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

关键步骤

    public Future<RecordMetadata> send(ProducerRecord<K, V> record,Callback callback){
        // intercept the record, which can be potentially modified; this method does not throw exceptions
        // 1、通过拦截器对消息进行拦截或者修改
        ProducerRecord<K, V> interceptedRecord=this.interceptors.onSend(record);
        return doSend(interceptedRecord,callback);
        }


    private Future<RecordMetadata> doSend(ProducerRecord<K, V> record,Callback callback){
        2、获取Kafka集群的信息，底层会唤醒Send线程更新Metadata中保存的Kafka集群元数据
        clusterAndWaitTime=waitOnMetadata(record.topic(),record.partition(),nowMs,maxBlockTimeMs);

        3、调用Serializer.serialize()方法序列化消息的key和value。
        serializedKey=keySerializer.serialize(record.topic(),record.headers(),record.key());
        serializedValue=valueSerializer.serialize(record.topic(),record.headers(),record.value());

        4、调用partition()为消息选择合适的分区
        int partition=partition(record,serializedKey,serializedValue,cluster);
        
        5、调用RecordAccumulator.append()方法，将消息追加到RecordAccumulator中
        RecordAccumulator.RecordAppendResult result=accumulator.append(tp,timestamp,serializedKey,
        serializedValue,headers,interceptCallback,remainingWaitMs,true,nowMs);

        6、 唤醒Sender线程，由Sender线程将RecordAccumulator中缓存的消息发送出去
        this.sender.wakeup();

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

ProducerInterceptors

Kafka集群元数据

元数据记录了：某个Topic中有哪几个分区，每个分区的Leader副本分配在哪个节点上，Follower副本分配在哪个节点上，哪些副本在ISR集合中以及这些节点的网络地址、端口。

使用Node、TopicPartition、PartitionInfo这三个类封装了Kafka集群的相关数据。

Node表示集群的一个节点，Node记录这个节点的host、ip、port等信息；
TopicPartition标识某个Topic的一个分区；
PartitionInfo：标识一个分区的详细信息。

public class PartitionInfo {
    // topic名称
    private final String topic;
    private final int partition;
    // leader副本所在节点
    private final Node leader;
    // 记录全部副本所在的节点信息
    private final Node[] replicas;
    // 记录ISR集合中所有副本所在的节点信息
    private final Node[] inSyncReplicas;
    // 离线副本节点信息
    private final Node[] offlineReplicas;
}

public final class TopicPartition implements Serializable {
    private static final long serialVersionUID = -613627415771699627L;

    private int hash = 0;
    // 分区编号
    private final int partition;
    // topic名称
    private final String topic;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

这些元数据保存在Cluster这个类中

public final class Cluster {

    private final boolean isBootstrapConfigured;
    // Kafka集群中节点信息列表
    private final List<Node> nodes;
    private final Set<String> unauthorizedTopics;
    private final Set<String> invalidTopics;
    private final Set<String> internalTopics;
    private final Node controller;
    // 记录了TopicPartition与PartitionInfo的映射关系
    private final Map<TopicPartition, PartitionInfo> partitionsByTopicPartition;
    // 记录了Topic名称和PartitionInfo的映射关系
    private final Map<String, List<PartitionInfo>> partitionsByTopic;
    // Topic名称和PartitionInfo的映射关系，这里的存放的分区必须是有Leader副本的Partition
    private final Map<String, List<PartitionInfo>> availablePartitionsByTopic;
    // 记录了Node与PartitionInfo的映射关系，可以按照节点Id查询其上分布的全部分区的详细信息
    private final Map<Integer, List<PartitionInfo>> partitionsByNode;
    // nodesById BrokerId与Node节点之间的对应关系，方便按照BrokerId进行索引
    private final Map<Integer, Node> nodesById;
    private final ClusterResource clusterResource;
    private final Map<String, Uuid> topicIds;
    private final Map<Uuid, String> topicNames;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

private final 修饰且只提供了查询方法，这就保证了这四个类的对象都是不可变性对象，也就成为了线程安全对象

Metadata中封装了Cluster对象，并保存了Cluster数据的最后更新时间、版本号、是否需要更新等信息。

public class Metadata implements Closeable {
    private final Logger log;
    // 设定两次重试之间的时间间隔，避免无效的频繁重试
    private final long refreshBackoffMs;
    // 如果在这个时间内元数据没有被更新的话会被强制更新
    private final long metadataExpireMs;
    // 更新版本号，没更新成功一次，version自增1，主要用于判断metadata是否更新
    private int updateVersion;  // bumped on every metadata response
    // 请求版本号，每发送一次请求，version自增1
    private int requestVersion; // bumped on every new topic addition
    // 上一次更新时间
    private long lastRefreshMs;
    private long lastSuccessfulRefreshMs;
    private KafkaException fatalException;
    // 无效的topic集合
    private Set<String> invalidTopics;
    // 无权限的topic集合
    private Set<String> unauthorizedTopics;
    private MetadataCache cache = MetadataCache.empty();
    // 是否全部主题更新，对生产者来说，全部主题是指最近发送的主题集合。
    private boolean needFullUpdate;
    // 是否部分主题更新，对生产者来说，部分主题是指新发送的主题集合。
    private boolean needPartialUpdate;
    // 会收到metadata update的监听器列表
    private final ClusterResourceListeners clusterResourceListeners;
    private boolean isClosed;
    // 存储Partition最近一次leaderEpoch
    private final Map<TopicPartition, Integer> lastSeenLeaderEpochs;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

当Sender线程运行时会更新metadata记录的集群元数据。

    private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long nowMs, long maxWaitMs) throws InterruptedException {
        // 获取缓存中的cluster信息
        Cluster cluster = metadata.fetch();
        // 判断给定topic在当前集群是否合法
        if (cluster.invalidTopics().contains(topic))
            throw new InvalidTopicException(topic);

        // 将topic添加到元数据的topics列表中，并将过期时间重置为-1，如果topics列表中不存在当前topic
        // 则强制更新元数据将requestVersion+1，同时将lastRefreshMs设置为0，将能needPartialUpdate设置为true
        metadata.add(topic, nowMs);

        // 获取发送topic的分区数
        Integer partitionsCount = cluster.partitionCountForTopic(topic);
        // 如果给定分区合法则返回缓存中的cluster信息
        if (partitionsCount != null && (partition == null || partition < partitionsCount))
            return new ClusterAndWaitTime(cluster, 0);

        // 到达这一步说明给定分区数不合法，缓存中的cluster信息已经过期，需要去更新
        // 一直等待metadata更新，除非metadata中含有我们所需的topic和Partition信息，或者超过最大等待时间
        long remainingWaitMs = maxWaitMs; // 更新最大等待时间
        long elapsed = 0; // 更新过程中已经消耗的时间
        do {
            if (partition != null) {
                log.trace("Requesting metadata update for partition {} of topic {}.", partition, topic);
            } else {
                log.trace("Requesting metadata update for topic {}.", topic);
            }
            metadata.add(topic, nowMs + elapsed);
            // 如果newTopics中存在该topic则针对newTopics标记部分更新并返回版本，否则全量更新返回版本
            int version = metadata.requestUpdateForTopic(topic);
            // 唤醒sender线程，sender线程优惠唤醒NetworkClient线程，并发送updateMetadataRequest请求
            sender.wakeup();
            try {
                // 一直等待更新metadata 直到当前的updateversion大于上一次的updateVersion或者timeout(等待过程中会不断获取updateVersion)
                metadata.awaitUpdate(version, remainingWaitMs);
            } catch (TimeoutException ex) {
                // Rethrow with original maxWaitMs to prevent logging exception with remainingWaitMs
                throw new TimeoutException(
                        String.format("Topic %s not present in metadata after %d ms.",
                                topic, maxWaitMs));
            }
            // 从缓存中获取最新的cluster信息
            cluster = metadata.fetch();
            elapsed = time.milliseconds() - nowMs;
            // 判断是否超时
            if (elapsed >= maxWaitMs) {
                throw new TimeoutException(partitionsCount == null ?
                        String.format("Topic %s not present in metadata after %d ms.",
                                topic, maxWaitMs) :
                        String.format("Partition %d of topic %s with partition count %d is not present in metadata after %d ms.",
                                partition, topic, partitionsCount, maxWaitMs));
            }
            metadata.maybeThrowExceptionForTopic(topic);
            remainingWaitMs = maxWaitMs - elapsed;
            partitionsCount = cluster.partitionCountForTopic(topic); // 重新获取Partition数
        } while (partitionsCount == null || (partition != null && partition >= partitionsCount));

        return new ClusterAndWaitTime(cluster, elapsed);
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

Serializer＆Deserializer

Partitioner

在KafkaProducer.partition()中，优先根据ProducerRecord中Partition字段指定的序号选择分区，如果ProducerRecord.partition字段没有明确指定分区编号，则通过Partitioner.partition()方法选择Partition。

DefaultPartitioner.partition方法负责在ProduceRecord中没有明确指定分区编号的时候，为其选择合适的分区：如果消息没有key则根据StickyPartitionCache.partition()黏性分区策略计算分区编号(黏性分区会随机地选择另一个分区并会尽可能坚持使用该分区)。如果消息有key的话则对key进行hash，然后与分区数量取模，来确定key所在的分区达到负载均衡。

RecordAccumulator分析

KafkaProducer可以有同步和异步两种方式发送消息，其实两者的底层实现相同，都是通过异步方式实现的。主线程调用KafkaProducer.send()方法发送消息的时候，先将消息放到 RecordAccumulator中暂存，然后主线程就可以从send()方法中返回了，此时消息并没有真正地发送给Kafka，而是缓存在了RecordAccumulator中。之后，业务线程通过KafkaProducer.send()方法不断向RecordAccumulator追加消息，当达到一定的条件，会唤醒Sender线程发送RecordAccumulator中的消息。

进入RecordAccumulator类中，可以看到它有很多的属性字段，其中batches这个字段需要引起我们的注意，它是一个以TopicPartition作为key，Deque作为value的ConcurrentMap，TopicPartition存储了topic及partition信息，能够标记消息属于哪个主题和应该发往哪个分区；Deque是一个双端队列,里面存放的是ProducerBatch对象，ProducerBatch用于存储一批将要被发送的消息记录；ProducerBatch通过MemoryRecordsBuilder对象拥有一个DataOutputStream对象的引用，这里就是我们消息存放的最终归宿，根据MemoryRecordsBuilder构造方法的源码可知DataOutputStream里面持有ByteBufferOutputStream，这是一个缓存buffer，所以往DataOutputStream里面写消息数据，就是往缓存里面写消息数据。

    public RecordAppendResult append(TopicPartition tp,
                                     long timestamp,
                                     byte[] key,
                                     byte[] value,
                                     Header[] headers,
                                     Callback callback,
                                     long maxTimeToBlock,
                                     boolean abortOnNewBatch,
                                     long nowMs) throws InterruptedException {
        try {
            // check if we have an in-progress batch
            // 1、根据分区信息找到应该插入到哪个队列，如果没有则创建
            Deque<ProducerBatch> dq = getOrCreateDeque(tp);
            synchronized (dq) {  // 加锁
                if (closed)
                    throw new KafkaException("Producer closed while send in progress");
                // 2、尝试向队列中最后一个ProducerBatch追加Record，如果追加成功则直接返回
                RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq, nowMs);
                if (appendResult != null)
                    return appendResult;
            }      // 释放锁

            // 3、计算一个批次的大小，取消息大小和批次大小的最大值，根据消息大小设置批次大小
            int size = Math.max(this.batchSize, AbstractRecords.estimateSizeInBytesUpperBound(maxUsableMagic, compression, key, value, headers));

            // 4、根据批次大小取内存池分配内存，从BufferPool中申请新空间
            buffer = free.allocate(size, maxTimeToBlock);

            synchronized (dq) {
                // 4、对队列加锁后，再次尝试追加Record
                RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq, nowMs);
                if (appendResult != null) {
                    return appendResult;
                }
                // 5、再次追加失败，根据申请的内存大小重新创建一个新的批次出来
                MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
                ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, nowMs);
                // 6、向新创建的批次中追加Record
                FutureRecordMetadata future = Objects.requireNonNull(batch.tryAppend(timestamp, key, value, headers,
                        callback, nowMs));
                // 7、将新的批次追加到队列末尾
                dq.addLast(batch);
                incomplete.add(batch);

                return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true, false);
            }
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

两次加锁重试的意义
主要是因为在向BufferPool申请新ByteBuffer的时候，可能会导致阻塞。我们假设在一个synchronized块中完成上面所有追加操作，有下面的场景：线程1发送的消息比较大，需要向BufferPool申请新空间，而此时BufferPool空间不足，线程1在BufferPool上等待，此时它依然持有对应Deque的锁；线程2发送的消息较小，
Deque最后一个RecordBatch剩余空间够用，但是由于线程1未释放Deque的锁，所以也需要一起等待。若线程2这样的线程较多，就会造成很多不必要的线程阻塞，降低了吞吐量。

    public FutureRecordMetadata tryAppend(long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, long now) {
        // 1、检查是否有空间继续容纳新的Record
        if (!recordsBuilder.hasRoomFor(timestamp, key, value, headers)) {
            return null;
        } else {
            // 2、 通过压缩器Compressor将Record写入ByteBuffer中
            this.recordsBuilder.append(timestamp, key, value, headers);
            // 3、更新Record最大长度记录
            this.maxRecordSize = Math.max(this.maxRecordSize, AbstractRecords.estimateSizeInBytesUpperBound(magic(),
                    recordsBuilder.compressionType(), key, value, headers));
            // 4、更新lastAppendTime
            this.lastAppendTime = now;
            // 5、同一个RecordBatch中的Record共享同一个result，result用于sender现成控制RecordBatch中的Record是否被成功提交
            FutureRecordMetadata future = new FutureRecordMetadata(this.produceFuture, this.recordCount,
                                                                   timestamp,
                                                                   key == null ? -1 : key.length,
                                                                   value == null ? -1 : value.length,
                                                                   Time.SYSTEM);
            // we have to keep every future returned to the users in case the batch needs to be
            // split to several new batches and resent.
            thunks.add(new Thunk(callback, future));
            this.recordCount++;
            return future;
        }
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

唤醒send线程的条件为batch满了或者有新的batch创建

MemoryRecordsBuilder

每个MemoryRecordsBuilder底层依赖一个ByteBuffer完成message的存储，我们后面会深入介绍KafkaProducer对ByteBuffer的管理。在MemoryRecordsBuilder中会将ByteBuffer封装成ByteBufferOutputStream，ByteBufferOutputStream实现OutputStream，这样我们就可以按照流的方式写入数据了。同时，ByteBufferOutputStream提供了自动扩容底层ByteBuffer的能力。

Sender分析

Sender线程本身是一个单线程。其主要完成的工作就是将消息添加器中添加的消息块分节点组装到网络发送客服端中，等待发送。

    private long sendProducerData(long now) {
        // 1、从Metadata获取kafka集群元数据
        Cluster cluster = metadata.fetch();
        // 2、根据RecordAccumulator的缓存情况，选出可以向哪些Node节点发送消息，返回ReadyCheckResult
        RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);

        // 3、存在不知道的副本，强制更新元数据
        if (!result.unknownLeaderTopics.isEmpty()) {
            for (String topic : result.unknownLeaderTopics)
                this.metadata.add(topic, now);
            this.metadata.requestUpdate();
        }
        Iterator<Node> iter = result.readyNodes.iterator();
        long notReadyTimeout = Long.MAX_VALUE;
        // 4、检查节点网络IO方面是否符合发送消息的条件，不符合条件的Node将会从readyNodes集合中删除。
        while (iter.hasNext()) {
            Node node = iter.next();
            if (!this.client.ready(node, now)) {
                iter.remove();
                notReadyTimeout = Math.min(notReadyTimeout, this.client.pollDelayMs(node, now));
            }
        }

        // 5、按照broker节点进行分组
        Map<Integer, List<ProducerBatch>> batches = this.accumulator.drain(cluster, result.readyNodes, this.maxRequestSize, now);
        // 添加到已发送等待响应的队列中，InflightBatchs 这个是网络发送端的核心对象
        addToInflightBatches(batches);

        // 消息确保有序，实际上是将tp添加到mute的set集合中，记录该tp已经有消息正在发送，为了保证有序，该tp暂时不能再次发送
        // 在ready函数中会去掉mute集合中存在的tp，在接收到服务端响应之后会从集合中去掉
        if (guaranteeMessageOrder) {
        // Mute all the partitions drained
        for (List<ProducerBatch> batchList : batches.values()) {
        for (ProducerBatch batch : batchList)
        this.accumulator.mutePartition(batch.topicPartition);
        }
        }
        
        // 6、删除超出规定交付时长的RecordBatch(过期的) 释放ByteBuffer空间
        List<ProducerBatch> expiredInflightBatches = getExpiredInflightBatches(now);
        List<ProducerBatch> expiredBatches = this.accumulator.expiredBatches(now);
        expiredBatches.addAll(expiredInflightBatches);
        // 7、发送批次
        sendProduceRequests(batches, now);
        return pollTimeout;
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

简单概述发送流程，主要有一下几步:

剔除不能发送broker对应节点的消息；
按照broker节点进行消息分组；
将分组的消息一次创建发送请求，等待网络客服端进行发送。

NetworkClient

调用Sender.createProduceRequests()方法将待发送的消息封装成ClientRequest;
调用NetWorkClient.send()方法，将ClientRequest写入KafkaChannel的send字段;
调用NetWorkClient.poll()方法，将KafkaChannel.send字段中保存的ClientRequest发送出去，同时，还会处理服务端发回的响应、
处理超时的请求、调用用户自定义Callback等.

在sender函数中有连个地方调用了NetworkClient的read、send和poll函数，Read是判断该节点是否连接就绪，可以发送消息，send函数主要是往
channel中添加数据，poll是将数据发送并将响应获取。

ready

整个客服端的创建是由Selector完成的，在ready函数中判断是否需要初始化连接，这也就意味着kafka的生产者客服端并不是在启动的时候就去连接服务端的，而是在发送消息的时候，需要连接的时候再去连接。如果我们想要保住我们服务在启动的时候kafka服务是正常运转的，一个有效的解决方案是创建一个测试的topic，在启动的时候往这个topic发送消息，当发送成功的时候服务正常启动，否则服务启动失败，强制我们检查kafka服务。

    @Override
    public boolean ready(Node node, long now) {
        if (node.isEmpty())
            throw new IllegalArgumentException("Cannot connect to empty node " + node);

        if (isReady(node, now))
            return true;

        // 如果该node上没有连接或者连接断开或超时，创建新的连接
        if (connectionStates.canConnect(node.idString(), now))
            // if we are interested in sending to a node and we don't have a connection to it, initiate one
            // 创建连接
            initiateConnect(node, now);

        return false;
    }

    @Override
    public boolean isReady(Node node, long now) {
        // 元数据不需要立即更新
        return !metadataUpdater.isUpdateDue(now) && canSendRequest(node.idString(), now);
        }
        
        // 连接状态和channel都ok
    private boolean canSendRequest(String node, long now) {
        return connectionStates.isReady(node, now) && selector.isChannelReady(node) &&
        inFlightRequests.canSendMore(node);
        }
        
    public boolean canSendMore(String node) {
        Deque<NetworkClient.InFlightRequest> queue = requests.get(node);
        // 没有待回响应的请求，并且请求已经发送到服务端，而且等待数没有达到最大值
        return queue == null || queue.isEmpty() ||
        (queue.peekFirst().send.completed() && queue.size() < this.maxInFlightRequestsPerConnection);
        }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

isReady函数中判断了发送的前提条件：

不需要立即更新元数据；
Channel是打开的，保证可以往里面写buffer；
Socket连接正常，保证可发送；
该节点没有待发送的请求

connect

当ready函数发现需要初始化连接的时候，会立即进行连接，连接过程设置socket非阻塞，无延时等配置进行连接。

    @Override
    public void connect(String id, InetSocketAddress address, int sendBufferSize, int receiveBufferSize) throws IOException {
        // 确保没有注册节点
        ensureNotRegistered(id);
        SocketChannel socketChannel = SocketChannel.open();
        SelectionKey key = null;
        try {
            // 配置channel的基本信息，tcpDelay、blocking、keeplive 发送接收大小
            configureSocketChannel(socketChannel, sendBufferSize, receiveBufferSize);
            // 开始建立连接
            boolean connected = doConnect(socketChannel, address);
            // 连接成功的key
            key = registerChannel(id, socketChannel, SelectionKey.OP_CONNECT);

            // 连接成功，设置key
            if (connected) {
                // OP_CONNECT won't trigger for immediately connected channels
                log.debug("Immediately connected to node {}", id);
                immediatelyConnectedKeys.add(key);
                key.interestOps(0);
            }
        } catch (IOException | RuntimeException e) {
            // 异常处理 移除id、channel关闭
            if (key != null)
                immediatelyConnectedKeys.remove(key);
            channels.remove(id);
            socketChannel.close();
            throw e;
        }
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

InFlightRequest是NetworkClient的一个内部静态类，它的作用是用来存储那些已经发送或者正在发送但是还没有收到响应的请求。
InFlightRequest对象在NetworkClient中主要有两个作用：
1、存储发送出去的请求，以便于后续的响应回来可以进行相应的callback处理；
2、保证一个服务端只有一个消息正在发送。

send

    private void doSend(ClientRequest clientRequest, boolean isInternalRequest, long now, AbstractRequest request) {
        String destination = clientRequest.destination();
        RequestHeader header = clientRequest.makeHeader(request.version());
        if (log.isDebugEnabled()) {
            log.debug("Sending {} request with header {} and timeout {} to node {}: {}",
                clientRequest.apiKey(), header, clientRequest.requestTimeoutMs(), destination, request);
        }
        // 创建send对象 包含消息内容
        Send send = request.toSend(header);
        // 添加到队列中，这个用来判断是否有向该节点发送过
        InFlightRequest inFlightRequest = new InFlightRequest(
                clientRequest,
                header,
                isInternalRequest,
                request,
                send,
                now);
        this.inFlightRequests.add(inFlightRequest);
        // 调用selector 的send函数添加消息发送的内容
        selector.send(new NetworkSend(clientRequest.destination(), send));
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

整个send的流程就是两件事情：创建send对象并通过selector往channel中添加；创建inflightrequest对象，等待响应回来。

poll

Poll函数是sender线程在处理完send方法之后调用networkclient的一个方法，该方法的主要作用是将消息发送出去，并且从网络客服端获取的结果进行相应的处理。

    @Override
    public List<ClientResponse> poll(long timeout, long now) {
        ensureActive();
        // 如果有无法发送（可能是版本或者网络）的消息，立即处理，不进行下面的poll
        if (!abortedSends.isEmpty()) {
            // If there are aborted sends because of unsupported version exceptions or disconnects,
            // handle them immediately without waiting for Selector#poll.
            List<ClientResponse> responses = new ArrayList<>();
            handleAbortedSends(responses);
            completeResponses(responses);
            return responses;
        }
        // 有必要的话，更新元数据
        long metadataTimeout = metadataUpdater.maybeUpdate(now);
        try {
            // 使用selector进行poll
            this.selector.poll(Utils.min(timeout, metadataTimeout, defaultRequestTimeoutMs));
        } catch (IOException e) {
            log.error("Unexpected error during I/O", e);
        }

        // process completed actions
        long updatedNow = this.time.milliseconds();
        // 处理完成发送的操作
        List<ClientResponse> responses = new ArrayList<>();
        handleCompletedSends(responses, updatedNow);
        //处理完成接收的操作
        handleCompletedReceives(responses, updatedNow);
        // 处理断开连接
        handleDisconnections(responses, updatedNow);
        // 处理连接中
        handleConnections();
        // 处理初始化api
        handleInitiateApiVersionRequests(updatedNow);
        // 处理连接超时
        handleTimedOutConnections(responses, updatedNow);
        // 处理请求超时
        handleTimedOutRequests(responses, updatedNow);
        // 将上面的response进行回调
        completeResponses(responses);

        return responses;
    }

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

Selector

这个selector对象和jdk自带的selector对象不同，这个是kafka自写的一个selector类，它主要工作是监听网络读写事件。在kafkaProducer创建的时候，会new一个selector对象放入networkClient中.

在这里插入图片描述
selector的send过程就是设置KafkaChannel的send对象，poll流程的核心是读写操作。

send函数

   public void send(NetworkSend send) {
        String connectionId = send.destinationId();
        KafkaChannel channel = openOrClosingChannelOrFail(connectionId);
        if (closingChannels.containsKey(connectionId)) {
            // ensure notification via `disconnected`, leave channel in the state in which closing was triggered
            // 是否正在关闭
            this.failedSends.add(connectionId);
        } else {
            try {
                // 给channel设值
                channel.setSend(send);
            } catch (Exception e) {
                // update the state for consistency, the channel will be discarded after `close`
                channel.state(ChannelState.FAILED_SEND);
                // ensure notification via `disconnected` when `failedSends` are processed in the next poll
                this.failedSends.add(connectionId);
                close(channel, CloseMode.DISCARD_NO_NOTIFY);
                if (!(e instanceof CancelledKeyException)) {
                    log.error("Unexpected exception during send, closing connection {} and rethrowing exception {}",
                            connectionId, e);
                    throw e;
                }
            }
        }
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

poll函数

Poll的流程就相对复杂，poll要做的关注各类事件消息。Select的核心就是不断轮询获取准备好的事件进行相应的操作。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KZMsbkQs-1663314994581)(img_2.png)]

    @Override
    public void poll(long timeout) throws IOException {
        if (timeout < 0)
            throw new IllegalArgumentException("timeout should be >= 0");

        boolean madeReadProgressLastCall = madeReadProgressLastPoll;
        // 每次poll先清理上一次的结果
        clear();

        boolean dataInBuffers = !keysWithBufferedRead.isEmpty();
        // 如果有刚连上或者数据需要读取的话 设值select等待时间为0
        if (!immediatelyConnectedKeys.isEmpty() || (madeReadProgressLastCall && dataInBuffers))
            timeout = 0;

        // 之前内存不足，现在内存ok 重置标志位
        if (!memoryPool.isOutOfMemory() && outOfMemory) {
            //we have recovered from memory pressure. unmute any channel not explicitly muted for other reasons
            log.trace("Broker no longer low on memory - unmuting incoming sockets");
            for (KafkaChannel channel : channels.values()) {
                if (channel.isInMutableState() && !explicitlyMutedChannels.contains(channel)) {
                    channel.maybeUnmute();
                }
            }
            outOfMemory = false;
        }

        /* check ready keys */
        long startSelect = time.nanoseconds();
        // 获取准备好的事件数量
        int numReadyKeys = select(timeout);
        long endSelect = time.nanoseconds();
        this.sensors.selectTime.record(endSelect - startSelect, time.milliseconds());
        // 如果有准备好的事件或者刚连上或者有数据需要读取
        if (numReadyKeys > 0 || !immediatelyConnectedKeys.isEmpty() || dataInBuffers) {
            // 获取准备好的事件
            Set<SelectionKey> readyKeys = this.nioSelector.selectedKeys();

            // Poll from channels that have buffered data (but nothing more from the underlying socket)
            // 读取buffer中上次没有读完的数据
            if (dataInBuffers) {
                keysWithBufferedRead.removeAll(readyKeys); //so no channel gets polled twice
                Set<SelectionKey> toPoll = keysWithBufferedRead;
                keysWithBufferedRead = new HashSet<>(); //poll() calls will repopulate if needed
                pollSelectionKeys(toPoll, false, endSelect);
            }

            // Poll from channels where the underlying socket has more data
            // 操作目前已经准备好的key
            pollSelectionKeys(readyKeys, false, endSelect);
            // Clear all selected keys so that they are included in the ready count for the next select
            // 清除已操作的key
            readyKeys.clear();
            // 处理连接上的key操作，这个是在connect函数中添加的事件
            pollSelectionKeys(immediatelyConnectedKeys, true, endSelect);
            immediatelyConnectedKeys.clear();
        } else {
            madeReadProgressLastPoll = true; //no work is also "progress"
        }

        long endIo = time.nanoseconds();
        this.sensors.ioTime.record(endIo - endSelect, time.milliseconds());
        // 延迟关闭处理完成，可以正式关闭了
        // Close channels that were delayed and are now ready to be closed
        completeDelayedChannelClose(endIo);

        // 关闭很早的连接
        // we use the time at the end of select to ensure that we don't close any connections that
        // have just been processed in pollSelectionKeys
        maybeCloseOldestConnection(endSelect);
    }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

处理三种准备事件：上次还未读完的数据、准备好的事件、连接成功的事件

相关阅读:
内容安全是什么？（如何严控内容安全）
机器学习——Logistic Regression
第九章 APP项目测试（3）测试工具
 【JavaScript】Promise和async/await的区别
 洛谷 P3387 【模板】缩点（scc 缩点）
在一张 24 GB 的消费级显卡上用 RLHF 微调 20B LLMs
MySQL8（增删改）
docker从入门到熟悉
 Java反射原理和实际用法
 chrome 浏览器 f12 如何查看 websocket 消息？
原文地址：https://blog.csdn.net/mashaokang1314/article/details/126892336