kafka客户端应用参数详解

一、基本客户端收发消息

Kafka提供了非常简单的客户端API。只需要引入一个Maven依赖即可：


<dependency>
   <groupId>org.apache.kafkagroupId>
   <artifactId>kafka_2.13artifactId>
   <version>3.4.0version>
  dependency>

1、消息发送者主流程

然后可以使用Kafka提供的Producer类，快速发送消息。


public class MyProducer {
    private static final String BOOTSTRAP_SERVERS = "worker1:9092,worker2:9092,worker3:9092";
    private static final String TOPIC = "disTopic";
 
    public static void main(String[] args) throws ExecutionException, InterruptedException {
        //PART1:设置发送者相关属性
        Properties props = new Properties();
        // 此处配置的是kafka的端口
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
        // 配置key的序列化类
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
        // 配置value的序列化类
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
 
        Producer producer = new KafkaProducer<>(props);
        CountDownLatch latch = new CountDownLatch(5);
        for(int i = 0; i < 5; i++) {
            //Part2:构建消息
            ProducerRecord record = new ProducerRecord<>(TOPIC, Integer.toString(i), "MyProducer" + i);
            //Part3:发送消息
            //单向发送：不关心服务端的应答。
            producer.send(record);
            System.out.println("message "+i+" sended");
            //同步发送：获取服务端应答消息前，会阻塞当前线程。
            RecordMetadata recordMetadata = producer.send(record).get();
            String topic = recordMetadata.topic();
            int partition = recordMetadata.partition();
            long offset = recordMetadata.offset();
            String message = recordMetadata.toString();
            System.out.println("message:["+ message+"] sended with topic:"+topic+"; partition:"+partition+ ";offset:"+offset);
            //异步发送：消息发送后不阻塞，服务端有应答后会触发回调函数
            producer.send(record, new Callback() {
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
                    if(null != e){
                        System.out.println("消息发送失败,"+e.getMessage());
                        e.printStackTrace();
                    }else{
                        String topic = recordMetadata.topic();
                        long offset = recordMetadata.offset();
                        String message = recordMetadata.toString();
                        System.out.println("message:["+ message+"] sended with topic:"+topic+";offset:"+offset);
                    }
                    latch.countDown();
                }
            });
        }
        //消息处理完才停止发送者。
        latch.await();
        producer.close();
    }
}

整体来说，构建Producer分为三个步骤：

设置Producer核心属性 ：Producer可选的属性都可以由ProducerConfig类管理。比如ProducerConfig.BOOTSTRAP_SERVERS_CONFIG属性，显然就是指发送者要将消息发到哪个Kafka集群上。这是每个Producer必选的属性。在ProducerConfig中，对于大部分比较重要的属性，都配置了对应的DOC属性进行描述。
构建消息：Kafka的消息是一个Key-Value结构的消息。其中，key和value都可以是任意对象类型。其中，key主要是用来进行Partition分区的，业务上更关心的是value。
使用Producer发送消息。：通常用到的就是单向发送、同步发送和异步发送者三种发送方式。

2、消息消费者主流程

接下来可以使用Kafka提供的Consumer类，快速消费消息。


public class MyConsumer {
    private static final String BOOTSTRAP_SERVERS = "worker1:9092,worker2:9092,worker3:9092";
    private static final String TOPIC = "disTopic";
 
    public static void main(String[] args) {
        //PART1:设置发送者相关属性
        Properties props = new Properties();
        //kafka地址
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
        //每个消费者要指定一个group
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
        //key序列化类
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
        //value序列化类
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
        Consumer consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(TOPIC));
        while (true) {
            //PART2:拉取消息
            // 100毫秒超时时间
            ConsumerRecords records = consumer.poll(Duration.ofNanos(100));
            //PART3:处理消息
            for (ConsumerRecord record : records) {
                System.out.println("offset = " + record.offset() + ";key = " + record.key() + "; value= " + record.value());
            }
            //提交offset，消息就不会重复推送。
            consumer.commitSync(); //同步提交，表示必须等到offset提交完毕，再去消费下一批数据。
//            consumer.commitAsync(); //异步提交，表示发送完提交offset请求后，就开始消费下一批数据了。不用等到Broker的确认。
        }
    }
}

整体来说，Consumer同样是分为三个步骤：

设置Consumer核心属性 ：可选的属性都可以由ConsumerConfig类管理。在这个类中，同样对于大部分比较重要的属性，都配置了对应的DOC属性进行描述。同样BOOTSTRAP_SERVERS_CONFIG是必须设置的属性。
拉取消息：Kafka采用Consumer主动拉取消息的Pull模式。consumer主动从Broker上拉取一批感兴趣的消息。
处理消息，提交位点：消费者将消息拉取完成后，就可以交由业务自行处理对应的这一批消息了。只是消费者需要向Broker提交偏移量offset。如果不提交Offset，Broker会认为消费者端消息处理失败了，还会重复进行推送。

Kafka的客户端基本就是固定的按照这三个大的步骤运行。在具体使用过程中，最大的变数基本上就是给生产者和消费者的设定合适的属性。这些属性极大的影响了客户端程序的执行方式。

kafka官方配置：Apache Kafka

二、客户端属性详解

1、消费者分组消费机制

在Consumer中，都需要指定一个GROUP_ID_CONFIG属性，这表示当前Consumer所属的消费者组。他的描述是这样的：


  public static final String GROUP_ID_CONFIG = "group.id";
// 大概意思是给消费者组指定一个唯一的string，如果消费者使用subscribe(topic)或基于kafka的偏移量管理策略来使用组管理功能，则需要此属性。
    public static final String GROUP_ID_DOC = "A unique string that identifies the consumer group this consumer belongs to. This property is required if the consumer uses either the group management functionality by using subscribe(topic) or the Kafka-based offset management strategy.";

既然有基于kafka管理的offset，也有消费者端缓存的offset

查看消费者组offset消费者进度

./kafka-consumer-groups.sh --bootstrap-server worker1:9092 --describe --group test

2、生产者拦截器机制

生产者拦截机制允许客户端在生产者在消息发送到Kafka集群之前，对消息进行拦截，甚至可以修改消息内容。

这涉及到Producer中指定的一个参数：INTERCEPTOR_CLASSES_CONFIG


  public static final String INTERCEPTOR_CLASSES_CONFIG = "interceptor.classes";
  public static final String INTERCEPTOR_CLASSES_DOC = "A list of classes to use as interceptors. "
                                                        + "Implementing the org.apache.kafka.clients.producer.ProducerInterceptor interface allows you to intercept (and possibly mutate) the records "
                                                        + "received by the producer before they are published to the Kafka cluster. By default, there are no interceptors.";

于是，按照他的说明，我们可以定义一个自己的拦截器实现类：


public class MyInterceptor implements ProducerInterceptor {
    //发送消息时触发
    @Override
    public ProducerRecord onSend(ProducerRecord producerRecord) {
        System.out.println("prudocerRecord : " + producerRecord.toString());
        return producerRecord;
    }
 
    //收到服务端响应时触发
    @Override
    public void onAcknowledgement(RecordMetadata recordMetadata, Exception e) {
        System.out.println("acknowledgement recordMetadata:"+recordMetadata.toString());
    }
 
    //连接关闭时触发
    @Override
    public void close() {
        System.out.println("producer closed");
    }
 
    //整理配置项
    @Override
    public void configure(Map map) {
        System.out.println("=====config start======");
        for (Map.Entry entry : map.entrySet()) {
            System.out.println("entry.key:"+entry.getKey()+" === entry.value: "+entry.getValue());
        }
        System.out.println("=====config end======");
    }
}

然后在生产者中指定拦截器类（多个拦截器类，用逗号隔开）

 props.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG,"com.roy.kfk.basic.MyInterceptor");

拦截器机制一般用得比较少，主要用在一些统一添加时间等类似的业务场景。比如，用Kafka传递一些POJO，就可以用拦截器统一添加时间属性。但是我们平常用Kafka传递的都是String类型的消息，POJO类型的消息，Kafka可以传吗？这就要用到下面的消息序列化机制。

3、消息序列化机制

在之前的简单示例中，Producer指定了两个属性KEY_SERIALIZER_CLASS_CONFIG和VALUE_SERIALIZER_CLASS_CONFIG，对于这两个属性，在ProducerConfig中都有配套的说明属性。


 public static final String KEY_SERIALIZER_CLASS_CONFIG = "key.serializer";
    public static final String KEY_SERIALIZER_CLASS_DOC = "Serializer class for key that implements the org.apache.kafka.common.serialization.Serializer interface.";
    public static final String VALUE_SERIALIZER_CLASS_CONFIG = "value.serializer";
    public static final String VALUE_SERIALIZER_CLASS_DOC = "Serializer class for value that implements the org.apache.kafka.common.serialization.Serializer interface.";

通过这两个参数，可以指定消息生产者如何将消息的key和value序列化成二进制数据。在Kafka的消息定义中，key和value的作用是不同的。

key是用来进行分区的可选项。Kafka通过key来判断消息要分发到哪个Partition。

如果没有填写key，那么Kafka会使Round-robin轮询的方式，自动选择Partition。

如果填写了key，那么会通过声明的Serializer序列化接口，将key转换成一个byte[]数组，然后对key进行hash，选择Partition。这样可以保证key相同的消息会分配到相同的Partition中。

Value是业务上比较关心的消息。Kafka同样需要将Value对象通过Serializer序列化接口，将Key转换成byte[]数组，这样才能比较好的在网络上传输Value信息，以及将Value信息落盘到操作系统的文件当中。

生产者要对消息进行序列化，那么消费者拉取消息时，自然需要进行反序列化。所以，在Consumer中，也有反序列化的两个配置


  public static final String KEY_DESERIALIZER_CLASS_CONFIG = "key.deserializer";
    public static final String KEY_DESERIALIZER_CLASS_DOC = "Deserializer class for key that implements the org.apache.kafka.common.serialization.Deserializer interface.";
    public static final String VALUE_DESERIALIZER_CLASS_CONFIG = "value.deserializer";
    public static final String VALUE_DESERIALIZER_CLASS_DOC = "Deserializer class for value that implements the org.apache.kafka.common.serialization.Deserializer interface.";

在Kafka中，对于常用的一些基础数据类型，都已经提供了对应的实现类。但是，如果需要使用一些自定义的消息格式，比如自己定制的POJO，就需要定制具体的实现类了。

4、消息分区路由机制

了解前面两个机制后，你自然会想到一个问题。就是消息如何进行路由？也即是两个相关联的问题。

Producer会根据消息的key选择Partition，具体如何通过key找Partition呢？
一个消费者组会共同消费一个Topic下的多个Partition中的同一套消息副本，那Consumer节点是不是可以决定自己消费哪些Partition的消息呢？

这两个问题其实都不难，你只要在几个Config类中稍微找一找就能找到答案。

首先，在Producer中，可以指定一个Partitioner来对消息进行分配。


 public static final String PARTITIONER_CLASS_CONFIG = "partitioner.class";
    private static final String PARTITIONER_CLASS_DOC = "A class to use to determine which partition to be send to when produce the records. Available options are:" +
        "" +
            "
If not set, the default partitioning logic is used. " +

"This strategy will try sticking to a partition until at least " + BATCH_SIZE_CONFIG + " bytes is produced to the partition. It works with the strategy:" +
"" +
"
If no partition is specified but a key is present, choose a partition based on a hash of the key

" +

//1.记录累加器 int batchSize = Math.max(1, config.getInt(ProducerConfig.BATCH_SIZE_CONFIG)); this.accumulator = new RecordAccumulator(logContext,batchSize,this.compressionType,lingerMs(config),retryBackoffMs,deliveryTimeoutMs, partitionerConfig,metrics,PRODUCER_METRIC_GROUP_NAME,time,apiVersions,transactionManager,new BufferPool(this.totalMemorySize, batchSize, metrics, time, PRODUCER_METRIC_GROUP_NAME)); //2. 数据发送线程 this.sender = newSender(logContext, kafkaClient, this.metadata);

//RecordAccumulator缓冲区大小 public static final String BUFFER_MEMORY_CONFIG = "buffer.memory"; private static final String BUFFER_MEMORY_DOC = "The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are " + "sent faster than they can be delivered to the server the producer will block for " + MAX_BLOCK_MS_CONFIG + " after which it will throw an exception." + "" + "This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since " + "not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if " + "compression is enabled) as well as for maintaining in-flight requests."; //缓冲区每一个batch的大小 public static final String BATCH_SIZE_CONFIG = "batch.size"; private static final String BATCH_SIZE_DOC = "The producer will attempt to batch records together into fewer requests whenever multiple records are being sent" + " to the same partition. This helps performance on both the client and the server. This configuration controls the " + "default batch size in bytes. " + "" + "No attempt will be made to batch records larger than this size. " + "" + "Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent. " + "" + "A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable " + "batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a " + "buffer of the specified batch size in anticipation of additional records." + "" + "Note: This setting gives the upper bound of the batch size to be sent. If we have fewer than this many bytes accumulated " + "for this partition, we will 'linger' for the linger.ms time waiting for more records to show up. " + "This linger.ms setting defaults to 0, which means we'll immediately send out a record even the accumulated " + "batch size is under this batch.size setting.";

public static final String LINGER_MS_CONFIG = "linger.ms"; private static final String LINGER_MS_DOC = "The producer groups together any records that arrive in between request transmissions into a single batched request. " + "Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to " + "reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount " + "of artificial delay—that is, rather than immediately sending out a record, the producer will wait for up to " + "the given delay to allow other records to be sent so that the sends can be batched together. This can be thought " + "of as analogous to Nagle's algorithm in TCP. This setting gives the upper bound on the delay for batching: once " + "we get " + BATCH_SIZE_CONFIG + " worth of records for a partition it will be sent immediately regardless of this " + "setting, however if we have fewer than this many bytes accumulated for this partition we will 'linger' for the " + "specified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay). Setting " + LINGER_MS_CONFIG + "=5, " + "for example, would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absence of load."; public static final String MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION = "max.in.flight.requests.per.connection"; private static final String MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_DOC = "The maximum number of unacknowledged requests the client will send on a single connection before blocking." + " Note that if this configuration is set to be greater than 1 and enable.idempotence is set to false, there is a risk of" + " message reordering after a failed send due to retries (i.e., if retries are enabled); " + " if retries are disabled or if enable.idempotence is set to true, ordering will be preserved." + " Additionally, enabling idempotence requires the value of this configuration to be less than or equal to " + MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_FOR_IDEMPOTENCE + "." + " If conflicting configurations are set and idempotence is not explicitly enabled, idempotence is disabled. ";

//org.apache.kafka.clients.producer.KafkaProducer#doSend if (result.batchIsFull || result.newBatchCreated) { log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), appendCallbacks.getPartition()); this.sender.wakeup(); }

public static final String ACKS_CONFIG = "acks"; private static final String ACKS_DOC = "The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the " + " durability of records that are sent. The following settings are allowed: " + " " + " acks=0 If set to zero then the producer will not wait for any acknowledgment from the" + " server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be" + " made that the server has received the record in this case, and the retries configuration will not" + " take effect (as the client won't generally know of any failures). The offset given back for each record will" + " always be set to -1." + " acks=1 This will mean the leader will write the record to its local log but will respond" + " without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after" + " acknowledging the record but before the followers have replicated it then the record will be lost." + " acks=all This means the leader will wait for the full set of in-sync replicas to" + " acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica" + " remains alive. This is the strongest available guarantee. This is equivalent to the acks=-1 setting." + "" + "" + "Note that enabling idempotence requires this config value to be 'all'." + " If conflicting configurations are set and idempotence is not explicitly enabled, idempotence is disabled.";

min.insync.replicas When a producer sets acks to "all" (or "-1"), min.insync.replicas specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. If this minimum cannot be met, then the producer will raise an exception (either NotEnoughReplicas or NotEnoughReplicasAfterAppend). When used together, min.insync.replicas and acks allow you to enforce greater durability guarantees. A typical scenario would be to create a topic with a replication factor of 3, set min.insync.replicas to 2, and produce with acks of "all". This will ensure that the producer raises an exception if a majority of replicas do not receive a write. Type: int Default: 1 Valid Values: [1,...] Importance: high Update Mode: cluster-wide

public static final String ENABLE_IDEMPOTENCE_CONFIG = "enable.idempotence"; public static final String ENABLE_IDEMPOTENCE_DOC = "When set to 'true', the producer will ensure that exactly one copy of each message is written in the stream. If 'false', producer " + "retries due to broker failures, etc., may write duplicates of the retried message in the stream. " + "Note that enabling idempotence requires " + MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION + " to be less than or equal to " + MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_FOR_IDEMPOTENCE + " (with message ordering preserved for any allowable value), " + RETRIES_CONFIG + " to be greater than 0, and " + ACKS_CONFIG + " must be 'all'. " + "" + "Idempotence is enabled by default if no conflicting configurations are set. " + "If conflicting configurations are set and idempotence is not explicitly enabled, idempotence is disabled. " + "If idempotence is explicitly enabled and conflicting configurations are set, a ConfigException is thrown.";

// max.in.flight.requests.per.connection should be less than or equal to 5 when idempotence producer enabled to ensure message ordering private static final int MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_FOR_IDEMPOTENCE = 5; /** max.in.flight.requests.per.connection */ public static final String MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION = "max.in.flight.requests.per.connection"; private static final String MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_DOC = "The maximum number of unacknowledged requests the client will send on a single connection before blocking." + " Note that if this config is set to be greater than 1 and enable.idempotence is set to false, there is a risk of" + " message re-ordering after a failed send due to retries (i.e., if retries are enabled)." + " Additionally, enabling idempotence requires this config value to be less than or equal to " + MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_FOR_IDEMPOTENCE + "." + " If conflicting configurations are set and idempotence is not explicitly enabled, idempotence is disabled.";

// 1 初始化事务 void initTransactions(); // 2 开启事务 void beginTransaction() throws ProducerFencedException; // 3 提交事务 void commitTransaction() throws ProducerFencedException; // 4 放弃事务（类似于回滚事务的操作） void abortTransaction() throws ProducerFencedException;

public class TransactionErrorDemo { private static final String BOOTSTRAP_SERVERS = "worker1:9092,worker2:9092,worker3:9092"; private static final String TOPIC = "disTopic"; public static void main(String[] args) throws ExecutionException, InterruptedException { Properties props = new Properties(); // 此处配置的是kafka的端口 props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS); // 事务ID props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG,"111"); // 配置key的序列化类 props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer"); // 配置value的序列化类 props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer"); Producer producer = new KafkaProducer<>(props); producer.initTransactions(); producer.beginTransaction(); for(int i = 0; i < 5; i++) { ProducerRecord record = new ProducerRecord<>(TOPIC, Integer.toString(i), "MyProducer" + i); //异步发送。 producer.send(record); if(i == 3){ //第三条消息放弃事务之后，整个这一批消息都回退了。 System.out.println("error"); producer.abortTransaction(); } } System.out.println("message sended"); try { Thread.sleep(10000); } catch (Exception e) { e.printStackTrace(); } // producer.commitTransaction(); producer.close(); } }

kafka客户端应用参数详解

一、基本客户端收发消息

1、消息发送者主流程

2、消息消费者主流程

二、客户端属性详解

1、消费者分组消费机制

2、生产者拦截器机制

3、消息序列化机制

4、消息分区路由机制

5、生产者消息缓存机制

6、发送应答机制

7、生产者消息幂等性

8、生产者消息事务