Working with Apache Kafka

For development work, it can be installed locally or a docker container can be used.

brew install kafka
Docker Container
- wurstmeister/kafka. separate images for Kafka and Zookeper. Run with Dockr Compose.
- spotify/docker-kafka. Container includes both Zookeeper and Kafka.
- landoop/fast-data-dev. Provides a convenient UI and can used with Cloudera Bigdata distribution.
- confluentinc/cp-kafka. Official distribution from Confluent.

Basic Demo

Start Docker Container locally
Create a Topic
Produce to a Topic
Consume from Topic

Figure 1. Console Producer and Consumer

Start Spotify/Kafka Docker container

docker run \
-p 2181:2181 \
-p 9092:9092 \
--env ADVERTISED_HOST=127.0.0.1 \
--env ADVERTISED_PORT=9092 \
--name wmskafka \
--detach \
spotify/kafka

Create a Topic

Run bash in the container.

-i interactive flag to keep stdin open
-t allocate a terminal

docker exec -it wmskafka bash

cd /opt/kafka_2.11-0.10.1.0/bin

./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 2 --topic wmsdemo

Number of partitions is the degree of parallelism for reads and writes.

Produce and Consume

Produce to a topic

./kafka-console-producer.sh --broker-list localhost:9092 --topic wmsdemo

Open a new terminal for the consumer

docker exec -it wmskafka bash

cd /opt/kafka_2.11-0.10.1.0/bin

./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic wmsdemo --from-beginning

Enter text in producer terminal for output in the consumer terminal.

Java API

Execute Producer

java -cp target/ProducerDemo.jar com.demo.ProducerExample 127.0.0.1 9092 wmsdemo \
1000 10 100 50

parameters are
1. Broker host name
2. Broker port
3. Topic name
4. Number of messages to send
5. Number of messages between thread waits
6. Wait time to slow down publishing
7. execute a call back every N messages

Execute Consumer

java -cp target/ConsumerDemo.jar com.demo.ConsumerExample 127.0.0.1 9092 wmsdemo

parameters are
1. Broker host name
2. Broker port
3. Topic name

Sync Kafka

An MQ Connector could be used to write from the current queue to a Kafka topic. Given the unknowns, it may be practical to write directly to a Kafka topic.

Figure 2. Sync Gateway

Split topics

Available options for splitting Topic
* Kafka Streams
* Spark Streaming
* Handled by Producer

Figure 3. Streams

Topic Criteria
* Batch Processing
* Real Time
* Message Ordering

Spring

Kafka is supported by Spring Boot and Spring integration.This would allow reuse of existing code base.

Failures

Transaction Support

Kafka supports atomic writes across mulitple partitions with the Transaction API.

producer.initTransaction();
try {
	producer.beginTransaction
	producer.send(record1);
	producer.send(record1);
	....
	producer.commitTransaction();
}
catch (KafkaException e) {
	producer.abortTransaction();
}

Idempotent Producer

Apache Kafka and Streams provide Exacltly-once semantics. This ensures exactly once message delivery per partition.

Downstream processing failures

There are multiple approaches to handling failures in downstream processing.
Messages are persisted and can be read from the last offset.
Messages can be written to a new topic for later processing.
Circuit Breaker implementation in Spring cloud libraries provides Hystrix and Consul. This can be used in combination with Spring Actuator that we currently have.

Testing

Spring Boot comes with an embedded Kafka and can be used for testing.
Kafka API provides MockProducer and MockConsumer to mock interactions with Kafka cluster.
There is also an Open Source Junit Kafka wrapper from Salesforce.

Development effort

Understanding the configuration knobs
Design of Topics and Partitions
Processing of the message can reuse exisiting code
Cutting edge always has surprises - unknown unknowns.