Kafka Migration and Event Streaming

Published on June 18, 2024

Kafka

Cloud Migration Services

Write for DO

By Savic and Anish Singh Walia

Introduction

Apache Kafka is an open-source distributed event and stream-processing platform written in Java, built to process demanding real-time data feeds. It is inherently scalable, with high throughput and availability. It can scale from a single-node cluster for testing to hundreds of nodes in production to serve large amounts of data. When expanding the cluster, existing topics are not automatically rearranged across the new cluster state.

In this tutorial, you’ll learn how to expand your Kafka cluster by adding a new node and properly migrating topic partitions to the new node, ensuring maximum resource utilization. You’ll learn how to achieve that manually using the provided script, as well as automatically with Kafka Cruise Control, a daemon for automatically optimizing the inner processes of a Kafka cluster. You’ll also learn how to aggregate your event data using ksqlDB, a database that seamlessly operates on top of Kafka topics.

Prerequisites

To complete this tutorial, you’ll need:

Four Droplets available with at least 4GB RAM and 2 CPUs. In case of an Ubuntu server, follow the Ubuntu Initial Server Setup for setup instructions.
Java Development Kit (JDK) 11 installed on your Droplet or local machine. For instructions on installing Java on Ubuntu, see the How To Install Java with Apt on Ubuntu tutorial.
A cluster of three Apache Kafka nodes. You can create it by following the How To Set Up a Multi-Node Kafka Cluster using KRaft tutorial.
A single Apache Kafka node, installed and configured on the fourth Droplet. You can follow the Introduction to Kafka tutorial for setup instructions. You only need to complete Step 1 and Step 2.
A fully registered domain name with four subdomains pointed towards the four Kafka nodes in total. This tutorial will refer to them individually as kafkaX.your_domain throughout. You can purchase a domain name on Namecheap, get one for free on Freenom, or use the domain registrar of your choice.
kcat and Kafka Cruise Control metrics reporter installed all four nodes in total. Follow the How To Manage Kafka Programmatically tutorial for instructions. You only need to complete Step 2 and Step 3, and you do not need to configure Cruise Control on the nodes other than the fourth one.
Docker installed on your machine. For Ubuntu, visit How To Install and Use Docker on Ubuntu. You only need to complete Step 1 and Step 2. Otherwise, visit Docker’s website for other distributions.
Docker Compose installed on your machine. For Ubuntu, visit How To Install and Use Docker Compose on Ubuntu. You only need to complete Step 1. Otherwise, visit Docker’s website for other distributions.

Step 1 - Expanding the Cluster

In this step, you’ll learn how to add nodes as brokers to your KRaft Kafka cluster. With KRaft, the nodes themselves can organize and perform administrative tasks without the overhead of depending on Apache ZooKeeper, freeing you from the additional dependency. You’ll also learn how to use the new broker by migrating topics to it.

After completing the prerequisites, you will have a Kafka cluster consisting of three nodes. Before expanding the cluster with one more node, you’ll create a new topic.

On the fourth node, as user kafka, navigate to the directory where Kafka is installed (~/kafka) and run the following command:

./bin/kafka-topics.sh --bootstrap-server kafka1.your_domain:9092 --create --topic new-topic

The output will be:

OutputCreated topic new-topic.

Integrating the Fourth Node

First, navigate to the directory where Kafka resides and open its configuration file for editing by running:

nano config/kraft/server.properties

Find the following lines: