Wednesday, November 24, 2021

Rate-limit Kafka event generation with kcat and bash

Traffic for event streams
Recently, I worked with IBM Cloud Event Streams which is a message bus built with Apache Kafka. I was looking for a simple command-line tool to test my Event Streams instance and to stream access logs into it. That's when I ran into kcat (formerly known as kafkacat). It is a generic command line Kafka producer and consumer and easy to install - just use a Docker image. All worked well, I could even read a file of historic Apache access logs and, line by line, send them over. But I still faced the issue of controlling how much to send, how to throttle it. I solved it using a bash script.

Bash scripting to the rescue

Using kcat, you can send a file line by line just by running this command:

kcat -P -t mytopic -l myfile

"-P" tells kcat to work as producer, "-t mytopic" specifies the Kafka topic, "-l" is to treat each line of that file as individual message. That works nicely, but I wanted to only send few messages per second and have greater control. After some research on Stack Overflow and elsewhere, I settled on this bash script as workaround:

#! /bin/bash
if [ -z "$1" ]; then
   echo "usage: $0 filename lines wait"
   exit
fi
INPUT_FILE=$1
NUM_LINES=$2
COUNTER=0
WAIT_SECONDS=$3
while read -u3 input_text rest; do
   trap 'exit 130' INT   
   echo $input_text $rest
   ((COUNTER++))
   if (( COUNTER == $NUM_LINES )); then
       sleep $WAIT_SECONDS
       COUNTER=0
   fi
done 3< "$INPUT_FILE"

 

The script ("rate_limit.sh") expects the filename, how many lines to send at once and the number of seconds to wait in between. Using a pipe, I could combine it to this command:

sh rate_limit.sh myfile 10 2 | kcat -P -t mytoken

It emits 10 lines, then waits 2 seconds before continuing. The tricky part is the line "trap 'exit 130' INT" which is needed to react to control+c to stop the transmission.

Conclusions

There are many great open source tools available. Sometimes, it needs the extra shell scripting to work around missing features.

If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik) or LinkedIn.