Felpfe Inc.
Search
Close this search box.
call 24/7

+484 237-1364‬

Search
Close this search box.

Performing aggregations and grouping operations

Introduction to Aggregations and Grouping Operations

In this section, we will explore aggregations and grouping operations in Kafka Streams. Aggregations allow for computing summary statistics and aggregating data based on specific keys or criteria, while grouping operations enable grouping data based on certain attributes or key values.

Topics covered in this section:

  1. Introduction to aggregations and grouping operations in Kafka Streams.
  2. Aggregating data using different functions: count, sum, average, etc.
  3. Grouping data based on keys or attributes.
  4. Windowed aggregations and time-based grouping operations.
  5. Handling late events and out-of-order data in aggregations.

Code Sample: Performing Aggregations and Grouping in Kafka Streams

Java
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.*;
import java.util.Properties;

public class KafkaStreamsAggregationsExample {

    public static void main(String[] args) {
        // Configure Kafka Streams application
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-streams-app");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

        // Create Kafka Streams builder
        StreamsBuilder builder = new StreamsBuilder();

        // Define the processing logic
        KStream<String, Integer> inputStream = builder.stream("input_topic");
        KTable<String, Long> countTable = inputStream
                .groupBy((key, value) -> key)
                .count();

        KGroupedTable<String, Integer> groupedTable = inputStream
                .groupBy((key, value) -> key);
        KTable<String, Double> averageTable = groupedTable
                .aggregate(
                        () -> new SumCount(0, 0),
                        (key, value, aggregate) -> new SumCount(aggregate.getSum() + value, aggregate.getCount() + 1),
                        Materialized.as("average_state_store"))
                .mapValues(sumCount -> (double) sumCount.getSum() / sumCount.getCount());

        // Write the aggregated count data to an output topic
        countTable.toStream().to("count_output_topic");

        // Write the average data to another output topic
        averageTable.toStream().to("average_output_topic");

        // Build and start the Kafka Streams application
        KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();

        // Gracefully shutdown the application on termination
        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
    }
}

class SumCount {
    private int sum;
    private int count;

    public SumCount(int sum, int count) {
        this.sum = sum;
        this.count = count;
    }

    public int getSum() {
        return sum;
    }

    public int getCount() {
        return count;
    }
}

Reference Link:

  • Kafka Streams documentation on aggregations and grouping: link

Helpful Video:

  • “Kafka Streams Aggregations” by Confluent: link

:Windowed Aggregations and Time-Based Grouping Operations

In this section, we will delve into windowed aggregations and time-based grouping operations in Kafka Streams. Windowed aggregations enable computing summary statistics over fixed-size time windows, while time-based grouping operations allow for grouping data based on time-based criteria.

Topics covered in this section:

  1. Introduction to window

ed aggregations in stream processing.

  1. Configuring window parameters for windowed aggregations.
  2. Time-based grouping operations and their use cases.
  3. Handling late events and out-of-order data in windowed aggregations.
  4. Performance considerations for windowed operations and time-based grouping.

Code Sample: Windowed Aggregations and Time-Based Grouping in Kafka Streams

Java
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.*;
import java.util.Properties;
import java.time.Duration;

public class KafkaStreamsWindowedAggregationsExample {

    public static void main(String[] args) {
        // Configure Kafka Streams application
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-streams-app");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

        // Create Kafka Streams builder
        StreamsBuilder builder = new StreamsBuilder();

        // Define the processing logic
        KStream<String, Integer> inputStream = builder.stream("input_topic");
        TimeWindowedKStream<String, Integer> windowedStream = inputStream
                .groupBy((key, value) -> key)
                .windowedBy(TimeWindows.of(Duration.ofMinutes(5)))
                .reduce((value1, value2) -> value1 + value2);

        // Write the windowed aggregations to an output topic
        windowedStream.toStream()
                .foreach((key, value) -> System.out.println("Window: " + key + ", Sum: " + value));

        // Build and start the Kafka Streams application
        KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();

        // Gracefully shutdown the application on termination
        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
    }
}

Reference Link:

  • Kafka Streams documentation on windowed aggregations: link

Helpful Video:

  • “Kafka Streams Windowed Aggregations” by DataCumulus: link

Conclusion:
In this module, we explored performing aggregations and grouping operations in Kafka Streams. Aggregations allow for computing summary statistics and grouping operations enable organizing data based on specific keys or attributes. By leveraging aggregations and grouping, you can derive valuable insights from real-time data.

The provided code samples and reference links empower you to perform aggregations and grouping operations in your Kafka Streams applications. You can compute counts, sums, averages, and more, and group data based on keys or attributes. Additionally, windowed aggregations enable computations over fixed-size time windows, providing temporal insights.

By mastering aggregations and grouping in Kafka Streams, you can build powerful stream processing pipelines that efficiently summarize and organize real-time data. Kafka Streams offers a robust framework for performing aggregations and grouping operations, making it a suitable choice for scalable and fault-tolerant stream processing applications.

About Author
Ozzie Feliciano CTO @ Felpfe Inc.

Ozzie Feliciano is a highly experienced technologist with a remarkable twenty-three years of expertise in the technology industry.

kafka-logo-tall-apache-kafka-fel
Stream Dream: Diving into Kafka Streams
In “Stream Dream: Diving into Kafka Streams,”...
ksql
Talking in Streams: KSQL for the SQL Lovers
“Talking in Streams: KSQL for the SQL Lovers”...
spring_cloud
Stream Symphony: Real-time Wizardry with Spring Cloud Stream Orchestration
Description: The blog post, “Stream Symphony:...
1_GVb-mYlEyq_L35dg7TEN2w
Kafka Chronicles: Saga of Resilient Microservices Communication with Spring Cloud Stream
“Kafka Chronicles: Saga of Resilient Microservices...
kafka-logo-tall-apache-kafka-fel
Tackling Security in Kafka: A Comprehensive Guide on Authentication and Authorization
As the usage of Apache Kafka continues to grow in organizations...
1 2 3 58
90's, 2000's and Today's Hits
Decades of Hits, One Station

Listen to the greatest hits of the 90s, 2000s and Today. Now on TuneIn. Listen while you code.