Felpfe Inc.
Search
Close this search box.
call 24/7

+484 237-1364‬

Search
Close this search box.

Joins and windowed joins in Kafka Streams

Introduction to Joins in Kafka Streams

In this section, we will explore joins and windowed joins in Kafka Streams. Joining allows for combining data from multiple streams based on common keys or criteria, enabling powerful data correlation and analysis. Windowed joins extend this capability by considering time-based windows for more sophisticated analysis.

Topics covered in this section:

  1. Introduction to joins and windowed joins in Kafka Streams.
  2. Inner joins, left joins, and outer joins.
  3. Joining streams with KTables and GlobalKTables.
  4. Handling late events and out-of-order data in joins.
  5. Configuring join parameters for optimal results.

Code Sample: Inner Join and Windowed Join in Kafka Streams

Java
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.*;
import org.apache.kafka.common.serialization.*;
import java.util.Properties;
import java.time.Duration;

public class KafkaStreamsJoinExample {

    public static void main(String[] args) {
        // Configure Kafka Streams application
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-streams-app");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        // Create Kafka Streams builder
        StreamsBuilder builder = new StreamsBuilder();

        // Define the processing logic
        KStream<String, String> stream1 = builder.stream("input_topic_1");
        KStream<String, String> stream2 = builder.stream("input_topic_2");

        // Inner Join
        KStream<String, String> innerJoinStream = stream1
                .join(
                        stream2,
                        (value1, value2) -> value1 + ", " + value2,
                        JoinWindows.of(Duration.ofMinutes(5)),
                        Joined.with(Serdes.String(), Serdes.String(), Serdes.String())
                );

        // Windowed Join
        KStream<Windowed<String>, String> windowedJoinStream = stream1
                .join(
                        stream2,
                        (value1, value2) -> value1 + ", " + value2,
                        JoinWindows.of(Duration.ofMinutes(10)),
                        StreamJoined.with(Serdes.String(), Serdes.String(), Serdes.String())
                );

        // Write the joined data to an output topic
        innerJoinStream.to("inner_join_output_topic");
        windowedJoinStream.to("windowed_join_output_topic");

        // Build and start the Kafka Streams application
        KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();

        // Gracefully shutdown the application on termination
        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
    }
}

Reference Link:

  • Kafka Streams documentation on joins and windowed joins: link

Helpful Video:

  • “Kafka Streams Joins” by Confluent: link

Benefits and Considerations of Joins and Windowed Joins

In this section, we will discuss the benefits and considerations of joins and windowed joins in Kafka Streams. Understanding the advantages and trade-offs of different join strategies and windowed joins allows for making informed decisions when designing stream processing applications.

Topics covered in this section:

  1. Advantages and use cases of joins in stream processing.
  2. Choosing the appropriate join strategy: inner join, left join, or outer join.
  3. Windowed joins and their applications in time-based analysis.
  4. Handling late events and out-of-order data in joins.
  5. Performance considerations and trade-offs of join operations.

Code Sample: Left Join and Windowed Join with Late Data Handling

Java
import org.apache.kafka.streams.*;
import org.apache.kafka.streams.kstream.*;
import org.apache.kafka.common.serialization.*;
import java.util.Properties;
import java.time.Duration;

public class KafkaStreamsJoinExample {

    public static void main(String[] args) {
        // Configure Kafka Streams application
        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-streams-app");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());

        // Create Kafka Streams builder
        StreamsBuilder builder = new StreamsBuilder();

        // Define the processing logic
        KStream<String, String> stream1 = builder.stream("input_topic_1");
        KStream<String, String> stream2 = builder.stream("input_topic_2");

        // Left Join
        KStream<String, String> leftJoinStream = stream1
                .leftJoin(
                        stream2,
                        (value1, value2) -> value1 + ", " + value2,
                        JoinWindows.of(Duration.ofMinutes(5)),
                        StreamJoined.with(Serdes.String(), Serdes.String(), Serdes.String())
                );

        // Windowed Join with Late Data Handling
        KStream<Windowed<String>, String> windowedJoinStream = stream1
                .leftJoin(
                        stream2,
                        (value1, value2) -> value1 + ", " + value2,
                        JoinWindows.of(Duration.ofMinutes(10)).grace(Duration.ofMinutes(2)),
                        StreamJoined.with(Serdes.String(), Serdes.String(), Serdes.String())
                );

        // Write the joined data to an output topic
        leftJoinStream.to("left_join_output_topic");
        windowedJoinStream.to("windowed_join_output_topic");

        // Build and start the Kafka Streams application
        KafkaStreams streams = new KafkaStreams(builder.build(), props);
        streams.start();

        // Gracefully shutdown the application on termination
        Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
    }
}

Reference Link:

  • Kafka Streams documentation on joins and windowed joins: link

Helpful Video:

  • “Kafka Streams – Join Operations” by DataCumulus: link

Conclusion:
In this module, we explored joins and windowed joins in Kafka Streams. Joins allow for combining data from multiple streams based on common keys, enabling powerful data correlation and analysis. Windowed joins extend this capability by considering time-based windows for more sophisticated analysis.

By leveraging the code samples and reference links provided, you can implement inner joins, left joins, and outer joins in your Kafka Streams applications. Additionally, windowed joins enable temporal analysis by performing joins over fixed-size time windows. Considerations such as late events and out-of-order data handling are crucial when designing join operations.

Joins and windowed joins in Kafka Streams empower you to build real-time stream processing pipelines that integrate and analyze data from multiple sources. With the ability to correlate data based on keys or temporal

windows, you can derive valuable insights and make data-driven decisions. Kafka Streams provides a scalable and fault-tolerant framework for efficient and accurate join operations.

About Author
Ozzie Feliciano CTO @ Felpfe Inc.

Ozzie Feliciano is a highly experienced technologist with a remarkable twenty-three years of expertise in the technology industry.

kafka-logo-tall-apache-kafka-fel
Stream Dream: Diving into Kafka Streams
In “Stream Dream: Diving into Kafka Streams,”...
ksql
Talking in Streams: KSQL for the SQL Lovers
“Talking in Streams: KSQL for the SQL Lovers”...
spring_cloud
Stream Symphony: Real-time Wizardry with Spring Cloud Stream Orchestration
Description: The blog post, “Stream Symphony:...
1_GVb-mYlEyq_L35dg7TEN2w
Kafka Chronicles: Saga of Resilient Microservices Communication with Spring Cloud Stream
“Kafka Chronicles: Saga of Resilient Microservices...
kafka-logo-tall-apache-kafka-fel
Tackling Security in Kafka: A Comprehensive Guide on Authentication and Authorization
As the usage of Apache Kafka continues to grow in organizations...
1 2 3 58
90's, 2000's and Today's Hits
Decades of Hits, One Station

Listen to the greatest hits of the 90s, 2000s and Today. Now on TuneIn. Listen while you code.