You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: spring-kafka-docs/src/main/antora/modules/ROOT/pages/streams.adoc
+43-2Lines changed: 43 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -363,8 +363,6 @@ public <T> T retrieveQueryableStore(String storeName, QueryableStoreType<T> stor
363
363
364
364
When calling this method, the user can specifially ask for the proper state store type, as we have done in the above example.
365
365
366
-
NOTE: `KafkaStreamsInteractiveQueryService` API in Spring for Apache Kafka only supports providing access to local key-value stores at the moment.
367
-
368
366
=== Retrying State Store Retrieval
369
367
370
368
When trying to retrieve the state store using the `KafkaStreamsInteractiveQueryService`, there is a chance that the state store might not be found for various reasons.
@@ -388,6 +386,49 @@ public KafkaStreamsInteractiveQueryService kafkaStreamsInteractiveQueryService(S
388
386
}
389
387
----
390
388
389
+
=== Querying Remote State Stores
390
+
391
+
The API shown above for retrieving the state store - `retrieveQueryableStore` is intended for locally available key-value state stores.
392
+
In productions settings, Kafka Streams applications are most likely distributed based on the number of partitions.
393
+
If a topic has four partitions and there are four instances of the same Kafka Streams processor running, then each instance maybe responsible for processing a single partition from the topic.
394
+
In this scenario, calling `retrieveQueryableStore` may not give the correct result that an instance is looking for, although it might return a valid store.
395
+
Let's assume that the topic with four partitions has data about various keys and a single partition is always responsible for a specific key.
396
+
If the instance that is calling `retrieveQueryableStore` is looking for information about a key that this instance does not host, then it will not receive any data.
397
+
This is because the current Kafka Streams instance does not know anything about this key.
398
+
To fix this, the calling instance first needs to make sure that they have the host information for the Kafka Streams processor instance where the particular key is hosted.
399
+
This can be retrieved from any Kafka Streams instance under the same `application.id` as below.
HostInfo kafkaStreamsApplicationHostInfo = this.interactiveQueryService.getKafkaStreamsApplicationHostInfo("app-store", 12345, new IntegerSerializer());
407
+
----
408
+
409
+
In the example code above, the calling instance is querying for a particular key `12345` from the state-store named `app-store`.
410
+
The API also needs a corresponding key serializer, which in this case is the `IntegerSerializer`.
411
+
Kafka Streams looks through all it's instances under the same `application.id` and tries to find which instance hosts this particular key,
412
+
Once found, it returns that host information as a `HostInfo` object.
413
+
414
+
This is how the API looks like:
415
+
416
+
[source, java]
417
+
----
418
+
public <K> HostInfo getKafkaStreamsApplicationHostInfo(String store, K key, Serializer<K> serializer)
419
+
----
420
+
421
+
When using multiple instances of the Kafka Streams processors of the same `application.id` in a distributed way like this, the application is supposed to provide an RPC layer where the state stores can be queried over an RPC endpoint such as a REST one.
422
+
See this https://kafka.apache.org/36/documentation/streams/developer-guide/interactive-queries.html#querying-remote-state-stores-for-the-entire-app[article] for more details on this.
423
+
When using Spring for Apache Kafka, it is very easy to add a Spring based REST endpoint by using the spring-web technologies.
424
+
Once there is a REST endpoint, then that can be used to query the state stores from any Kafka Streams instance, given the `HostInfo` where the key is hosted is known to the instance.
425
+
426
+
If the key hosting the instance is the current instance, then the application does not need to call the RPC mechanism, but rather make an in-JVM call.
427
+
However, the trouble is that an application may not know that the instance that is making the call is where the key is hosted because a particular server may lose a partition due to a consumer rebalance.
428
+
To fix this issue, `KafkaStreamsInteractiveQueryService` provides a convenient API for querying the current host information via an API method `getCurrentKafkaStreamsApplicationHostInfo()` that returns the current `HostInfo`.
429
+
The idea is that the application can first acquire information about where the key is held, and then compare the `HostInfo` with the one about the current instance.
430
+
If the `HostInfo` data matches, then it can proceed with a simple JVM call via the `retrieveQueryableStore`, otherwise go with the RPC option.
Copy file name to clipboardExpand all lines: spring-kafka/src/test/java/org/springframework/kafka/streams/KafkaStreamsInteractiveQueryServiceTests.java
0 commit comments