In a streaming pipeline, I need to send messages to a multi-partition kafka topic. In the low latency consuming pipeline downward (say, an alerting system), I need all messages relative to some secondary key to be redirected on the same kafka partition. E.g: messages are tweets, and seconday key would be the tweet owner. Or in a network security monitoring system, messages are access log events, and secondary key the IP adress of the target computer, etc...
Indeed, Kafka producer API allow us to do it, either by providing a previously computed partition key, or a custom partitioner class that know how to handle the message unique key (in that later, part of the unique key would obviouly contain the secondary key).
For now, Kafka Sink plugin does'nt provide a way to use this capacities of the Kafka API.
I propose to provide it, by letting the user provide an optional partition key field. Then the plugin would compute and use a partition key from it, taking into account the number of partitions in the given topic (kafka API does provide a way to discover it).
Please find as an attachment a proposed patch (based on 5.1.2 codebase) of the kafka 0.10 plugin that implements these ideas.