Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-15904

Add Partition Key to Kafka Sink Plugin

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 5.1.2
    • Fix Version/s: None
    • Component/s: App Fabric
    • Labels:
      None
    • Rank:
      1|i00r93:

      Description

      In a streaming pipeline, I need to send messages to a multi-partition kafka topic. In the low latency consuming pipeline downward (say, an alerting system), I need all messages relative to some secondary key to be redirected on the same kafka partition. E.g: messages are tweets, and seconday key would be the tweet owner. Or in a network security monitoring system, messages are access log events, and secondary key the IP adress of the target computer, etc... 

      Indeed, Kafka producer API allow us to do it, either by providing a previously computed partition key, or a custom partitioner class that know how to handle the message unique key (in that later, part of the unique key would obviouly contain the secondary key).

      For now, Kafka Sink plugin does'nt provide a way to use this capacities of the Kafka API.

      I propose to provide it, by letting the user provide an optional partition key field. Then the plugin would  compute and use  a partition key from it, taking into account the number of partitions in the given topic (kafka API does provide a way to discover it).

       

      Please find as an attachment a proposed patch (based on 5.1.2 codebase) of the kafka 0.10 plugin that  implements these ideas.

        Attachments

          Activity

            People

            • Assignee:
              trishka Trishka
              Reporter:
              gholler Guillaume HOLLER
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: