Uploaded image for project: 'CDAP'
  1. CDAP
  2. CDAP-10483

Multiple issues with LogParser Transform


    • Rank:


      1. The identifiers for S3 and CLF are case-sensitive. So s3 and clf will not work.
      2. For non-S3 and non-CLF logs, the code has no validations. It will throw ArrayIndexOutOfBoundsException and NullPointerException all over.
      3. The code in general could use some cleanup and better error handling
      4. It also seems like the transform always only outputs uri, ip, browser, device, httpStatus and ts. It seems like we should allow users to configure more or less fields than that.
      5. For CLF, the regex does not match the CLF record mentioned at https://en.wikipedia.org/wiki/Common_Log_Format or https://httpd.apache.org/docs/1.3/logs.html. It seems like that record on wikipedia has the last two fields (referrer and user-agent) missing. If it is indeed a valid CLF record, then that regex is wrong! On more investigation, it seems like CLF for us is Combined Log Format. But a quick web search reveals it as common log format, which has two less fields. We should perhaps use a different name for this and also support common log format?
      6. If parsing fails as either S3 and CLF, then the log message doesn't indicate what went wrong.




            • Assignee:
              russellsavage Russ Savage
              bhooshan Bhooshan Mogal
            • Votes:
              0 Vote for this issue
              2 Start watching this issue


              • Created: