(PP-1841) Kafka memory leak

Description

Kafka brokers suffer from a memory leak (https://github.com/apache/kafka/pull/4307) related to the metric reporting feature. This is unfortunately not a feature that can be disabled. This leak affects version 0.10.x and 0.11.x releases and has been fixed in release 1.x only.

Impact

This leak will slowly fill the heap of kafka broker processes. It is easily identifiable using a jmx tool such as jvisualvm or a heap dump performed using jmap/jhat. The number of JmxReporter instances will regularly increase. Note that by design, JmxReporter instances are cleaned hourly. It is thus normal to observe an hourly based increase.

Once no more heap memory is available, brokers start suffering from OOM (Out Of Memory) errors, which can lead to service unavailability until a broker restart.

Affected Version

  • Brad-4.0.0
  • Brad-4.0.1

Note that the craig release is relying on Kafka 1.1.

WorkAround

A periodic restart of the Kafka brokers will solve the issue.

Punchplatform Fixe

Because it is not easy for production setups to migrate to Kafka 1.x release, a brad punchplatform 4.0.2 release has been delivered that includes a patched version of Kafka. That version has been released by the punchplatform team with the fixe under version number 0.10.0.0.1. We strongly suggest all brad customer to brad-4.0.2, or to schedule a periodic restart of all kafka brokers.