OPENSHIFT

OpenShift Application Core Dumps

#containers , #applications , #developers , #core , #dumps

OpenShift Application Core Dumps

When logs fail to provide us the information we need to diagnose an application problem, we may find it useful to take core dumps of memory, showing us the processes as they are currently running in the system. This is not something we want to do on a regular basis in production. Ideally such problems are discovered during application performance and load testing in lower environments. In reality we frequently find something unique about the real-world application load that our test scenarios could never uncover. However, it is also a great practice to do regular analysis of core dumps of your application during its testing phases.

One of the awesome features of kubernetes and OpenShift in general is the ability to control deployment scenarios. Unfortunately we don’t have a specific feature yet to allow us to capture core dumps only on specific application pods. However, we can control core dumps by the nodes on which the applications are deployed. And since we are running kubernetes, we have very simple but powerful control over our application placement. We can schedule our pods to be deployed on nodes that are ready to capture core dumps while preventing other pods that do not need core dumps to be prevented from running on these same nodes.

Configuring a Node to Capture Core Dumps

These notes are for OpenShift 3.x where you have control over changing the node directly. I’ll share OpenShift 4.x configurations later.

  • Change the core dump pattern to be written straight to temp file

    echo "/tmp/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern
  • Change the limits on the node, which should permit all users to capture an unlimited size core dump. If you know your application well, you may be able to set this with a more logical numeric restriction. CAUTION! You should be aware of the risk of capturing large core dumps. It would behoove you to have a separate partition for storing them.

     ulimit -c unlimited
     vi /etc/security/limits.conf
    
     * hard core unlimited
  • Ensure these values, to allow full daemon process dumping

    sysctl -w fs.suid_dumpable=1
    sysctl -p
  • Make a temp folder, that matches the folder you specified in your core_pattern file

    mkdir -p /tmp/cores
    chmod a+rwx /tmp/cores

You can, however, use systemd-coredump to manage this, and configure the MaxUse or KeepFree parameters as explained in the documentation[1]:

  • Change the core dump pattern to be processed by systemd

    echo '|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e' > /proc/sys/kernel/core_pattern
  • If processing with systemd, your files will show up here:

    /var/lib/systemd/coredump/

You could also write a custom filter script and specifying that in the core_pattern, for example:

# cat /proc/sys/kernel/core_pattern
|/usr/libexec/my-custom-core-filter %P %u %g %s %t %c %e

You could then filter the coredumps you want to keep based on patterns or app names.[2]

Acquired from Robert Bost.

Here is the RFE for core dumps at pod or container level. Note that all RFEs are now in JIRA so this may be elsewhere now.

Testing Core Dumps

  • Build and deploy your application. Here we will use a test application from OpenShift.

    oc new-project test-cores
    oc new-app ruby~https://github.com/openshift/ruby-ex.git
  • Schedule a test pod to your node

  • RSH to your pod, find the main application PID

    $oc rsh POD_NAME`
    -$ ps aux
  • Kill your application to test the core dump was generated

    -$ kill -11 <PID>

    or

    -$ kill -s SIGSEGV $$
  • On the node, check the location the core dump was written, or if using systemd, run the following:

    coredumpctl list

Testing Core Dumps on Apache HTTPD

Apache Web Server handles SIGSEGV itself in a special way. You must also configure how apache handles this, by setting CoreDumpDirectory and pointing that directive to a persistent storage location within the pod itself. You could configure a HostPath to go to the same location as other core dumps, but this is an unnecessary security concern, and it is easier and safer to add and remove a PersistentVolume specifically for capturing the core dump.

See the following repo for an example configuration in an s2i build.

$ cat httpd-cfg/coredumps.conf

You can also achieve this by using a posthook execution

Testing Core Dumps on Java

Add flag to java startup.

-XX:OnError

Additional Documentation

Add "default-ulimits": {"core": {"Name": "core", "Hard": -1, "Soft": -1}} to /etc/docker/daemon.json