OpenShift Application Core Dumps
#containers , #applications , #developers , #core , #dumps
OpenShift Application Core Dumps
When logs fail to provide us the information we need to diagnose an application problem, we may find it useful to take core dumps of memory, showing us the processes as they are currently running in the system. This is not something we want to do on a regular basis in production. Ideally such problems are discovered during application performance and load testing in lower environments. In reality we frequently find something unique about the real-world application load that our test scenarios could never uncover. However, it is also a great practice to do regular analysis of core dumps of your application during its testing phases.
One of the awesome features of kubernetes and OpenShift in general is the ability to control deployment scenarios. Unfortunately we don’t have a specific feature yet to allow us to capture core dumps only on specific application pods. However, we can control core dumps by the nodes on which the applications are deployed. And since we are running kubernetes, we have very simple but powerful control over our application placement. We can schedule our pods to be deployed on nodes that are ready to capture core dumps while preventing other pods that do not need core dumps to be prevented from running on these same nodes.
Configuring a Node to Capture Core Dumps
These notes are for OpenShift 3.x where you have control over changing the node directly. I’ll share OpenShift 4.x configurations later.
-
Change the core dump pattern to be written straight to temp file
echo "/tmp/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern
-
Change the limits on the node, which should permit all users to capture an unlimited size core dump. If you know your application well, you may be able to set this with a more logical numeric restriction. CAUTION! You should be aware of the risk of capturing large core dumps. It would behoove you to have a separate partition for storing them.
ulimit -c unlimited vi /etc/security/limits.conf * hard core unlimited
-
Ensure these values, to allow full daemon process dumping
sysctl -w fs.suid_dumpable=1 sysctl -p
-
Make a temp folder, that matches the folder you specified in your
core_pattern
filemkdir -p /tmp/cores chmod a+rwx /tmp/cores
You can, however, use systemd-coredump to manage this, and configure the MaxUse or KeepFree parameters as explained in the documentation[1]:
-
Change the core dump pattern to be processed by systemd
echo '|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e' > /proc/sys/kernel/core_pattern
-
If processing with systemd, your files will show up here:
/var/lib/systemd/coredump/
You could also write a custom filter script and specifying that in the core_pattern, for example:
# cat /proc/sys/kernel/core_pattern |/usr/libexec/my-custom-core-filter %P %u %g %s %t %c %e
You could then filter the coredumps you want to keep based on patterns or app names.[2]
Acquired from Robert Bost.
Here is the RFE for core dumps at pod or container level. Note that all RFEs are now in JIRA so this may be elsewhere now.
Testing Core Dumps
-
Build and deploy your application. Here we will use a test application from OpenShift.
oc new-project test-cores oc new-app ruby~https://github.com/openshift/ruby-ex.git
-
Schedule a test pod to your node
-
RSH to your pod, find the main application PID
$oc rsh POD_NAME` -$ ps aux
-
Kill your application to test the core dump was generated
-$ kill -11 <PID>
or
-$ kill -s SIGSEGV $$
-
On the node, check the location the core dump was written, or if using systemd, run the following:
coredumpctl list
Testing Core Dumps on Apache HTTPD
Apache Web Server handles SIGSEGV itself in a special way. You must also configure how apache handles this, by setting CoreDumpDirectory
and pointing that directive to a persistent storage location within the pod itself. You could configure a HostPath to go to the same location as other core dumps, but this is an unnecessary security concern, and it is easier and safer to add and remove a PersistentVolume specifically for capturing the core dump.
See the following repo for an example configuration in an s2i build.
$ cat httpd-cfg/coredumps.conf
You can also achieve this by using a posthook execution
Testing Core Dumps on Mysql
Testing Core Dumps on Java
Add flag to java startup.
-XX:OnError
Additional Documentation
-
How to enable core file dumps when an application crashes or segmentation faults
-
How to enable coredumps for daemon process (services) in RHEL?
-
How to collect core dump file of a crashing program that is shipped in Red Hat Enterprise Linux 6/7?
-
Bug 1596284 - [RFE[GSS] Stop creating core dumps inside gluster pod root directory]
-
Documentation: Make clear instructions for getting a core file, when container crashes #11740
Add "default-ulimits": {"core": {"Name": "core", "Hard": -1, "Soft": -1}} to /etc/docker/daemon.json
-
https://github.com/sclorg/mysql-container/blob/master/5.7/Dockerfile.rhel7
-
https://www.freedesktop.org/software/systemd/man/systemd-coredump.html
-
https://www.freedesktop.org/software/systemd/man/coredump.conf.html
-
https://www.freedesktop.org/software/systemd/man/coredumpctl.html
-
https://dev.mysql.com/doc/refman/5.7/en/using-system-variables.html
-
https://www.percona.com/blog/2011/08/26/getting-mysql-core-file-on-linux/
-
https://access.redhat.com/documentation/en-us/red_hat_developer_toolset/7/html-single/user_guide/
-
https://github.com/sclorg/devtoolset-container/tree/master/7-toolchain