Tag Archives: CVE-2015-0235

Emergency Security Band-Aids with Systemtap

Software security vulnerabilities are a fact of life. So is the subsequent publicity, package updates, and suffering service restarts. Administrators are used to it, and users bear it, and it’s a default and traditional method.

On the other hand, in some circumstances the update & restart methods are unacceptable, leading to the development of online fix facilities like kpatch, where code may be surgically replaced in a running system. There is plenty of potential in these systems, but they are still at an early stage of deployment.

In this article, we present another option: a limited sort of live patching using systemtap. This tool, now a decade old, is conventionally thought of as a tracing widget, but it can do more. It can not only monitor the detailed internals of the Linux kernel and user-space programs, it can also change them – a little. It turns out to be just enough to defeat some classes of security vulnerabilities. We refer to these as security “band-aids” rather than fixes, because we expect them to be used temporarily.

Systemtap

Systemtap is a system-wide programmable probing tool introduced in 2005, and supported since RHEL 4 on all Red Hat and many other Linux distributions. (Its capabilities vary with the kernel’s. For example, user-space probing is not available in RHEL 4.) It is system-wide in the sense that it allows looking into much of the software stack: from device drivers, kernel core, system libraries, through user-applications. It is programmable because it operates based on programs in the systemtap scripting language in order to specify what operations to perform. It probes in the sense that, like a surgical instrument, it safely opens up running software so that we can peek and poke at its internals.

Systemtap’s script language is inspired by dtrace, awk, C, intended to be easy to understand and compact while expressive. Here is hello-world:

probe oneshot { printf("hello worldn") }

Here is some counting of vfs I/O:

 global per_pid_io # an array
 probe kernel.function("vfs_read") 
     { per_pid_io["read",pid()] += $count }
 probe kernel.function("vfs_write")
     { per_pid_io["write",pid()] += $count }
 probe timer.s(5) { exit() } # per_pid_io will be printed

Here is a system-wide strace for non-root processes:

probe syscall.*
{
    if (uid() != 0)
        printf("%s %d %s %sn", execname(), tid(), name, argstr)
}

Additional serious and silly samples are available on the Internet and also distributed with systemtap packages.

The meaning of a systemtap script is simple: whenever a given probed event occurs, pause that context, evaluate the statements in the probe handler atomically (safely and quickly), then resume the context. Those statements can inspect source-level state in context, trace it, or store it away for later analysis/summary.

The systemtap scripting language is well-featured. It has normal control flow statements, functions with recursion. It deals with integral and string data types, and look-up tables are available. An unusual feature for a small language, full type checking is paired with type inference, so type declarations are not necessary. There exist dozens of types of probes, like timers, calls/interiors/returns from arbitrary functions, designated tracing instrumentation in the kernel or user-space, hardware performance counter values, Java methods, and more.

Systemtap scripts are run by algorithmic translation to C, compilation to machine code, and execution within the kernel as a loadable module. The intuitive hazards of this technique are ameliorated by prolific checking throughout the process, both during translation, and within the generated C code and its runtime. Both time and space usage are limited, so experimentation is safe. There are even modes available for non-root users to inspect only their own processes.  This capability has been used for a wide range of purposes:

  • performance tuning via profiling
  • program-understanding via pinpoint tracing, dynamic call-graphs, through pretty-printing local variables line by line.

Many scripts can run independently at the same time and any can be completely stopped and removed. People have even written interactive games!

Methodology

While systemtap is normally used in a passive (read-only) capacity, it may be configured to permit active manipulations of state. When invoked in “guru mode”, a probe handler may send signals, insert delays, change variables in the probed programs, and even run arbitrary C code. Since systemtap cannot change the program code, changing data is the approach of choice for construction of security band-aids. Here are some of the steps involved, once a vulnerability has been identified in some piece of kernel or user code.

Plan

To begin, we need to look at the vulnerable code bug and, if available, the corrective patch to understand:

  • Is the bug mainly (a) data-processing-related or (b) algorithmic?
  • Is the bug (a) localized or (b) widespread?
  • Is the control flow to trigger the bug (a) simple or (b) complicated?
  • Are control flow paths to bypass the bug (a) available nearby (included in callers) or (b) difficult to reach?
  • Is the bug dependent mainly on (a) local data (such as function parameters) or (b) global state?
  • Is the vulnerability-triggering data accessible over (a) a broad range of the function (a function parameter or global) or only (b) a narrow window (a local variable inside a nested block)?
  • Are the bug triggering conditions (a) specific and selective or (b) complex and imprecise?
  • Are the bug triggering conditions (a) deliberate or (b) incidental in normal operation?

More (a)s than (b)s means it’s more likely that systemtap band-aids would work in the particular situation while more (b)s means it’s likely that patching the traditional way would be best.

Then we need to decide how to change the system state at the vulnerable point. One possibility is to change to a safer error state; the other is to change to a correct state.

In a type-1 band-aid, we will redirect flow of control away from the vulnerable areas. In this approach, we want to “corrupt” incoming data further in the smallest possible way necessary to short-circuit the function to bypass the vulnerable regions. This is especially appropriate if:

  • correcting the data is difficult, perhaps because it is in multiple locations, or because it needs to be temporary or no obvious corrected value can be computed
  • error handling code already exists and is accessible
  • we don’t have a clear identification of vulnerable data states and want to err on the side of error-handling
  • if the vulnerability is deliberate so we don’t want to spend the effort of performing a corrected operation

A type-2 band-aid is correcting data so that the vulnerable code runs correctly. This is especially appropriate in the complementary cases from the above and if:

  • the vulnerable code and data can occur from real workloads so we would like them to succeed
  • corrected data can be practically computed from nearby state
  • natural points occur in the vulnerable code where the corrected data may be inserted
  • natural points occur in the vulnerable code where clean up code (restoring of previous state) may be inserted, if necessary

Implement

With the vulnerability-band-aid approach chosen, we need to express our intent in the systemtap scripting language. The model is simple: for each place where the state change is to be done we place a probe. In each probe handler, we detect whether the context indicates an exploit is in progress and, if so, make changes to the context. We might also need additional probes to detect and capture state from before the vulnerable section of code, for diagnostic purposes.

A minimal script form for changing state can be easily written. It demonstrates one kernel and one user-space function-entry probe, where each happens to take a parameter named p that needs to be range-limited. (The dollar sign identifies the symbol as a variable in the context of the probed program, not as a script-level temporary variable.)

probe kernel.function("foo"),
      process("/lib*/libc.so").function("bar")
{
    if ($p > 100)
        $p = 4
}

Another possible action in the probe handler is to deliver a signal to the current user-space process using the raise function. In this script a global variable in the target program is checked at every statement in the given source code file and line-number-range and deliver a killing blow if necessary:

probe process("/bin/foo").statement("*@src/foo.c:100-200")
{
    if (@var("a_global") > 1000)
        raise(9) # SIGKILL
}

Another possible action is logging the attempt at the systemtap process console:

    # ...
    printf("check process %s pid=%d uid=%d",
           execname(), pid(), uid())
    # ...

Or sending a note to the sysadmin:

    # ...
    system(sprintf("/bin/logger check process %s pid=%d uid=%d",
                   execname(), pid(), uid()))
    # ...

These and other actions may be done in any combination.

During development of a band-aid one should start with just tracing (no band-aid countermeasures) to fine-tune the detection of the vulnerable state. If an exploit is available run it without systemtap, with systemtap (tracing only), and with the operational band-aid. If normal workload can trigger the bug run it with and without the same spectrum of systemtap supervision to confirm that we’re not harming that traffic.

Deploy

To run a systemtap script, we will need systemtap on a developer workstation. Systemtap has been included in RHEL 4 and later since 2006. RHEL 6 and RHEL 7 still receive rebases from new upstream releases, though capabilities vary. (Systemtap upstream is tested against a gamut of RHEL 4 through fresh kernel.org kernels, and against other distributions.)

In addition to systemtap itself, security band-aid type scripts usually require source-level debugging information for the buggy programs. That is because we need the same symbolic information about types, functions, and variables in the program as an interactive debugger like gdb does. On Red Hat/Fedora distributions this means the “-debuginfo” RPMs available on RHN and YUM. Some other distributions make them available as “-dbgsym” packages. Systemtap scripts that probe the kernel will probably need the larger “kernel-debuginfo”; those that probe user-space will probably need a corresponding “foobar-debuginfo” package. (Systemtap will tell you if it’s missing.)

Running systemtap security band-aid scripts will generally require root privileges and a “guru-mode” flag to designate permission to modify state such as:

# stap -g band_aid.stp
[...]
^C

The script will inject instrumentation into existing and future processes and continue running until it is manually interrupted, it stops itself, or error conditions arise. In case of most errors, the script will stop cleanly, print a diagnostic message, and point to a manual page with further advice. For transient or acceptable conditions command line options are available to suppress some safety checks altogether.

Systemtap scripts may be distributed to a network of homogeneous workstations in “pre-compiled” (kernel-object) form, so that a full systemtap + compiler + debuginfo installation is not necessary on the other computers. Systemtap includes automation for remote compilation and execution of the scripts. A related facility is available to create MOK-signed modules for machines running under SecureBoot.  Scripts may also be installed for automatic execution at startup via initscripts.

Some examples

Of all the security bugs for which systemtap band-aids have been published, we analyse a few below.

CVE-2013-2094, perf_swevent_enabled array out-of-bound access

This was an older bug in the kernel’s perf-event subsystem which takes a complex command struct from a syscall. The bug involved missing a range check inside the struct pointed to by the event parameter. We opted for a type-2 data-correction fix, even though the a type-1 failure-induction could have worked as well.

This script demonstrates an unusual technique: embedded-C code called from the script, to adjust the erroneous value. There is a documented argument-passing API between embedded-C and script, but systemtap cannot analyze or guarantee anything about the safety of the C code. In this case, the same calculation could have been expressed within the safe scripting language, but it serves to demonstrate how a more intricate correction could be fitted.

# declaration for embedded-C code
%{
#include <linux/perf_event.h>
%}
# embedded-C function - note %{ %} bracketing
function sanitize_config:long (event:long) %{
    struct perf_event *event;
    event = (struct perf_event *) (unsigned long) STAP_ARG_event;
    event->attr.config &= INT_MAX;
%}

probe kernel.function("perf_swevent_init").call {
    sanitize_config($event)  # called with pointer
}

CVE-2015-3456, “venom”

Here is a simple example from the recent VENOM bug, CVE-2015-3456, in QEMU’s floppy-drive emulation code. In this case, a buffer-overflow bug allows some user-supplied data to overwrite unrelated memory. The official upstream patch adds explicit range limiting for an invented index variable pos, in several functions. For example:

@@ -1852,10 +1852,13 @@
 static void fdctrl_handle_drive_specification_command(FDCtrl *fdctrl, int direction)
 {
     FDrive *cur_drv = get_cur_drv(fdctrl);
+    uint32_t pos;
-    if (fdctrl->fifo[fdctrl->data_pos - 1] & 0x80) {
+    pos = fdctrl->data_pos - 1;
+    pos %= FD_SECTOR_LEN;
+    if (fdctrl->fifo[pos] & 0x80) {
         /* Command parameters done */
-        if (fdctrl->fifo[fdctrl->data_pos - 1] & 0x40) {
+        if (fdctrl->fifo[pos] & 0x40) {
             fdctrl->fifo[0] = fdctrl->fifo[1];
             fdctrl->fifo[2] = 0;
             fdctrl->fifo[3] = 0;

Inspecting the original code, we see that the vulnerable index was inside a heap object at fdctrl->data_pos. A type-2 systemtap band-aid would have to adjust that value before the code runs the fifo[] dereference, and subsequently restore the previous value. This might be expressed like this:

global saved_data_pos
probe process("/usr/bin/qemu-system-*").function("fdctrl_*spec*_command").call
{
    saved_data_pos[tid()] = $fdctrl->data_pos;
    $fdctrl->data_pos = $fdctrl->data_pos % 512 # FD_SECTOR_LEN
}
probe process("/usr/bin/qemu-system-*").function("fdctrl_*spec*_command").return
{
    $fdctrl->data_pos = saved_data_pos[tid()]
    delete saved_data_pos[tid()]
}

The same work would have to be done at each of the three analogous vulnerable sites unless further detailed analysis suggests that a single common higher-level function could do the job.

However, this is probably too much work. The CVE advisory suggests that any call to this area of code is likely a deliberate exploit attempt (since modern operating systems don’t use the floppy driver). Therefore, we could opt for a type-1 band-aid, where we bypass the vulnerable  computations entirely. We find that all the vulnerable functions ultimately have a common caller, fdctrl_write.

static void fdctrl_write (void *opaque, uint32_t reg, uint32_t value)
{
    FDCtrl *fdctrl = opaque;
    [...]
    reg &= 7;
    switch (reg) {
    case FD_REG_DOR:
        fdctrl_write_dor(fdctrl, value);
        break;
        [...]
    case FD_REG_CCR:
        fdctrl_write_ccr(fdctrl, value);
        break;
    default:
        break;
    }
}

We can disarm the entire simulated floppy driver by pretending that the simulated CPU is addressing a reserved FDC register, thus falling through to the default: case. This requires just one probe. Here we’re being more conservative than necessary, overwriting only the low few bits:

probe process("/usr/bin/qemu-system-*").function("fdctrl_write")
{
    $reg = (($reg & ~7) | 6) # replace register address with 0x__6
}

CVE-2015-0235, “ghost”

This recent bug in glibc involved a buffer overflow related to dynamic allocation with a miscalculated size. It affected a function that is commonly used in normal software, and the data required to determine whether the vulnerability would be triggered or not is not available in situ. Therefore, a type-1 error-inducing band-aid would not be appropriate.

However, it is a good candidate for type-2 data-correction. The script below works by incrementing the size_needed variable set around line 86 of glibc nss/digits_dots.c, so as to account for the missing sizeof (*h_alias_ptr). This makes the subsequent comparisons work and return error codes for buffer-overflow situations.

85
86 size_needed = (sizeof (*host_addr)
87             + sizeof (*h_addr_ptrs) + strlen (name) + 1);
88
89 if (buffer_size == NULL)
90   {
91     if (buflen < size_needed)
92       {
93         if (h_errnop != NULL)
94           *h_errnop = TRY_AGAIN;
95         __set_errno (ERANGE);
96         goto done;
97       }
98   }
99 else if (buffer_size != NULL && *buffer_size < size_needed)
100  {
101    char *new_buf;
102    *buffer_size = size_needed;
103    new_buf = (char *) realloc (*buffer, *buffer_size);

The script demonstrates an unusual technique. The variable in need of correction (size_needed) is deep within a particular function so we need “statement” probes to place it before the bad value is used. Because of compiler optimizations, the exact line number where the probe may be placed can’t be known a prior so we ask systemtap to try a whole range. The probe handler than protects itself against being invoked more than once (per function call) using an auxiliary flag array.

global added%
global trap = 1 # stap -G trap=0 to only trace, not fix
probe process("/lib*/libc.so.6").statement("__nss_hostname_digits_dots@*:87-102")
{
    if (! added[tid()])
    {
        added[tid()] = 1; # we only want to add once
        printf("%s[%d] BOO! size_needed=%d ", execname(), tid(),
 $size_needed)
        if (trap)
        {
            # The &@cast() business is a fancy sizeof(uintptr_t),
            # which makes this script work for both 32- and 64-bit glibc's.
            $size_needed = $size_needed + &@cast(0, "uintptr_t")[1]
            printf("ghostbusted to %d", $size_needed)
        }
    printf("n")
    }
}

probe process("/lib*/libc.so.6").function("__nss_hostname_digits_dots").return
{
    delete added[tid()] # reset for next call
}

This type-2 band-aid allows applications to operate as through glibc was patched.

Conclusions

We hope you enjoyed this foray into systemtap and its unexpected application as a potential band-aid for security bugs. If you would like to learn more, read our documentation, contact our team, or just go forth and experiment. If this technology seems like a fit for your installation and situation, consult your vendor for a possible systemtap band-aid.

Don’t judge the risk by the logo

It’s been almost a year since the OpenSSL Heartbleed vulnerability, a flaw which started a trend of the branded vulnerability, changing the way security vulnerabilities affecting open-source software are being reported and perceived. Vulnerabilities are found and fixed all the time, and just because a vulnerability gets a name and a fancy logo doesn’t mean it is of real risk to users.

ven1

So let’s take a tour through the last year of vulnerabilities, chronologically, to see what issues got branded and which issues actually mattered for Red Hat customers.

“Heartbleed” (April 2014)CVE-2014-0160

Heartbleed was an issue that affected newer versions of OpenSSL. It was a very easy to exploit flaw, with public exploits released soon after the issue was public. The exploits could be run against vulnerable public web servers resulting in a loss of information from those servers. The type of information that could be recovered varied based on a number of factors, but in some cases could include sensitive information. This flaw was widely exploited against unpatched servers.

For Red Hat Enterprise Linux, only customers running version 6.5 were affected as prior versions shipped earlier versions of OpenSSL that did not contain the flaw.

Apache Struts 1 Class Loader RCE (April 2014) CVE-2014-0114

This flaw allowed attackers to manipulate exposed ClassLoader properties on a vulnerable server, leading to remote code execution. Exploits have been published but they rely on properties that are exposed on Tomcat 8, which is not included in any supported Red Hat products. However, some Red Hat products that ship Struts 1 did expose ClassLoader properties that could potentially be exploited.

Various Red Hat products were affected and updates were made available.

OpenSSL CCS Injection (June 2014) CVE-2014-0224

After Heartbleed, a number of other OpenSSL issues got attention. CCS Injection was a flaw that could allow an attacker to decrypt secure connections. This issue is hard to exploit as it requires a man in the middle attacker who can intercept and alter network traffic in real time, and as such we’re not aware of any active exploitation of this issue.

Most Red Hat Enterprise Linux versions were affected and updates were available.

glibc heap overflow (July 2014) CVE-2014-5119

A flaw was found inside the glibc library where an attacker who is able to make an application call a specific function with a carefully crafted argument could lead to arbitrary code execution. An exploit for 32-bit systems was published (although this exploit would not work as published against Red Hat Enterprise Linux).

Some Red Hat Enterprise Linux versions were affected, in various ways, and updates were available.

JBoss Remoting RCE (July 2014) CVE-2014-3518

A flaw was found in JBoss Remoting where a remote attacker could execute arbitrary code on a vulnerable server. A public exploit is available for this flaw.

Red Hat JBoss products were only affected by this issue if JMX remoting is enabled, which is not the default. Updates were made available.

“Poodle” (October 2014) CVE-2014-3566

Continuing with the interest in OpenSSL vulnerabilities, Poodle was a vulnerability affecting the SSLv3 protocol. Like CCS Injection, this issue is hard to exploit as it requires a man in the middle attack. We’re not aware of active exploitation of this issue.

Most Red Hat Enterprise Linux versions were affected and updates were available.

“ShellShock” (September 2014) CVE-2014-6271

The GNU Bourne Again shell (Bash) is a shell and command language interpreter used as the default shell in Red Hat Enterprise Linux. Flaws were found in Bash that could allow remote code execution in certain situations. The initial patch to correct the issue was not sufficient to block all variants of the flaw, causing distributions to produce more than one update over the course of a few days.

Exploits were written to target particular services. Later, malware circulated to exploit unpatched systems.

Most Red Hat Enterprise Linux versions were affected and updates were available.

RPM flaws (December 2014) CVE-2013-6435, CVE-2014-8118

Two flaws were found in the package manager RPM. Either could allow an attacker to modify signed RPM files in such a way that they would execute code chosen by the attacker during package installation. We know CVE-2013-6435 is exploitable, but we’re not aware of any public exploits for either issue.

Various Red Hat Enterprise Linux releases were affected and updates were available.

“Turla” malware (December 2014)

Reports surfaced of a trojan package targeting Linux, suspected as being part of an “advance persistent threat” campaign. Our analysis showed that the trojan was not sophisticated, was easy to detect, and unlikely part of such a campaign.

The trojan does not use any vulnerability to infect a system, it’s introduction onto a system would be via some other mechanism. Therefore it does not have a CVE name and no updates are applicable for this issue.

“Grinch” (December 2014)

An issue was reported which gained media attention, but was actually not a security vulnerability. No updates were applicable for this issue.

“Ghost” (January 2015) CVE-2015-0235

A bug was found affecting certain function calls in the glibc library. A remote attacker that is able to make an application call to an affected function could execute arbitrary code. While a proof of concept exploit is available, not many applications were found to be vulnerable in a way that would allow remote exploitation.

Red Hat Enterprise Linux versions were affected and updates were available.

“Freak” (March 2015) CVE-2015-0204

It was found that OpenSSL clients accepted EXPORT-grade (insecure) keys even when the client had not initially asked for them. This could be exploited using a man-in-the-middle attack, which could downgrade to a weak key, factor it, then decrypt communication between the client and the server. Like Poodle and CCS Injection, this issue is hard to exploit as it requires a man in the middle attack. We’re not aware of active exploitation of this issue.

Red Hat Enterprise Linux versions were affected and updates were available.

Other issues of customer interest

We can also get a rough guide of which issues are getting the most attention by looking at the number of page views on the Red Hat CVE pages. While the top views were for the  issues above, also of increased interest was:

  • A kernel flaw (May 2014) CVE-2014-0196, allowing local privilege escalation. A public exploit exists for this issue but does not work as published against Red Hat Enterprise Linux.
  • “BadIRET”, a kernel flaw (December 2014) CVE-2014-9322, allowing local privilege escalation. Details on how to exploit this issue have been discussed, but we’re not aware of any public exploits for this issue.
  • A flaw in BIND (December 2014), CVE-2014-8500. A remote attacker could cause a denial of service against a BIND server being used as a recursive resolver.  Details that could be used to craft an exploit are available but we’re not aware of any public exploits for this issue.
  • Flaws in NTP (December 2014), including CVE-2014-9295. Details that could be used to craft an exploit are available.  These serious issues had a reduced impact on Red Hat Enterprise Linux.
  • A flaw in Samba (February 2015) CVE-2015-0240, where a remote attacker could potentially execute arbitrary code as root. Samba servers are likely to be internal and not exposed to the internet, limiting the attack surface. No exploits that lead to code execution are known to exist, and some analyses have shown that creation of such a working exploit is unlikely.

Conclusion

ven2

We’ve shown in this post that for the last year of vulnerabilities affecting Red Hat products the issues that matter and the issues that got branded do have an overlap, but they certainly don’t closely match. Just because an issue gets given a name, logo, and press attention does not mean it’s of increased risk. We’ve also shown there were some vulnerabilities of increased risk that did not get branded.

At Red Hat, our dedicated Product Security team analyse threats and vulnerabilities against all our products every day, and provide relevant advice and updates through the customer portal. Customers can call on this expertise to ensure that they respond quickly to address the issues that matter, while avoiding being caught up in a media whirlwind for those that don’t.