Patch node.js to run on Android <= 4.0.4

TL;DR node.js crashes when ran on Android API level 15 and below due to libuv use of pthread_sigmask which is broken on older versions of Android. If libuv is patched with the fix for that function everything works fine.

As part of the journey to try and run node.js everywhere, I've recently came across an interesting issue of running node.js on Android devices with API level 15 and below. (Or, Android versions 4.0.4 and below, which apperently account for more than 10% of Android's market share).

The ability to build and run node.js on the Android platform has been around for quite some time now, and given the node.js source code, a Linux machine and an NDK copy, it should be pretty straight forward.

However, when trying to run node.js on older Android devices, it seems to immediately crash with the following cryptic error message:

I/DEBUG﹕ signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbaad  
I/DEBUG﹕ r0 deadbaad  r1 00000001  r2 40000000  r3 00000000  
I/DEBUG﹕ r4 00000000  r5 00000027  r6 0000000a  r7 4aae8bf8  
I/DEBUG﹕ r8 00000004  r9 00000003  10 0000004d  fp 4b51c964  
I/DEBUG﹕ ip ffffffff  sp 4b51c930  lr 4001f121  pc 4001b880  cpsr 60000030  
I/DEBUG﹕ d0  0000000000000000  d1  0000000000000000  
I/DEBUG﹕ d2  0000000000000000  d3  4370000043708000  
I/DEBUG﹕ d4  0000000041c00000  d5  3f80000000000000  
I/DEBUG﹕ d6  0000000000000000  d7  0000000000000000  
I/DEBUG﹕ d8  0000000000000000  d9  0000000000000000  
I/DEBUG﹕ d10 0000000000000000  d11 0000000000000000  
I/DEBUG﹕ d12 0000000000000000  d13 0000000000000000  
I/DEBUG﹕ d14 0000000000000000  d15 0000000000000000  
I/DEBUG﹕ scr 60000012  
I/DEBUG﹕ #00  pc 00017880  /system/lib/libc.so  
I/DEBUG﹕ #01  lr 4001f121  /system/lib/libc.so  
I/DEBUG﹕ code around pc:  
I/DEBUG﹕ 4001b860 4623b15c 2c006824 e026d1fb b12368db  
I/DEBUG﹕ 4001b870 21014a17 6011447a 48124798 24002527  
I/DEBUG﹕ 4001b880 f7f47005 2106ee60 eeeef7f5 460aa901  
I/DEBUG﹕ 4001b890 f04f2006 94015380 94029303 eab8f7f5  
I/DEBUG﹕ 4001b8a0 4622a905 f7f52002 f7f4eac2 2106ee4c  
I/DEBUG﹕ code around lr:  
I/DEBUG﹕ 4001f100 41f0e92d 46804c0c 447c2600 68a56824  
I/DEBUG﹕ 4001f110 e0076867 300cf9b5 dd022b00 47c04628  
I/DEBUG﹕ 4001f120 35544306 37fff117 6824d5f4 d1ee2c00  
I/DEBUG﹕ 4001f130 e8bd4630 bf0081f0 000283da 41f0e92d  
I/DEBUG﹕ 4001f140 fb01b086 9004f602 461f4815 4615460c  
I/DEBUG﹕ stack:  
I/DEBUG﹕ 4b51c8f0  002d8448  
I/DEBUG﹕ 4b51c8f4  4004c568  
I/DEBUG﹕ 4b51c8f8  000000d0  
I/DEBUG﹕ 4b51c8fc  4004c5a8  
I/DEBUG﹕ 4b51c900  4004770c  
I/DEBUG﹕ 4b51c904  4004c85c  
I/DEBUG﹕ 4b51c908  00000000  
I/DEBUG﹕ 4b51c90c  4001f121  /system/lib/libc.so  

Unfortunately, the log doesn't seem to give any information on the source of the error, just a reference to the standard c library (libc) and there's not a lot we can do with it.
In such cases, there are basically 2 things I try to do:

  1. Try to debug the thing.
  2. Add logs everywhere.

Since node.js's source code is pretty big, the first option seemed more promising.
It took some twisting and turning, but after 1-2 days, I was able to make ndk-gdb work with node.js on android, which means that I can now set breakpoints, and inspect local variable values, among other things.

[There is plenty of documentation out there on how to get ndk-gdb working,so we're not gonna spend any time on this part, but the main advice I can tell you about running ndk-gsb is that you should pay close attention carefully to its error messages and don't be afraid to change the script in order to make it specifically work for your app]

After spending some time on setting up some breakpoints in various code paths in node, I was able to narrow down the source of the SIGSEGV signal to line 103 in libuv's signal.c:

.....
static void uv__signal_block_and_lock(sigset_t* saved_sigmask) {  
  sigset_t new_mask;

  if (sigfillset(&new_mask))
    abort();

  if (pthread_sigmask(SIG_SETMASK, &new_mask, saved_sigmask))
    abort();  // line 103

  if (uv__signal_lock())
    abort();
}
....

After inspecting the return value of the call to pthread_sigmask it seems that it always fails with the return value of 22, or EINVAL, which causes the 2nd if clause to call abort, which results with the SIGSEGV we were seeing earlier.

Some more digging up, and apparently, pthread_sigmask not working on Android API <=15 is a known issue!

Looking at the change set that fixed this issue for API level 16, it seems like it's a rather small change that we can try and incorporate into libuv's signal.c.

We start by adding the fix from the android source base above and a new pthread_sigmask_patched method in which we will first try to call to the system's pthread_sigmask function, and if it fails with an EINVAL, we'll try to call the fixed pthread_sigmask version.

/* signal.c code here... */

// --- Start of Android platform fix --

/* Despite the fact that our kernel headers define sigset_t explicitly
 * as a 32-bit integer, the kernel system call really expects a 64-bit
 * bitmap for the signal set, or more exactly an array of two-32-bit
 * values (see $KERNEL/arch/$ARCH/include/asm/signal.h for details).
 *
 * Unfortunately, we cannot fix the sigset_t definition without breaking
 * the C library ABI, so perform a little runtime translation here.
 */
typedef union {  
    sigset_t   bionic;
    uint32_t   kernel[2];
} kernel_sigset_t;

/* this is a private syscall stub */
extern int __rt_sigprocmask(int, const kernel_sigset_t *, kernel_sigset_t *, size_t);

int pthread_sigmask_android16(int how, const sigset_t *set, sigset_t *oset)  
{
    int ret, old_errno = errno;

    /* We must convert *set into a kernel_sigset_t */
    kernel_sigset_t  in_set, *in_set_ptr;
    kernel_sigset_t  out_set;

    in_set.kernel[0]  = in_set.kernel[1]  =  0;
    out_set.kernel[0] = out_set.kernel[1] = 0;

    /* 'in_set_ptr' is the second parameter to __rt_sigprocmask. It must be NULL
     * if 'set' is NULL to ensure correct semantics (which in this case would
     * be to ignore 'how' and return the current signal set into 'oset'.
      */

    if (set == NULL) {
        in_set_ptr = NULL;
    } else {
        in_set.bionic = *set;
        in_set_ptr = &in_set;
    }

    ret = __rt_sigprocmask(how, in_set_ptr, &out_set, sizeof(kernel_sigset_t));
     if (ret < 0)
         ret = errno;

    if (oset)
        *oset = out_set.bionic;

     errno = old_errno;
     return ret;
}

// --- End of Android platform fix --

// first try to call pthread_sigmask, in case of failure try again with the API 16 fix
int pthread_sigmask_patched(int how, const sigset_t *set, sigset_t *oset) {  
  int ret = pthread_sigmask(how, set, oset);
  if (ret == EINVAL) {
    return pthread_sigmask_android16(how, set, oset);
  }
}

/* more signal.c code here... */

Additionally, we also change the 2 methods in signal.c that uses pthread_sigmask to use the patched version instead:

static void uv__signal_block_and_lock(sigset_t* saved_sigmask) {  
  sigset_t new_mask;

  if (sigfillset(&new_mask))
    abort();
  // Code was changed here in order to fix android API <= 15 broken pthread_sigmask issue
  // original code called directly pthread_sigmask
  if (pthread_sigmask_patched(SIG_SETMASK, &new_mask, saved_sigmask))
    abort();

  if (uv__signal_lock())
    abort();
}


static void uv__signal_unlock_and_unblock(sigset_t* saved_sigmask) {  
  if (uv__signal_unlock())
    abort();

  // Code was changed here in order to fix android API <= 15 broken pthread_sigmask issue
  // original code called directly pthread_sigmask
  if (pthread_sigmask_patched(SIG_SETMASK, saved_sigmask, NULL))
    abort();
}

Compiling and trying again to run node.js...and guess what? node starts as expected, no crashes, and everything seems to work fine!

Pretty miraculously, this was everything needed in order to make node.js run on older Android versions!