Saturday, 6 November 2010

Mono 2.8: a step closer to a reliable foundation

We previously complained about the use of Boehm's conservative garbage collector in earlier versions of Mono because it is fundamentally flawed and prone to causing unpredictable memory leaks that result in applications dying with out-of-memory errors when there is plenty of garbage left to be reclaimed. Specifically, we gave a simple 9-line example program that fills and forgets ten hash tables that ran out of memory when run on Mono 2.4. What happens when this program is run on Mono 2.8 using the new SGen garbage collector?

Running the test with Mono 2.8 using the default Boehm GC often reproduces the same leak that we saw before, as expected. Repeating our previous test using the new SGen garbage collector we find that the program does not die after four iterations with an out-of-memory error but gets as far as eight of the intended ten iterations before dying with a segmentation fault:

$ mono-sgen TailCall.exe
m[42] = 42
Took 3.40511s

m[42] = 42
Took 3.41273s

m[42] = 42
Took 3.20464s

m[42] = 42
Took 3.96534s

m[42] = 42
Took 3.14944s

m[42] = 42
Took 3.10114s

m[42] = 42
Took 3.14187s

m[42] = 42
Took 3.27123s

Stacktrace:

  at (wrapper managed-to-native) object.__icall_wrapper_mono_gc_alloc_vector (intptr,intptr,intptr) <0x00003>
  at (wrapper managed-to-native) object.__icall_wrapper_mono_gc_alloc_vector (intptr,intptr,intptr) <0x00003>
  at (wrapper alloc) object.AllocVector (intptr,intptr) <0x000ac>
  at System.Collections.Generic.Dictionary`2<double, double>.Resize () <0x001bc>
  at System.Collections.Generic.Dictionary`2<double, double>.set_Item (double,double) <0x0014f>
  at <StartupCode$TailCall>.$Program.main@ () <0x0007c>
  at (wrapper runtime-invoke) object.runtime_invoke_void (object,intptr,intptr,intptr) <0x0007d>

Native stacktrace:

        mono-sgen [0x80dec34]
        mono-sgen [0x812b2cb]
        [0xb76f3410]
        mono-sgen [0x8174e17]
        mono-sgen [0x8175428]
        [0xb72ecb0b]
        [0xb72e97d5]
        [0xb72ec695]
        [0xb72ec2a8]
        [0xb72e8d9d]
        [0xb72e8fd6]
        mono-sgen [0x8065318]
        mono-sgen(mono_runtime_invoke+0x40) [0x81a9aa0]
        mono-sgen(mono_runtime_exec_main+0xd6) [0x81ad1f6]
        mono-sgen(mono_main+0x1a41) [0x80bb501]
        mono-sgen [0x805b388]
        /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xb7451b56]
        mono-sgen [0x805b131]

Debug info from gdb:

[Thread debugging using libthread_db enabled]
[New Thread 0xb7103b70 (LWP 8401)]
0xb76f3430 in __kernel_vsyscall ()
  2 Thread 0xb7103b70 (LWP 8401)  0xb76f3430 in __kernel_vsyscall ()
* 1 Thread 0xb7439720 (LWP 8400)  0xb76f3430 in __kernel_vsyscall ()

Thread 2 (Thread 0xb7103b70 (LWP 8401)):
#0  0xb76f3430 in __kernel_vsyscall ()
#1  0xb75a9f75 in sem_wait@@GLIBC_2.1 ()
    at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/sem_wait.S:80
#2  0x0822c778 in mono_sem_wait (sem=0x89ce64c, alertable=0)
    at mono-semaphore.c:102
#3  0x081560c7 in finalizer_thread (unused=0x0) at gc.c:1048
#4  0x08183065 in start_wrapper (data=0xa37c760) at threads.c:747
#5  0x0821a7df in thread_start_routine (args=0xa36762c) at wthreads.c:285
#6  0x0816da8b in gc_start_thread (arg=0xa37c808) at sgen-gc.c:5350
#7  0xb75a380e in start_thread (arg=0xb7103b70) at pthread_create.c:300
#8  0xb75078de in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

Thread 1 (Thread 0xb7439720 (LWP 8400)):
#0  0xb76f3430 in __kernel_vsyscall ()
#1  0xb75aac8b in read () from /lib/tls/i686/cmov/libpthread.so.0
#2  0x080dedfc in read (signal=11, ctx=0xb72fcd0c)
    at /usr/include/bits/unistd.h:45
#3  mono_handle_native_sigsegv (signal=11, ctx=0xb72fcd0c)
    at mini-exceptions.c:1935
#4  0x0812b2cb in mono_arch_handle_altstack_exception (sigctx=0xb72fcd0c,
    fault_addr=0x8, stack_ovf=0) at exceptions-x86.c:1163
#5  <signal handler called>
#6  alloc_large_inner (vtable=<value optimised out>,
    size=<value optimised out>) at sgen-los.c:368
#7  0x08174e17 in mono_gc_alloc_obj_nolock (vtable=0xa3af948, size=0)
    at sgen-gc.c:3219
#8  0x08175428 in mono_gc_alloc_vector (vtable=0xa3af948, size=147681864,
    max_length=18460231) at sgen-gc.c:3437
#9  0xb72ecb0b in ?? ()
#10 0xb72e97d5 in ?? ()
#11 0xb72ec695 in ?? ()
#12 0xb72ec2a8 in ?? ()
#13 0xb72e8d9d in ?? ()
#14 0xb72e8fd6 in ?? ()
#15 0x08065318 in mono_jit_runtime_invoke (method=0xa330bdc, obj=0x0,
    params=0xbfd1aafc, exc=0x0) at mini.c:5392
#16 0x081a9aa0 in mono_runtime_invoke (method=0xa330bdc, obj=0x0,
    params=0xbfd1aafc, exc=0x0) at object.c:2709
#17 0x081ad1f6 in mono_runtime_exec_main (method=0xa330bdc, args=0xb6c00638,
    exc=0x0) at object.c:3838
#18 0x080bb501 in main_thread_handler (argc=2, argv=0xbfd1ace4) at driver.c:999
#19 mono_main (argc=2, argv=0xbfd1ace4) at driver.c:1836
#20 0x0805b388 in mono_main_with_options (argc=2, argv=0xbfd1ace4) at main.c:66
#21 main (argc=2, argv=0xbfd1ace4) at main.c:97

=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================

Aborted

Seven years after the Mono team described their use of the Boehm garbage collector as "an interim measure", the SGen collector is still experimental. Hopefully these issues will be resolved and the Mono platform will benefit from a reliable garbage collector in the not too-distant future. However, we cannot help but wonder why the Mono team have not chosen to release a simple but reliable garbage collector that people could use while they wait for SGen to be stabilized. After all, multicore-friendly garbage collection can be easy.