How to reproduce the segmentation faults of Ryzen bug

It’s been reported that sporadic segmentation faults occurred on Linux/Ryzen under heavy CPU loads especially during parallel-compiling large OpenSource code such as Linux kernel, Mesa, LLVM, Chromium, and WebKit. But, it’s difficult to reproduce the segmentation faults due to the low probability and the uncleared reproducibility conditions. I’d like to summarize the best practices in this post.

This kind of problem is known as the Sig11 problem.

This is a different problem with the system reboot problem on DragonFly BSD and on FreeBSD.

Finally, AMD confirms this issue.

Preparations

It’s been reported that some conditions significantly reduced the probability. On the other hand, others have reported such conditions changed nothing. There is nothing obvious at the moment.

Check your following UEFI BIOS settings.

Enable uOP cache
Enable SMT (Simultaneous Multithreading)

uOP cache setting can be found in Advanced > AMD CBS > Zen Common Options > Opcache Control in my UEFI.

These crashes happen not only on Linux but also on Windows Subsystems for Linux, NetBSD, and FreeBSD. Actually Linux is the most reported. This article uses it.

Check ASLR (Address space layout randomization) is enabled. It’s been reported that disabling ASLR significantly reduces the probability.

 $ sysctl kernel.randomize_va_space
kernel.randomize_va_space = 2

It’s already enabled if the value is 2. Enable it otherwise.

$ sudo sysctl kernel.randomize_va_space=2
kernel.randomize_va_space = 2

Check your gcc and bash are not PIE (Position-independent executables). In Ubunt 17.04, gcc and bash aren’t PIE, but clang and dash are PIE.

You can check PIE of executables by checking Type field of the result of readelf -h:

$ readelf -h $(which bash)

It’s PIE if the type is DYN, non-PIE if EXEC.

In Ubunt 17.04, gcc generates PIE as default. It generates non PIE by specifying -no-pie and -fno-pie switches.

$ echo 'int main() { return 0; }' > a.c
$ gcc a.c
$ readelf -h a.out | grep Type
  Type:                              DYN (Shared object file)
$ gcc -no-pie -fno-pie a.c
$ readelf -h a.out | grep Type
  Type:                              EXEC (Executable file)

gcc version 4 seems better than version 5 or later. Because the core component of compilation is the single executable cc1 in gcc-4, but the shared library libcc1.so in gcc-5 which is of course PIC.

Make the coredump size unlimited to get full coredumps.

$ ulimit -c unlimited

And, install debug info packages.

Compile repeatedly

Let’s compile repeatedly until it will fail.

$ while make -j$(nproc); do make clean; done

It takes about ten minutes if you are lucky, about two hours if ordinary, never happens if unlucky.

If it’d happen, collect a coredump file core if it’d be generated. GCC usually doesn’t generate coredump because it catches the segmentation faults by itself and outputs an own error report.

Libtool is implemented as a shell script. I’d like to recommend to compile code using Libtool because it’s easy to get coredump of Bash.

Linux kernel may output some messages. Take the result of dmesg command.

Satoru Takeuchi created a useful script to compile repeatedly.

Postmortem Examination

There is a particular pattern in some of Ryzen’s crashes. According to Hideki EIRAKU’s investigation, Ryzen seems to execute 64 bytes ahead instructions than where RIP register is pointing.

Here is my coredump of bash. Let’s check the coredump.

Current program counter rip was 0x4370d0.

rip            0x4370d0	0x4370d0 <execute_builtin+720>

SIGSEGV was raised while copying [rsp+0x8] to eax immediately after returning from run_unwind_frame.

   0x00000000004370cb <+715>:	call   0x465ef0 <run_unwind_frame>
=> 0x00000000004370d0 <+720>:	mov    eax,DWORD PTR [rsp+0x8]

The stack pointer rsp was 0x7ffe79c8f4d0.

rsp            0x7ffe79c8f4d0	0x7ffe79c8f4d0

Here is the stack dump:

(gdb) x/32g 0x7ffe79c8f450
0x7ffe79c8f450:	0x0000000000000001	0xe375df75c9fcd400
0x7ffe79c8f460:	0x0000000000000000	0x00000000018da5c8
0x7ffe79c8f470:	0x00000000004659c0	0xe375df75c9fcd400
0x7ffe79c8f480:	0x0000000000000000	0x0000000000000000
0x7ffe79c8f490:	0x0000000000000001	0x0000000000000000
0x7ffe79c8f4a0:	0x0000000000000000	0x0000000000000000
0x7ffe79c8f4b0:	0x0000000000484d10	0x0000000000465f10
0x7ffe79c8f4c0:	0x0000000000000001	0x00000000004370d0
0x7ffe79c8f4d0:	0x00000000018dd548	0x0000000000000000
0x7ffe79c8f4e0:	0x0000000000000001	0x000000000180e5c8
0x7ffe79c8f4f0:	0x0000000000000000	0x000000000180e5c8
0x7ffe79c8f500:	0x0000000000484d10	0x0000000000000000
0x7ffe79c8f510:	0x0000000000000000	0x0000000000000000
0x7ffe79c8f520:	0x0000000000000000	0x0000000000439426
0x7ffe79c8f530:	0x0000000000000001	0x00000000ffffffff
0x7ffe79c8f540:	0x00000000018ddea8	0x00000001008dc8c8

[rsp-8] was 0x00000000004370d0. This was the return address which the preceding call pushed.

Here is the code of run_unwind_frame:

(gdb) disassemble run_unwind_frame
Dump of assembler code for function run_unwind_frame:
   0x0000000000465ef0 <+0>:	cmp    QWORD PTR [rip+0x2a8a78],0x0        # 0x70e970 <unwind_protect_list>
   0x0000000000465ef8 <+8>:	je     0x465f17 <run_unwind_frame+39>
   0x0000000000465efa <+10>:	push   rbx
   0x0000000000465efb <+11>:	mov    ebx,DWORD PTR [rip+0x2a8a83]        # 0x70e984 <interrupt_immediately>
   0x0000000000465f01 <+17>:	mov    DWORD PTR [rip+0x2a8a79],0x0        # 0x70e984 <interrupt_immediately>
   0x0000000000465f0b <+27>:	call   0x465aa0 <unwind_frame_run_internal>
   0x0000000000465f10 <+32>:	mov    DWORD PTR [rip+0x2a8a6e],ebx        # 0x70e984 <interrupt_immediately>
   0x0000000000465f16 <+38>:	pop    rbx
   0x0000000000465f17 <+39>:	repz ret 

pop rbx was executed just before returning this function. This value has been recoded in the stack. [rsp-16] was 1. And rbx was 1.

rbx            0x1	0x1

The stack and th registers look consistent. But, SIGSEGV was raised. It’s a Mystery!

Let’s use the hypothesis here. The return address was 0x4370d0 and current rip was 0x4370d0. But, if Ryzen would execute 64byte ahead instructions, what would happen? The address was 0x437090. Here is the disassembled instruction.

(gdb) x/i 0x437090
   0x437090 <execute_builtin+656>:	add    BYTE PTR [rax],al

rax was 0. If this instruction was really executed, 0 was the address where this SIGSEGV was occurred.

(gdb) p $_siginfo._sifields._sigfault.si_addr
$3 = (void *) 0x6dfb44

Unfortunately, It was 0x6dfb44. Surprisingly, no registers stores this value. I don’t know where this value came from.

Anyway, the hypothesis doesn’t seem to match in this case.

After writing above text, he posted his investigation of my coredump. He guesses rip was slipped to 0x465f10 which is in run_unwind_frame.

(gdb) disassemble run_unwind_frame
Dump of assembler code for function run_unwind_frame:
   0x0000000000465ef0 <+0>:	cmp    QWORD PTR [rip+0x2a8a78],0x0        # 0x70e970 <unwind_protect_list>
   0x0000000000465ef8 <+8>:	je     0x465f17 <run_unwind_frame+39>
   0x0000000000465efa <+10>:	push   rbx
   0x0000000000465efb <+11>:	mov    ebx,DWORD PTR [rip+0x2a8a83]        # 0x70e984 <interrupt_immediately>
   0x0000000000465f01 <+17>:	mov    DWORD PTR [rip+0x2a8a79],0x0        # 0x70e984 <interrupt_immediately>
   0x0000000000465f0b <+27>:	call   0x465aa0 <unwind_frame_run_internal>
   0x0000000000465f10 <+32>:	mov    DWORD PTR [rip+0x2a8a6e],ebx        # 0x70e984 <interrupt_immediately>
   0x0000000000465f16 <+38>:	pop    rbx
   0x0000000000465f17 <+39>:	repz ret 

This mov instruction is about to copy ebx to [rip+0x2a8a6e]. rip was 0x4370d0, rip+0x2a8a6e is 0x6dfb3e, and this mov instruction size is 6, 0x6dfb3e + 6 is 0x6dfb44. Bingo!

Both least six significant bits of rip (0x4370d0) and slipped destination address (0x465f10) are equal.

Preparations

Compile repeatedly

Postmortem Examination

Links