Bisecting Kernel Using PXE Boot
Posted: | More posts about the-hard-way bisect bugs git kernel linux pxe raring ubuntu ubuntu planet wifi
Recently I found that an Atheros AR5BXB64 wireless card in my Acer Aspire One AOA150 netbook stopped working properly in 13.04 with 3.8 kernel. The problem was that any file download over the wireless interface had a high chance of being corrupted. I was not able to download updates - every time something big (like a kernel upgrade) got downloaded, I got the file, but the checksum did not match. The card was able to inject copies of previous buffers into the stream in a way that still passed IP and TCP checksum verification.
This was exactly the same symptom we had when Active State Power Management (ASPM) L0s was enabled for that PCI-E device, when the hardware itself was not actually able to handle this state properly. It was fixed in 2009 but suddenly got broken in 3.8 again.
I booted 12.10 from a Flash drive and found that the card was working properly there. Then I checked 3.7 kernels and found that the issue did not exist there and appeared only in 3.8.0-rc1, which suggested there was indeed a regression in the kernel. And the difference was that 3.7 and earlier kernels had LinkCtl: ASPM Disabled while 3.8+ had LinkCtl: ASPM L0s L1 Enabled in the lspci -vv output.
Since I was not able to find any relevant commits that looked like they were deliberately doing something to ASPM handling, the only way to proceed was to run a binary search. Since it was possible to check whether the bug was there or not just by checking lspci, it could and should have been automated.
Bisect works by splitting the commits between known good and known bad revision in half with every iteration and testing the commit in between. Rinse and repeat until you hit a commit that made the switch from good to bad and that's where the things broke.
I had to test 5840 commits, with roughly 13 steps (213=8192, 212=4096). 13 manual reboots are 12 reboots too many so I went on to set up an environment that would do everything for me. Additionally I did not want to build the kernels on that netbook itself as it is pretty slow. The build host and the netbook should have been separated if I wanted to finish the task in a reasonable amount of time.
So, here's the plan:
- Set up a PXE environment.
- Find the minimal kernel config that exhibits the issue.
- Create a custom initrd containing all the tools needed to detect whether an issue is there.
- Create a script that can be reliably used by git bisect to make it all run automatically.
PXE is a Pre-boot Execution Environment which allows network boot without any local disk installation. It works by starting a DHCP client early in the boot process, obtaining the address of the TFTP server that hosts the boot images, downloading the image and passing the control over. TFTP is a Trivial File Transfer Protocol, a really simple connection-less and authentication-less protocol that's running over UDP.
In my case I already had a DHCP server which was running on my OpenWRT-based router. I set up a virtual machine and had my DHCP server direct the queries for the boot images to that virtual machine. This was added to my /etc/dnsmasq.conf on the router:
# PXE boot dhcp-boot=pxelinux.0,raring-server-i386.lappyfamily.net,192.168.100.102
Then I switched to the "raring-server-i386" and installed atftpd.
Ubuntu 12.04 Note
The version in 12.04 suffers from a bug that prevents rlinetd from being set up properly. If you happen to hit the bug, you can create the configuration manually by executing the following:
. /etc/default/atftpd sudo update-inetd --group BOOT --add \ "tftp dgram udp wait nobody /usr/sbin/tcpd /usr/sbin/in.tftpd $OPTIONS" unset OPTIONS
In order to run Linux with PXE you will need to use PXELINUX. The required image, pxelinux.0 can be obtained after installing syslinux package from /usr/lib/syslinux/pxelinux.0. By default atftpd uses /srv/tftp as a root for the shared filesystem, so pxelinux.0 should go to /srv/tftp and be readable by nobody user.
Then I set up the netbook to boot from PXE. This will vary between different BIOS setup interfaces, but it boils down to enabling integrated LAN adapter, enabling boot ROM on that adapter and setting the network device as the first one to boot from. A wired connection to an IPv4 network is needed.
With pxelinux.0 in place I booted the netbook:
Intel UNDI, PXE-2.1 (build 082) Copyright (C) 1997-2000 Intel Corporation This Product is covered by one or more of the following patents: US5,307,459, US5,434,872, US5,732,094, US6,570,804, US6,115,776 and US6,327,625 Realtek PCI Express Fast Ethernet Controller Series v1.15 (090224) CLIENT MAC ADDR: 00 1E 68 E2 10 AE GUID: 40709CDA-A8E0-D411-8C6C-001E68E210AE CLIENT IP: 192.168.1.14 MASK: 255.255.255.0 DHCP IP: 192.168.1.1 GATEWAY IP: 192.168.1.1 !PXE entry point found (we hope) at 9D17:0109 via plan A UNDI code segment at 9D17 len 19CA UNDI data segment at 930A len A0D0 Getting cached packet 01 02 03 My IP address seems to be C0A8010E 192.168.1.14 ip=192.168.1.14:192.168.100.102:192.168.1.1:255.255.255.0 BOOTIF=01-00-1e-68-e2-10-ae SYSUUID=40709cda-a8e0-d411-8c6c-001e68e210ae TFTP prefix: Unable to locate configuration file Boot failed: press a key to retry, or wait for reset...
The line starting with "My IP address" is our pxelinux.0 working. Now we need to put configuration and kernel in place.
You can just grab the kernel and initrd from existing installation and put them in /srv/tftp. In my case I put kernel to /srv/tftp/vmlinuz and initrd to /srv/tftp/initrd.img. If you check /var/log/syslog on the TFTP server, you will see that pxelinux wanted to get the configuration from the following locations:
pxelinux.cfg/40709cda-a8e0-d411-8c6c-001e68e210ae pxelinux.cfg/01-00-1e-68-e2-10-ae pxelinux.cfg/C0A8010E pxelinux.cfg/C0A8010 pxelinux.cfg/C0A801 pxelinux.cfg/C0A80 pxelinux.cfg/C0A8 pxelinux.cfg/C0A pxelinux.cfg/C0 pxelinux.cfg/C pxelinux.cfg/default
If you are wondering, C0A8010E is 192.168.1.14 in hexadecimal notation.
So we need to put configuration at some location it is querying. I decided to use the MAC address, 01:00:1e:68:e2:10:ae and put the following configuration to /srv/tftp/pxelinux.cfg/01-00-1e-68-e2-10-ae (yes, the folder is named pxelinux.cfg):
DEFAULT linux LABEL linux KERNEL vmlinuz APPEND initrd=initrd.img panic=10
You can reboot your device now and you will see that the kernel is loaded, initrd.img is loaded too but the boot breaks since the root device is undefined. While you can put the root information from your grub configuration, we don't actually need to do that. As we are going to build our own kernel with minimal number of features, we will run everything from the intitial RAM drive, initrd. I put the panic=10 option so that the kernel reboots after 10 seconds in case it fails so that I would not need to shut the netbook down manually.
Well, that was easy. I grabbed the git repository of Linus, git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, checked out v3.8-rc1 and ran make menuconfig
Since I was interested in ath5k module only I disabled loadable module support and made ath5k a built-in module. I left quite a few features I did not know/care about enabled which I believe has somehow increased the build time, but in the end it took 3 minutes to build a kernel on my VM with 4 concurrent builds (make -j 4, the VM had 4 CPUs allocated)
When kernel boots with initrd, it mounts that image as a root device and runs init as PID 1 trying /sbin/init, /etc/init, /bin/init and /bin/sh. In case /bin/sh is handled by busybox and it is running as PID 1, it attempts to run /init and /sbin/init. So I needed to create a new initramfs that had everything inside the /init script itself:
- Setting up /dev, /proc, /sys and /tmp in case I need that for debugging.
- Configure the network interface.
- Execute the required test.
- Wait for a reboot command.
Here's what I came up with, the first part is stolen from /init that ships with Ubuntu:
#!/bin/sh [ -d /dev ] || mkdir -m 0755 /dev [ -d /root ] || mkdir -m 0700 /root [ -d /sys ] || mkdir /sys [ -d /proc ] || mkdir /proc [ -d /tmp ] || mkdir /tmp mkdir -p /var/lock mount -t sysfs -o nodev,noexec,nosuid sysfs /sys mount -t proc -o nodev,noexec,nosuid proc /proc # Some things don't work properly without /etc/mtab. ln -sf /proc/mounts /etc/mtab ifconfig eth0 192.168.1.14 route add default gw 192.168.1.1 sleep 5 lspci -vv -d 168c:001c | nc 192.168.100.102 5555 nc -l -p 5555 && reboot -f
So this init script configures the wired interface, sends the output of lspci to the host running the bisect script and listens to an incoming TCP connection on port 5555 so that it would reboot the host when connection is terminated. There is no need for anything fancier.
The initramfs contained the minimal set of applications and libraries, but you will want to grab busybox from a working system as it has a reboot applet which is missing from the version shipped with initrd by default on Ubuntu systems. I needed to do a bit of tweaking before I got everything working as I wanted so I had /bin/sh (symlink to /bin/busybox) originally in place of all these lspci calls. Also I used nc from netcat-traditional since it had less dependencies.
ubuntu@raring-server-i386:~/minroot$ find ./init ./bin ./bin/sh ./bin/lspci ./bin/nc ./bin/bash ./bin/busybox ./lib ./lib/ld-linux.so.2 ./lib/libresolv.so.2 ./lib/libc.so.6 ./lib/libz.so.1 ./lib/libpci.so.3 ./dev ./dev/ram0 ./dev/console ./dev/null ubuntu@raring-server-i386:~/minroot$ find | cpio --format=newc -o | gzip > /srv/tftp/initrd.img 7806 blocks
git-bisect run expects the script to return 0 when this build is considered a success or 1 when the build is considered a failure, so in our case a successful build will have "ASPM Disabled" in the output. grep returns 0 when a string is found, 1 when not found and 2 when error, so we can just use that.
I called the script 'bisect-helper' and put it in my home directory on the VM.
#!/bin/bash cp ~/kernel.config ~/linux/.config cd ~/linux make olddefconfig make clean make -j 4 cp arch/i386/boot/bzImage /srv/tftp/vmlinuz # reboot netbook echo "reboot" | nc 192.168.1.14 5555 # when it goes up, it will send us a message nc -l 5555 | grep 'LnkCtl:\s*ASPM Disabled' exit $?
The script needs to be tested manually first to check whether it is actually producing the results expected:
ubuntu@raring-server-i386:~/linux$ git checkout v3.7 Checking out files: 100% (11706/11706), done. Previous HEAD position was 19f949f... Linux 3.8 HEAD is now at 2959440... Linux 3.7 ubuntu@raring-server-i386:~/linux$ ~/bisect-helper; echo $? ... Kernel: arch/x86/boot/bzImage is ready (#40) LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ 0 ubuntu@raring-server-i386:~/linux$ git checkout v3.8 Checking out files: 100% (11706/11706), done. Previous HEAD position was 2959440... Linux 3.7 HEAD is now at 19f949f... Linux 3.8 ubuntu@raring-server-i386:~/linux$ ~/bisect-helper; echo $? ... Kernel: arch/x86/boot/bzImage is ready (#41) 1
Yep, looks like it is working. Now let the madness begin!
ubuntu@raring-server-i386:~/linux$ git bisect start v3.8-rc1 v3.7 Bisecting: 5840 revisions left to test after this (roughly 13 steps) [6be35c700f742e911ecedd07fcc43d4439922334] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ubuntu@raring-server-i386:~/linux$ time git bisect run ../bisect-helper running ../bisect-helper ... (an hour later) ... 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6 is the first bad commit commit 8c33f51df406e1a1f7fa4e9b244845b7ebd61fa6 Author: Taku Izumi <firstname.lastname@example.org> Date: Tue Oct 30 15:27:13 2012 +0900 PCI/ACPI: Request _OSC control before scanning PCI root bus This patch moves up the code block to request _OSC control in order to separate ACPI work and PCI work in acpi_pci_root_add(). Signed-off-by: Taku Izumi <email@example.com> Signed-off-by: Bjorn Helgaas <firstname.lastname@example.org> :040000 040000 ac58db59f5eaf541d5cb36197aaaf5dfe319ea37 d25e645d59da508e3a1bcb01509cbfb5c7a8239b M drivers bisect run success real 63m39.674s user 155m30.732s sys 27m33.772s
It took ~64 minutes to run 13 kernel builds and the offending commit was discovered. I filed a bug, it got picked up by Yinghai Lu, who found that this commit triggered an issue in another function that differentiated between ASPM handling for non-hotplug and hotplug cases. This was affecting iwlwifi as well as other modules which were using pci_disable_link_state call, so it was a real issue.
Yinghai Lu has posted the patch to the kernel-pci mailing list and it should be merged at some point.
I've always wanted to do git bisect on something, but I had no idea what to start with. Bisecting is not specific to git, there's a Bazaar plugin as well and it is fairly easy to use this technique with any other VCS.
Minimizing the kernel build time is important. Originally I spent nearly 2 hours fiddling with the kernel configuration until I removed most of the things that were completely irrelevant to the issue being debugged. Having a powerful machine to build the software is a must in case the actual test needs to be performed on a slower machine.
Booting linux with PXE involves quite a few moving parts (DHCP, TFTP, boot image, kernel and initrd) that may take a while to configure, but all the time invested in this configuration will eventually pay off.