There's nothing like a challenge.
Along with the rest of Canonical and a whole bunch of excellent community guys, Thom May and I have been at the Ubuntu conference at Mataro since last Sunday. While imbibing large amounts of Coke, Thom decided that boot was probably too slow, with an eye on the promised forty-second desktop. He put bootchart on his laptop, and there the madness started.
Originally, we believed our boot was around 20 seconds without readahead, and felt pretty good about ourselves. But once we moved bootchart into the right place (starting at the top of rc2 is not a useful metric), we had a more realistic view -- around 1min for a boot to the gdm login screen -- and set to work. One thing we found was that gdm slept. A lot. After I took to the gdm source with a very large axe, we no longer had a huge gulf in our boot process with time elapsing, but no disk or CPU usage.
We had already replaced large swathes of the massive shell horror that is known as hotplug with grepmap, so it was time to look at other things. cupsys in particular was a huge disk hog, so some fine-tuning from Scott James Remnant soon set that in order. But we had kicked the more obnoxious sleeps out of gdm, and we were still looking at well over fifteen seconds from X starting through to gdm actually prompting you for your login, which sucked. I stopped laughing at Thom when he told me the X server startup really was too slow. So I straced it.
Output of strace -e file (i.e. show all file accesses) when starting X:
5702 xorg-trace-file
In July, I got very frustrated with the old, crufty, and generally horrid MetroLink loader in XFree86/X.Org, which was also used in Debrix (being a fork of X.Org), and threw it away.
daniels@catsby:~/x/xorg/xc/programs/Xserver/hw/xfree86/loader% wc -l *.[ch] | tail -1
13540 total
daniels@catsby:~/x/debrix/debrix--devel/hw/xorg/loader% wc -l *.[ch] | tail -1
897 totalIt seemed like some major surgery was needed on the loader, or at least to beat the more obnoxious parts out of it. For instance, stat()ing the Radeon driver forty-one times. I don't even have a Radeon in this machine; it's i855-based. Turns out that the loader was running a regular expression over every single file in /usr/X11R6/lib/modules, and then stat()ing them, for every module load. It is now no longer doing so:
daniels@catsby:~/public_html% wc -l xorg.trace-1040
1904 xorg.trace-1040With a lazy 3798 file accesses gone, there was no longer a massive disk I/O hit as X started, but it was still not enormously quick. Further beating of gdm ensued, and we discovered that scaling a 1600x1200 pixmap down and then overlaying a transparent PNG really, really hurt. Enabling autologin let us fly through the process, but this was far too much of a security risk to consider, so we went back to measuring gdm proper.
While we were at it, in an inspired move, Scott James Remnant set to work on cupsd, which really was a horrific disk hog, so soon set that in order.
Hotplug was still doing a lot of work (but no disk I/O) for a very long time, so we decided to parallelise hotplug and readahead, so we could have pure CPU grunt work interleaved with smashing the disk. This seems to have worked very effectively, and has shaved quite some time off our accesses. Starting some parts of rc2 in parallel seems to have worked very well, also; gdm starts at 14 (before most services that you don't need for a desktop).
At present, the biggest bottleneck we have is hotplug; Scott is working on replacing the hotplug init script with moving through udevd, and replacing many of our init scripts with hotplug triggers, so we can more effectively parallelise most of our boot process. Thom is, at present, working on moving all the readahead files somewhere where they can be streamed quickly; at the moment, we are getting flayed by seek()s. Beyond this, gdm looks like it needs some serious work, but we believe we are at a very strong position, especially after beating more file accesses out of X (this is our current bootchart, at time of writing).
These tests were done (by both myself and Thom) on IBM ThinkPad X40s, with Pentium M 1.2GHz CPus and low-speed hard drives. One of the larger blockers is i855, which takes forever to initialise through VBE: 'profiling' the X server by throwing in time information with all the logs showed us nothing was usefully slow (in terms of low-hanging fruit), but video initialisation still takes a good four or so seconds. It'll be interesting to measure the results on standard desktop disks (not 15kRPM SCSI or such) and a chipset we have the full video BIOS information for. However, everything we have done thus far is totally applicable to other systems -- hotplug still runs in full, so you can dump the exact same stuff we're using on a totally different system (even PowerPC, if you like), and it will still work perfectly fine, with no modification.
Current statistics: booting through a full, typical startup to GDM login screen, including hotplug -- 42 seconds (warty: 90 or more); file accesses when starting the X server -- 1093 (warty: at least 5700); files read in by readahead: 1000 (total size: 58M).
(Update: X is now down to 538 opens.)