Monday, March 07, 2011

Don't Go Anywhere!

I'm debugging X-Plane's autogen engine. In debug mode, with no inlining, optimizations, and a pile of safety checks, the autogen engine is not very fast. Fortunately, my main development machine has 8 cores, and the autogen engine is completely thread-crazy. The work gets spooled out to a worker pool and goes...well, about 8 times as fast.

All is good and I'm sipping my coffee when I hit a break-point. Hrm...looks like we have a NaN. Well, we divided by a sum of some elements of a vector. What's in the vector?
print ag_block.spellings_s.[0].widths[1]
Ah...8 tiles. At this point I am already dead. If you've debugged threaded apps you already know what went wrong:
  • The array access operator in vector is really a function call (particularly in debug mode - we jam bounds checks in there).
  • GDB has to let the application 'run' to run the array operator, and at that instant, the sim's thread can switch.
  • The new thread will run until it hits some kind of break-point.
  • If you have 8 threads running the same operation, you will hit the break point you expect...but from the wrong thread.
To say this makes debugging a bit confusing is an understatement.

A brute force solution is to turn off threading - in X-Plane you can simply tell the sim that your machine has one core using the command line. But that means slow load times.

Fortunately gdb has these clever commands:
set scheduler-locking on
set scheduler-locking off
When you set scheduler locking on, the thread scheduler can't jump threads. This is handy before an extended inspection session with STL classes. You can apparently put the scheduler into 'step' mode, which will switch on run but not on step, but I haven't needed that yet.

No comments:

Post a Comment