<unlurks> I have to jump in on this thread. Traffic light controllers are a fun category of technical artifacts. The weatherproof boxes that the relays used to live in have stayed the same size for decades, but now the controllers just take a teeny tiny circuit board rattling around in this comparatively huge box. And it's full of software, dontcha know? So why not have lots of newfangled features? Curiously, the people who make the insides of the box have a WHOLE DIFFERENT way of thinking about "what a traffic light controller should do?" - the "insider" people are in the 21st century, while the "outsider" people are in the early 20th century. Lemme splain. A particular traffic light controller that I tested in 2007 had an FTP server inside it. I have no idea why. So I tried fuzzing it. 5 minutes into the test, the test aborted because the DuT wouldn't restart anymore. Upon investigation, we discovered that a particular FTP sequence had triggered a bug that had a rather unfortunate (side-)effect: The flash file system of the traffic light controller was formatted or erased. As a bonus, the device also had crashed and it was awaiting a ZMODEM file download since it didn't have a boot image any more. We couldn't test anything else because we didn't have the special serial cable to (re-)install the OS. Fail-safe? Not hardly: Not when it has no software! It's a lump of highly refined sand, in a plastic case. There are many lessons here, not least of which is: Ship the device with the smallest possible attack surface! Why the heck was FTP enabled? Clearly this device had never been subjected to any negative testing. And these devices are meant to be networked, so that FTP bug will be tickled someday, I just don't know when. Yes, it was reported to the vendor, and no, I have no idea if they ever fixed it. Also, in this thread I have seen several references to "fail-safe" or "redundancy" features. In my experience, those are often some of the weakest aspects of some systems. In one case, I my testing rendered a multi-million-dollar highly redundant VoIP soft switch useless by constantly causing the primary to fail - and while the secondary was being activated, there was a quiet period of 2-3 seconds during which time no calls went through. Shortly after the secondary had become the primary, it failed again, continuing the cycle. Literally traffic amounting to one packet (about 100 bytes, IIRC) per second of carefully crafted SIP INVITES could make this switch completely useless. The bug I found involved SIP INVITE messages that could not be filtered…unless you didn't want to accept VoIP phone calls at all, which calls into question your purchase of the multi-million-dollar highly redundant soft switch. That bug was fixed. Software is tricky stuff. The number of ways it can fail is practically infinite, but there is generally only a small number of ways for it to work correctly. Networked software is particularly challenging to write because the software engineers don't get to control their inputs. The intervening network can (does) fold, spindle, mutilate, truncate, drop, reorder or duplicate packets and your code on the receiving end has to try to understand what was intended by the sender. Oh, and the sender might be following an older version of the standard (if one even exists) or simply have included some bugs of their own. Because the coders are so focused on making their code do what the MRD/PRD required - on a tight schedule! - they have little time to imagine all the possible ways their code might fail. Their error-handling routines are simply never imaginative enough to handle real-world brokenness. It *is* possible to test this stuff, but time pressures in release schedules don't leave a lot of breathing room for developers to take on whole new classes of tasks that are outside their expertise (security testing). So you end up with a traffic light controller that erases its own flash file system when it receives a slightly strange but completely legal FTP command, or a highly redundant VoIP soft switch that is only good at ping-ponging from primary to secondary CPUs. Don't even get me started on problems I have found in carrier-class routers. I don't need to name names: All software has bugs (except possibly the code in the main computers on the Space Shuttle). Every engineer I have ever known has tried to write their code well, but automated negative testing has only recently caught up to where the engineers and QA staff can focus on what they do best (write and test code that implements features that someone can buy), and let purpose-built tools do the negative testing for them, so their error-handling routines can be robust, too. Fixing bugs is generally straightforward. Finding them has always been the challenge. ~tom </unlurks> On 23 Nov 2011, at 17:59 , Brett Frankenberger wrote:
On Wed, Nov 23, 2011 at 05:45:08PM -0500, Jay Ashworth wrote:
Yeah. But at least that's stuff you have a hope of managing. "Firmware underwent bit rot" is simply not visible -- unless there's, say, signature tracing through the main controller.
I can't speak to traffic light controllers directly, but at least some vital logical controllers do check signatures of their firmware and programming and will fail into a safe configuration if the signatures don't validate.
-- Brett