On Jul 24, 2007, at 5:34 PM, Iljitsch van Beijnum wrote:
On 24-jul-2007, at 15:27, Prof. Robert Mathews (OSIA) wrote:
Looking at this issue with an 'interoperability lens,' I remain puzzled by a personal observation that at least in the publicized case of Duke University's Wi-Fi net being effected, the "ARP storms" did not negatively impact network operations UNTIL the presence of iPhones on campus. The nagging point in my mind therefore, is: why have other Wi-Fi devices (laptops, HPCs/PDAs, Smartphones etc.,) NOT caused the 'type' of ARP flooding, which was made visible in Duke's Wi-Fi environment?
Reading the Cisco document the conclusion seems obvious: the iPhone implements RFC 4436 unicast ARP packets which cause the problem.
I don't have an iPhone on hand to test this and make sure, though.
The difference between an iPhone and other devices (running Mac OS X?) that do the same thing would be that an iPhone is online while the user moves around, while laptops are generally put to sleep prior to moving around.
There is also the weird property of many types of "flood vulnerable" systems that they seem to remain stable until some sort of threshold is reached before suddenly spiraling out of control. I am not sure of the exact mechanism behind this, but I have seen multiple instances of this happening. The standard scenario is basically: You have a couple of switches with STP turned off -- someone plugs in some random cable, forming a bridge loop....... and everything continues running fine, until some time in the future when it all goes to hell in a hand-basket. Now, I could understand the system remaining stable until the first broadcast / unknown MAC caused flooding to happen, but I have seen this system remain stable for anywhere from a few days to in a few weeks before suddenly exploding. I have seen the same thing happen in systems other than switches, for example RIP networks with split-horizon turned off, weird frame-relay networks, etc. Unfortunately I have never managed to recreate the event in a controlled environment (In the few cases that I have cared enough to try, I form a loop and everything goes BOOM immediately!), and in the wild have always just fixed it and run away (its usually someone else's network and I'm just helping out or visiting or something). I HATE switched networks..... A few observations: In *almost* all of the cases, things *do* go boom immediately! In the instances where they don't, there doesn't seem to be a correlation between load and when it does suddenly spiral out of control [0]. There is not a gradual increase increase in the sorts of packets that you would expect to see cause this (in a switched environment, you do not see flooded packets slowly increase, or even an exponential increase over a long time, there is basically no traffic and then boom! 100%). Anyway, I have wondered that triggers it, but never enough to actually look into much.... W [0] Except for one case that I remember especially fondly -- it was switched network with something like 30 switches scattered around -- someone had plugged one of those "silver satin" phone type cables (untwisted copper) between two ports on a switch -- the cable was bad enough that most of the frames were dropped / corrupted, but under high broadcast traffic loads enough packets would make it through to cause a flood, and then after some time (5-10 minutes) it would die back down... -- Never criticize a man till you've walked a mile in his shoes. Then if he didn't like what you've said, he's a mile away and barefoot.