re: AGIS Route Flaps Interrupting its Peering?
Here's some background: AGIS's router is not colocated at the MAE parking garage, but is in fact colocated at WorldCom in downtown Washington DC. Our bits get from there to the MAE via a DS3, and that DS3 is terminated at each end with a device called a NetEdge, which does the FDDI to DS3 ATM conversion. These NetEdges seem to have three different possible operating states: completely working (which doesn't happen often enough); broken (often, right out of the box); and kind of working (which happens all too often). This third operating state results in some very interesting, possibly misleading, and sometimes damaging behavior. It looks quite similar to the kind of behavior you get when you change the MAC layer device but keep the same ip address at either of the MAE's: ARP caches get inconsistent, and BGP sessions with other routers flop around, leading to routes getting flap dampened by those running the appropriate code. Here's what happened: AGIS's connection to MAE-East experienced one of these kind-of-working problems which resulted in the erratic behavior above. Digex customers wishing to reach AGIS customers called the Digex NOC, and the posting which started this all was made to the Digex internal news group. Similarly, AGIS customers had problems, and we worked with MFS to get the problem resolved (they must have a warehouse full of swapped-out NetEdges at this point). In the interval, a short-on-facts bozo spit into the wind and got us and Digex wet. I'm in private correspondence with Ed Kern to postmortem the situation. Peter At 10:25 AM 7/5/96 -0400, Ed Kern wrote:
One key point is that we have not received any complaints or reports of any sort concerning any perceived issues at mae-east from any mae-east peers. Digex made no attempt to contact us. We were already working with Advantis on the unreachable issue above, but the first we heard of the "AGIS attacks mae-east" report was when a Digex customer sent us a report similar to that forwarded to all of you by Cook.
Went into this in the last message...Digex will try and be more proactive with pointing out Agis flapping prefixes in the future.
An appropriate audience would have been the AGIS noc and the Digex noc. I think the Cook approach was inappropriate because the issue was purely between Digex and AGIS until Cook distributed it to the three widespread mailing lists.
I agree..
How is the report flawed?
I see that Ed Kern has already replied indicating that the report was indeed flawed. I don't think that there is anything to be gained by going into further detail.
What I was referring to was the internal circulation here...which I was under the impression got to external customers....now im not so sure...
The internal report was flawed because it relied to much on source routes and came to some bad conclusions on the internal state of agis.
My key point is that nothing of interest happened. This was a non-issue until the misinformation was blasted around the Internet technical universe.
I would argue that the external message that got sent around was misinformation...It was correct information from what the people could see at the time it was released...(lots of dampened prefixes and a down peer)..
Ed
_____________________________________________________________________ Peter Kline Senior Network Engineer| 313-730-5151 AGIS - Internet Backbone Services | _Lucem Diffundo_ Post-Traumatic Success Disorder+ | ///////////////////////////////////////////////////////////////////// You can pretend to care, but you can't pretend to be there.
Peter et al: We too have had nothing but trouble with the netedge boxes (to mae-east and mae-west). They are particularly insidious when they are "kind of working". A couple years ago, when traffic loads were lower, they seemed to perform well. Does anyone know if MFS has plans to address this problem? -- Becca ----------------------------------------------------------------------- Rebecca L. Nitzan Lawrence Berkeley National Lab Network Engineering Services Group 1 Cyclotron Rd, 50A/3101 MS 50C ESnet - Energy Sciences Network Berkeley, CA. 94720 phone: 510-486-6468 fax: 510-486-4300 nitzan@es.net -----------------------------------------------------------------------
Here's some background:
AGIS's router is not colocated at the MAE parking garage, but is in fact colocated at WorldCom in downtown Washington DC. Our bits get from there to the MAE via a DS3, and that DS3 is terminated at each end with a device called a NetEdge, which does the FDDI to DS3 ATM conversion.
These NetEdges seem to have three different possible operating states: completely working (which doesn't happen often enough); broken (often, right out of the box); and kind of working (which happens all too often). This third operating state results in some very interesting, possibly misleading, and sometimes damaging behavior. It looks quite similar to the kind of behavior you get when you change the MAC layer device but keep the same ip address at either of the MAE's: ARP caches get inconsistent, and BGP sessions with other routers flop around, leading to routes getting flap dampened by those running the appropriate code.
Here's what happened:
AGIS's connection to MAE-East experienced one of these kind-of-working problems which resulted in the erratic behavior above. Digex customers wishing to reach AGIS customers called the Digex NOC, and the posting which started this all was made to the Digex internal news group. Similarly, AGIS customers had problems, and we worked with MFS to get the problem resolved (they must have a warehouse full of swapped-out NetEdges at this point).
In the interval, a short-on-facts bozo spit into the wind and got us and Digex wet. I'm in private correspondence with Ed Kern to postmortem the situation.
Peter
At 10:25 AM 7/5/96 -0400, Ed Kern wrote:
One key point is that we have not received any complaints or reports of any sort concerning any perceived issues at mae-east from any mae-east peers. Digex made no attempt to contact us. We were already working with Advantis on the unreachable issue above, but the first we heard of the "AGIS attacks mae-east" report was when a Digex customer sent us a report similar to that forwarded to all of you by Cook.
Went into this in the last message...Digex will try and be more proactive with pointing out Agis flapping prefixes in the future.
An appropriate audience would have been the AGIS noc and the Digex noc. I think the Cook approach was inappropriate because the issue was purely between Digex and AGIS until Cook distributed it to the three widespread mailing lists.
I agree..
How is the report flawed?
I see that Ed Kern has already replied indicating that the report was indeed flawed. I don't think that there is anything to be gained by going into further detail.
What I was referring to was the internal circulation here...which I was under the impression got to external customers....now im not so sure...
The internal report was flawed because it relied to much on source routes and came to some bad conclusions on the internal state of agis.
My key point is that nothing of interest happened. This was a non-issue until the misinformation was blasted around the Internet technical universe.
I would argue that the external message that got sent around was misinformation...It was correct information from what the people could see at the time it was released...(lots of dampened prefixes and a down peer)..
Ed
_____________________________________________________________________ Peter Kline Senior Network Engineer| 313-730-5151 AGIS - Internet Backbone Services | _Lucem Diffundo_ Post-Traumatic Success Disorder+ | ///////////////////////////////////////////////////////////////////// You can pretend to care, but you can't pretend to be there.
Here's what happened:
AGIS's connection to MAE-East experienced one of these kind-of-working problems which resulted in the erratic behavior above. Digex customers wishing to reach AGIS customers called the Digex NOC, and the posting which started this all was made to the Digex internal news group. Similarly, AGIS customers had problems, and we worked with MFS to get the problem resolved (they must have a warehouse full of swapped-out NetEdges at this point).
On the other hand, one could assume with equal probability that there really is only *one* spare and it keeps being put back into service. :-) Erik
participants (3)
-
Erik Sherk
-
Peter Kline, Sr. Network Engineer
-
Rebecca L. Nitzan