Re: Problems with AT&T

20 Mar 2003

      On Thu, Mar 20, 2003 at 03:26:35PM -0500, bdragon@gweep.net wrote:
...
...
If someone can identify what you are actually seeing, I'll check into
it.
If you are experiencing drops or slow traces, only through the core,
there is an issue with excessive de-prioritization of ICMP control
message with a particular router type (vendcor) in the core. End to end
data flow has not seemed to be affected but trace and ping core
latencies are looking very wierd. I've been asking customers to use
trace only for path detail and to use end to end ping for any
performance data.=20
Yes, the core is MPLS enabled. Diffserv acted on only at the edges
though.=20
Michelle
It could certainly be customers who have broken themselves. I've heard
lots of stories about people who do PMTUD but simultaneously filter
ICMP Can't Frag messages.
As soon as the Path MTU drops below whatever their local box is (usually
1500) they "break" although due to their own screwed up config.
Since MPLS adds additional overhead, dropping the MTU, I'ld seriously
consider this as a possible reason.
Speaking very generally and not about any one specific network, this
is likely to not be the issue.  MPLS leads to problems on Ethernet,
but I've seen no problems in anything other than Eth/FE.  GigE and POS
haven't had the same issue; for one, default POS MTU is ~4k, which is
more than enough to hold packets from hosts that assume 576 or 1500,
and PMTU over an MPLS network takes the MPLS label stack size into
account when doing discovery.  

Also, some implementations have framers that can accept a packet
that's actually MTU+(N*4), where N is typically no more than 4, and
more likely 2.

And I think I can say without breaking any confidentially agreements
that AT&T's backbone Probably Isn't (nudge nudge wink wink) made up of
scads and scads of 10/100Mb links everywhere. :)

The biggest problem you can have with MPLS is if you have customers
who are connected at 4k or 9k or what have you, and who don't do
PMTUD; I've not seen this come up as a real operational issue.  

.02

eric
...
The major problems are:
1) identifying broken customers
2) convincing customers that they are broken when they "haven't changed
anything"
3) getting them to actually change
Some folks just put off the problem until later by moving to MTUs > 1500.
The only benefit to this is that hopefully when the customer next breaks
it is as a direct result of them having "changed something" which gets
you over the hurdle of convincing some person that their filtering of all
ICMP isn't just stupid, but is also broken.

Re: Problems with AT&T

Eric Osborne