[NANOG] Microsoft.com PMTUD black hole?
Hello, Has anyone else here seen problems with microsoft/msn/hotmail/live.com sites not performing PMTUD correctly? We have, for a while now, had people on our network complain of poor microsoft.com reachability, and discovered we can work around the issue by changing MSS on all TCP SYN as they go out of our network. I recently watched the whole conversation between msn.com and a host on our network (with the MSS rewrite disabled), and if I'm reading it right, we are following PMTUD protocol correctly by sending back ICMP type 3 code 4, but all Microsoft hosts seem to ignore this and continue to send packets back to our host with an MSS that is too large. I hope I'm wrong and that it is we who are doing something stupid, but after cruising Google for a while, I found a multitude of other complaints from people connected to other ISPs specifically about not being able to reach Microsoft web sites. It seems crazy that MS could have PMTUD broken for so long with nobody ever raising a complaint to them directly, though, which makes me wonder if there is another answer here that I'm missing. I sent the following message to a couple of addresses that I gleaned from ARIN WHOIS for the IP block in question and threw hostmaster in there just in case it went somewhere, but noc@microsoft.com appears to be defunct. I have yet to receive acknowledgment of receipt from the other address. Are there any microsoft.com admins that hang out here that can comment on this or get in touch with me, or is there perhaps someone on here with connections to the Microsoft NOC? (BTW, I stripped the referenced libpcap attachment off of this message to the list just so that I wouldn't accidentally incur the wrath of NANOG...if y'all want to see it, I'm happy to post it.) Thanks, -- Nathan Anderson First Step Internet, LLC nathana@fsr.com -------- Original Message -------- Subject: Microsoft/MSN/Live!/Hotmail behind blackhole router? Date: Thu, 01 May 2008 19:00:46 -0700 From: Nathan Anderson/FSR <nathana@fsr.com> To: hostmoaster@microsoft.com, noc@microsoft.com, iprrms@microsoft.com To microsoft.com NOC admins: I work for a regional ISP in the inland pacific northwest. May of our customers' connections have MTUs of less than 1500, and we get routine complaints from them that they have trouble reaching web sites that are under your administration. Usually we can fix the problem by "mangling" the TCP SYNs originating from our customers and headed to the world to reflect a lower value; however, we would rather not have to do that. The fact that we are REQUIRED to do this in order for your sites to be reachable by our customers strongly suggests that either the servers that respond to HTTP requests sent to www.microsoft/msn/hotmail/live.com are behind routers that are blocking ALL ICMP traffic sent their way -- even ICMP type 3 code 4 (packet too large, DF set), which is necessary in order for Path MTU Discovery to work -- or the servers themselves are not listening to the ICMP messages that we are sending their way when our routers are forced to drop a packet sent by you which is too large to be forwarded to a customer of ours. I set up a test connection "on the bench" so to speak, and had our router capture a copy of the conversation between our test client and www.msnbc.msn.com and forward that conversation encapsulated in TZSP to the same test client over a different interface. The capture clearly shows our test client establishing the TCP connection with MSNBC (SYN/SYN+ACK/ACK), and then goes on to show MSNBC send ethernet MTU-sized packets our way that an intermediate router of ours drops and responds with "packet too big, DF set." Despite this, MSNBC continues to retrasmit the original packet with the same payload and the same size back to us. We continue to respond "packet too big, DF set," but the MSNBC server never seems to get the message (literally). We see the same behavior with all sites across the board contained within the 207.46.0.0/16 space, regardless of actual hostname/FQDN. We also find this ironic considering that Microsoft published a Technet article a few years back on black hole routers and the problems they pose, found at http://technet.microsoft.com/en-us/library/bb878081.aspx (which we can't read/access unless we are mangling the MSS). We would appreciate it if Microsoft NOC admins would please look into the matter and take the appropriate corrective action: allowing ICMP type 3 code 4 messages through your routers/firewalls, and making sure that your servers respond to them appropriately as defined in RFC 1191. I have attached the capture we made of the conversation to this e-mail message in libpcap format for your analysis. The test client itself had a 1500 MTU to a desktop router, which in turn had an MTU of 1492 on its uplink to us. I am available to answer any additional clarifying questions you may have. Thank you for your time and attention to this matter. Regards, -- Nathan Anderson First Step Internet, LLC nathana@fsr.com
I thought I'd post a few constructive comments on this thread. (Full disclosure: I am an ex-Microsoft employee. I do not speak for the company, I'm just trying to help out the network community.) 1) Yes, Microsoft blocks ICMP for the most part, which will break Path MTU Discovery. This is a known issue. If you run into it, its most likely because the servers you are trying to talk to in MS-land don't have black hole router detection turned on. 2) Instead of trying to get all the various ACLs and firewalls in Microsoft fixed to allow PMTUD, you are more likely to experience joy if you can contact the server owners. Ask if they have black hole router detection turned on, and if not, if they can do so. 3) So how do you get in contact with the server owners or MSN's networking people? msnalert@microsoft.com is your best bet. That's the email address monitored by the basic Tier 1 "Service Operations Center". They cut tickets, follow scripts, and do very basic front line work. They probably won't be able to fix the problem for you, but they CAN get you in touch with the right people. 4) FINDING the right people can be a challenge, even internally. Microsoft is a very big company, and its far from centralized. Be specific in what URLs and IPs you are having trouble with, and be prepared to bounce around a bit. The people who run microsoft.com's servers aren't the same group that does hotmail, etc. Have patience, and try to get ticket numbers for tracking at much as possible. 5) Try to give a realistic estimate of how many users are being impacted by the problem. Your problem will be triaged as it moves through various groups, and yes, the response time may not be what you want. Your problem is one fire among many, and there aren't enough firefighters. 6) Be nice. Seriously. People love to hate Microsoft, and sometimes take it out on the poor overworked geeks who are trying to actually make things better. Every vulnerability, BSOD, or Vista delay is not the fault of the network or systems engineer you get in touch with. ;-)
* netgeek@bgp4.net (Janet Sullivan) [Thu 08 May 2008, 23:35 CEST]:
1) Yes, Microsoft blocks ICMP for the most part, which will break Path MTU Discovery. This is a known issue. If you run into it, its most likely because the servers you are trying to talk to in MS-land don't have black hole router detection turned on.
I find it hilarious that one part of the company had to come up with a hack to work around the inability of another part of the company to understand how TCP/IP works -- Niels. --
participants (3)
-
Janet Sullivan
-
Nathan Anderson/FSR
-
Niels Bakker