Re: Who does source address validation? (was Re: what's that smell?)
On Tue, 8 Oct 2002, Greg A. Woods wrote:
[ On Tuesday, October 8, 2002 at 22:34:51 (+0100), Stephen J. Wilcox wrote: ]
Subject: Re: Who does source address validation? (was Re: what's that smell?)
So I guess you may argue block RFC1918 tcp inbound but icmp and udp .. you start to break things, perhaps that is why large providers dont do this on backbone links.
Such things REALLY _NEEED_ to be broken, and the sooner the better as then perhaps the offenders will fix such things sooner too, because they are by definition already broken and in violation of RFC 1918 and good common sense.
Ok but real world calling. I have tried this and when customers find something doesnt work on your network but it does on your competitor you make it work even if that means breaking rules. You've snipped the other comments from my email which goes on to say take any RFC for a protocol eg POP, SMTP etc and look at whats actually being done with it, most commonly look at how Microsoft have implemented it or what the big ISPs are doing on their servers etc and you either tow the line or your service suffers. Steve
On Wednesday, Oct 9, 2002, at 11:36 Canada/Eastern, Stephen J. Wilcox wrote:
On Tue, 8 Oct 2002, Greg A. Woods wrote:
Such things REALLY _NEEED_ to be broken, and the sooner the better as then perhaps the offenders will fix such things sooner too, because they are by definition already broken and in violation of RFC 1918 and good common sense.
Ok but real world calling. I have tried this and when customers find something doesnt work on your network but it does on your competitor you make it work even if that means breaking rules.
What services require transport of packets with RFC1918 source addresses across the public network? I can think of esoteric examples of things it would be possible to do, but nothing that a real-world user might need (or have occasion to complain about). Do you have experience of such breakage from your own customers? It would be interesting to hear details. Joe
Ok but real world calling. I have tried this and when customers find something doesnt work on your network but it does on your competitor you make it work even if that means breaking rules.
What services require transport of packets with RFC1918 source addresses across the public network?
I can think of esoteric examples of things it would be possible to do, but nothing that a real-world user might need (or have occasion to complain about).
Do you have experience of such breakage from your own customers? It would be interesting to hear details.
Loss of ICMP packets generated by links with endpoints numbered in RFC1918 space. Holes in traceroutes, broken PMTU detection. DS
On Wed, 9 Oct 2002, Joe Abley wrote:
What services require transport of packets with RFC1918 source addresses across the public network?
I can think of esoteric examples of things it would be possible to do, but nothing that a real-world user might need (or have occasion to complain about).
Do you have experience of such breakage from your own customers? It would be interesting to hear details.
Check the archives, its been covered every time this issue has come up... a. Intra-provider links using RFC1918 addresses and MTU changes/PMTU discovery b. Traceroutes TTL exceeded packets across RFC1918 intra-provider links People used to have lots of problems with @Home customers trying to access their websites if their filtered RFC1918 addresses using large MTU connected servers (i.e. non-ethernet). Ok, so @Home is out of business, but I'm sure there are other similar cases which would break.
On Wed, 9 Oct 2002, Joe Abley wrote:
On Wednesday, Oct 9, 2002, at 11:36 Canada/Eastern, Stephen J. Wilcox wrote:
On Tue, 8 Oct 2002, Greg A. Woods wrote:
Such things REALLY _NEEED_ to be broken, and the sooner the better as then perhaps the offenders will fix such things sooner too, because they are by definition already broken and in violation of RFC 1918 and good common sense.
Ok but real world calling. I have tried this and when customers find something doesnt work on your network but it does on your competitor you make it work even if that means breaking rules.
What services require transport of packets with RFC1918 source addresses across the public network?
None afaik which is why they should be blocked - on ingress from customer links. Dont get me wrong, I'm just sharing experience not ethics and saying we should all adhere to the RFC but if you apply filters that assume others are also doing so you may be surprised.. Without repeating myself or list archives its all very well strictly following all the RFC guidelines and saying to tell the planet its Microsoft or @Home's fault its not working but the customers really dont buy it and they will go elsewhere and it mightnt be about corporate $$$s but those same $$$s pay your wages and then it starts to hurt!
I can think of esoteric examples of things it would be possible to do, but nothing that a real-world user might need (or have occasion to complain about).
On a related issue (pMTU) I recently discovered that using a link with MTU < 1500 breaks a massive chunk of the net - specifically mail and webservers who block all inbound icmp.. the servers assume 1500, send out the packets with DF set, they hit the link generating an icmp frag, icmp is filtered and data stops. Culprits included several major ISP/Telcos ... I'd love to tell the customer the link is fine its the rest of the Internet at fault but in the end I just forced the DF bit clear as a temp workaround before finally swapping out to MTU 1500!
Do you have experience of such breakage from your own customers? It would be interesting to hear details.
I did attempt strict ingress filtering at borders after a DoS some time ago, I figured I'd disallow any non public addresses. I took it off within a day after a number of customers found a whole bunch of things had stopped working... Unfortunately I cant give you an example as this was a while back and I dont have the details to hand. But if anyone with an appreciable sized customer base wants to try implementing such filters feel free to forward the customer issues to the list as references! Steve
On Wed, 9 Oct 2002, Stephen J. Wilcox wrote:
On a related issue (pMTU) I recently discovered that using a link with MTU < 1500 breaks a massive chunk of the net - specifically mail and webservers who block all inbound icmp.. the servers assume 1500, send out the packets with DF set, they hit the link generating an icmp frag, icmp is filtered and data stops. Culprits included several major ISP/Telcos ... I'd love to tell the customer the link is fine its the rest of the Internet at fault but in the end I just forced the DF bit clear as a temp workaround before finally swapping out to MTU 1500!
I'm not going to say what I think of these people in order to avoid another semi-flame fest, but limit my comments to: You can also get around this by making the first hop the one with the lowest MTU. This is no fun for ethernet-connected stuff, but for dial-up this is easy. Then this box will announce a smaller TCP MSS when the connection is established and there aren't any problems.
On Thu, 10 Oct 2002 00:55:24 +0200, Iljitsch van Beijnum said:
You can also get around this by making the first hop the one with the lowest MTU. This is no fun for ethernet-connected stuff, but for dial-up this is easy. Then this box will announce a smaller TCP MSS when the connection is established and there aren't any problems.
Or equivalently, just nail the MSS size for off-site connections down to 512, and accept that you have to send 3 times as many packets as you probably should. As far as I can tell from when pMTU *does* work because all parties concerned actually use reasonable addresses and don't filter 'icmp frag needed', you end up with one of 3 results most of the time: 1) You get a clear 1500 end-to-end. 2) You get an MTU of 1460 because of tunneling. 3) You end up racheted down to 576 because of some ancient IP stack someplace (older versions of end-user SLIP/PPP are famous for this) -- Valdis Kletnieks Computer Systems Senior Engineer Virginia Tech
On Thu, 10 Oct 2002 Valdis.Kletnieks@vt.edu wrote:
On Thu, 10 Oct 2002 00:55:24 +0200, Iljitsch van Beijnum said:
You can also get around this by making the first hop the one with the lowest MTU. This is no fun for ethernet-connected stuff, but for dial-up this is easy. Then this box will announce a smaller TCP MSS when the connection is established and there aren't any problems.
Or equivalently, just nail the MSS size for off-site connections down to 512, and accept that you have to send 3 times as many packets as you probably should. As far as I can tell from when pMTU *does* work because all parties concerned actually use reasonable addresses and don't filter 'icmp frag needed', you end up with one of 3 results most of the time:
1) You get a clear 1500 end-to-end. 2) You get an MTU of 1460 because of tunneling. 3) You end up racheted down to 576 because of some ancient IP stack someplace (older versions of end-user SLIP/PPP are famous for this)
Ah but what if the traffic is coming into you ie originating elsewhere coming into you.. seems in that case the originator blocks the necessary icmps and they then fail to send data into you.. my example where I saw this recently was for inbound SMTP traffic. Steve
On Thursday, 2002-10-10 at 00:55 ZE2, Iljitsch van Beijnum <iljitsch@muada.com> wrote:
You can also get around this by making the first hop the one with the lowest MTU. This is no fun for ethernet-connected stuff, but for dial-up this is easy. Then this box will announce a smaller TCP MSS when the connection is established and there aren't any problems.
Traffic consists of more than tcp; setting your mtu low might get your tcp traffic delivered but won't help inbound traffic using other protocols. Mtu discrepancies must be dealt with in at least one of the following ways if you don't want it to lead to fatally dropped packets: 1. Fragmentation must work. This applies to systems that don't use PMTUD or use blackhole detection. (Some folks think it a good "security" practice to drop fragments! Some nat boxes don't know what to do with fragments when they arrive out of order - especially a non-initial fragment before the first.) 2. PMTUD must work. 3. PMTUD blackhole detection must be used with operable fragmentation. (If you have to fallback to this you're likely to suffer significant performance hits.) Tony Rall
On Wed, 09 Oct 2002 23:05:59 BST, "Stephen J. Wilcox" said:
On a related issue (pMTU) I recently discovered that using a link with MTU < 1500 breaks a massive chunk of the net - specifically mail and webservers who block all inbound icmp.. the servers assume 1500, send out the packets with DF
My personal pet peeve is the opposite - we'll try to use pMTU, some provider along the way sees fit to run it through a tunnel, so the MTU there is 1460 instead of 1500 - and the chuckleheads number the tunnel endpoints out of 1918 space - so the 'ICMP Frag Needed' gets tossed at our border routers, because we do both ingress and egress filtering. It's bad enough when all the interfaces on the offending unit are 1918-space, but it's really annoying when the critter has perfectly good non-1918 addresses it could use as the source... Argh... -- Valdis Kletnieks Computer Systems Senior Engineer Virginia Tech
Valdis.Kletnieks@vt.edu wrote:
My personal pet peeve is the opposite - we'll try to use pMTU, some provider along the way sees fit to run it through a tunnel, so the MTU there is 1460 instead of 1500 - and the chuckleheads number the tunnel endpoints out of 1918 space - so the 'ICMP Frag Needed' gets tossed at our border routers, because we do both ingress and egress filtering.
That's not terribly hard to overcome - allow icmp unreachables (from any source) in your acl, then deny all traffic from RFC 1918 addresses, then the rest of the ACL. Combined with CAR (or CatOS QoS rate limiting) on icmp's, you end up with all the functionality, and almost none of the bogus traffic.
On Wed, 09 Oct 2002 22:43:50 PDT, Steve Francis said:
That's not terribly hard to overcome - allow icmp unreachables (from any source) in your acl, then deny all traffic from RFC 1918 addresses, then the rest of the ACL.
Combined with CAR (or CatOS QoS rate limiting) on icmp's, you end up with all the functionality, and almost none of the bogus traffic.
Amazingly enough, although there's a number of offenders in the 1918-numbered tunnel category, we decided it was easier to just not worry about talking to those provider's victi^H^H^H^H^Hcustomers(*). We got tired of watching all the DDoS-backscatter ICMP that *also* shows up with 1918 addresses on it. When those show up, it means that some provider didn't filter whoever was forging our address *AND* some provider wasn't filtering the 1918-sourced ICMP. The fact it's probably two different providers is enough to make you give up trying to do something nice for the net and just go have too many beers instead.;) /Valdis (*) The problem usually tends to be self-correcting - the host that got bit the most was our Listserv machine - and if outbound mail got hosed up for TOO long, it would bounce, the victim would get unsubscribed, and no more problems - at least till they manage to resubscribe. Life got much nicer once I made sure the "You must now confirm your subscription" message was long enough to always trigger a 'frag needed'. ;)
At 10:43 PM 09-10-02 -0700, Steve Francis wrote:
My personal pet peeve is the opposite - we'll try to use pMTU, some provider along the way sees fit to run it through a tunnel, so the MTU there is 1460 instead of 1500 - and the chuckleheads number the tunnel endpoints out of 1918 space - so the 'ICMP Frag Needed' gets tossed at our border routers, because we do both ingress and egress filtering. That's not terribly hard to overcome - allow icmp unreachables (from any
Valdis.Kletnieks@vt.edu wrote: source) in your acl, then deny all traffic from RFC 1918 addresses, then the rest of the ACL.
Combined with CAR (or CatOS QoS rate limiting) on icmp's, you end up with all the functionality, and almost none of the bogus traffic.
CAR should not be used to rate-limit but instead use the MQC police command which basically does the same thing. CAR is not going to be around much longer and is not being developed anymore: Have a look at: http://www.cisco.com/warp/public/105/cbpcar.html http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122cgcr/fqos... for more information. -Hank
On Thu, Oct 10, 2002 at 01:06:15AM -0400, Valdis.Kletnieks@vt.edu wrote:
On Wed, 09 Oct 2002 23:05:59 BST, "Stephen J. Wilcox" said:
On a related issue (pMTU) I recently discovered that using a link with MTU < 1500 breaks a massive chunk of the net - specifically mail and webservers who block all inbound icmp.. the servers assume 1500, send out the packets with DF
My personal pet peeve is the opposite - we'll try to use pMTU, some provider along the way sees fit to run it through a tunnel, so the MTU there is 1460 instead of 1500 - and the chuckleheads number the tunnel endpoints out of 1918 space - so the 'ICMP Frag Needed' gets tossed at our border routers, because we do both ingress and egress filtering. It's bad enough when all the interfaces on the offending unit are 1918-space, but it's really annoying when the critter has perfectly good non-1918 addresses it could use as the source... Argh...
Ok, I know how this manages to rile people up, but might I suggest that you brought it upon yourself? There is a time and a place for messages sourced from addresses to which you cannot reply, and a time and place where those messages should not exist. Obviously, a dns *QUERY* is not the place for a message which cannot be returned. But what about an ICMP *RESPONSE*? Nothing depends upon the source address of the IP header for operation, the original headers which caused the problem are encoded in the ICMP message. And yet people are so busy concerning themselves with this mythical "thing which might break from receiving ICMP overlapping existing internal 1918 space", the extra 0.4% of bandwidth which might be wasted, and the righteous feeling that they have done something useful, that they don't stop to realize *THEY* are the ones breaking PMTU-D. I'm sure we can all agree on at least the concept that sourcing packets from an address which cannot receive a reply is at least potentially useful, for example to avoid DoS against a critical piece of infrastructure. Would it make people feel better if there was a specific seperate non-routed address space reserved for "router generated messages which don't want replies"? Why? Even Windows 2000+ includes blackhole detection which will eventually remove the DF bit if packets aren't getting through and ICMP messages aren't coming back, something many unixes lack. But the heart of the problem is that people still push packets like every one must include the maximum data the MTU can support. Do we have any idea how much "network suffering" is being caused by that damn 1500 number right now? Aside from the fact that it is one of the worst numbers possible for the data, it throws a major monkey wrench in the use of tunnels, pppoe, etc. Eventually we will realize the way to go is something like "4096 data octets, plus some room for headers", on a 4470 MTU link. But if the best reason we can come up with is ISIS, the IEEE will just keep laughing. </rant> -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
On Thu, 10 Oct 2002, Richard A Steenbergen wrote:
Even Windows 2000+ includes blackhole detection which will eventually remove the DF bit if packets aren't getting through and ICMP messages aren't coming back, something many unixes lack.
Wow, now I'm impressed. And what about the 1999 other versions of Windows? This is hardly a new problem. Still, it's good that some people at least make progress, even if very slowly.
But the heart of the problem is that people still push packets like every one must include the maximum data the MTU can support.
And why not?
Do we have any idea how much "network suffering" is being caused by that damn 1500 number right now? Aside from the fact that it is one of the worst numbers possible for the data, it throws a major monkey wrench in the use of tunnels, pppoe, etc.
So don't use those.
Eventually we will realize the way to go is something like "4096 data octets, plus some room for headers", on a 4470 MTU link.
So what then if someone runs a secure tunnel over wireless over a PPPoE over ADSL using mobile IPv6 that runs over a tunnel or two ad nauseum until the headers get bigger than 374 bytes? Then you'll have your problem right back. Might as well really solve it the first try. One of the problems is that there is no generally agreed on and widely available set of rules for this stuff. Setting the DF bit on all packets isn't good, but it works. Using RFC1918 space to number your tunnel routers isn't good, but it works. Filtering validating source addresses on ingress is good, but hey, it doesn't work! Making a good list of best practices (and then have people widely implement them) might also go a long way towards showing concerned parties such as the US administration that the network community consists of responsible people that can work together for the common good.
But if the best reason we can come up with is ISIS, the IEEE will just keep laughing.
Why is the IEEE laughing?
On Thu, Oct 10, 2002 at 06:36:33PM +0200, Iljitsch van Beijnum wrote:
So what then if someone runs a secure tunnel over wireless over a PPPoE over ADSL using mobile IPv6 that runs over a tunnel or two ad nauseum until the headers get bigger than 374 bytes? Then you'll have your problem right back. Might as well really solve it the first try.
This is a problem that would be solved by everyone being responsible and doing pmtud properly.
One of the problems is that there is no generally agreed on and widely available set of rules for this stuff. Setting the DF bit on all packets isn't good, but it works. Using RFC1918 space to number your tunnel routers isn't good, but it works. Filtering validating source addresses on ingress is good, but hey, it doesn't work!
I think we're starting to get at the heart of the problem but let me stick my neck out and say it: Registries (APNIC, ARIN, RIPE, usw) charge for ip addresses. be it via a lease/registration fee, it's a per-ip charge that ISPs must get via some means out of their subscribers. (Unless people don't care about money that is). Back in the "days", one could obtain ip addresses from Internic saying "i will not connect to internet", "i intend to connect at some later date in a year or two .. (or similar)", "i intend to connect now". People number out of 1918 space primarily for a few reasons, be them good or not: 1) Internal use 2) Cost involved.. nobody else needs to telnet to my p2p links but me, and i don't want to pay {regional_rir} for my internal use to reduce costs 3) "security" of not being a "publicly" accessible network. This can break many things, pmtu, multicast and various streaming (multi)media applications. With the past scare of "we'll be out of ip addresses by 199x" still fresh in some peoples memories, they in their good consience decided to also conserve ips via this method. The problem is not everyone today that considers themselves a network operator understands all the ramifications of their current practices, be they good or bad. Going into fantasy-land mode, if IPv6 addresses were instantly used by everyone, people could once again obtain ips that could be used for internal private use yet remain globally unique, therefore allowing tracking back of who is leaking their own internal sources.
Making a good list of best practices (and then have people widely implement them) might also go a long way towards showing concerned parties such as the US administration that the network community consists of responsible people that can work together for the common good.
I agree here, I personally think that numbering your internal links out of 1918 space is not an acceptable solution unless it's behind your "natted" network/firewall and does not leak out. Perhaps some of those that are the better/brighter out there want to start to write up a list of "networking best practices". Then test those "book smart" ccie/cne types with the information to insure they understand the ramifications. a few good whitepapers about these might be good to include or quiz folks on. i suspect there's only a handful of people that actually understand the complete end-to-end problem and all the ramifications involved as it is quite complicated.
But if the best reason we can come up with is ISIS, the IEEE will just keep laughing.
Why is the IEEE laughing?
The implication is that IEEE will not change the 802.x specs to allow larger [default] link-local mtu due to legacy interop issues. imagine your circa 1989 ne2000 card attempting to process a 4400 byte frame on your local lan. a lot of the "cheap" ethernet cards don't include enough buffering to handle such a large frame let alone the legacy issues involved.. and remember the enterprise networks have a far larger number of ethernet interfaces deployed than the entire internet combined * 100 at least. any change to the spec would obviously affect them also. - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.
On Thu, 10 Oct 2002, Jared Mauch wrote: [People using RFC 1918 addresses for routers that terminate tunnels which breaks path MTU discovery when RFC 1918 source addresses are filtered elsewere.]
People number out of 1918 space primarily for a few reasons, be them good or not:
1) Internal use 2) Cost involved.. nobody else needs to telnet to my p2p links but me, and i don't want to pay {regional_rir} for my internal use to reduce costs
So use IP unnumbered.
3) "security" of not being a "publicly" accessible network.
Well then they get more security than they bargained for if their network becomes inaccessible...
With the past scare of "we'll be out of ip addresses by 199x" still fresh in some peoples memories, they in their good consience decided to also conserve ips via this method.
From where I'm sitting, getting IP addresses is largely a matter of spending some time and energy, but after a while you get them. It seems this is different for other people. For instance, an ISP here in NL gave their premium ADSL a few addresses when they first started offering the service, but later offered those customers a free ADSL router if they returned the addresses. So obviously there must have been a pretty big incentive for getting the address space back.
Another problem with numbering router links is that you need to break up your address blocks. This is extremely annoying and wasteful.
The problem is not everyone today that considers themselves a network operator understands all the ramifications of their current practices, be they good or bad.
Very true.
Going into fantasy-land mode, if IPv6 addresses were instantly used by everyone, people could once again obtain ips that could be used for internal private use yet remain globally unique, therefore allowing tracking back of who is leaking their own internal sources.
Ok, quick question: how do I number my point to point links in IPv6: 1. /64 2. /126 3. /127 4. IP unnumbered 5. just link-local addresses I hate to say it, but I don't think IPv6 is ready for prime time yet.
Making a good list of best practices (and then have people widely implement them) might also go a long way towards showing concerned parties such as the US administration that the network community consists of responsible people that can work together for the common good.
I agree here, I personally think that numbering your internal links out of 1918 space is not an acceptable solution unless it's behind your "natted" network/firewall and does not leak out.
Agree.
Perhaps some of those that are the better/brighter out there want to start to write up a list of "networking best practices".
I've started with a list of BGP best practices recently. When I think it's ready I'll post a link. If anyone has anything to contribute before then (even just (contructive) criticism), mail me off-list.
But if the best reason we can come up with is ISIS, the IEEE will just keep laughing.
Why is the IEEE laughing?
The implication is that IEEE will not change the 802.x specs to allow larger [default] link-local mtu due to legacy interop issues.
So? We don't stick to IEEE 802.3 anyway...
imagine your circa 1989 ne2000 card attempting to process a 4400 byte frame on your local lan. a lot of the "cheap" ethernet cards don't include enough buffering to handle such a large frame let alone the legacy issues involved..
4400 bytes on a 1989 card, you are being _very_ optimistic to even take the trouble of saying that doesn't work. Many of today's cards 100 Mbit cards (and that's not just the $10 ones) can't even handle 1504 bytes as needed for 802.1q VLAN tags. I have to side with the IEEE here: simply changing the spec isn't an option, since none of the 10 Mbps stuff will handle it, very little of the 100 Mbps stuff and not even all of the 1000 Mbps stuff. (I once complained to a vendor about this. They sent us new GE interfaces. Those did 64k frames...) Having a larger than 1500 byte MTU in backbones would be very good, because then you have some room to work with when adding extra headers. A good solution for this would be an neighbor MTU discovery protocol. Maybe ARPv2? Then boxes with different MTUs can live together on the same wire and doing more than 1500 bytes over an Ethernet-based public exchange point wouldn't be a problem.
participants (11)
-
David Schwartz
-
Hank Nussbacher
-
Iljitsch van Beijnum
-
Jared Mauch
-
Joe Abley
-
Richard A Steenbergen
-
Sean Donelan
-
Stephen J. Wilcox
-
Steve Francis
-
Tony Rall
-
Valdis.Kletnieks@vt.edu