BCP38 on public-facing Ubuntu servers
Not every uplink service implements BCP38. When putting up servers connected more-or-less directly to the Internet through these uplinks, it would be nice if the servers themselves were able to implement ingress and egress filtering according to BCP38. (Sorry about the typo in the subject lines of my previous message -- not everyone can get a BGP feed.) (Or, when using Ubuntu server edition to implement edge routers.) My earlier query was asking if anyone has encoded the blackhole routes in YAML for inserting in netplan(5). My prior message contains the routes to be blackholed. That takes care of egress routing. (I think I can write a Python program to take my list and convert it to the YAML that netplan(5) wants to see. That way, the routes are inserted when the public interface is up, and removed when the public interface is down.) Ingress routing appears to be one-line addition. IPTABLES can be told to weed out packets with unroutable source addresses. My experiments will add something like this line to the firewall: # iptables -A INPUT -m addrtype -i enp1s0 --src-type BLACKHOLE -j DROP THIS HAS NOT BEEN VERIFIED. I'm building a web server that will integrate this idea, and try it out.
Maybe you can explore the in kernel feature call RP filter or reverse path filter. In router gear it's called uRPF. cat /proc/sys/net/ipv4/conf/default/rp_filter There are 2 modes: Loose or strict. If your server is BGP multi-homed, then you must use loose. Loose is still very powerful and useful. Basically, RP is doing what a router does, but the opposite way. When a packet arrives on your server, it checks the routing table for destination next-hop and RP also check whether the frames arrived from the good source interface. If your routing is asymmetric or spoofed, then RP drops it. It's a nice feature, but it's doing a double route checkup so for sure, it's slightly slower. I'm not sure we can say that it's twice slower though. I assume your network is not asymmetric, so RP would help you for ingress traffic. For egress, then add blackholes routes to /dev/null interface or with the bogon scripts in python. I wouldn't use iptables for that as it's purely routing, but there are many ways to achieve the same goal. I recommend to explore the rp_filter as it might do what you're looking for. As a side note, iptables is super slow when under attack and/or under heavy load. There are a lot of limitations, like the kernel can only forward ~1.4 Mpps per cpu/socket with iptables. It's too slow slow in my opinion and this was still true recently, but I can't confirm with the latest 5.x kernel. It could have been fix or improve. Finally, can you share with us which provider doesn't filter BCP38 in their uplink? #JustCurious. 😊 Jean -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Stephen Satchell Sent: June 2, 2021 12:41 AM To: nanog@nanog.org; satch@ine.com Subject: BCP38 on public-facing Ubuntu servers Not every uplink service implements BCP38. When putting up servers connected more-or-less directly to the Internet through these uplinks, it would be nice if the servers themselves were able to implement ingress and egress filtering according to BCP38. (Sorry about the typo in the subject lines of my previous message -- not everyone can get a BGP feed.) (Or, when using Ubuntu server edition to implement edge routers.) My earlier query was asking if anyone has encoded the blackhole routes in YAML for inserting in netplan(5). My prior message contains the routes to be blackholed. That takes care of egress routing. (I think I can write a Python program to take my list and convert it to the YAML that netplan(5) wants to see. That way, the routes are inserted when the public interface is up, and removed when the public interface is down.) Ingress routing appears to be one-line addition. IPTABLES can be told to weed out packets with unroutable source addresses. My experiments will add something like this line to the firewall: # iptables -A INPUT -m addrtype -i enp1s0 --src-type BLACKHOLE -j DROP THIS HAS NOT BEEN VERIFIED. I'm building a web server that will integrate this idea, and try it out.
On 6/2/21 4:35 AM, Jean St-Laurent via NANOG wrote:
Maybe you can explore the in kernel feature call RP filter or reverse path filter. In router gear it's called uRPF.
cat /proc/sys/net/ipv4/conf/default/rp_filter
+100 to rp_filter
There are 2 modes: Loose or strict.
If your server is BGP multi-homed, then you must use loose. Loose is still very powerful and useful.
I think loose with any default will fail to do what you want. If you are running your router without a default, then loose would probably be okay.
Basically, RP is doing what a router does, but the opposite way. When a packet arrives on your server, it checks the routing table for destination next-hop and RP also check whether the frames arrived from the good source interface.
For strict mode, the router allows the incoming packet if the incoming interface would be the outgoing interface when sending a packet to the incoming packet's source IP.
If your routing is asymmetric or spoofed, then RP drops it. It's a nice feature, but it's doing a double route checkup so for sure, it's slightly slower. I'm not sure we can say that it's twice slower though.
I'm confident that it is at least some slower. However ... I have a lowly AMD E-350 APU (lscpu says it's at 918 MHz) processing multiple hundred Mbps on GPON against a full DFZ feed with no noticeable delay. (I've never felt the need nor desire to instrument it.) As such, I'm confident that any system that would be used in a greenfield deployment will be able to *easily* handle the traffic that most servers will see.
I assume your network is not asymmetric, so RP would help you for ingress traffic. For egress, then add blackholes routes to /dev/null interface or with the bogon scripts in python. I wouldn't use iptables for that as it's purely routing, but there are many ways to achieve the same goal.
"unreachable" routes (in Linux parlance) or "null" routes (in Cisco parlance) combined with Reverse Path Filtering (RPF) is a HUGE win in my book. I've expanded this methodology to federate Fail2Ban between multiple systems. EBGP via bird to trade fail2ban specific tables between machines and ip rule to make sure the fail2ban table is processed. Works great in my opinion.
I recommend to explore the rp_filter as it might do what you're looking for.
+100
As a side note, iptables is super slow when under attack and/or under heavy load. There are a lot of limitations, like the kernel can only forward ~1.4 Mpps per cpu/socket with iptables. It's too slow slow in my opinion and this was still true recently, but I can't confirm with the latest 5.x kernel. It could have been fix or improve.
That may be the case. However, that's Apples (iptables) to walnuts (RPF). They are both food (processing packets), but they are significantly different. -- Grant. . . . unix || die
On Wed, Jun 2, 2021 at 2:04 PM Grant Taylor via NANOG <nanog@nanog.org> wrote:
On 6/2/21 4:35 AM, Jean St-Laurent via NANOG wrote:
Maybe you can explore the in kernel feature call RP filter or reverse path filter. In router gear it's called uRPF.
cat /proc/sys/net/ipv4/conf/default/rp_filter
+100 to rp_filter
rp_filter is great until your network is slightly less than a perfect hierarchy. Then your Linux "router" starts mysteriously dropping packets and, as with allow_local, Linux doesn't have any way to generate logs about it so you end up with these mysteriously unexplained packet discards matching no conceivable rule in iptables... This failure has too often been the bane of my existence when using Linux for advanced networking. Regards, Bill Herrin -- William Herrin bill@herrin.us https://bill.herrin.us/
On 6/3/21 8:44 AM, William Herrin wrote:
rp_filter is great until your network is slightly less than a perfect hierarchy. Then your Linux "router" starts mysteriously dropping packets and, as with allow_local, Linux doesn't have any way to generate logs about it so you end up with these mysteriously unexplained packet discards matching no conceivable rule in iptables... This failure has too often been the bane of my existence when using Linux for advanced networking.
I don't remember the particulars, but I thought that was the domain of log_martians (net.ipv4.conf.*.log_martians). Without log_martians or explicitly looking for such, no, you won't get any indication of such drops. -- Grant. . . . unix || die
Grant Taylor via NANOG <nanog@nanog.org> wrote:
On 6/3/21 8:44 AM, William Herrin wrote:
rp_filter is great until your network is slightly less than a perfect hierarchy. Then your Linux "router" starts mysteriously dropping packets and, as with allow_local, Linux doesn't have any way to generate logs about it so you end up with these mysteriously unexplained packet discards matching no conceivable rule in iptables... This failure has too often been the bane of my existence when using Linux for advanced networking.
I don't remember the particulars, but I thought that was the domain of log_martians (net.ipv4.conf.*.log_martians).
Without log_martians or explicitly looking for such, no, you won't get any indication of such drops.
Yes, enabling the log_martians sysctl will generate a kernel log message for each rp_filter failure (subject to rate limiting). There are also stat counters in /proc/net/stat/rt_cache (one line per CPU) for in_martian_dst and in_martian_src which increment regardless of the log_martians setting. The rp_filter sysctl defaults to strict mode (== 1) on Ubuntu, but can be set to loose mode (== 2); the difference is, essentially, in strict mode the reverse path must be the same interface as the ingress interface, whereas in loose mode the reverse path can be any interface (as long as the source address is reachable). https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst -J --- -Jay Vosburgh, jay.vosburgh@canonical.com
Hey, to my knowledge there is no IPv6 equivalent for net.ipv4.conf.all.rp_filter. Therefore I use netfilter to do the RP filtering for both address families. ip(6)tables -t raw -I PREROUTING -m rpfilter --invert -j DROP Using the raw tables less resources are used, but you could also choose other tables. Details abour rpfilter can be found here [1]. This can also be achieved using nftables [2]. Best Fran [1] https://ipset.netfilter.org/iptables-extensions.man.html#lbBX [2] https://wiki.nftables.org/wiki-nftables/index.php/Matching_routing_informati... On 04.06.21 20:43, Jay Vosburgh wrote:
Grant Taylor via NANOG <nanog@nanog.org> wrote:
On 6/3/21 8:44 AM, William Herrin wrote:
rp_filter is great until your network is slightly less than a perfect hierarchy. Then your Linux "router" starts mysteriously dropping packets and, as with allow_local, Linux doesn't have any way to generate logs about it so you end up with these mysteriously unexplained packet discards matching no conceivable rule in iptables... This failure has too often been the bane of my existence when using Linux for advanced networking.
I don't remember the particulars, but I thought that was the domain of log_martians (net.ipv4.conf.*.log_martians).
Without log_martians or explicitly looking for such, no, you won't get any indication of such drops.
Yes, enabling the log_martians sysctl will generate a kernel log message for each rp_filter failure (subject to rate limiting). There are also stat counters in /proc/net/stat/rt_cache (one line per CPU) for in_martian_dst and in_martian_src which increment regardless of the log_martians setting.
The rp_filter sysctl defaults to strict mode (== 1) on Ubuntu, but can be set to loose mode (== 2); the difference is, essentially, in strict mode the reverse path must be the same interface as the ingress interface, whereas in loose mode the reverse path can be any interface (as long as the source address is reachable).
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
-J
--- -Jay Vosburgh, jay.vosburgh@canonical.com
On 6/8/21 2:38 PM, Fran via NANOG wrote:
Hey,
to my knowledge there is no IPv6 equivalent for net.ipv4.conf.all.rp_filter.
Therefore I use netfilter to do the RP filtering for both address families.
ip(6)tables -t raw -I PREROUTING -m rpfilter --invert -j DROP
Using the raw tables less resources are used, but you could also choose other tables. Details abour rpfilter can be found here [1].
This can also be achieved using nftables [2].
I've been in discussions on how to filter packets with bad source addresses on several mailing lists, including this one. For the last few weeks, I've been search for all the information I can find for how Linux implements rp_filter...which appears to have some holes. Looking at /proc/sys/net/ipv6, there is no knob for rp_filter, so if your system is IPv6 enabled you have to use the built-in firewall. For IPv4, I found kernel documentation, but it doesn't tell the whole story. For that, I had to comb the kernel sources to find out all the details of rp_filter. I've prepared a RFC letter of what I think I found, to be sent to the kernel developers. Here is the text of what I'll be sending, with any constructive criticism I get from here: Letter begins: After looking at the source that appears to implement rp_filter linux/net/ipv4/fib_frontend.c I believe that I now understand the tests rp_filter performs to validate the source address when net.ipv4.conf.*.rp_filter is set to one or two for a given interface. Does the new paragraph I have written accurately reflect what happens? If so, then I find out how to submit a patch to add the clarification to the kernel document. Description of rp_filter from https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt -------------------------------------------------------------------- rp_filter - INTEGER 0 - No source validation. 1 - Strict mode as defined in RFC3704 Strict Reverse Path Each incoming packet is tested against the FIB and if the interface is not the best reverse path the packet check will fail. By default failed packets are discarded. 2 - Loose mode as defined in RFC3704 Loose Reverse Path Each incoming packet's source address is also tested against the FIB and if the source address is not reachable via any interface the packet check will fail. [*new text here] Current recommended practice in RFC3704 is to enable strict mode to prevent IP spoofing from DDos attacks. If using asymmetric routing or other complicated routing, then loose mode is recommended. The max value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}. Default value is 0. Note that some distributions enable it in startup scripts. -------------------------------------------------------------------- Recommended addition where marked with "[*new text here]": rp_filter will examine the source address of an incoming IP packet by performing an FIB lookup. In loose mode (value 2), the packet is rejected if the source address is neither UNICAST nor LOCAL nor IPSEC. For strict mode (value 1) the interface indicated by the FIB entry must also match the interface on which the packet arrived.
Bingo! With the -t raw, you can bypass the 1.2 Mpps limitation in iptables per cpusocket, because it's doing a very early drop without crossing the full iptables kernel modules. You can reach close to wrirespeed with the -t raw compare to using the same iptables without -t raw. Jean -----Original Message----- From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Fran via NANOG Sent: June 8, 2021 5:39 PM To: nanog@nanog.org Subject: Re: BCP38 on public-facing Ubuntu servers Hey, to my knowledge there is no IPv6 equivalent for net.ipv4.conf.all.rp_filter. Therefore I use netfilter to do the RP filtering for both address families. ip(6)tables -t raw -I PREROUTING -m rpfilter --invert -j DROP Using the raw tables less resources are used, but you could also choose other tables. Details abour rpfilter can be found here [1]. This can also be achieved using nftables [2]. Best Fran [1] https://ipset.netfilter.org/iptables-extensions.man.html#lbBX [2] https://wiki.nftables.org/wiki-nftables/index.php/Matching_routing_informati... On 04.06.21 20:43, Jay Vosburgh wrote:
Grant Taylor via NANOG <nanog@nanog.org> wrote:
On 6/3/21 8:44 AM, William Herrin wrote:
rp_filter is great until your network is slightly less than a perfect hierarchy. Then your Linux "router" starts mysteriously dropping packets and, as with allow_local, Linux doesn't have any way to generate logs about it so you end up with these mysteriously unexplained packet discards matching no conceivable rule in iptables... This failure has too often been the bane of my existence when using Linux for advanced networking.
I don't remember the particulars, but I thought that was the domain of log_martians (net.ipv4.conf.*.log_martians).
Without log_martians or explicitly looking for such, no, you won't get any indication of such drops.
Yes, enabling the log_martians sysctl will generate a kernel log message for each rp_filter failure (subject to rate limiting). There are also stat counters in /proc/net/stat/rt_cache (one line per CPU) for in_martian_dst and in_martian_src which increment regardless of the log_martians setting.
The rp_filter sysctl defaults to strict mode (== 1) on Ubuntu, but can be set to loose mode (== 2); the difference is, essentially, in strict mode the reverse path must be the same interface as the ingress interface, whereas in loose mode the reverse path can be any interface (as long as the source address is reachable).
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
-J
--- -Jay Vosburgh, jay.vosburgh@canonical.com
   And by that he means: "only a few" =D. ----- Alain Hebert ahebert@pubnix.net PubNIX Inc. 50 boul. St-Charles P.O. Box 26770 Beaconsfield, Quebec H9W 6G7 Tel: 514-990-5911 http://www.pubnix.net Fax: 514-990-9443 On 6/2/21 12:40 AM, Stephen Satchell wrote:
Not every uplink service implements BCP38. When putting up servers connected more-or-less directly to the Internet through these uplinks, it would be nice if the servers themselves were able to implement ingress and egress filtering according to BCP38. (Sorry about the typo in the subject lines of my previous message -- not everyone can get a BGP feed.)
(Or, when using Ubuntu server edition to implement edge routers.)
My earlier query was asking if anyone has encoded the blackhole routes in YAML for inserting in netplan(5). My prior message contains the routes to be blackholed. That takes care of egress routing.
(I think I can write a Python program to take my list and convert it to the YAML that netplan(5) wants to see. That way, the routes are inserted when the public interface is up, and removed when the public interface is down.)
Ingress routing appears to be one-line addition. IPTABLES can be told to weed out packets with unroutable source addresses. My experiments will add something like this line to the firewall:
# iptables -A INPUT -m addrtype -i enp1s0 --src-type BLACKHOLE -j DROP
THIS HAS NOT BEEN VERIFIED. I'm building a web server that will integrate this idea, and try it out.
participants (7)
-
Alain Hebert
-
Fran
-
Grant Taylor
-
Jay Vosburgh
-
Jean St-Laurent
-
Stephen Satchell
-
William Herrin