BGP route hijack by AS10990
We saw a bunch of our IP blocks hijacked by AS10990 from 19:15 MDT until 20:23 MDT. Anybody else have problems with that. ASpath: 1299 7219 10990 50.92.0.0/17 AS10990 198.166.0.0/17 AS10990 198.166.128.0/17 AS10990 162.157.128.0/17 AS10990 162.157.0.0/17 AS10990 50.92.128.0/17 AS10990 -- Clinton Work Airdrie, AB
We appeared to be impacted with some address space within 206.47.0.0/16 which AS577 normally advertises, but that was between 15:50 and 16:30 Eastern. Jeff On Wed, Jul 29, 2020, 10:48 PM Clinton Work <clinton@scripty.com> wrote:
We saw a bunch of our IP blocks hijacked by AS10990 from 19:15 MDT until 20:23 MDT. Anybody else have problems with that.
ASpath: 1299 7219 10990
50.92.0.0/17 AS10990 198.166.0.0/17 AS10990 198.166.128.0/17 AS10990 162.157.128.0/17 AS10990 162.157.0.0/17 AS10990 50.92.128.0/17 AS10990
-- Clinton Work Airdrie, AB
Looks like the list is too long.. none of them have any valid ROAs as well. = 104.230.0.0/18 206313 6724 1299 7219 10990 = 104.230.64.0/18 206313 6724 1299 7219 10990 = 107.184.0.0/16 206313 6724 1299 7219 10990 = 107.185.0.0/16 206313 6724 1299 7219 10990 = 107.189.192.0/19 206313 6724 1299 7219 10990 = 107.189.224.0/19 206313 6724 1299 7219 10990 = 108.49.0.0/17 206313 6724 1299 7219 10990 = 108.49.128.0/17 206313 6724 1299 7219 10990 = 135.19.192.0/19 206313 6724 1299 7219 10990 = 135.19.224.0/19 206313 6724 1299 7219 10990 = 137.119.140.0/23 206313 6724 1299 7219 10990 = 137.119.142.0/23 206313 6724 1299 7219 10990 = 142.113.0.0/17 206313 6724 1299 7219 10990 = 142.113.128.0/17 206313 6724 1299 7219 10990 = 147.194.0.0/20 206313 6724 1299 7219 10990 = 147.194.16.0/20 206313 6724 1299 7219 10990 = 162.157.0.0/17 206313 6724 1299 7219 10990 = 162.157.128.0/17 206313 6724 1299 7219 10990 = 166.48.0.0/18 206313 6724 1299 7219 10990 = 166.48.64.0/18 206313 6724 1299 7219 10990 = 167.100.80.0/22 206313 6724 1299 7219 10990 = 167.100.84.0/22 206313 6724 1299 7219 10990 = 172.103.112.0/20 206313 6724 1299 7219 10990 = 172.103.96.0/20 206313 6724 1299 7219 10990 = 172.112.0.0/14 206313 6724 1299 7219 10990 = 172.116.0.0/14 206313 6724 1299 7219 10990 = 173.160.0.0/14 206313 6724 1299 7219 10990 = 173.164.0.0/14 206313 6724 1299 7219 10990 = 173.28.224.0/21 206313 6724 1299 7219 10990 = 173.28.232.0/21 206313 6724 1299 7219 10990 = 173.48.0.0/17 206313 6724 1299 7219 10990 = 173.48.128.0/17 206313 6724 1299 7219 10990 = 173.90.0.0/16 206313 6724 1299 7219 10990 = 173.91.0.0/16 206313 6724 1299 7219 10990 = 174.1.56.0/23 206313 6724 1299 7219 10990 = 174.1.58.0/23 206313 6724 1299 7219 10990 = 174.108.0.0/15 206313 6724 1299 7219 10990 = 174.110.0.0/15 206313 6724 1299 7219 10990 = 174.223.0.0/18 206313 6724 1299 7219 10990 = 174.223.64.0/18 206313 6724 1299 7219 10990 = 174.228.0.0/18 206313 6724 1299 7219 10990 = 174.228.64.0/18 206313 6724 1299 7219 10990 = 174.231.128.0/18 206313 6724 1299 7219 10990 = 174.231.192.0/18 206313 6724 1299 7219 10990 = 177.132.112.0/20 206313 6724 1299 7219 10990 = 177.132.96.0/20 206313 6724 1299 7219 10990 = 198.166.0.0/17 206313 6724 1299 7219 10990 = 198.166.128.0/17 206313 6724 1299 7219 10990 = 198.52.176.0/23 206313 6724 1299 7219 10990 = 198.52.178.0/23 206313 6724 1299 7219 10990 = 204.195.0.0/18 206313 6724 1299 7219 10990 *= 208.79.152.0/22 <http://208.79.152.0/22> 206313 6724 6939 10990= 208.79.153.0/24 <http://208.79.153.0/24> 206313 6724 6939 7219 10990*= 216.10.190.0/24 206313 6724 1299 7219 10990 = 216.10.191.0/24 206313 6724 1299 7219 10990 = 24.102.64.0/19 206313 6724 1299 7219 10990 = 24.102.96.0/19 206313 6724 1299 7219 10990 = 24.197.208.0/21 206313 6724 1299 7219 10990 = 24.197.216.0/21 206313 6724 1299 7219 10990 = 24.201.64.0/19 206313 6724 1299 7219 10990 = 24.201.96.0/19 206313 6724 1299 7219 10990 = 24.205.160.0/20 206313 6724 1299 7219 10990 = 24.205.176.0/20 206313 6724 1299 7219 10990 = 24.48.0.0/19 206313 6724 1299 7219 10990 = 24.48.32.0/19 206313 6724 1299 7219 10990 = 24.57.0.0/17 206313 6724 1299 7219 10990 = 24.57.128.0/17 206313 6724 1299 7219 10990 = 24.89.16.0/20 206313 6724 1299 7219 10990 = 24.90.64.0/19 206313 6724 1299 7219 10990 = 24.90.96.0/19 206313 6724 1299 7219 10990 = 35.211.0.0/17 206313 6724 1299 7219 10990 = 35.211.128.0/17 206313 6724 1299 7219 10990 = 45.48.0.0/15 206313 6724 1299 7219 10990 = 45.50.0.0/15 206313 6724 1299 7219 10990 = 47.218.0.0/23 206313 6724 1299 7219 10990 = 47.218.2.0/23 206313 6724 1299 7219 10990 = 47.32.64.0/19 206313 6724 1299 7219 10990 = 47.32.96.0/19 206313 6724 1299 7219 10990 = 47.36.0.0/19 206313 6724 1299 7219 10990 = 47.36.32.0/19 206313 6724 1299 7219 10990 = 47.39.64.0/19 206313 6724 1299 7219 10990 = 47.39.96.0/19 206313 6724 1299 7219 10990 = 50.88.0.0/16 206313 6724 1299 7219 10990 = 50.89.0.0/16 206313 6724 1299 7219 10990 = 50.92.0.0/17 206313 6724 1299 7219 10990 = 50.92.128.0/17 206313 6724 1299 7219 10990 = 66.65.0.0/18 206313 6724 1299 7219 10990 = 66.65.64.0/18 206313 6724 1299 7219 10990 = 66.68.0.0/16 206313 6724 1299 7219 10990 = 66.69.0.0/16 206313 6724 1299 7219 10990 = 67.149.198.0/24 206313 6724 1299 7219 10990 = 67.149.199.0/24 206313 6724 1299 7219 10990 = 67.247.112.0/20 206313 6724 1299 7219 10990 = 67.247.96.0/20 206313 6724 1299 7219 10990 = 70.83.128.0/19 206313 6724 1299 7219 10990 = 70.83.160.0/19 206313 6724 1299 7219 10990 = 72.137.0.0/17 206313 6724 1299 7219 10990 = 72.137.128.0/17 206313 6724 1299 7219 10990 = 72.140.0.0/16 206313 6724 1299 7219 10990 = 72.141.0.0/16 206313 6724 1299 7219 10990 = 72.53.64.0/20 206313 6724 1299 7219 10990 = 72.53.80.0/20 206313 6724 1299 7219 10990 = 74.56.192.0/19 206313 6724 1299 7219 10990 = 74.56.224.0/19 206313 6724 1299 7219 10990 = 74.59.128.0/19 206313 6724 1299 7219 10990 = 74.59.160.0/19 206313 6724 1299 7219 10990 = 74.76.0.0/15 206313 6724 1299 7219 10990 = 74.78.0.0/15 206313 6724 1299 7219 10990 = 76.168.0.0/14 206313 6724 1299 7219 10990 = 76.172.0.0/14 206313 6724 1299 7219 10990 = 76.86.0.0/16 206313 6724 1299 7219 10990 = 76.87.0.0/16 206313 6724 1299 7219 10990 = 96.3.0.0/17 206313 6724 1299 7219 10990 = 96.3.128.0/17 206313 6724 1299 7219 10990 = 96.32.64.0/20 206313 6724 1299 7219 10990 = 96.32.80.0/20 206313 6724 1299 7219 10990 = 98.148.0.0/16 206313 6724 1299 7219 10990 = 98.149.0.0/16 206313 6724 1299 7219 10990 = 98.32.0.0/13 206313 6724 1299 7219 10990 = 98.40.0.0/13 206313 6724 1299 7219 10990 = 99.225.0.0/19 206313 6724 1299 7219 10990 = 99.225.192.0/19 206313 6724 1299 7219 10990 = 99.225.224.0/19 206313 6724 1299 7219 10990 = 99.225.32.0/19 206313 6724 1299 7219 10990 = 99.240.128.0/18 206313 6724 1299 7219 10990 = 99.240.192.0/18 206313 6724 1299 7219 10990 = 99.254.80.0/21 206313 6724 1299 7219 10990 = 99.254.88.0/21 206313 6724 1299 7219 10990 = 99.255.0.0/19 206313 6724 1299 7219 10990 = 99.255.32.0/19 206313 6724 1299 7219 10990 Regards, Aftab A. Siddiqui On Thu, 30 Jul 2020 at 12:49, Clinton Work <clinton@scripty.com> wrote:
We saw a bunch of our IP blocks hijacked by AS10990 from 19:15 MDT until 20:23 MDT. Anybody else have problems with that.
ASpath: 1299 7219 10990
50.92.0.0/17 AS10990 198.166.0.0/17 AS10990 198.166.128.0/17 AS10990 162.157.128.0/17 AS10990 162.157.0.0/17 AS10990 50.92.128.0/17 AS10990
-- Clinton Work Airdrie, AB
Looks like the real question here is why doesn’t 7219 do a better job of filtering what they accept. Has anyone reached out to them? Owen
On Jul 29, 2020, at 23:31 , Aftab Siddiqui <aftab.siddiqui@gmail.com> wrote:
Looks like the list is too long.. none of them have any valid ROAs as well.
= 104.230.0.0/18 <http://104.230.0.0/18> 206313 6724 1299 7219 10990 = 104.230.64.0/18 <http://104.230.64.0/18> 206313 6724 1299 7219 10990 = 107.184.0.0/16 <http://107.184.0.0/16> 206313 6724 1299 7219 10990 = 107.185.0.0/16 <http://107.185.0.0/16> 206313 6724 1299 7219 10990 = 107.189.192.0/19 <http://107.189.192.0/19> 206313 6724 1299 7219 10990 = 107.189.224.0/19 <http://107.189.224.0/19> 206313 6724 1299 7219 10990 = 108.49.0.0/17 <http://108.49.0.0/17> 206313 6724 1299 7219 10990 = 108.49.128.0/17 <http://108.49.128.0/17> 206313 6724 1299 7219 10990 = 135.19.192.0/19 <http://135.19.192.0/19> 206313 6724 1299 7219 10990 = 135.19.224.0/19 <http://135.19.224.0/19> 206313 6724 1299 7219 10990 = 137.119.140.0/23 <http://137.119.140.0/23> 206313 6724 1299 7219 10990 = 137.119.142.0/23 <http://137.119.142.0/23> 206313 6724 1299 7219 10990 = 142.113.0.0/17 <http://142.113.0.0/17> 206313 6724 1299 7219 10990 = 142.113.128.0/17 <http://142.113.128.0/17> 206313 6724 1299 7219 10990 = 147.194.0.0/20 <http://147.194.0.0/20> 206313 6724 1299 7219 10990 = 147.194.16.0/20 <http://147.194.16.0/20> 206313 6724 1299 7219 10990 = 162.157.0.0/17 <http://162.157.0.0/17> 206313 6724 1299 7219 10990 = 162.157.128.0/17 <http://162.157.128.0/17> 206313 6724 1299 7219 10990 = 166.48.0.0/18 <http://166.48.0.0/18> 206313 6724 1299 7219 10990 = 166.48.64.0/18 <http://166.48.64.0/18> 206313 6724 1299 7219 10990 = 167.100.80.0/22 <http://167.100.80.0/22> 206313 6724 1299 7219 10990 = 167.100.84.0/22 <http://167.100.84.0/22> 206313 6724 1299 7219 10990 = 172.103.112.0/20 <http://172.103.112.0/20> 206313 6724 1299 7219 10990 = 172.103.96.0/20 <http://172.103.96.0/20> 206313 6724 1299 7219 10990 = 172.112.0.0/14 <http://172.112.0.0/14> 206313 6724 1299 7219 10990 = 172.116.0.0/14 <http://172.116.0.0/14> 206313 6724 1299 7219 10990 = 173.160.0.0/14 <http://173.160.0.0/14> 206313 6724 1299 7219 10990 = 173.164.0.0/14 <http://173.164.0.0/14> 206313 6724 1299 7219 10990 = 173.28.224.0/21 <http://173.28.224.0/21> 206313 6724 1299 7219 10990 = 173.28.232.0/21 <http://173.28.232.0/21> 206313 6724 1299 7219 10990 = 173.48.0.0/17 <http://173.48.0.0/17> 206313 6724 1299 7219 10990 = 173.48.128.0/17 <http://173.48.128.0/17> 206313 6724 1299 7219 10990 = 173.90.0.0/16 <http://173.90.0.0/16> 206313 6724 1299 7219 10990 = 173.91.0.0/16 <http://173.91.0.0/16> 206313 6724 1299 7219 10990 = 174.1.56.0/23 <http://174.1.56.0/23> 206313 6724 1299 7219 10990 = 174.1.58.0/23 <http://174.1.58.0/23> 206313 6724 1299 7219 10990 = 174.108.0.0/15 <http://174.108.0.0/15> 206313 6724 1299 7219 10990 = 174.110.0.0/15 <http://174.110.0.0/15> 206313 6724 1299 7219 10990 = 174.223.0.0/18 <http://174.223.0.0/18> 206313 6724 1299 7219 10990 = 174.223.64.0/18 <http://174.223.64.0/18> 206313 6724 1299 7219 10990 = 174.228.0.0/18 <http://174.228.0.0/18> 206313 6724 1299 7219 10990 = 174.228.64.0/18 <http://174.228.64.0/18> 206313 6724 1299 7219 10990 = 174.231.128.0/18 <http://174.231.128.0/18> 206313 6724 1299 7219 10990 = 174.231.192.0/18 <http://174.231.192.0/18> 206313 6724 1299 7219 10990 = 177.132.112.0/20 <http://177.132.112.0/20> 206313 6724 1299 7219 10990 = 177.132.96.0/20 <http://177.132.96.0/20> 206313 6724 1299 7219 10990 = 198.166.0.0/17 <http://198.166.0.0/17> 206313 6724 1299 7219 10990 = 198.166.128.0/17 <http://198.166.128.0/17> 206313 6724 1299 7219 10990 = 198.52.176.0/23 <http://198.52.176.0/23> 206313 6724 1299 7219 10990 = 198.52.178.0/23 <http://198.52.178.0/23> 206313 6724 1299 7219 10990 = 204.195.0.0/18 <http://204.195.0.0/18> 206313 6724 1299 7219 10990 = 208.79.152.0/22 <http://208.79.152.0/22> 206313 6724 6939 10990 = 208.79.153.0/24 <http://208.79.153.0/24> 206313 6724 6939 7219 10990 = 216.10.190.0/24 <http://216.10.190.0/24> 206313 6724 1299 7219 10990 = 216.10.191.0/24 <http://216.10.191.0/24> 206313 6724 1299 7219 10990 = 24.102.64.0/19 <http://24.102.64.0/19> 206313 6724 1299 7219 10990 = 24.102.96.0/19 <http://24.102.96.0/19> 206313 6724 1299 7219 10990 = 24.197.208.0/21 <http://24.197.208.0/21> 206313 6724 1299 7219 10990 = 24.197.216.0/21 <http://24.197.216.0/21> 206313 6724 1299 7219 10990 = 24.201.64.0/19 <http://24.201.64.0/19> 206313 6724 1299 7219 10990 = 24.201.96.0/19 <http://24.201.96.0/19> 206313 6724 1299 7219 10990 = 24.205.160.0/20 <http://24.205.160.0/20> 206313 6724 1299 7219 10990 = 24.205.176.0/20 <http://24.205.176.0/20> 206313 6724 1299 7219 10990 = 24.48.0.0/19 <http://24.48.0.0/19> 206313 6724 1299 7219 10990 = 24.48.32.0/19 <http://24.48.32.0/19> 206313 6724 1299 7219 10990 = 24.57.0.0/17 <http://24.57.0.0/17> 206313 6724 1299 7219 10990 = 24.57.128.0/17 <http://24.57.128.0/17> 206313 6724 1299 7219 10990 = 24.89.16.0/20 <http://24.89.16.0/20> 206313 6724 1299 7219 10990 = 24.90.64.0/19 <http://24.90.64.0/19> 206313 6724 1299 7219 10990 = 24.90.96.0/19 <http://24.90.96.0/19> 206313 6724 1299 7219 10990 = 35.211.0.0/17 <http://35.211.0.0/17> 206313 6724 1299 7219 10990 = 35.211.128.0/17 <http://35.211.128.0/17> 206313 6724 1299 7219 10990 = 45.48.0.0/15 <http://45.48.0.0/15> 206313 6724 1299 7219 10990 = 45.50.0.0/15 <http://45.50.0.0/15> 206313 6724 1299 7219 10990 = 47.218.0.0/23 <http://47.218.0.0/23> 206313 6724 1299 7219 10990 = 47.218.2.0/23 <http://47.218.2.0/23> 206313 6724 1299 7219 10990 = 47.32.64.0/19 <http://47.32.64.0/19> 206313 6724 1299 7219 10990 = 47.32.96.0/19 <http://47.32.96.0/19> 206313 6724 1299 7219 10990 = 47.36.0.0/19 <http://47.36.0.0/19> 206313 6724 1299 7219 10990 = 47.36.32.0/19 <http://47.36.32.0/19> 206313 6724 1299 7219 10990 = 47.39.64.0/19 <http://47.39.64.0/19> 206313 6724 1299 7219 10990 = 47.39.96.0/19 <http://47.39.96.0/19> 206313 6724 1299 7219 10990 = 50.88.0.0/16 <http://50.88.0.0/16> 206313 6724 1299 7219 10990 = 50.89.0.0/16 <http://50.89.0.0/16> 206313 6724 1299 7219 10990 = 50.92.0.0/17 <http://50.92.0.0/17> 206313 6724 1299 7219 10990 = 50.92.128.0/17 <http://50.92.128.0/17> 206313 6724 1299 7219 10990 = 66.65.0.0/18 <http://66.65.0.0/18> 206313 6724 1299 7219 10990 = 66.65.64.0/18 <http://66.65.64.0/18> 206313 6724 1299 7219 10990 = 66.68.0.0/16 <http://66.68.0.0/16> 206313 6724 1299 7219 10990 = 66.69.0.0/16 <http://66.69.0.0/16> 206313 6724 1299 7219 10990 = 67.149.198.0/24 <http://67.149.198.0/24> 206313 6724 1299 7219 10990 = 67.149.199.0/24 <http://67.149.199.0/24> 206313 6724 1299 7219 10990 = 67.247.112.0/20 <http://67.247.112.0/20> 206313 6724 1299 7219 10990 = 67.247.96.0/20 <http://67.247.96.0/20> 206313 6724 1299 7219 10990 = 70.83.128.0/19 <http://70.83.128.0/19> 206313 6724 1299 7219 10990 = 70.83.160.0/19 <http://70.83.160.0/19> 206313 6724 1299 7219 10990 = 72.137.0.0/17 <http://72.137.0.0/17> 206313 6724 1299 7219 10990 = 72.137.128.0/17 <http://72.137.128.0/17> 206313 6724 1299 7219 10990 = 72.140.0.0/16 <http://72.140.0.0/16> 206313 6724 1299 7219 10990 = 72.141.0.0/16 <http://72.141.0.0/16> 206313 6724 1299 7219 10990 = 72.53.64.0/20 <http://72.53.64.0/20> 206313 6724 1299 7219 10990 = 72.53.80.0/20 <http://72.53.80.0/20> 206313 6724 1299 7219 10990 = 74.56.192.0/19 <http://74.56.192.0/19> 206313 6724 1299 7219 10990 = 74.56.224.0/19 <http://74.56.224.0/19> 206313 6724 1299 7219 10990 = 74.59.128.0/19 <http://74.59.128.0/19> 206313 6724 1299 7219 10990 = 74.59.160.0/19 <http://74.59.160.0/19> 206313 6724 1299 7219 10990 = 74.76.0.0/15 <http://74.76.0.0/15> 206313 6724 1299 7219 10990 = 74.78.0.0/15 <http://74.78.0.0/15> 206313 6724 1299 7219 10990 = 76.168.0.0/14 <http://76.168.0.0/14> 206313 6724 1299 7219 10990 = 76.172.0.0/14 <http://76.172.0.0/14> 206313 6724 1299 7219 10990 = 76.86.0.0/16 <http://76.86.0.0/16> 206313 6724 1299 7219 10990 = 76.87.0.0/16 <http://76.87.0.0/16> 206313 6724 1299 7219 10990 = 96.3.0.0/17 <http://96.3.0.0/17> 206313 6724 1299 7219 10990 = 96.3.128.0/17 <http://96.3.128.0/17> 206313 6724 1299 7219 10990 = 96.32.64.0/20 <http://96.32.64.0/20> 206313 6724 1299 7219 10990 = 96.32.80.0/20 <http://96.32.80.0/20> 206313 6724 1299 7219 10990 = 98.148.0.0/16 <http://98.148.0.0/16> 206313 6724 1299 7219 10990 = 98.149.0.0/16 <http://98.149.0.0/16> 206313 6724 1299 7219 10990 = 98.32.0.0/13 <http://98.32.0.0/13> 206313 6724 1299 7219 10990 = 98.40.0.0/13 <http://98.40.0.0/13> 206313 6724 1299 7219 10990 = 99.225.0.0/19 <http://99.225.0.0/19> 206313 6724 1299 7219 10990 = 99.225.192.0/19 <http://99.225.192.0/19> 206313 6724 1299 7219 10990 = 99.225.224.0/19 <http://99.225.224.0/19> 206313 6724 1299 7219 10990 = 99.225.32.0/19 <http://99.225.32.0/19> 206313 6724 1299 7219 10990 = 99.240.128.0/18 <http://99.240.128.0/18> 206313 6724 1299 7219 10990 = 99.240.192.0/18 <http://99.240.192.0/18> 206313 6724 1299 7219 10990 = 99.254.80.0/21 <http://99.254.80.0/21> 206313 6724 1299 7219 10990 = 99.254.88.0/21 <http://99.254.88.0/21> 206313 6724 1299 7219 10990 = 99.255.0.0/19 <http://99.255.0.0/19> 206313 6724 1299 7219 10990 = 99.255.32.0/19 <http://99.255.32.0/19> 206313 6724 1299 7219 10990
Regards,
Aftab A. Siddiqui
On Thu, 30 Jul 2020 at 12:49, Clinton Work <clinton@scripty.com <mailto:clinton@scripty.com>> wrote: We saw a bunch of our IP blocks hijacked by AS10990 from 19:15 MDT until 20:23 MDT. Anybody else have problems with that.
ASpath: 1299 7219 10990
50.92.0.0/17 <http://50.92.0.0/17> AS10990 198.166.0.0/17 <http://198.166.0.0/17> AS10990 198.166.128.0/17 <http://198.166.128.0/17> AS10990 162.157.128.0/17 <http://162.157.128.0/17> AS10990 162.157.0.0/17 <http://162.157.0.0/17> AS10990 50.92.128.0/17 <http://50.92.128.0/17> AS10990
-- Clinton Work Airdrie, AB
On Jul 30, 2020, at 09:45 , Yang Yu <yang.yu.list@gmail.com> wrote:
On Thu, Jul 30, 2020 at 9:37 AM Owen DeLong <owen@delong.com> wrote:
Looks like the real question here is why doesn’t 7219 do a better job of filtering what they accept.
Has anyone reached out to them?
You mean 1299? 7219 and 10990 are the same entity.
In that case, sure, up to 1299. Owen
On Thu, Jul 30, 2020 at 11:21:04AM +0300, Hank Nussbacher <hank@interall.co.il> wrote a message of 48 lines which said:
See:
Peace, On Thu, Jul 30, 2020, 5:48 AM Clinton Work <clinton@scripty.com> wrote:
We saw a bunch of our IP blocks hijacked by AS10990 from 19:15 MDT until 20:23 MDT. Anybody else have problems with that.
Here's what we discovered about the incident. Hope that brings some clarity. https://radar.qrator.net/blog/as10990-routing-optimization-tale -- Töma
so, bgp optimizers... again? -- Patrick Am 30.07.2020 um 18:58 schrieb Töma Gavrichenkov:
Peace,
On Thu, Jul 30, 2020, 5:48 AM Clinton Work <clinton@scripty.com <mailto:clinton@scripty.com>> wrote:
We saw a bunch of our IP blocks hijacked by AS10990 from 19:15 MDT until 20:23 MDT. Anybody else have problems with that.
Here's what we discovered about the incident. Hope that brings some clarity.
https://radar.qrator.net/blog/as10990-routing-optimization-tale <https://radar.qrator.net/blog/as10990-routing-optimization-tale>
-- Töma
On Thu, 30 Jul 2020, at 13:09, Patrick Schultz wrote:
so, bgp optimizers... again?
-- Patrick
More like shame on Telia for not filtering properly. If Tulix used a so called BGP "optimizer" and didn't have a proper export filter in place it is their mistake but as a major transit provider, Telia bears the brunt of the responsibility of making sure that Tulix's mistake doesn't affect the rest of us. -- Sadiq Saif https://sadiqsaif.com/
It's not like there are scorecards, but there's a lot of fault to go around. However, again, BGP "Optimizers" are bad. The conditions by which the inadvertent leak occur need to be fixed , no question. But in scenarios like this, as-path length generally limits impact to "Oh crap, I'll fix that, sorry!." Once you start squirting out more specifics, you get to own some of the egg on the face. On Thu, Jul 30, 2020 at 1:35 PM Sadiq Saif <lists@sadiqsaif.com> wrote:
On Thu, 30 Jul 2020, at 13:09, Patrick Schultz wrote:
so, bgp optimizers... again?
-- Patrick
More like shame on Telia for not filtering properly.
If Tulix used a so called BGP "optimizer" and didn't have a proper export filter in place it is their mistake but as a major transit provider, Telia bears the brunt of the responsibility of making sure that Tulix's mistake doesn't affect the rest of us.
-- Sadiq Saif https://sadiqsaif.com/
On 30/Jul/20 19:44, Tom Beecher wrote:
It's not like there are scorecards, but there's a lot of fault to go around.
However, again, BGP "Optimizers" are bad. The conditions by which the inadvertent leak occur need to be fixed , no question. But in scenarios like this, as-path length generally limits impact to "Oh crap, I'll fix that, sorry!." Once you start squirting out more specifics, you get to own some of the egg on the face.
For about a year or so, I've been saying that the next generation of network engineers are being trained for a GUI-based point & click world, as opposed to understanding what protocols and CLI do. There is no shortage of annual workshops that teach BGP Multi-Homing. Despite the horror BGP optimizers have displayed in recent years, they seem to be flying off the shelves, still. Is this a clear example of the next generation of network engineers that we are breeding? Mark.
They solve a need that isn't reasonably solved any other way that doesn't have similar drawbacks. Some optimizers need to be redesigned to be safer by default. Some networks need to be safer by default as well. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Mark Tinka" <mark.tinka@seacom.com> To: nanog@nanog.org Sent: Friday, July 31, 2020 8:59:51 AM Subject: Re: BGP route hijack by AS10990 On 30/Jul/20 19:44, Tom Beecher wrote:
It's not like there are scorecards, but there's a lot of fault to go around.
However, again, BGP "Optimizers" are bad. The conditions by which the inadvertent leak occur need to be fixed , no question. But in scenarios like this, as-path length generally limits impact to "Oh crap, I'll fix that, sorry!." Once you start squirting out more specifics, you get to own some of the egg on the face.
For about a year or so, I've been saying that the next generation of network engineers are being trained for a GUI-based point & click world, as opposed to understanding what protocols and CLI do. There is no shortage of annual workshops that teach BGP Multi-Homing. Despite the horror BGP optimizers have displayed in recent years, they seem to be flying off the shelves, still. Is this a clear example of the next generation of network engineers that we are breeding? Mark.
On 31/Jul/20 16:29, Mike Hammett wrote:
They solve a need that isn't reasonably solved any other way that doesn't have similar drawbacks.
Some optimizers need to be redesigned to be safer by default.
Some networks need to be safer by default as well.
Almost every product ever made does solve a need. You will find at least one customer who is happy with what they paid their money for. But BGP-4 is vulnerable enough as it is, and the Internet has moved on in leaps and bounds since 1994 (RFC 1654). Until we see BGP-5, we need to look after our community. And if that means holding the BGP optimizers to a higher standard, so be it. As they say, "You can't blame a monkey for botching a brain surgery". Plenty of industries strongly "guide" (I'll avoid "regulate") their actors to ensure standards and results (medicine, aviation, energy, construction, e.t.c.). If the acceptance bar to a BGP actor is an optional CCNA or JNCIA certification, we shall learn the hard way, as we did with this and similar incidents. Mark.
Telia's statement: https://blog.teliacarrier.com/2020/07/31/bgp-hijack-of-july-30-2020/ (tl;dr: it was as-path filtering only, as opposed to prefix filtering, the former has been removed as an option)
----- On Jul 31, 2020, at 2:33 PM, Lukas Tribus lists@ltri.eu wrote: Hi,
Telia's statement:
https://blog.teliacarrier.com/2020/07/31/bgp-hijack-of-july-30-2020/
(tl;dr: it was as-path filtering only, as opposed to prefix filtering, the former has been removed as an option)
Kudos to Telia for admitting their mistakes, and fixing their processes. Thanks, Sabri
On 31/Jul/20 23:38, Sabri Berisha wrote:
Kudos to Telia for admitting their mistakes, and fixing their processes.
Considering Telia's scope and "experience", that is one thing. But for the general good of the Internet, the number of intended or unintentional route hijacks in recent years, and all the noise that rises on this and other lists each time we have such incidents (this won't be the last), Telia should not have waited to be called out in order to get this fixed. Do we know if they are fixing this on just this customer of theirs, or all their customers? I know this has been their filtering policy with us (SEACOM) since 2014, as I pointed out earlier today. There has not been a shortage of similar incidents between now and then, where the community has consistently called for more deliberate and effective route filtering across inter-AS arrangements. There is massive responsibility for the community to act correctly for the Internet to succeed. Especially so during these Coronavirus times where the world depends on us to keep whatever shred of an economy is left up and running. Doubly so if you are a major concern (like Telia) for the core of the Internet. It's great that they are fixing this - but this was TOTALLY avoidable. That we won't see this again - even from the same the actors - isn't something I have high confidence in guaranteeing, based on current experience. We can all do better. We should all do better. Mark.
----- On Jul 31, 2020, at 2:50 PM, Mark Tinka mark.tinka@seacom.com wrote: Hi Mark,
On 31/Jul/20 23:38, Sabri Berisha wrote:
Kudos to Telia for admitting their mistakes, and fixing their processes.
It's great that they are fixing this - but this was TOTALLY avoidable.
I'm not sure if you read their entire Mea Culpa, but they did indicate that the root cause of this issue was the provisioning of a legacy filter that they are no longer using. So effectively, that makes it a human error. We're going to a point where a single error is no longer causing outages, something very similar to my favorite analogy: avation. Pretty much every major air disaster was caused by a combination of factors. Pretty much every major outage these days is caused by a combination of factors. The manual provisioning of an inadequate filter, combined with an automation error on the side of a customer (which by itself was probably caused by a combination of factors), caused this issue. We learn from every outage. And instead of radio silence, they fessed up and fixed the issue. Have a look at the ASRS program :) Thanks, Sabri
To your point with regards to multiple failures combined causing an outage, here's some basic reading on the Swiss cheese model: https://en.wikipedia.org/wiki/Swiss_cheese_model
From over here it looks like the legacy filter was a latent failure, and the BGP automation from the downstream peer of Telia was an active failure (combined caused the outage). Now from the downstream peer's point of view, perhaps the cause of their BGP automation failure was latent also, but we wouldn't know without more details.
Pretty interesting topic.
On 1/Aug/20 02:44, Rafael Possamai wrote:
To your point with regards to multiple failures combined causing an outage, here's some basic reading on the Swiss cheese model: https://en.wikipedia.org/wiki/Swiss_cheese_model
You just reminded me of the defense's strategy in the court case against HealthSouth's CEO Richard Scrushy, when they used a picture of a rat carrying Swiss cheese (full of holes) in their closing arguments to the jurors, to discredit the prosecution :-). Mark.
On 1/Aug/20 02:17, Sabri Berisha wrote:
I'm not sure if you read their entire Mea Culpa, but they did indicate that the root cause of this issue was the provisioning of a legacy filter that they are no longer using. So effectively, that makes it a human error.
We're going to a point where a single error is no longer causing outages, something very similar to my favorite analogy: avation. Pretty much every major air disaster was caused by a combination of factors. Pretty much every major outage these days is caused by a combination of factors.
The manual provisioning of an inadequate filter, combined with an automation error on the side of a customer (which by itself was probably caused by a combination of factors), caused this issue.
We learn from every outage. And instead of radio silence, they fessed up and fixed the issue. Have a look at the ASRS program :)
What I meant by "TOTALLY avoidable" is that "this particular plane crash" has happened in the exact same way, for the exact same reasons, over and over again. Aviation learns from mistakes that don't generally recur in the exact same way for the exact same reasons. Telia and others have known about these issues from them happening to other operators. When we see these issues, we go back and look at our own networks to implement the fixes that solve the problem the last time it happened. That's the idea. The difference between us and aviation is that fundamental flaws or mistakes that impact safety are required to be fixed and checked if you want to keep operating in the industry. We don't have that, so... Mark.
On Sat, Aug 1, 2020 at 4:21 AM Mark Tinka <mark.tinka@seacom.com> wrote:
What I meant by "TOTALLY avoidable" is that "this particular plane crash" has happened in the exact same way, for the exact same reasons, over and over again.
Aviation learns from mistakes that don't generally recur in the exact same way for the exact same reasons.
Aviation is regulated. I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it. Tar and feather them. FTC, do your job. FTC has done good work before https://www.ftc.gov/news-events/press-releases/2017/01/ftc-charges-d-link-pu... Noction — delete your account
On 1/Aug/20 15:50, Ca By wrote:
Aviation is regulated.
Which is my point. While, like you, I am not in support in heavy-handed regulation like most life & death industries require, we also can't be leaving our industry open for any actor to do as they please.
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Tar and feather them. FTC, do your job.
FTC has done good work before https://www.ftc.gov/news-events/press-releases/2017/01/ftc-charges-d-link-pu...
Noction — delete your account
+1. Mark.
On Sat, Aug 01, 2020 at 06:50:55AM -0700, Ca By wrote:
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances. "In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015 Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration. Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident: "We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019 Community members responded: "Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019 Last year Noction stated: "Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019 Sentiment we all can agree with, change is needed! As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage. I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes. Kind regards, Job
Job, I disagree on the fact that it is not fair to the BGP implementation ecosystem, to enforce a single piece of software to activate the no-export community by default, due to ignorance from the engineer(s) implementing the solution. It should be common sense that certain routes that should be advertised beyond the local AS, just like RFC1918 routes, and more. Also, wasn't it you that said Cisco routers had a bug in ignoring NO_EXPORT? Would you go on a rant with Cisco, even if Noction add that enabled checkbox by default? Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen, but the same options like the NO_EXPORT community are available for the engineer to use? One solution would be to implement "BGP Group/Session Profiles" (ISP/RTBH/DDOS Filtering/Route Optimizers/etc) or a "BGP Session Wizard" (ask the operator questions about their intentions), then automatically generate import and export policies based on known accepted practices. Another solution could be having the BGP daemon disclose the make, model family, and exact model of hardware it is running on, to BGP peers, and add more knobs into policy creation to match said values, and take action appropriately. That would be useful in getting around vendor specific issues, as well as belt & suspenders protection. Ryan On Aug 1 2020, at 9:58 am, Job Snijders <job@instituut.net> wrote:
On Sat, Aug 01, 2020 at 06:50:55AM -0700, Ca By wrote:
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances.
"In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015
Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration.
Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident:
"We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019
Community members responded: "Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019
Last year Noction stated: "Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019
Sentiment we all can agree with, change is needed! As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage.
I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes.
Kind regards, Job
Ryan, The reason Noction is being singled out here as opposed to other BGP speakers is that it inherently breaks several BGP protection mechanisms as a means to achieve its purpose. BGP was never intended to be "optimized", it was intended to be stable and scalable. While i'm sure there are hundreds of operators that use these optimizers without incident, they are a significant paint point for the rest of the internet. They have created a platform that has the ease of use of a residential CPE, but with the consequences of misuse of any DFZ platform. This allows users who have little experience speaking BGP with the world to make these mistakes because they don't know any better, whereas the other platforms you mention require some knowledge to configure. It's not a perfect filter, but it does create a barrier for the inept. Since Noction has made it easy enough to configure their software so that anyone can do it, with or without experience on the DFZ, they have SOME responsibility to keep their software from accidentally breaking the internet. -Matt On Sat, Aug 1, 2020 at 2:30 PM Ryan Hamel <ryan@rkhtech.org> wrote:
Job,
I disagree on the fact that it is not fair to the BGP implementation ecosystem, to enforce a single piece of software to activate the no-export community by default, due to ignorance from the engineer(s) implementing the solution. It should be common sense that certain routes that should be advertised beyond the local AS, just like RFC1918 routes, and more. Also, wasn't it you that said Cisco routers had a bug in ignoring NO_EXPORT? Would you go on a rant with Cisco, even if Noction add that enabled checkbox by default?
Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen, but the same options like the NO_EXPORT community are available for the engineer to use? One solution would be to implement "BGP Group/Session Profiles" (ISP/RTBH/DDOS Filtering/Route Optimizers/etc) or a "BGP Session Wizard" (ask the operator questions about their intentions), then automatically generate import and export policies based on known accepted practices.
Another solution could be having the BGP daemon disclose the make, model family, and exact model of hardware it is running on, to BGP peers, and add more knobs into policy creation to match said values, and take action appropriately. That would be useful in getting around vendor specific issues, as well as belt & suspenders protection.
Ryan On Aug 1 2020, at 9:58 am, Job Snijders <job@instituut.net> wrote:
On Sat, Aug 01, 2020 at 06:50:55AM -0700, Ca By wrote:
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances.
"In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015
Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration.
Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident:
"We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019
Community members responded:
"Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019
Last year Noction stated:
"Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019
Sentiment we all can agree with, change is needed!
As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage.
I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes.
Kind regards,
Job
-- Matt Erculiani ERCUL-ARIN
Matt, Why are you blaming the ease of use on the vendor, for the operators lack of knowledge regarding BGP? That is like blaming a vehicle manufacturer for a person pressing the gas pedal in a car and not giving a toss about the rules of the road. The base foundation regarding the rules of the road mostly apply the same for driving a car, truck, bus, and semi/lorry truck. There is no excuse for ignorance just because the user interface is different (web browser vs. SSH client). Adding a take on this, there are kids born after 9/11, with IP allocations and ASNs experimenting in the DFZ right now. If they can make it work, and not cause harm to other members in this community, it clearly demonstrates a lack of knowledge, or honest human error (which will never go away). Anything that can be used, can be misused. With that said, why shouldn't ALL BGP software implementations encourage best practice? They decided RPKI validation was a good thing. Ryan On Aug 1 2020, at 4:12 pm, Matt Erculiani <merculiani@gmail.com> wrote:
Ryan,
The reason Noction is being singled out here as opposed to other BGP speakers is that it inherently breaks several BGP protection mechanisms as a means to achieve its purpose. BGP was never intended to be "optimized", it was intended to be stable and scalable. While i'm sure there are hundreds of operators that use these optimizers without incident, they are a significant paint point for the rest of the internet.
They have created a platform that has the ease of use of a residential CPE, but with the consequences of misuse of any DFZ platform. This allows users who have little experience speaking BGP with the world to make these mistakes because they don't know any better, whereas the other platforms you mention require some knowledge to configure. It's not a perfect filter, but it does create a barrier for the inept.
Since Noction has made it easy enough to configure their software so that anyone can do it, with or without experience on the DFZ, they have SOME responsibility to keep their software from accidentally breaking the internet.
-Matt
On Sat, Aug 1, 2020 at 2:30 PM Ryan Hamel <ryan@rkhtech.org (mailto:ryan@rkhtech.org)> wrote:
Job,
I disagree on the fact that it is not fair to the BGP implementation ecosystem, to enforce a single piece of software to activate the no-export community by default, due to ignorance from the engineer(s) implementing the solution. It should be common sense that certain routes that should be advertised beyond the local AS, just like RFC1918 routes, and more. Also, wasn't it you that said Cisco routers had a bug in ignoring NO_EXPORT? Would you go on a rant with Cisco, even if Noction add that enabled checkbox by default? Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen, but the same options like the NO_EXPORT community are available for the engineer to use? One solution would be to implement "BGP Group/Session Profiles" (ISP/RTBH/DDOS Filtering/Route Optimizers/etc) or a "BGP Session Wizard" (ask the operator questions about their intentions), then automatically generate import and export policies based on known accepted practices. Another solution could be having the BGP daemon disclose the make, model family, and exact model of hardware it is running on, to BGP peers, and add more knobs into policy creation to match said values, and take action appropriately. That would be useful in getting around vendor specific issues, as well as belt & suspenders protection. Ryan On Aug 1 2020, at 9:58 am, Job Snijders <job@instituut.net (mailto:job@instituut.net)> wrote:
On Sat, Aug 01, 2020 at 06:50:55AM -0700, Ca By wrote:
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances.
"In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015
Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration.
Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident:
"We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019
Community members responded: "Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019
Last year Noction stated: "Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019
Sentiment we all can agree with, change is needed! As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage.
I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes.
Kind regards, Job
-- Matt Erculiani ERCUL-ARIN
On Sat, Aug 1, 2020 at 4:47 PM Ryan Hamel <ryan@rkhtech.org> wrote:
Matt,
Why are you blaming the ease of use on the vendor, for the operators lack of knowledge regarding BGP? That is like blaming a vehicle manufacturer for a person pressing the gas pedal in a car and not giving a toss about the rules of the road. The base foundation regarding the rules of the road mostly apply the same for driving a car, truck, bus, and semi/lorry truck. There is no excuse for ignorance just because the user interface is different (web browser vs. SSH client).
Vendors are responsible. The FTC slammed D-Link for being insecure and they can slam Noction too https://www.ftc.gov/news-events/press-releases/2019/07/d-link-agrees-make-se... Asking people in Pintos to not get in accidents is not an option. https://www.tortmuseum.org/ford-pinto/
Adding a take on this, there are kids born after 9/11, with IP allocations and ASNs experimenting in the DFZ right now. If they can make it work, and not cause harm to other members in this community, it clearly demonstrates a lack of knowledge, or honest human error (which will never go away).
Anything that can be used, can be misused. With that said, why shouldn't ALL BGP software implementations encourage best practice? They decided RPKI validation was a good thing.
Ryan On Aug 1 2020, at 4:12 pm, Matt Erculiani <merculiani@gmail.com> wrote:
Ryan,
The reason Noction is being singled out here as opposed to other BGP speakers is that it inherently breaks several BGP protection mechanisms as a means to achieve its purpose. BGP was never intended to be "optimized", it was intended to be stable and scalable. While i'm sure there are hundreds of operators that use these optimizers without incident, they are a significant paint point for the rest of the internet.
They have created a platform that has the ease of use of a residential CPE, but with the consequences of misuse of any DFZ platform. This allows users who have little experience speaking BGP with the world to make these mistakes because they don't know any better, whereas the other platforms you mention require some knowledge to configure. It's not a perfect filter, but it does create a barrier for the inept.
Since Noction has made it easy enough to configure their software so that anyone can do it, with or without experience on the DFZ, they have SOME responsibility to keep their software from accidentally breaking the internet.
-Matt
On Sat, Aug 1, 2020 at 2:30 PM Ryan Hamel <ryan@rkhtech.org> wrote:
Job,
I disagree on the fact that it is not fair to the BGP implementation ecosystem, to enforce a single piece of software to activate the no-export community by default, due to ignorance from the engineer(s) implementing the solution. It should be common sense that certain routes that should be advertised beyond the local AS, just like RFC1918 routes, and more. Also, wasn't it you that said Cisco routers had a bug in ignoring NO_EXPORT? Would you go on a rant with Cisco, even if Noction add that enabled checkbox by default?
Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen, but the same options like the NO_EXPORT community are available for the engineer to use? One solution would be to implement "BGP Group/Session Profiles" (ISP/RTBH/DDOS Filtering/Route Optimizers/etc) or a "BGP Session Wizard" (ask the operator questions about their intentions), then automatically generate import and export policies based on known accepted practices.
Another solution could be having the BGP daemon disclose the make, model family, and exact model of hardware it is running on, to BGP peers, and add more knobs into policy creation to match said values, and take action appropriately. That would be useful in getting around vendor specific issues, as well as belt & suspenders protection.
Ryan On Aug 1 2020, at 9:58 am, Job Snijders <job@instituut.net> wrote:
On Sat, Aug 01, 2020 at 06:50:55AM -0700, Ca By wrote:
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances.
"In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015
Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration.
Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident:
"We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019
Community members responded:
"Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019
Last year Noction stated:
"Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019
Sentiment we all can agree with, change is needed!
As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage.
I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes.
Kind regards,
Job
-- Matt Erculiani ERCUL-ARIN
Ryan, To continue with your analogy, this would be more similar to someone who has never driven before walking into a dealership and buying a new car to drive off the lot. Ultimately the responsibility is on the driver, but the dealership should have never sold them the car in the first place. Thus, it's reasonable to place some of the responsibility on the vendors who introduce this equipment into the wild without truly understanding (or worse, not caring) how easy they've made it for their users to cause havoc when most of the defaults are left untouched. Again, it's not like anyone consciously sets these optimizers to evil, they're just spontaneously dangerous when left misconfigured (which is apparently easy to do). These devices throw caution to the wind in the name of being easy to use. Think of all the companies that set up AD servers with "corp.com" and how big of a security risk that is. Microsoft ended up taking action because of the potentially catastrophic implications, but at least it only affected the companies that made poor choices. Now imagine those same individuals who threw security to the wind are sold one of these devices because it's a turnkey way to make their network faster; that's exactly what we see here and it's terrifying to think how many of those little route hijack timebombs are out there, just waiting to ruin your day, night, or vacation in the Bahamas. -Matt On Sat, Aug 1, 2020 at 6:12 PM Ca By <cb.list6@gmail.com> wrote:
On Sat, Aug 1, 2020 at 4:47 PM Ryan Hamel <ryan@rkhtech.org> wrote:
Matt,
Why are you blaming the ease of use on the vendor, for the operators lack of knowledge regarding BGP? That is like blaming a vehicle manufacturer for a person pressing the gas pedal in a car and not giving a toss about the rules of the road. The base foundation regarding the rules of the road mostly apply the same for driving a car, truck, bus, and semi/lorry truck. There is no excuse for ignorance just because the user interface is different (web browser vs. SSH client).
Vendors are responsible. The FTC slammed D-Link for being insecure and they can slam Noction too
https://www.ftc.gov/news-events/press-releases/2019/07/d-link-agrees-make-se...
Asking people in Pintos to not get in accidents is not an option.
https://www.tortmuseum.org/ford-pinto/
Adding a take on this, there are kids born after 9/11, with IP allocations and ASNs experimenting in the DFZ right now. If they can make it work, and not cause harm to other members in this community, it clearly demonstrates a lack of knowledge, or honest human error (which will never go away).
Anything that can be used, can be misused. With that said, why shouldn't ALL BGP software implementations encourage best practice? They decided RPKI validation was a good thing.
Ryan On Aug 1 2020, at 4:12 pm, Matt Erculiani <merculiani@gmail.com> wrote:
Ryan,
The reason Noction is being singled out here as opposed to other BGP speakers is that it inherently breaks several BGP protection mechanisms as a means to achieve its purpose. BGP was never intended to be "optimized", it was intended to be stable and scalable. While i'm sure there are hundreds of operators that use these optimizers without incident, they are a significant paint point for the rest of the internet.
They have created a platform that has the ease of use of a residential CPE, but with the consequences of misuse of any DFZ platform. This allows users who have little experience speaking BGP with the world to make these mistakes because they don't know any better, whereas the other platforms you mention require some knowledge to configure. It's not a perfect filter, but it does create a barrier for the inept.
Since Noction has made it easy enough to configure their software so that anyone can do it, with or without experience on the DFZ, they have SOME responsibility to keep their software from accidentally breaking the internet.
-Matt
On Sat, Aug 1, 2020 at 2:30 PM Ryan Hamel <ryan@rkhtech.org> wrote:
Job,
I disagree on the fact that it is not fair to the BGP implementation ecosystem, to enforce a single piece of software to activate the no-export community by default, due to ignorance from the engineer(s) implementing the solution. It should be common sense that certain routes that should be advertised beyond the local AS, just like RFC1918 routes, and more. Also, wasn't it you that said Cisco routers had a bug in ignoring NO_EXPORT? Would you go on a rant with Cisco, even if Noction add that enabled checkbox by default?
Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen, but the same options like the NO_EXPORT community are available for the engineer to use? One solution would be to implement "BGP Group/Session Profiles" (ISP/RTBH/DDOS Filtering/Route Optimizers/etc) or a "BGP Session Wizard" (ask the operator questions about their intentions), then automatically generate import and export policies based on known accepted practices.
Another solution could be having the BGP daemon disclose the make, model family, and exact model of hardware it is running on, to BGP peers, and add more knobs into policy creation to match said values, and take action appropriately. That would be useful in getting around vendor specific issues, as well as belt & suspenders protection.
Ryan On Aug 1 2020, at 9:58 am, Job Snijders <job@instituut.net> wrote:
On Sat, Aug 01, 2020 at 06:50:55AM -0700, Ca By wrote:
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances.
"In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015
Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration.
Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident:
"We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019
Community members responded:
"Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019
Last year Noction stated:
"Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019
Sentiment we all can agree with, change is needed!
As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage.
I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes.
Kind regards,
Job
-- Matt Erculiani ERCUL-ARIN
-- Matt Erculiani ERCUL-ARIN
All, Watching this thread with interest got an idea - let me run it by this list before taking it any further (ie. to IETF). How about we learn from this and try to make BGP just a little bit safer ? *Idea: * In all stub (non transit) ASNs we modify BGP spec and disable automatic iBGP to eBGP advertisement ? *Implementation: * Vendors to allow to define as part of global bgp configuration if given ASN is transit or not. The default is to be discussed - no bias. *Benefit: * Without any issues anyone playing any tools in his network will be able to just issue one cli and be protected from accidentally hurting others. Yet naturally he will still be able to advertise his neworks just as today except by explicit policy in any shape and form we would find proper (example: "redistribute iBGP to eBGP policy-X"). We could even discuss if this should be perhaps part of BGP OPEN or BGP capabilities too such that two sides of eBGP session must agree with each other before bringing eBGP up. Comments, questions, flames - all welcome :) Cheers, Robert. PS. Such a definition sure can and likely will be misused (especially if we would just settle on only a single side setting it - but that will not cause any more harm as not having it at all. Moreover I can already see few other good options which BGP implementation or BGP spec can be augmented with once we know we are stub or for that matter once it knows it is transit ....
On Sun, Aug 2, 2020 at 4:34 AM Robert Raszuk <robert@raszuk.net> wrote:
All,
Watching this thread with interest got an idea - let me run it by this list before taking it any further (ie. to IETF).
How about we learn from this and try to make BGP just a little bit safer ?
*Idea: *
In all stub (non transit) ASNs we modify BGP spec and disable automatic iBGP to eBGP advertisement ?
Why do you believe a stub AS was involved or that would have changed this situation? The whole point of Noction is for a bad isp to fake more specific routes to downstream customers. Noction is sold to ISPs, aka transit AS, afaik
*Implementation: *
Vendors to allow to define as part of global bgp configuration if given ASN is transit or not. The default is to be discussed - no bias.
Oh. A configuration knob. Noction had knobs, the world runs of 5 year old software with default configs.
*Benefit: *
Without any issues anyone playing any tools in his network will be able to just issue one cli
Thanks for no pretending we configure our networks with yang model apis and be protected from accidentally hurting others. Yet naturally he will
still be able to advertise his neworks just as today except by explicit policy in any shape and form we would find proper (example: "redistribute iBGP to eBGP policy-X").
XR rolls this way today, thanks Cisco. But the “any” keyword exists, so yolo.
We could even discuss if this should be perhaps part of BGP OPEN or BGP capabilities too such that two sides of eBGP session must agree with each other before bringing eBGP up.
Comments, questions, flames - all welcome :)
Cheers, Robert.
PS. Such a definition sure can and likely will be misused (especially if we would just settle on only a single side setting it - but that will not cause any more harm as not having it at all.
Moreover I can already see few other good options which BGP implementation or BGP spec can be augmented with once we know we are stub or for that matter once it knows it is transit ....
Hi Ca,
Noction is sold to ISPs, aka transit AS, afaik
Interesting. My impression always was by talking to Noction some time back that mainly what they do is a flavor of performance routing. But this is not about Noction IMHO. If I am a non transit ASN with N upstream ISPs I want to exit not in a hot potato style ... if I care about my services I want to exit the best performing way to reach back customers. That's btw what Cisco PFR does or Google's Espresso or Facebook Edge Fabric etc ... And you have few vendors offering this as well as bunch of home grown tools attempting to do the same. Go and mandate that all of them will do NO-EXPORT if they insert any routes ... And we will see more and more of those type of tools coming. Sure we have implementations with obligatory policy on eBGP - cool. And yes we have match "ANY" too. So if your feedback is that to limit the iBGP routes to go out over eBGP this is all sufficient and we do not need a bit more protection there then case solved. Cheers, R. On Sun, Aug 2, 2020 at 4:42 PM Ca By <cb.list6@gmail.com> wrote:
On Sun, Aug 2, 2020 at 4:34 AM Robert Raszuk <robert@raszuk.net> wrote:
All,
Watching this thread with interest got an idea - let me run it by this list before taking it any further (ie. to IETF).
How about we learn from this and try to make BGP just a little bit safer ?
*Idea: *
In all stub (non transit) ASNs we modify BGP spec and disable automatic iBGP to eBGP advertisement ?
Why do you believe a stub AS was involved or that would have changed this situation?
The whole point of Noction is for a bad isp to fake more specific routes to downstream customers. Noction is sold to ISPs, aka transit AS, afaik
*Implementation: *
Vendors to allow to define as part of global bgp configuration if given ASN is transit or not. The default is to be discussed - no bias.
Oh. A configuration knob. Noction had knobs, the world runs of 5 year old software with default configs.
*Benefit: *
Without any issues anyone playing any tools in his network will be able to just issue one cli
Thanks for no pretending we configure our networks with yang model apis
and be protected from accidentally hurting others. Yet naturally he will
still be able to advertise his neworks just as today except by explicit policy in any shape and form we would find proper (example: "redistribute iBGP to eBGP policy-X").
XR rolls this way today, thanks Cisco. But the “any” keyword exists, so yolo.
We could even discuss if this should be perhaps part of BGP OPEN or BGP capabilities too such that two sides of eBGP session must agree with each other before bringing eBGP up.
Comments, questions, flames - all welcome :)
Cheers, Robert.
PS. Such a definition sure can and likely will be misused (especially if we would just settle on only a single side setting it - but that will not cause any more harm as not having it at all.
Moreover I can already see few other good options which BGP implementation or BGP spec can be augmented with once we know we are stub or for that matter once it knows it is transit ....
And bgp "optimizer" won't do that At best, they will let you get the less worst On 8/2/20 6:36 PM, Robert Raszuk wrote:
if I care about my services I want to exit the best performing way to reach back customers.
On Sun, Aug 2, 2020 at 9:36 AM Robert Raszuk <robert@raszuk.net> wrote:
Hi Ca,
Noction is sold to ISPs, aka transit AS, afaik
Interesting.
My impression always was by talking to Noction some time back that mainly what they do is a flavor of performance routing. But this is not about Noction IMHO.
If I am a non transit ASN with N upstream ISPs I want to exit not in a hot potato style ... if I care about my services I want to exit the best performing way to reach back customers. That's btw what Cisco PFR does or Google's Espresso or Facebook Edge Fabric etc ...
And you have few vendors offering this as well as bunch of home grown tools attempting to do the same. Go and mandate that all of them will do NO-EXPORT if they insert any routes ... And we will see more and more of those type of tools coming.
Sure we have implementations with obligatory policy on eBGP - cool. And yes we have match "ANY" too.
So if your feedback is that to limit the iBGP routes to go out over eBGP this is all sufficient and we do not need a bit more protection there then case solved.
Cheers, R.
My feedback is the local_pref is complete for this behavior of setting an outbound, including being non-transitive FB uses local-pref for this afaik https://research.fb.com/blog/2017/08/steering-oceans-of-content-to-the-world...
On Sun, Aug 2, 2020 at 4:42 PM Ca By <cb.list6@gmail.com> wrote:
On Sun, Aug 2, 2020 at 4:34 AM Robert Raszuk <robert@raszuk.net> wrote:
All,
Watching this thread with interest got an idea - let me run it by this list before taking it any further (ie. to IETF).
How about we learn from this and try to make BGP just a little bit safer ?
*Idea: *
In all stub (non transit) ASNs we modify BGP spec and disable automatic iBGP to eBGP advertisement ?
Why do you believe a stub AS was involved or that would have changed this situation?
The whole point of Noction is for a bad isp to fake more specific routes to downstream customers. Noction is sold to ISPs, aka transit AS, afaik
*Implementation: *
Vendors to allow to define as part of global bgp configuration if given ASN is transit or not. The default is to be discussed - no bias.
Oh. A configuration knob. Noction had knobs, the world runs of 5 year old software with default configs.
*Benefit: *
Without any issues anyone playing any tools in his network will be able to just issue one cli
Thanks for no pretending we configure our networks with yang model apis
and be protected from accidentally hurting others. Yet naturally he will
still be able to advertise his neworks just as today except by explicit policy in any shape and form we would find proper (example: "redistribute iBGP to eBGP policy-X").
XR rolls this way today, thanks Cisco. But the “any” keyword exists, so yolo.
We could even discuss if this should be perhaps part of BGP OPEN or BGP capabilities too such that two sides of eBGP session must agree with each other before bringing eBGP up.
Comments, questions, flames - all welcome :)
Cheers, Robert.
PS. Such a definition sure can and likely will be misused (especially if we would just settle on only a single side setting it - but that will not cause any more harm as not having it at all.
Moreover I can already see few other good options which BGP implementation or BGP spec can be augmented with once we know we are stub or for that matter once it knows it is transit ....
I don't think there's any requirement for it to be for downstream customers (from a BGP perspective) or any relatance to transit ASes. Web hosting companies, their AS, no client ASes, huge optimization going on. I'd think mostly because the major eyeball ISPs have garbage peering policies and like to run their ports hot to force you to buy their transit\DIA. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Ca By" <cb.list6@gmail.com> To: "Robert Raszuk" <robert@raszuk.net> Cc: nanog@nanog.org Sent: Sunday, August 2, 2020 9:42:12 AM Subject: Re: Issue with Noction IRP default setting (Was: BGP route hijack by AS10990) On Sun, Aug 2, 2020 at 4:34 AM Robert Raszuk < robert@raszuk.net > wrote: All, Watching this thread with interest got an idea - let me run it by this list before taking it any further (ie. to IETF). How about we learn from this and try to make BGP just a little bit safer ? Idea: In all stub (non transit) ASNs we modify BGP spec and disable automatic iBGP to eBGP advertisement ? Why do you believe a stub AS was involved or that would have changed this situation? The whole point of Noction is for a bad isp to fake more specific routes to downstream customers. Noction is sold to ISPs, aka transit AS, afaik <blockquote> Implementation: Vendors to allow to define as part of global bgp configuration if given ASN is transit or not. The default is to be discussed - no bias. </blockquote> Oh. A configuration knob. Noction had knobs, the world runs of 5 year old software with default configs. <blockquote> Benefit: Without any issues anyone playing any tools in his network will be able to just issue one cli </blockquote> Thanks for no pretending we configure our networks with yang model apis <blockquote> and be protected from accidentally hurting others. Yet naturally he will still be able to advertise his neworks just as today except by explicit policy in any shape and form we would find proper (example: "redistribute iBGP to eBGP policy-X"). </blockquote> XR rolls this way today, thanks Cisco. But the “any” keyword exists, so yolo. <blockquote> We could even discuss if this should be perhaps part of BGP OPEN or BGP capabilities too such that two sides of eBGP session must agree with each other before bringing eBGP up. Comments, questions, flames - all welcome :) Cheers, Robert. PS. Such a definition sure can and likely will be misused (especially if we would just settle on only a single side setting it - but that will not cause any more harm as not having it at all. Moreover I can already see few other good options which BGP implementation or BGP spec can be augmented with once we know we are stub or for that matter once it knows it is transit .... <blockquote> </blockquote> </blockquote>
On 2/Aug/20 01:44, Ryan Hamel wrote:
Matt,
Why are you blaming the ease of use on the vendor, for the operators lack of knowledge regarding BGP? That is like blaming a vehicle manufacturer for a person pressing the gas pedal in a car and not giving a toss about the rules of the road. The base foundation regarding the rules of the road mostly apply the same for driving a car, truck, bus, and semi/lorry truck. There is no excuse for ignorance just because the user interface is different (web browser vs. SSH client).
Actually, there is. One has to actually acquire knowledge about not only driving a car, but driving it in public. That knowledge is then validated by a gubbermint-sanctioned driver's license test. If you fail, you aren't allowed to drive. If you are caught driving without a driver's license, you pay the penalty. There is no requirement for a license in order to run power into a router and hook it up to the Internet. This is the problem I have with the current state of how we support BGP actors.
Adding a take on this, there are kids born after 9/11, with IP allocations and ASNs experimenting in the DFZ right now. If they can make it work, and not cause harm to other members in this community, it clearly demonstrates a lack of knowledge, or honest human error (which will never go away).
We should not be celebrating this.
Anything that can be used, can be misused. With that said, why shouldn't ALL BGP software implementations encourage best practice? They decided RPKI validation was a good thing.
The larger question is we should find a way to make our industry genuinely qualification-based, and not "free for all that decides they want to try it out". I don't yet know how to do that, but we certainly need to start thinking more seriously about it. Kids born after 9/11 successfully experimenting on a global network is not where the bar ought to be. Mark.
Mark, I think trying to implement some kind of license requirement for DFZ participants is a step in the wrong direction and a waste of time and money. How would you even enforce it? If the goal is just to provide a bigger barrier to "kids born after 9/11", why not just increase RIR fees, or add an age requirement for individuals? And anyway, why do we need to increase that barrier? What problem does that actually solve? Are "kids born after 9/11" the ones propagating route leaks? I don't think they are. But the reason for that is not that they're necessarily more skilled operators than "adults born before 9/11" or anyone else - it's that they are being filtered appropriately by the likes of Vultr, etc. Verizon (and other large incumbents) could learn something from them. Let's try to stay away from exclusivity for exclusivity's sake and actually focus on solving the real problems we have. On Sun, Aug 2, 2020 at 12:45 PM Mark Tinka <mark.tinka@seacom.com> wrote:
On 2/Aug/20 01:44, Ryan Hamel wrote:
Matt,
Why are you blaming the ease of use on the vendor, for the operators lack of knowledge regarding BGP? That is like blaming a vehicle manufacturer for a person pressing the gas pedal in a car and not giving a toss about the rules of the road. The base foundation regarding the rules of the road mostly apply the same for driving a car, truck, bus, and semi/lorry truck. There is no excuse for ignorance just because the user interface is different (web browser vs. SSH client).
Actually, there is.
One has to actually acquire knowledge about not only driving a car, but driving it in public. That knowledge is then validated by a gubbermint-sanctioned driver's license test. If you fail, you aren't allowed to drive. If you are caught driving without a driver's license, you pay the penalty.
There is no requirement for a license in order to run power into a router and hook it up to the Internet. This is the problem I have with the current state of how we support BGP actors.
Adding a take on this, there are kids born after 9/11, with IP allocations and ASNs experimenting in the DFZ right now. If they can make it work, and not cause harm to other members in this community, it clearly demonstrates a lack of knowledge, or honest human error (which will never go away).
We should not be celebrating this.
Anything that can be used, can be misused. With that said, why shouldn't ALL BGP software implementations encourage best practice? They decided RPKI validation was a good thing.
The larger question is we should find a way to make our industry genuinely qualification-based, and not "free for all that decides they want to try it out".
I don't yet know how to do that, but we certainly need to start thinking more seriously about it. Kids born after 9/11 successfully experimenting on a global network is not where the bar ought to be.
Mark.
On 2/Aug/20 21:37, Ross Tajvar wrote:
Mark,
I think trying to implement some kind of license requirement for DFZ participants is a step in the wrong direction and a waste of time and money. How would you even enforce it? If the goal is just to provide a bigger barrier to "kids born after 9/11", why not just increase RIR fees, or add an age requirement for individuals? And anyway, why do we need to increase that barrier? What problem does that actually solve? Are "kids born after 9/11" the ones propagating route leaks? I don't think they are. But the reason for that is not that they're necessarily more skilled operators than "adults born before 9/11" or anyone else - it's that they are being filtered appropriately by the likes of Vultr, etc. Verizon (and other large incumbents) could learn something from them.
Let's try to stay away from exclusivity for exclusivity's sake and actually focus on solving the real problems we have.
Like I said before, "guidance" rather than "regulation". The way the Internet has worked for 4+ decades has been what has made it so successful. However, it's starting to catch up with us, so we need to figure it out, and not bury our heads in the sand until it hurts me or you more directly for either us to care. Like I also said, I don't quite know how to solve this problem yet. What I do know is if we keep having this dance every few months each year, it will be 2050 and we'll still be in the same place, only worse. Before we can find a solution, we have to realize that there is a problem. There is enough smarts in the community to find a solution. Hopefully before some silly gubbermint (TikTok ban, anyone?) decides for us. Mark.
I guess I missed your mention of "guidance rather than regulation", and am still missing it, unless you're referring to another thread. If you want to acknowledge a problem with internet governance and bring it to this mailing list for discussion, that sounds like a good idea. But the only "problem" I've seen you bring up in this thread is the participation of young people, and I've yet to hear a reason why that's a bad thing. This just sounds like gatekeeping to me. If we want to improve routing security, then rather than making vague claims about things "catching up with us" with no clear problem statement, we should be focusing our efforts on basic safeguards like filtering and RPKI OV. I don't consider that "burying my head in the sand". On Sun, Aug 2, 2020, 5:24 PM Mark Tinka <mark.tinka@seacom.com> wrote:
On 2/Aug/20 21:37, Ross Tajvar wrote:
Mark,
I think trying to implement some kind of license requirement for DFZ participants is a step in the wrong direction and a waste of time and money. How would you even enforce it? If the goal is just to provide a bigger barrier to "kids born after 9/11", why not just increase RIR fees, or add an age requirement for individuals? And anyway, why do we need to increase that barrier? What problem does that actually solve? Are "kids born after 9/11" the ones propagating route leaks? I don't think they are. But the reason for that is not that they're necessarily more skilled operators than "adults born before 9/11" or anyone else - it's that they are being filtered appropriately by the likes of Vultr, etc. Verizon (and other large incumbents) could learn something from them.
Let's try to stay away from exclusivity for exclusivity's sake and actually focus on solving the real problems we have.
Like I said before, "guidance" rather than "regulation".
The way the Internet has worked for 4+ decades has been what has made it so successful. However, it's starting to catch up with us, so we need to figure it out, and not bury our heads in the sand until it hurts me or you more directly for either us to care.
Like I also said, I don't quite know how to solve this problem yet. What I do know is if we keep having this dance every few months each year, it will be 2050 and we'll still be in the same place, only worse.
Before we can find a solution, we have to realize that there is a problem. There is enough smarts in the community to find a solution. Hopefully before some silly gubbermint (TikTok ban, anyone?) decides for us.
Mark.
On 3/Aug/20 00:03, Ross Tajvar wrote:
I guess I missed your mention of "guidance rather than regulation", and am still missing it, unless you're referring to another thread.
If you want to acknowledge a problem with internet governance and bring it to this mailing list for discussion, that sounds like a good idea. But the only "problem" I've seen you bring up in this thread is the participation of young people, and I've yet to hear a reason why that's a bad thing. This just sounds like gatekeeping to me.
If we want to improve routing security, then rather than making vague claims about things "catching up with us" with no clear problem statement, we should be focusing our efforts on basic safeguards like filtering and RPKI OV. I don't consider that "burying my head in the sand".
You may have missed most of these fundamentals much earlier in the thread. I'm not looking to repeat myself, so you're welcome to start from the top and come back if you have any more questions. Mark.
On 1/Aug/20 22:29, Ryan Hamel wrote:
Job,
I disagree on the fact that it is not fair to the BGP implementation ecosystem, to enforce a single piece of software to activate the no-export community by default, due to ignorance from the engineer(s) implementing the solution. It should be common sense that certain routes that should be advertised beyond the local AS, just like RFC1918 routes, and more. Also, wasn't it you that said Cisco routers had a bug in ignoring NO_EXPORT? Would you go on a rant with Cisco, even if Noction add that enabled checkbox by default?
Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen, but the same options like the NO_EXPORT community are available for the engineer to use? One solution would be to implement "BGP Group/Session Profiles" (ISP/RTBH/DDOS Filtering/Route Optimizers/etc) or a "BGP Session Wizard" (ask the operator questions about their intentions), then automatically generate import and export policies based on known accepted practices.
Another solution could be having the BGP daemon disclose the make, model family, and exact model of hardware it is running on, to BGP peers, and add more knobs into policy creation to match said values, and take action appropriately. That would be useful in getting around vendor specific issues, as well as belt & suspenders protection.
Most (if not all) people buying BGP optimizers aren't using them for regular BGP-speaking routers to the rest of the Internet or their core network. BGP optimizers serve a unique use-case, which works in the way it does to create an expected risk as we saw in this and past incidents. On that basis, I think Job's request to make NO_EXPORT a mandatory default (I'd go further to say the new default mode can be user-disabled, but with an unmistakable warning in the UI) is not unreasonable. Mark.
Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen, but the same options like the NO_EXPORT community are available for the engineer to use? One solution would be to implement "BGP Group/Session Profiles" (ISP/RTBH/DDOS Filtering/Route Optimizers/etc) or a "BGP Session Wizard" (ask the operator questions about their intentions), then automatically generate import and export policies based on known accepted practices.
You seem to be implying that nobody has ever given feedback to a vendor about their BGP implementation. That's incredibly far from the truth. Default parameters on many NOS have been changed over the years to be safer for 2AM compliance, or for operators with lesser experience. It's correct that someone with any of those BGP implementations can make configuration errors. The difference is those configuration errors are LESS LIKELY to cause widespread disruption in the DFZ. I think back to many years ago at the start of my career, and the first time I configured BGP on a router with 2 upstreams. In an amazing rookie move, I created config which I did not apply, reannouncing everything from 3356 to 1239 via myself, and vice versa. While embracing, only a very small amount of traffic ( <10Mbps ) and prefixes were impacted, since BGP worked as designed, and the longer AS PATH I created was less desirable for almost everyone. IF you are going to create more specific announcements, be it with a BGP "optimizer" , or with other BGP implementations, the SAFEST method to prevent unintended consequences would be to add guardrails, like NO_EXPORT. It's just a best practice. When you can make those good best practices a default behavior? Even better! There is no downside to Nocton making NO_EXPORT the default behavior, only upside to the stability of the internet at large. On Sat, Aug 1, 2020 at 4:31 PM Ryan Hamel <ryan@rkhtech.org> wrote:
Job,
I disagree on the fact that it is not fair to the BGP implementation ecosystem, to enforce a single piece of software to activate the no-export community by default, due to ignorance from the engineer(s) implementing the solution. It should be common sense that certain routes that should be advertised beyond the local AS, just like RFC1918 routes, and more. Also, wasn't it you that said Cisco routers had a bug in ignoring NO_EXPORT? Would you go on a rant with Cisco, even if Noction add that enabled checkbox by default?
Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen, but the same options like the NO_EXPORT community are available for the engineer to use? One solution would be to implement "BGP Group/Session Profiles" (ISP/RTBH/DDOS Filtering/Route Optimizers/etc) or a "BGP Session Wizard" (ask the operator questions about their intentions), then automatically generate import and export policies based on known accepted practices.
Another solution could be having the BGP daemon disclose the make, model family, and exact model of hardware it is running on, to BGP peers, and add more knobs into policy creation to match said values, and take action appropriately. That would be useful in getting around vendor specific issues, as well as belt & suspenders protection.
Ryan On Aug 1 2020, at 9:58 am, Job Snijders <job@instituut.net> wrote:
On Sat, Aug 01, 2020 at 06:50:55AM -0700, Ca By wrote:
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances.
"In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015
Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration.
Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident:
"We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019
Community members responded:
"Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019
Last year Noction stated:
"Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019
Sentiment we all can agree with, change is needed!
As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage.
I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes.
Kind regards,
Job
Dear Ryan, I have come to believe this is a Noction IRP specific issue. On Sat, Aug 01, 2020 at 01:29:59PM -0700, Ryan Hamel wrote:
I disagree on the fact that it is not fair to the BGP implementation ecosystem, to enforce a single piece of software to activate the no-export community by default
I am not exaggerating when I say that *ONLY* the name of this software is mentioned when incidents like this happen. Other route manipulation tools either use different (safer) technologies and/or mark routes with NO_EXPORT. Every few weeks I am in phone calls with new people who happened originated hijacks which existed for traffic engineering purposes and without fail it is always the same software from the same company that originated the rogue routes. It seems more efficient if the software were to ship with improved default settings than me explaining the problem ad-nauseum to every new engineer after they unsuspectingly stepped into this trap. Not extremely dangerous by default, is it really too much to ask?
Also, wasn't it you that said Cisco routers had a bug in ignoring NO_EXPORT? Would you go on a rant with Cisco, even if Noction add that enabled checkbox by default?
Cisco and Noction are separate companies, regardless of what Noction does, the Cisco implementations are expected to confirm to their own documentation and the BGP-4 specifications. 1/ Without setting NO_EXPORT by a default, route manipulation software by default is very dangerous. 2/ Even if NO_EXPORT is set, software defects happen from time to time and the existence of fake more-specific routes in a specific routing domain can have dire consequences (as has been demonstrated time after time). Not setting NO_EXPORT as a default is setting your customers up for failure. If your car's seatbelt accidentally breaks, it wouldn't logically follow to also remove the airbags.
Why are you not on your soap box about BIRD, FRrouting, OpenBGPd, Cisco, Juniper, etc... about how they can possibly allow every day screw ups to happen
It is interesting you mention these names, as all of them in recent years went through a process to revisit some unsafe default behavior and address it. These companies have far larger userbases, so if they can do it, anyone can do it! For the longest time many BGP implementations - BY DEFAULT - would propagate any and all routes from EBGP peers to all other IGBP and EBGP peers. The community identified this to be a root cause for many incidents, and eventually came up with a change to the BGP-4 specification which codifies that the default should be safe instead of dangerous. https://tools.ietf.org/html/rfc8212 - BIRD introduced support for RFC 8212 in BIRD 2 and higher - FRRouting changed the defaults in 7.4 and higher - Cisco IOS XR had RFC 8212 right from the start - OpenBGPD changed its default behavior in version 6.4 - Juniper is still working on this, in the meantime a SLAX script can be used to emulate RFC 8212 behavior: https://github.com/packetsource/rfc8212-junos It is well understood how default settings strongly shape the success or failure of deployments. This is no different. Kind regards, Job
Was Tulix using Noction, or was it something else that caused their particular issue? ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Job Snijders" <job@instituut.net> To: nanog@nanog.org Sent: Saturday, August 1, 2020 11:58:12 AM Subject: Issue with Noction IRP default setting (Was: BGP route hijack by AS10990) On Sat, Aug 01, 2020 at 06:50:55AM -0700, Ca By wrote:
I am not normally supporting a heavy hand in regulation, but i think it is fair to say Noction and similar BGP optimizers are unsafe at any speed and the FTC or similar should ban them in the USA. They harm consumers and are a risk to national security / critical infrastructure
Noction and similar could have set basic defaults (no-export, only create /25 bogus routes to limit scope), but they have been clear that their greed to suck up traffic does not benefit from these defaults and they wont do it.
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances. "In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015 Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration. Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident: "We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019 Community members responded: "Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019 Last year Noction stated: "Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019 Sentiment we all can agree with, change is needed! As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage. I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes. Kind regards, Job
On 1/Aug/20 18:58, Job Snijders wrote:
Following a large scale BGP incident in March 2015, noction made it possible to optionally set the well-known NO_EXPORT community on route advertisements originated by IRP instances.
"In order to further reduce the likelihood of these problems occurring in the future, we will be adding a feature within Noction IRP to give an option to tag all the more specific prefixes that it generates with the BGP NO_EXPORT community. This will not be enabled by default [snip]" https://www.noction.com/blog/route-optimizers Mar 27, 2015
Due to NO_EXPORT not being set in the default configuration, there are probably if not certainly many unsuspecting network engineers who end up deploying this software - without ever even considering - to change that one setting in the configuration.
Fast forward a few years and a few incidents, on the topic of default settings, following the Cloudflare/DQE/Verizon incident:
"We do have no export community support and have done for many years. The use of more specifics is also optional. Neither replaces the need for filters." https://twitter.com/noction/status/1143177562191011840 Jun 24, 2019
Community members responded:
"Noction have been facilitating Internet outages for years and years and the best thing they can say in response is that it is technically possible to use their product responsibly, they just don't ship it that way." https://twitter.com/PowerDNS_Bert/status/1143252745257979905 June 24, 2019
Last year Noction stated:
"Nobody found this leak pleasant." https://www.noction.com/news/incident-response June 26, 2019
Sentiment we all can agree with, change is needed!
As far as I know, Noction IRP is the ONLY commercially available off-the-shelf BGP route manipulation software which - as default - does NOT set the BGP well-known NO_EXPORT community on the product's route advertisements. This is a product design decision which causes collateral damage.
I would like to urge Noction to reconsider their position. Seek to migrate the existing users to use NO_EXPORT, and release a new version of the IRP software which sets NO_EXPORT BY DEFAULT on all generated routes.
A great first step! Mark.
Mark Tinka wrote on 01/08/2020 12:20:
The difference between us and aviation is that fundamental flaws or mistakes that impact safety are required to be fixed and checked if you want to keep operating in the industry. We don't have that, so...
... so once again, route optimisers were at the heart of another serious route leaking incident. BGP is designed to prevent loops from happening, and has tools like no-export to help prevent inadvertent leaks. When people build "BGP optimisers" which reinject a prefix into a routing mesh with the entire as-path stripped and then they refuse to apply the basic minimum of common sense by refusing point blank to tag prefixes with no-export, it's a matter of certainty that leaks are going to happen, and that when they do, they'll cause damage. It's about as responsible as shipping a shotgun with the safety disabled and then handing it to a newbie. After all, the safety makes it more difficult to operate and if the newbie shoots themselves, it was their fault. And if they shot someone else, they shouldn't have got in the way, right? Nick
On 1/Aug/20 16:44, Nick Hilliard wrote:
... so once again, route optimisers were at the heart of another serious route leaking incident.
BGP is designed to prevent loops from happening, and has tools like no-export to help prevent inadvertent leaks.
When people build "BGP optimisers" which reinject a prefix into a routing mesh with the entire as-path stripped and then they refuse to apply the basic minimum of common sense by refusing point blank to tag prefixes with no-export, it's a matter of certainty that leaks are going to happen, and that when they do, they'll cause damage.
It's about as responsible as shipping a shotgun with the safety disabled and then handing it to a newbie. After all, the safety makes it more difficult to operate and if the newbie shoots themselves, it was their fault. And if they shot someone else, they shouldn't have got in the way, right?
All in all, agreed. While gun ownership and use is highly regulated (and penalized if violated) in almost all countries, it suffers the same problem as folk that have access to and drive cars without a valid license. In our case, we don't really have anything beyond person-to-person trust in doing their part to not only adhere to global BCOP's for BGP operation, but to also understand what they are doing with the equipment they have, as well as the BGP protocol itself. Without some plan in place to make sure BGP actors do so with sufficient knowledge and care, these problems are only going to worsen as the next crop of network engineers prefer a BGP optimizer with a point & click GUI to actually understanding BGP Multi-Homing principles and techniques. I'm not opposed to Cameron's suggestion on how to deal with BGP optimizers :-). The issue of correctly filtering at eBGP hand-off points has been beaten to death probably longer than I have been a member of this mailing list. So... Mark.
On Aug 1, 2020, at 04:20 , Mark Tinka <mark.tinka@seacom.com> wrote:
On 1/Aug/20 02:17, Sabri Berisha wrote:
I'm not sure if you read their entire Mea Culpa, but they did indicate that the root cause of this issue was the provisioning of a legacy filter that they are no longer using. So effectively, that makes it a human error.
We're going to a point where a single error is no longer causing outages, something very similar to my favorite analogy: avation. Pretty much every major air disaster was caused by a combination of factors. Pretty much every major outage these days is caused by a combination of factors.
The manual provisioning of an inadequate filter, combined with an automation error on the side of a customer (which by itself was probably caused by a combination of factors), caused this issue.
We learn from every outage. And instead of radio silence, they fessed up and fixed the issue. Have a look at the ASRS program :)
What I meant by "TOTALLY avoidable" is that "this particular plane crash" has happened in the exact same way, for the exact same reasons, over and over again.
That’s also true of Asiana 214. (Root cause: 5 pilots failed to pay attention to the approach) https://www.ntsb.gov/investigations/AccidentReports/Reports/AAR1401.pdf (The full report probably only interests pilots, but the executive summary on pages xi - xv is a good read). Worth noting, contrary to the public perception of airline accidents, despite the near total destruction of the airframe in this incident, 288 of 291 passengers and all of the crew survived. Of the 307 people on board, only 49 suffered serious injuries. (serious is defined as an injury requiring >48 hours of hospitalization within 7 days of the accident in which the injury was sustained). (49CFR§830.2) For those that find 5 pages of type TL;DR, the key findings are in the first paragraph after the last bullet point on page xv.
Aviation learns from mistakes that don't generally recur in the exact same way for the exact same reasons.
Aviation makes a strong effort in this area, perhaps stronger than any other human endeavor, especially when you’re talking about the fraction of Aviation known in the US as “Part 121 Scheduled Air Carrier Services”. However, as noted above, there are exceptions. In fact, there are striking parallels between Asiana 214 and this incident. The tools to avoid the accident in question automatically were available to the pilots, but they failed to turn them on (autothrottle). The tools to avoid this incident were available to Telia, but they failed to turn them on. Owen
On 1/Aug/20 17:49, Owen DeLong wrote:
Aviation makes a strong effort in this area, perhaps stronger than any other human endeavor, especially when you’re talking about the fraction of Aviation known in the US as “Part 121 Scheduled Air Carrier Services”.
However, as noted above, there are exceptions.
In fact, there are striking parallels between Asiana 214 and this incident.
The tools to avoid the accident in question automatically were available to the pilots, but they failed to turn them on (autothrottle).
The tools to avoid this incident were available to Telia, but they failed to turn them on.
Agreed, the leading cause of aircraft incidents is human error. When human errors in aeroplane accidents are repeated, it's usually because of poor crew resource management, poor training, low experience, poor situational awareness, crew fatigue, crew disorientation, not following checklists... that sort of thing. We've made a whole hymn out of "do proper filtering at eBGP hand-off points" over the years. Network operators are not always working under pressure like airline pilots do. On a quiet, calm afternoon, an engineer can comb the network to make sure all potential mistakes that have been shouted about for years within our community are plugged, especially when working at an "experienced" operation such as Telia and similar. It's almost a "do once and forget, and watch it repeat" type-thing, vs. airline pilots who need to be on it 110%, every second of every flight, even if they've got 25,000hrs under their epaulettes. It shouldn't be this hard... Mark.
On Aug 1, 2020, at 09:09 , Mark Tinka <mark.tinka@seacom.com> wrote:
On 1/Aug/20 17:49, Owen DeLong wrote:
Aviation makes a strong effort in this area, perhaps stronger than any other human endeavor, especially when you’re talking about the fraction of Aviation known in the US as “Part 121 Scheduled Air Carrier Services”.
However, as noted above, there are exceptions.
In fact, there are striking parallels between Asiana 214 and this incident.
The tools to avoid the accident in question automatically were available to the pilots, but they failed to turn them on (autothrottle).
The tools to avoid this incident were available to Telia, but they failed to turn them on.
Agreed, the leading cause of aircraft incidents is human error. When human errors in aeroplane accidents are repeated, it's usually because of poor crew resource management, poor training, low experience, poor situational awareness, crew fatigue, crew disorientation, not following checklists... that sort of thing.
Let’s be clear… This was not an incident, it was an accident. In the US at least, under 49CFR§830.2 the two are specifically defined as follows: Aircraft accident means an occurrence associated with the operation of an aircraft which takes place between the time any person boards the aircraft with the intention of flight and all such persons have disembarked, and in which any person suffers death or serious injury, or in which the aircraft receives substantial damage. For purposes of this part, the definition of “aircraft accident” includes “unmanned aircraft accident,” as defined herein. Serious injury is defined in my previous message (reference to the same code section) Substantial damage is defined as “damage or failure which adversely affects the structural strength, performance, or flight characteristics of the aircraft and which would normally require major repair or replacement of the affected component. Engine failure or damage limited to an engine if only one engine fails or is damaged, bent fairings or cowling, dented skin, small punctured holses in the skin or fabric, ground damage to rotor or propeller blade, and damage to landing gear, wheels, tires, flaps, engine accessories, brakes, or wingtips are not considered “substantial damage” for the purpose of this part. An “Incident” is an occurrence other than an accident, associated with the operation of an aircraft, which affects or could affect the safety of operations.
We've made a whole hymn out of "do proper filtering at eBGP hand-off points" over the years. Network operators are not always working under pressure like airline pilots do. On a quiet, calm afternoon, an engineer can comb the network to make sure all potential mistakes that have been shouted about for years within our community are plugged, especially when working at an "experienced" operation such as Telia and similar.
Airline pilots are not always under pressure, either. In fact, airline flying is 90% boredom, 9+% routine operations (procedures for preparation and departure, departure and climb-out, preparations for approach and landing, descent, approach, and landing) and <1% actual pressure (IROPS, in-flight emergencies, etc.). I say this not only as someone who’s spent a lot of time as a passenger, but also as a commercial instrument-rated pilot.
It's almost a "do once and forget, and watch it repeat" type-thing, vs. airline pilots who need to be on it 110%, every second of every flight, even if they've got 25,000hrs under their epaulettes.
ROFLMAO, if you truly believe this, you have no concept of life in the cockpit. Yes, airline pilots need to be paying attention even in the most routine phases of flight, but in reality, 90+% of every flight is routine monitoring of systems, essentially checking the “Ts”… Time: Is the flight progressing as expected Are we where we expected to be at this time? Is the fuel consumption in line with our expectations? Turn: Are we on course? How far to the next heading change? Throttle: Is our performance correct for this point in the flight? Are we at the desired altitude, attitude, power, and airspeed? If applicable, are the auto throttles in the correct mode? Twist: Are any adjustments or preparations on Radios/Navigation/FMS needed? This is where you check to make sure that not only are you on the correct frequency now (com, navigation, etc.), but that you also have things set for the next change. It’s also where you make those changes (e.g. flip to the next VOR) if that’s due. Track: How does our track compare to our intended course. This is where the heading is adjusted, if necessary, to achieve the desired course. With modern automation in the cockpit, this is mostly a glance at the indicators to see that the autopilot is still engaged in the correct mode and holding the desired course. Talk: Interaction with ATC Any compulsory reports due? Are we in compliance with our clearance, etc. Now, in a classic single-engine aircraft, these 6 Ts are a constant effort for the pilot. In a modern airliner, once it’s at cruise, it’s: Time: 99% automated, check the gas gauges and ground vs. airspeed to make sure they match expectations. Turn: 99+% automated, you programmed your route into the FMS and George has it from there. (All autopilots are named George[1]). Throttle: 99+% automated. Are the auto throttles active in the correct mode? Twist: 99+% automated. Other than the occasional frequency handoff, the radios are 99% managed by the FMS… Thanks, George! Track: 99+% automated. Is the automation doing something untoward? Talk: Workload here, but only if ATC calls you for the most part. Generally about 5 seconds every 30-90 minutes in cruise flight. This is not to take away from the skill, training, or capabilities of those who have put in the effort and have what it takes to attain not only an ATP (Airline Transport Pilot) certificate, but also get on and keep a pilot job at an airline. It’s definitely no minor feat to accomplish all of that and as a general rule, they are hard-working highly skilled highly trained professionals. However, 99% of piloting is best summed up as “Pilots use their superior training and planning abilities and their superior judgment to avoid situations in which their superior skills are required.” I have tremendous respect for pilots. I am a pilot. But hyperbole such as what you present above is merely common misconception. Owen [1] Why is the autopilot called “George?” — There’s no definite answer, but the two most common theories are: + The first practical autopilot was invented by George DeBeeson (This is fact, but whether or not the colloquial name for autopilots is because of this isn’t certain) + RAF pilots named their aircraft in general “George” after King George, the owner of all RAF aircraft. (started in WWII under King George VI) https://airplaneacademy.com/why-is-the-autopilot-called-george-two-prevailin...
On 1/Aug/20 18:46, Owen DeLong wrote:
ROFLMAO, if you truly believe this, you have no concept of life in the cockpit.
I was born into aviation, with both my mom and dad licensed ATPL pilots for several decades. So I know my way around a number of different cockpits. The goal wasn't to turn this thread into an aviation one, but to focus on what we can do better in Internet operations for more accountability. Let's stay on-topic, please. Thanks. Mark.
Hi, ----- On Aug 1, 2020, at 8:49 AM, Owen DeLong owen@delong.com wrote:
In fact, there are striking parallels between Asiana 214 and this incident.
Yes. Children of the magenta line. Depending on automation, and no clue what to do when the Instrument Landing System goes down. But, the most important parallel is (hopefully) yet to come. One major outcome of the Asiana investigation was the call for more training, as the crew did not properly understand how the aircraft worked. The same can be said here. Noction and/or its operators appear to not understand how BGP works, and/or what safety measures must be deployed to ensure that the larger internet will not be hurt by misconfiguration. I also agree with Job, that Noction has some responsibility here. And as I understand more and more about it, I must now agree with Mark T that this was an avoidable incident (although not because of Telia, but because Noction's decision to not enable NO_EXPORT by default). Thanks, Sabri
On Aug 1, 2020, at 12:03 , Sabri Berisha <sabri@cluecentral.net> wrote:
Hi,
----- On Aug 1, 2020, at 8:49 AM, Owen DeLong owen@delong.com wrote:
In fact, there are striking parallels between Asiana 214 and this incident.
Yes. Children of the magenta line. Depending on automation, and no clue what to do when the Instrument Landing System goes down.
This wasn’t a case of the ILS going down. This was a case where the automation was put in the wrong mode (accidentally) without any of the pilots in the cockpit noticing it until it was too late. The problem was discovered and power applied 8 seconds before impact. It takes 19 seconds for the engines on a 777 to spool up to adequate power for a go-around at the airspeed and in the configuration that existed at the time.
But, the most important parallel is (hopefully) yet to come. One major outcome of the Asiana investigation was the call for more training, as the crew did not properly understand how the aircraft worked.
That’s true in virtually every human factors accident, but in reality, failure to understand the automation was a tiny contributing factor in this accident. Every pilot is taught early in their ab initio training that they must monitor the approach carefully and make sure not to bleed off too much energy (airspeed) in the process. There’s a very common and easily identifiable pattern to an under-powered approach on autopilot that all of the pilots in the cockpit should have readily recognized if they were even paying the slightest attention to the approach… 1. Airplane begins to dip below glide slope. 2. Autopilot raises nose to reduce descent rate and recapture glide slope. 3. Increased pitch = greater induced drag = lower airspeed. 4. Lower airspeed = less lift = goto 1. Until power is applied, this process will repeat until one of the following events occurs: 1. Landing short of the runway (as in the case of Asiana 214) 2. Power is applied and the approach is stabilized 3. The pitch attitude exceeds the critical angle of attack and the wings stall, causing an abrupt pitch down. This cycle is well understood by every student pilot before they can be endorsed for their first solo flight. No amount of training will make up for the utter and complete failure to pay attention to the approach. This is one of the reasons US carriers have a “sterile cockpit” rule. In most cases, the sterile cockpit rule is approximately this: “Below 10,000 feet or in other critical phases of flight (emergency situations, unusual climbs or descents, mechanical difficulties, etc.), cockpit communications are limited to those related to the safe operation of the aircraft.”
The same can be said here. Noction and/or its operators appear to not understand how BGP works, and/or what safety measures must be deployed to ensure that the larger internet will not be hurt by misconfiguration.
On one level, there’s validity to your claim here. On the other hand, there’s a certain extent to which your telling hammer manufacturers that they have to make it impossible for a carpenter to injure his thumb by missing the nail.
I also agree with Job, that Noction has some responsibility here. And as I understand more and more about it, I must now agree with Mark T that this was an avoidable incident (although not because of Telia, but because Noction's decision to not enable NO_EXPORT by default).
I disagree. I think Noction and Telia are both culpable here. Most of the top 200 providers manage to do prefix filtering at the customer edge, so I don’t see any reason to give Telia a free pass here. Owen
On 1/Aug/20 21:31, Owen DeLong wrote:
I disagree. I think Noction and Telia are both culpable here. Most of the top 200 providers manage to do prefix filtering at the customer edge, so I don’t see any reason to give Telia a free pass here.
Both Noction and Telia are culpable, because they both (should) know about past incidents, and how to do their part in protecting against them. I mean, this is what we talk about on and at *NOG everyday of the year. It's not like they've been living under a rock. Mark.
Sabri Berisha wrote on 01/08/2020 20:03:
but because Noction's decision to not enable NO_EXPORT by default
the primary problem is not this but that Noction reinjects prefixes into the local ibgp mesh with the as-path stripped and then prioritises these prefixes so that they're learned as the best path. The as-path is the primary loop detection mechanism in eBGP. Removing this is like hot-wiring your electrical distribution board because you found out you could get more power if you bypass those stupid RCDs. Once you strip off the as-path in the local view, it's like the AS7007 incident desperately begging to happen all over again. As long as route optimiser vendors ship their products with such deeply harmful defaults, we're going to continue to see these problems ad nauseam. Nick
----- On Aug 1, 2020, at 12:50 PM, Nick Hilliard nick@foobar.org wrote: Hi,
Sabri Berisha wrote on 01/08/2020 20:03:
but because Noction's decision to not enable NO_EXPORT by default
the primary problem is not this but that Noction reinjects prefixes into the local ibgp mesh with the as-path stripped and then prioritises these prefixes so that they're learned as the best path.
Yeah, but that's not problem as far as I'm concerned. Their network, their rules. I've done weirder stuff than that, in tightly controlled environments.
The as-path is the primary loop detection mechanism in eBGP. Removing this is like hot-wiring your electrical distribution board because you found out you could get more power if you bypass those stupid RCDs.
Well, let's be honest. Sometimes we need to get rid of that pesky mechanism. For example, when using BGP-as-IGP, the "allowas-in" disregards the as-path, in a controlled manner (and yes, I know, different use case). My point is that there can be operational reasons to do so, and whatever they wish to do on their network is perfectly fine. As long as they don't bother the rest of the world with it. Thanks, Sabri
Sabri Berisha wrote on 01/08/2020 20:59:
My point is that there can be operational reasons to do so, and whatever they wish to do on their network is perfectly fine. As long as they don't bother the rest of the world with it.
I get what you're saying, and am a big fan of personal responsibility, but when a vendor ships a product like a BGP optimiser, it requires that you run your network with the safety controls removed. It's no different in principle to shipping guns with the safety welded to off, or hot-wiring 20kW cables to bypass your RCDs. It can produce some great results, no doubt about it, but sooner or later you're guaranteed that there's going to be a nasty accident. In any individual case, it's understandable to assign blame to an operator for messing up their configs. In the general case, shipping products with dangerous-by-default configurations is going lead to more accidents happening. At this point, a large proportion of the major routing leaks on the internet can be associated with bgp optimisers and Noction's name appears with disturbing regularity. This is an appalling record, not least because it's almost entirely preventable. Nick
On Aug 1, 2020, at 12:59 PM, Sabri Berisha <sabri@cluecentral.net> wrote:
----- On Aug 1, 2020, at 12:50 PM, Nick Hilliard nick@foobar.org wrote:
Hi,
Sabri Berisha wrote on 01/08/2020 20:03:
but because Noction's decision to not enable NO_EXPORT by default
the primary problem is not this but that Noction reinjects prefixes into the local ibgp mesh with the as-path stripped and then prioritises these prefixes so that they're learned as the best path.
Yeah, but that's not problem as far as I'm concerned. Their network, their rules. I've done weirder stuff than that, in tightly controlled environments.
Your network, your rules is fine as far as your border. When you start announcing crap to the rest of the world, then the rest of the world has a right to object. When your product makes it easy for your customers to accidentally announce crap to the rest of the world, then it’s the moral equivalent of building a car without a seatbelt. Sure, before the technology was widely known and its life saving capabilities well understood, it was legitimate to dismiss it as an unnecessary added cost. Today, there’s no excuse for such an action. The hazards of BGP optimizers are pretty well known and it’s not unreasonable to expect vendors to implement appropriate safeguards into their products and/or recommend appropriate safeguards by their customers in their other routing devices. Certainly no-export by default is an example of something that there’s really no reason not to do in any BGP optimizer.
The as-path is the primary loop detection mechanism in eBGP. Removing this is like hot-wiring your electrical distribution board because you found out you could get more power if you bypass those stupid RCDs.
Well, let's be honest. Sometimes we need to get rid of that pesky mechanism. For example, when using BGP-as-IGP, the "allowas-in" disregards the as-path, in a controlled manner (and yes, I know, different use case).
Also a much more constrained case… allowas-in (which I still argue is a poor substitute for getting different ASNs for your different non-backboned sites) only allows you to loop your own AS and only at your own sites. It doesn’t support allowing you to feed crap to the internet.
My point is that there can be operational reasons to do so, and whatever they wish to do on their network is perfectly fine. As long as they don't bother the rest of the world with it.
But the whole reason we’re having this conversation is that they _DID_ bother the rest of the world with it. Kind of takes the wind out of that particular argument, wouldn’t you say? Owen
On 1/Aug/20 21:03, Sabri Berisha wrote:
The same can be said here. Noction and/or its operators appear to not understand how BGP works, and/or what safety measures must be deployed to ensure that the larger internet will not be hurt by misconfiguration.
I think the latter would be more appropriate. Their implementation of BGP is likely correct, but they aren't putting any emphasis on what the deployment of their use-case can do to global BGP security and performance. This where I'd say they can add more focus.
I also agree with Job, that Noction has some responsibility here. And as I understand more and more about it, I must now agree with Mark T that this was an avoidable incident (although not because of Telia, but because Noction's decision to not enable NO_EXPORT by default).
I see it differently. The chain is only as strong as its weakest actor. It is not unreasonable to expect that global actors of significant scale have enough clue to make sure any mistakes committed downstream are not propagated by them to the rest of the Internet. So while I do not absolve Noction (and their customer) of any responsibility here, I'd apportion the blame as: - Telia 51% - Noction 30% - Noction's customer 19% When the weaker chains of the link fail, we should be able to count on the strongest chain in that link to be the last line of defence... Telia, in this case. Simply for no other reason than they "know best", and have such global scope which comes with significant responsibility. But that isn't to say that neither Noction nor their customer cannot do better either. After all, BGP security and performance only works well when we all do our part, and not just some of us. Mark.
On Aug 1, 2020, at 11:14 , Hank Nussbacher <hank@interall.co.il> wrote:
On 01/08/2020 00:50, Mark Tinka wrote:
On 31/Jul/20 23:38, Sabri Berisha wrote:
Kudos to Telia for admitting their mistakes, and fixing their processes. Considering Telia's scope and "experience", that is one thing. But for the general good of the Internet, the number of intended or unintentional route hijacks in recent years, and all the noise that rises on this and other lists each time we have such incidents (this won't be the last), Telia should not have waited to be called out in order to get this fixed.
Do we know if they are fixing this on just this customer of theirs, or all their customers? I know this has been their filtering policy with us (SEACOM) since 2014, as I pointed out earlier today. There has not been a shortage of similar incidents between now and then, where the community has consistently called for more deliberate and effective route filtering across inter-AS arrangements.
AS level filtering is easy. IP prefix level filtering is hard. Especially when you are in the top 200: https://asrank.caida.org/ <https://asrank.caida.org/> IP Prefix level filtering at backbone<->backbone connections is hard (and mostly pointless).
IP Prefix level filtering at the customer edge is not that hard, no matter how large of a transit provider you are. Customer edge filtration by Telia in this case would have prevented this problem from spreading beyond the misconfigured ASN.
That being said, and due to these BGP "polluters" constantly doing the same thing, wouldn't an easy fix be to use the max-prefix/prefix-limit option: https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/25... <https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/25160-bgp-maximum-prefix.html> https://www.juniper.net/documentation/en_US/junos/topics/reference/configura... <https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/prefix-limit-edit-protocols-bgp.html> That’s a decent pair of suspenders to go with the belt of prefix filtration at the edge, but it’s no substitute.
For every BGP peer, the ISP determines what the current max-prefix currently is. Then add in 2% and set the max-prefix. An errant BGP polluter would then only have limited damage to the Internet routing table. Not the greatest solution, but easy to implement via a one line change on every BGP peer.
To the best of my knowledge, that’s already fairly common practice. It’s usually more like 10% (2% would require way too much active change and create churn and risk). Owen
On 1/Aug/20 21:20, Owen DeLong wrote:
IP Prefix level filtering at the customer edge is not that hard, no matter how large of a transit provider you are. Customer edge filtration by Telia in this case would have prevented this problem from spreading beyond the misconfigured ASN.
+1. There's simply no excuse - even if 100% of your eBGP sessions may be customers :-). Mark.
On 1/Aug/20 20:14, Hank Nussbacher wrote:
AS level filtering is easy. IP prefix level filtering is hard. Especially when you are in the top 200:
Doesn't immediately make sense to me why prefix filtering is hard.
That being said, and due to these BGP "polluters" constantly doing the same thing, wouldn't an easy fix be to use the max-prefix/prefix-limit option:
https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/25...
https://www.juniper.net/documentation/en_US/junos/topics/reference/configura...
For every BGP peer, the ISP determines what the current max-prefix currently is. Then add in 2% and set the max-prefix.
An errant BGP polluter would then only have limited damage to the Internet routing table.
Not the greatest solution, but easy to implement via a one line change on every BGP peer.
It's about combining multiple solutions to ensure several catch-points. AS_PATH filtering, prefix filtering and max-prefix.
Smaller ISPs can easily do it on their 10 BGP peers so as to limit damage as to what they will hear from their neighbors.
All ISP's should do this. All ISP's can. Mark.
We can all do better. We should all do better.
Agreed. However, every time we go on this Righteous Indignation of Should Do crusade, it would serve us well to stop and remember that in every one of our jobs, at many points in our careers, we have been faced with a situation where something we SHOULD do ends up being deferred for something we MUST to do. It is a universal truth that there will never enough time and resources to complete both, especially not in our current business environment that the only thing that matters is the numbers for the next quarter. Sometimes as engineers we have to make choices, sometimes choices are imposed on us by pointy hairs. Telia made a mistake. They owned it and will endeavor to do better. What more can be asked? On Fri, Jul 31, 2020 at 5:51 PM Mark Tinka <mark.tinka@seacom.com> wrote:
On 31/Jul/20 23:38, Sabri Berisha wrote:
Kudos to Telia for admitting their mistakes, and fixing their processes.
Considering Telia's scope and "experience", that is one thing. But for the general good of the Internet, the number of intended or unintentional route hijacks in recent years, and all the noise that rises on this and other lists each time we have such incidents (this won't be the last), Telia should not have waited to be called out in order to get this fixed.
Do we know if they are fixing this on just this customer of theirs, or all their customers? I know this has been their filtering policy with us (SEACOM) since 2014, as I pointed out earlier today. There has not been a shortage of similar incidents between now and then, where the community has consistently called for more deliberate and effective route filtering across inter-AS arrangements.
There is massive responsibility for the community to act correctly for the Internet to succeed. Especially so during these Coronavirus times where the world depends on us to keep whatever shred of an economy is left up and running. Doubly so if you are a major concern (like Telia) for the core of the Internet.
It's great that they are fixing this - but this was TOTALLY avoidable. That we won't see this again - even from the same the actors - isn't something I have high confidence in guaranteeing, based on current experience.
We can all do better. We should all do better.
Mark.
On 3/Aug/20 14:57, Tom Beecher wrote:
Agreed.
However, every time we go on this Righteous Indignation of Should Do crusade, it would serve us well to stop and remember that in every one of our jobs, at many points in our careers, we have been faced with a situation where something we SHOULD do ends up being deferred for something we MUST to do. It is a universal truth that there will never enough time and resources to complete both, especially not in our current business environment that the only thing that matters is the numbers for the next quarter. Sometimes as engineers we have to make choices, sometimes choices are imposed on us by pointy hairs.
Telia made a mistake. They owned it and will endeavor to do better. What more can be asked?
I think we've now gone past Telia's mistake and are considering what we can all do as BGP actors to prevent this particular issue from making a reprise. Agreed, we all have bits we need to prioritize our time on. But the BGP requires concerted effort of all actors on the Internet. How an operator in Omsk works with BGP has a potentially direct impact on another operator in Ketchikan. So whether I choose to spend more time on attending conferences vs. upgrading my core network, neither of those has an impact on the BGP. But if I'm going to not take BGP filtering as seriously as I should, the engineer, their employer and customer, sitting all the way in Yangon, could feel that. The devices we use, nowadays, are only as useful as their connectedness. No connectivity, and they're just bricks. Particularly in these Coronavirus times, the Internet is what is keeping economies alive, and folk employed. So rather than go back to the old days of, "We are busy, it is what it is", let's figure out how to make it better. We don't have to fix all of the Internet's governance issues this century - let's just start with making this "BGP optimizer danger" fix + "all operators should filter more deliberately" a reality. Mark.
On Mon, Aug 03, 2020 at 08:57:53AM -0400, Tom Beecher wrote:
Telia made a mistake. They owned it and will endeavor to do better. What more can be asked?
Figure out how that mistake happened -- what factors led to it? Then make changes so that it can't happen again, at least not in that particular way. (And if those changes are applicable to more than this isolated case: excellent. In that case, share them with all of us so that maybe they'll keep us from repeating the error.) "Stopping myself from making the same mistake twice" has probably been the most effective thing I've ever done. ---rsk
Hank Nussbacher wrote on 31/07/2020 08:21:
But wait - MANRS indicates that Telia does everything right:
Not only that, Telia indicates that Telia does everything right:
https://www.teliacarrier.com/our-network/bgp-routing/routing-security-.html
"We reject RPKI Invalids on all BGP Sessions; for both Peers and Customers."
How can that be?
Misconfig or oversight? Nick
On 31.07.2020 10.47, Nick Hilliard wrote:
Hank Nussbacher wrote on 31/07/2020 08:21:
But wait - MANRS indicates that Telia does everything right:
Not only that, Telia indicates that Telia does everything right:
https://www.teliacarrier.com/our-network/bgp-routing/routing-security-.html
"We reject RPKI Invalids on all BGP Sessions; for both Peers and Customers."
If true that none of the affected prefixes where signed, this is a good case to get some people to sign their prefixes. Everyone affected will have to accepted shared blame, because they could have prevented the issue by following best practice by doing their RPKI signing. Regards, Baldur
On 31/Jul/20 10:47, Nick Hilliard wrote:
Misconfig or oversight?
We started using Telia as an upstream back in 2014. When we had new prefixes to announce to the Internet, we always sent them (as we do to all our upstreams) a request to update their filters to support the same. The standard response we got back from them, in those days, was a list of ASN's permitted in an inbound filter applied to our eBGP session with them, that showed all the ASN's that belonged to us and transited through us. I am not entirely sure whether this was backed up by a prefix filter, but my feeling is that it wasn't. To them, as long as the AS we wanted to get through them was included in the list, we basically took 10 minutes away from their day with the request. If I check an e-mail from the Telia NOC as recently as 2018, I see this (verbatim; our customer AS masked out with XXXX): ***** Dear Customer, Please be advised that the BGP filter that is applied to you is AS-based and the AS XXXX is included in the BGP filter. Therefore, the reported prefixes should be accepted. Can you please check and inform us accordingly? ***** Is it at all possible that this is still their current filtering policy? Mark.
Telia implements RPKI filtering so the question is did it work? Were any affected prefixes RPKI signed? Would any prefixes have avoided being hijacked if RPKI signing had been in place? Regards Baldur - who had to turn off RPKI filtering at the request of JTAC to stop our mx204s from crashing :-( tor. 30. jul. 2020 18.59 skrev Töma Gavrichenkov <ximaera@gmail.com>:
Peace,
On Thu, Jul 30, 2020, 5:48 AM Clinton Work <clinton@scripty.com> wrote:
We saw a bunch of our IP blocks hijacked by AS10990 from 19:15 MDT until 20:23 MDT. Anybody else have problems with that.
Here's what we discovered about the incident. Hope that brings some clarity.
https://radar.qrator.net/blog/as10990-routing-optimization-tale
-- Töma
Not a single prefix was signed, what I saw. May be good reason for Rogers, Charter, TWC etc to do that now. It would have stopped the propagation at Telia. On Fri, 31 Jul 2020 at 8:40 am, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Telia implements RPKI filtering so the question is did it work? Were any affected prefixes RPKI signed? Would any prefixes have avoided being hijacked if RPKI signing had been in place?
Regards
Baldur - who had to turn off RPKI filtering at the request of JTAC to stop our mx204s from crashing :-(
tor. 30. jul. 2020 18.59 skrev Töma Gavrichenkov <ximaera@gmail.com>:
Peace,
On Thu, Jul 30, 2020, 5:48 AM Clinton Work <clinton@scripty.com> wrote:
We saw a bunch of our IP blocks hijacked by AS10990 from 19:15 MDT until 20:23 MDT. Anybody else have problems with that.
Here's what we discovered about the incident. Hope that brings some clarity.
https://radar.qrator.net/blog/as10990-routing-optimization-tale
-- Töma
-- Regards,
Aftab A. Siddiqui
On 31/Jul/20 03:57, Aftab Siddiqui wrote:
Not a single prefix was signed, what I saw. May be good reason for Rogers, Charter, TWC etc to do that now. It would have stopped the propagation at Telia.
While I am a huge proponent for ROA's and ROV, it is a massive expectation to req filtering to work on the basis of all BGP participants creating their ROA's. It's what I would like, but there is always going to be a lag on this one. If none of the prefixes had a ROA, no amount of Telia's shiny new "we drop invalids" machine would have helped, as we saw with this incident. ROV really only comes into its own when the majority of the Internet has correct ROA's setup. In the absence of that, it's a powerful but toothless feature. So while I will continue pushing for the rest of the world to create ROA's, turn on RPKI and enable ROV, I'll also advocate that operators continue to have both AS- and prefix-based filters. Not either/or, but both. Also, max-prefix as a matter of course. Mark.
How do you know that none of the prefixes had ROA? The ones that had got stopped by Telias filter, so we would never know. This is exactly the situation where RPKI already works. My and yours prefixes, provided you like me have ROAs, will not be leaked through Telia and a number of other large transits. Even if they did not have proper filters in place. Driving without RPKI / ROA is like driving without a seatbelt. You are fine until the day someone makes a mistake and then you wish you did your job at signing those prefixes sooner. Regards, Baldur On Fri, Jul 31, 2020 at 3:35 PM Mark Tinka <mark.tinka@seacom.com> wrote:
On 31/Jul/20 03:57, Aftab Siddiqui wrote:
Not a single prefix was signed, what I saw. May be good reason for Rogers, Charter, TWC etc to do that now. It would have stopped the propagation at Telia.
While I am a huge proponent for ROA's and ROV, it is a massive expectation to req filtering to work on the basis of all BGP participants creating their ROA's. It's what I would like, but there is always going to be a lag on this one.
If none of the prefixes had a ROA, no amount of Telia's shiny new "we drop invalids" machine would have helped, as we saw with this incident. ROV really only comes into its own when the majority of the Internet has correct ROA's setup. In the absence of that, it's a powerful but toothless feature.
So while I will continue pushing for the rest of the world to create ROA's, turn on RPKI and enable ROV, I'll also advocate that operators continue to have both AS- and prefix-based filters. Not either/or, but both. Also, max-prefix as a matter of course.
Mark.
On 31/Jul/20 16:01, Baldur Norddahl wrote:
How do you know that none of the prefixes had ROA? The ones that had got stopped by Telias filter, so we would never know.
Like I said, "if". If they did, then they were protected. If they didn't, well...
This is exactly the situation where RPKI already works. My and yours prefixes, provided you like me have ROAs, will not be leaked through Telia and a number of other large transits. Even if they did not have proper filters in place.
I don't have to like you, but I will always honour your ROA :-). That is my point, though - this works if ROA's are present. We know this to not be the case - so having proper filters in place is not optional. Not at least until we have 100% diffusion of ROA's + ROV. And even then, we probably still want some kind of safety net.
Driving without RPKI / ROA is like driving without a seatbelt. You are fine until the day someone makes a mistake and then you wish you did your job at signing those prefixes sooner.
Don't disagree with you there. Mark.
On Fri, Jul 31, 2020 at 03:34:47PM +0200, Mark Tinka wrote:
On 31/Jul/20 03:57, Aftab Siddiqui wrote:
Not a single prefix was signed, what I saw. May be good reason for Rogers, Charter, TWC etc to do that now. It would have stopped the propagation at Telia.
If none of the prefixes had a ROA, no amount of Telia's shiny new "we drop invalids" machine would have helped, as we saw with this incident.
Could it be ... we didn't see any RPKI Invalids through Telia *because* they are rejecting RPKI invalids? As far as I know the BGP Polluter software does not have a configuration setting to only ruin the day of operators without ROAs. :-) I think the system worked as designed: without RPKI ROV @ Telia the damage might have been worse. Kind regards, Job
On 31/Jul/20 16:07, Job Snijders wrote:
Could it be ... we didn't see any RPKI Invalids through Telia *because* they are rejecting RPKI invalids?
As far as I know the BGP Polluter software does not have a configuration setting to only ruin the day of operators without ROAs. :-)
I think the system worked as designed: without RPKI ROV @ Telia the damage might have been worse.
Indeed. What I was saying is we don't know how many of the leaked routes were dropped by Telia's ROV, if any. We really shouldn't be having to discuss how bad this could have gotten, because it means we are excusing Telia's inability to do proper filtering across its eBGP sessions with its customers. Mark.
So while I will continue pushing for the rest of the world to create ROA's, turn on RPKI and enable ROV, I'll also advocate that operators continue to have both AS- and prefix-based filters. Not either/or, but both. Also, max-prefix as a matter of course.
This is the correct approach. We are a very long way from being able to flip the switch to say "everyone drop any RPKI UNKNOWN" , so in the meantime best practices for non-ROA covered prefixes still have to be done. On Fri, Jul 31, 2020 at 9:35 AM Mark Tinka <mark.tinka@seacom.com> wrote:
On 31/Jul/20 03:57, Aftab Siddiqui wrote:
Not a single prefix was signed, what I saw. May be good reason for Rogers, Charter, TWC etc to do that now. It would have stopped the propagation at Telia.
While I am a huge proponent for ROA's and ROV, it is a massive expectation to req filtering to work on the basis of all BGP participants creating their ROA's. It's what I would like, but there is always going to be a lag on this one.
If none of the prefixes had a ROA, no amount of Telia's shiny new "we drop invalids" machine would have helped, as we saw with this incident. ROV really only comes into its own when the majority of the Internet has correct ROA's setup. In the absence of that, it's a powerful but toothless feature.
So while I will continue pushing for the rest of the world to create ROA's, turn on RPKI and enable ROV, I'll also advocate that operators continue to have both AS- and prefix-based filters. Not either/or, but both. Also, max-prefix as a matter of course.
Mark.
On Jul 30, 2020, at 5:37 PM, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Telia implements RPKI filtering so the question is did it work? Were any affected prefixes RPKI signed? Would any prefixes have avoided being hijacked if RPKI signing had been in place?
Regards
Baldur - who had to turn off RPKI filtering at the request of JTAC to stop our mx204s from crashing :-(
Oh uh, I’m getting close to getting RPKI going on my mx204s, or was until you posted that. What’s the story there, and perhaps which junos version?
On 2/Aug/20 19:22, Darrell Budic wrote:
Oh uh, I’m getting close to getting RPKI going on my mx204s, or was until you posted that. What’s the story there, and perhaps which junos version?
None that I know if. We have it working well (RPKI + ROV) on MX204's running Junos 19.2. Curious to hear about Baldur's bug. Mark.
Darrell Budic Sent: Sunday, August 2, 2020 6:23 PM
On Jul 30, 2020, at 5:37 PM, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Telia implements RPKI filtering so the question is did it work? Were any
affected prefixes RPKI signed? Would any prefixes have avoided being hijacked if RPKI signing had been in place?
Regards
Baldur - who had to turn off RPKI filtering at the request of JTAC to stop our
mx204s from crashing :-(
Oh uh, I’m getting close to getting RPKI going on my mx204s, or was until you posted that. What’s the story there, and perhaps which junos version?
Same here, would be interested in affected Junos versions or any details you can share please, adam
On 3 Aug 2020, at 11:04, adamv0025@netconsultings.com wrote:
Darrell Budic Sent: Sunday, August 2, 2020 6:23 PM
On Jul 30, 2020, at 5:37 PM, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Telia implements RPKI filtering so the question is did it work? Were any
affected prefixes RPKI signed? Would any prefixes have avoided being hijacked if RPKI signing had been in place?
Regards
Baldur - who had to turn off RPKI filtering at the request of JTAC to stop our
mx204s from crashing :-(
Oh uh, I’m getting close to getting RPKI going on my mx204s, or was until you posted that. What’s the story there, and perhaps which junos version?
Same here, would be interested in affected Junos versions or any details you can share please,
According to the information I received from the community[1], you should read PR1461602 and PR1309944 before deploying. -Alex [1] https://rpki.readthedocs.io/en/latest/rpki/router-support.html
On Mon, Aug 03, 2020 at 02:36:25PM +0200, Alex Band wrote:
According to the information I received from the community[1], you should read PR1461602 and PR1309944 before deploying.
[1] https://rpki.readthedocs.io/en/latest/rpki/router-support.html
My take on PR1461602 is that it can be ignored, as it appears to only manifest itself in a mostly cosmetic way: initial RTR session establishment takes multiple minutes, but once RTR sessions are up things work smoothly. Under no circumstances should you enable RPKI ROV functionality on boxes that suffer from PR1309944. That one is a real showstopper. Kind regards, Job
On Mon, Aug 3, 2020 at 3:54 PM Job Snijders <job@ntt.net> wrote:
On Mon, Aug 03, 2020 at 02:36:25PM +0200, Alex Band wrote:
According to the information I received from the community[1], you should read PR1461602 and PR1309944 before deploying.
[1] https://rpki.readthedocs.io/en/latest/rpki/router-support.html
My take on PR1461602 is that it can be ignored, as it appears to only manifest itself in a mostly cosmetic way: initial RTR session establishment takes multiple minutes, but once RTR sessions are up things work smoothly.
Under no circumstances should you enable RPKI ROV functionality on boxes that suffer from PR1309944. That one is a real showstopper.
We suffered a series of crashes that led to JTAC recommending disabling RPKI. We had a core dump which matches PR1332626 which is confidential, so I have no idea what it is about. Apparently what happened was the server running the RPKI validation server rebooted and the service was not configured to automatically restart. Also we did not have it redundant nor did we monitor the service. So we had no working RPKI validation server and that apparently caused the MX204 to become unstable in various ways. It might run for a day but it would do all sorts of things like packet loss, delays and generally be "strange". The first crash caused BGP, ssh and subscriber management to be down, but LDP, OSPF, SNMP to be up. It became a black hole we could not login to. The worst possible kind of crash for a router. We had to go onsite and pull the power. The router appears to run fine after disabling RPKI. I suppose starting the validation service may also fix the issue. But I am not going to go there until I know what is in that PR and also I feel the RPKI funktion needs to be failsafe before we can use it. I know we are at fault for not deploying the validation service in a redundant setup and for failing at monitoring the service. But we did so because we thought it not to be too important, because a failed validation service should simply lead to no validation, not a crashed router. This is on JUNOS 20.1R1.11. Regards, Baldur
On 3/Aug/20 17:09, Baldur Norddahl wrote:
We suffered a series of crashes that led to JTAC recommending disabling RPKI. We had a core dump which matches PR1332626 which is confidential, so I have no idea what it is about. Apparently what happened was the server running the RPKI validation server rebooted and the service was not configured to automatically restart. Also we did not have it redundant nor did we monitor the service. So we had no working RPKI validation server and that apparently caused the MX204 to become unstable in various ways. It might run for a day but it would do all sorts of things like packet loss, delays and generally be "strange". The first crash caused BGP, ssh and subscriber management to be down, but LDP, OSPF, SNMP to be up. It became a black hole we could not login to. The worst possible kind of crash for a router. We had to go onsite and pull the power.
The router appears to run fine after disabling RPKI. I suppose starting the validation service may also fix the issue. But I am not going to go there until I know what is in that PR and also I feel the RPKI funktion needs to be failsafe before we can use it. I know we are at fault for not deploying the validation service in a redundant setup and for failing at monitoring the service. But we did so because we thought it not to be too important, because a failed validation service should simply lead to no validation, not a crashed router.
This is on JUNOS 20.1R1.11.
That's a really nasty bug. Loss of an RTR session shouldn't kill the box, even if you are running only one validator. If you can share details about why this happens when you get them, that would be most helpful. I'd be curious to know whether this is dependent on a specific validator, or all of them. Are there bits in Junos 20 that you can't get in fixed versions of 19? Mark.
participants (30)
-
adamv0025@netconsultings.com
-
Aftab Siddiqui
-
Alex Band
-
Baldur Norddahl
-
Ca By
-
Clinton Work
-
Darrell Budic
-
Hank Nussbacher
-
Jeff Bilyk
-
Job Snijders
-
Job Snijders
-
Lukas Tribus
-
Mark Tinka
-
Matt Erculiani
-
Mike Hammett
-
nanog@jack.fr.eu.org
-
Nick Hilliard
-
Owen DeLong
-
Patrick Schultz
-
Rafael Possamai
-
Rich Kulawiec
-
Robert Raszuk
-
Ross Tajvar
-
Ryan Hamel
-
Sabri Berisha
-
Sadiq Saif
-
Stephane Bortzmeyer
-
Tom Beecher
-
Töma Gavrichenkov
-
Yang Yu