Re: FW: Reliability of looking glass sites / rviews
You didn't mention details about which ASN or prefixes you were checking. Are you referring to ASN 14607 that only advertises two prefixes 129.77.0.0/16 and 2620:0:2810::/48? Based what we see over the weekend (using routeviews data), we see: Event Start Time: 2017-09-09 11:29:23 UTC (2017-09-09 07:29:23 EDT) Event End Time: 2017-09-09 13:31:30 UTC (2017-09-09 09:31:30 EDT) Are the above times correct? We see the routes withdraw and then come back. For example: http://demo-rv.snas.io:3000/dashboard/db/prefix-history?orgId=2&var-prefix=129.77.0.0&var-prefix_len=16&var-asn_num=All&var-router_name=All&var-peer_name=All&from=1504908000000&to=1505203200000 When you checked routeviews, which router and peer were you looking at? When you did a "show ip bgp ..." did you include the prefix length? If not, it would have then shown you 0/0 or 128/5, depending on which router you were on. --Tim On 9/13/17, 8:43 AM, "NANOG on behalf of Matthew Huff" <nanog-bounces@nanog.org on behalf of mhuff@ox.com> wrote: Both should have been similar. In the first case we lost power to all of our BGP border routers that are peered with the upstream providers In the second case, I did an explicit "shut" on the interface connected to the upstream provider that appeared "stuck" after an hour after the outage. From: <christopher.morrow@gmail.com> on behalf of Christopher Morrow <morrowc.lists@gmail.com> Date: Wednesday, September 13, 2017 at 10:58 AM To: Matthew Huff <mhuff@ox.com> Cc: nanog2 <nanog@nanog.org> Subject: Re: Reliability of looking glass sites / rviews On Wed, Sep 13, 2017 at 5:30 AM, Matthew Huff <mhuff@ox.com<mailto:mhuff@ox.com>> wrote: This weekend our uninterruptible power supply became interruptible and we lost all circuits. While I was doing initial debugging of the problem while I waited on site power verification, I noticed that there was still paths being shown in rviews for the circuit that were down. This was over an hour after we went hard down and it took hours before we were back up. explicit vs implicit withdrawals causing different handling of the problem routes? I worked with our providers last night to verify there weren't any hanging static routes, etc... We shut the upstream circuit down and watched the convergence and saw that eventually all the paths disappeared. Given what we saw on Saturday, what would cause route-views to cache the paths that long? Some looking glass sites only show what they are peered with or at most what their peers are peered with, that's why I've always used route-views. What looking glass sites other than route-views would people recommend? ripe ris.
ASN 14607, and 129.77.0.0/16 After slightly over an hour after our power event where 100% of our equipment was down, this is what I saw at routeviews BGP routing table entry for 129.77.0.0/16, version 24978989 Paths: (7 available, best #7, table default) Not advertised to any peer Refresh Epoch 1 134708 3491 6939 46887 14607 103.197.104.1 from 103.197.104.1 (123.108.254.70) Origin IGP, localpref 100, valid, external rx pathid: 0, tx pathid: 0 Refresh Epoch 1 3333 1273 6939 46887 14607 193.0.0.56 from 193.0.0.56 (193.0.0.56) Origin IGP, localpref 100, valid, external Community: 1273:23000 rx pathid: 0, tx pathid: 0 Refresh Epoch 1 8283 57866 6762 6939 46887 14607 94.142.247.3 from 94.142.247.3 (94.142.247.3) Origin IGP, metric 0, localpref 100, valid, external Community: 6762:33 6762:16500 8283:15 57866:105 unknown transitive attribute: flag 0xE0 type 0x20 length 0xC value 0000 205B 0000 0006 0000 000F rx pathid: 0, tx pathid: 0 Refresh Epoch 1 24441 3491 3491 6939 46887 14607 202.93.8.242 from 202.93.8.242 (202.93.8.242) Origin IGP, localpref 100, valid, external rx pathid: 0, tx pathid: 0 Refresh Epoch 1 20912 1267 1273 6939 46887 14607 212.66.96.126 from 212.66.96.126 (212.66.96.126) Origin IGP, localpref 100, valid, external Community: 1273:23000 9035:50 9035:100 20912:65001 rx pathid: 0, tx pathid: 0 Refresh Epoch 1 1221 4637 6939 46887 14607 203.62.252.83 from 203.62.252.83 (203.62.252.83) Origin IGP, localpref 100, valid, external rx pathid: 0, tx pathid: 0 Refresh Epoch 1 2497 6939 46887 14607 202.232.0.2 from 202.232.0.2 (202.232.0.2) Origin IGP, localpref 100, valid, external, best rx pathid: 0, tx pathid: 0x0 From: Tim Evens [mailto:tim@snas.io] Sent: Friday, September 15, 2017 10:45 AM To: Matthew Huff <mhuff@ox.com> Cc: morrowc.lists@gmail.com; nanog@nanog.org Subject: Re: FW: Reliability of looking glass sites / rviews You didn't mention details about which ASN or prefixes you were checking. Are you referring to ASN 14607 that only advertises two prefixes 129.77.0.0/16 and 2620:0:2810::/48? Based what we see over the weekend (using routeviews data), we see: Event Start Time: 2017-09-09 11:29:23 UTC (2017-09-09 07:29:23 EDT) Event End Time: 2017-09-09 13:31:30 UTC (2017-09-09 09:31:30 EDT) Are the above times correct? We see the routes withdraw and then come back. For example: http://demo-rv.snas.io:3000/dashboard/db/prefix-history?orgId=2&var-prefix=129.77.0.0&var-prefix_len=16&var-asn_num=All&var-router_name=All&var-peer_name=All&from=1504908000000&to=1505203200000 When you checked routeviews, which router and peer were you looking at? When you did a "show ip bgp ..." did you include the prefix length? If not, it would have then shown you 0/0 or 128/5, depending on which router you were on. --Tim On 9/13/17, 8:43 AM, "NANOG on behalf of Matthew Huff" <nanog-bounces@nanog.org on behalf of mhuff@ox.com> wrote: Both should have been similar. In the first case we lost power to all of our BGP border routers that are peered with the upstream providers In the second case, I did an explicit “shut” on the interface connected to the upstream provider that appeared “stuck” after an hour after the outage. From: <christopher.morrow@gmail.com> on behalf of Christopher Morrow <morrowc.lists@gmail.com> Date: Wednesday, September 13, 2017 at 10:58 AM To: Matthew Huff <mhuff@ox.com> Cc: nanog2 <nanog@nanog.org> Subject: Re: Reliability of looking glass sites / rviews On Wed, Sep 13, 2017 at 5:30 AM, Matthew Huff <mhuff@ox.com<mailto:mhuff@ox.com>> wrote: This weekend our uninterruptible power supply became interruptible and we lost all circuits. While I was doing initial debugging of the problem while I waited on site power verification, I noticed that there was still paths being shown in rviews for the circuit that were down. This was over an hour after we went hard down and it took hours before we were back up. explicit vs implicit withdrawals causing different handling of the problem routes? I worked with our providers last night to verify there weren't any hanging static routes, etc... We shut the upstream circuit down and watched the convergence and saw that eventually all the paths disappeared. Given what we saw on Saturday, what would cause route-views to cache the paths that long? Some looking glass sites only show what they are peered with or at most what their peers are peered with, that's why I've always used route-views. What looking glass sites other than route-views would people recommend? ripe ris.
participants (2)
-
Matthew Huff
-
Tim Evens