6453 routing leaks (January and Today)
It appears there have been a large number of routing leaks from 6453 today based on my detection scripts that have been running. (shameless plug for http://puck.nether.net/bgp/leakinfo.cgi) A quick report of the data show (for today so far) a few thousand of leaks more than is normal for a day like today. I included a snapshot of yesterday below as well. I've included a more detailed report of the prefixes observed involved here: http://puck.nether.net/~jared/tata-leak-20110224.txt This seems to be a somewhat common event for 6453, loking through the history of data available, another event happened on 2011-01-28 as well. I'm interested in what best operational practices people have employed to help avoid the leaks seen here so I can document them for others to learn to prevent this from happening again. - Jared bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-02-24' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 2208 | 6453 | 6453 360 | 7473 | 3257 230 | | 170 | 17379 | 5511 130 | 8068 | 3356 39 | 3225 | 6453 34 | 45419 | 3356 26 | 3356 | 3356 25 | 12180 | 2828 18 | 22351 | 701 16 | 7991 | 2914 16 | 14051 | 1239 10 | 29571 | 5511 4 | 32327 | 2828 4 | 8966 | 2914 4 | 19080 | 1239 4 | 30209 | 7018 4 | 18734 | 701 4 | 4657 | 3320 3 | 33748 | 1239 2 | 5056 | 1239 2 | 10026 | 2828 2 | 12252 | 2914 1 | 11696 | 2828 (24 rows) bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-02-23' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 384 | 7473 | 3257 120 | 17379 | 5511 48 | | 27 | 45419 | 3356 24 | 12180 | 2828 11 | 23456 | 2914 (6 rows) bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-01-28' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 9119 | 6453 | 6453 2265 | | 355 | 2914 | 2914 313 | 7473 | 3257 250 | 17379 | 5511 213 | 32592 | 701 106 | 3790 | 1239 72 | 19108 | 6461 62 | 14051 | 1239 51 | 34977 | 6453 48 | 31133 | 3356 47 | 8657 | 174 32 | 7713 | 2914 31 | 1257 | 1239 31 | 8966 | 2914 30 | 30209 | 7018 30 | 31133 | 1299 29 | 8342 | 1239 24 | 38925 | 3320 24 | 12180 | 2828 22 | 8657 | 3549 21 | 15641 | 3549 18 | 31133 | 2914 16 | 15412 | 2914 15 | 7473 | 3549 10 | 6762 | 1299 10 | 6762 | 7018 10 | 20299 | 1239 10 | 6762 | 3561 10 | 6762 | 174 9 | 4323 | 2914 7 | 26163 | 6461 7 | 9505 | 174 7 | 15149 | 6461 7 | 9070 | 3549 7 | 7819 | 6461 6 | 7473 | 174 6 | 3216 | 3549 6 | 1273 | 174 5 | 8657 | 3356 5 | 26769 | 3549 5 | 6762 | 2914 5 | 6762 | 3356 4 | 8047 | 701 4 | 8877 | 174 4 | 174 | 174 2 | 20299 | 174 2 | 7843 | 174 2 | 7473 | 6453 2 | 8928 | 3320 2 | 7991 | 2914 1 | 1273 | 3549 1 | 20485 | 2914 1 | 3216 | 1239 (54 rows)
From: Jared Mauch <jared@puck.nether.net> Date: Thu, 24 Feb 2011 16:59:52 -0500
It appears there have been a large number of routing leaks from 6453 today based on my detection scripts that have been running.
(shameless plug for http://puck.nether.net/bgp/leakinfo.cgi)
A quick report of the data show (for today so far) a few thousand of leaks more than is normal for a day like today. I included a snapshot of yesterday below as well.
I've included a more detailed report of the prefixes observed involved here:
http://puck.nether.net/~jared/tata-leak-20110224.txt
This seems to be a somewhat common event for 6453, loking through the history of data available, another event happened on 2011-01-28 as well.
I'm interested in what best operational practices people have employed to help avoid the leaks seen here so I can document them for others to learn to prevent this from happening again.
- Jared
bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-02-24' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 2208 | 6453 | 6453 360 | 7473 | 3257 230 | | 170 | 17379 | 5511 130 | 8068 | 3356 39 | 3225 | 6453 34 | 45419 | 3356 26 | 3356 | 3356 25 | 12180 | 2828 18 | 22351 | 701 16 | 7991 | 2914 16 | 14051 | 1239 10 | 29571 | 5511 4 | 32327 | 2828 4 | 8966 | 2914 4 | 19080 | 1239 4 | 30209 | 7018 4 | 18734 | 701 4 | 4657 | 3320 3 | 33748 | 1239 2 | 5056 | 1239 2 | 10026 | 2828 2 | 12252 | 2914 1 | 11696 | 2828 (24 rows)
bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-02-23' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 384 | 7473 | 3257 120 | 17379 | 5511 48 | | 27 | 45419 | 3356 24 | 12180 | 2828 11 | 23456 | 2914 (6 rows)
bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-01-28' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 9119 | 6453 | 6453 2265 | | 355 | 2914 | 2914 313 | 7473 | 3257 250 | 17379 | 5511 213 | 32592 | 701 106 | 3790 | 1239 72 | 19108 | 6461 62 | 14051 | 1239 51 | 34977 | 6453 48 | 31133 | 3356 47 | 8657 | 174 32 | 7713 | 2914 31 | 1257 | 1239 31 | 8966 | 2914 30 | 30209 | 7018 30 | 31133 | 1299 29 | 8342 | 1239 24 | 38925 | 3320 24 | 12180 | 2828 22 | 8657 | 3549 21 | 15641 | 3549 18 | 31133 | 2914 16 | 15412 | 2914 15 | 7473 | 3549 10 | 6762 | 1299 10 | 6762 | 7018 10 | 20299 | 1239 10 | 6762 | 3561 10 | 6762 | 174 9 | 4323 | 2914 7 | 26163 | 6461 7 | 9505 | 174 7 | 15149 | 6461 7 | 9070 | 3549 7 | 7819 | 6461 6 | 7473 | 174 6 | 3216 | 3549 6 | 1273 | 174 5 | 8657 | 3356 5 | 26769 | 3549 5 | 6762 | 2914 5 | 6762 | 3356 4 | 8047 | 701 4 | 8877 | 174 4 | 174 | 174 2 | 20299 | 174 2 | 7843 | 174 2 | 7473 | 6453 2 | 8928 | 3320 2 | 7991 | 2914 1 | 1273 | 3549 1 | 20485 | 2914 1 | 3216 | 1239 (54 rows)
Can't say if it was a leak or de aggregation, but TATA announcements to us jumped from about 70,000 to almost 190,000 for a while today, then dropped back down. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751
On Feb 24, 2011, at 7:50 PM, Kevin Oberman wrote:
Can't say if it was a leak or de aggregation, but TATA announcements to us jumped from about 70,000 to almost 190,000 for a while today, then dropped back down.
It very much appears to be a leak based on the route-views MRT format updates. There's not a good reason for this observed prefix/path combination: 41.194.32.0/24 | 3549 6453 3356 22351 36939 I don't believe 3549 nor 3356 are buying transit from 6453 to reach each other. One of the interesting measurements I track (people accuse me of pcaping all bgp updates, which is sorta true with this MRT archive) is the average file sizes of the route-views archive: http://archive.routeviews.org/bgpdata/2011.02/UPDATES/ This is a good measure of how stable/unstable the network is. You can typically see when a network has performed some grooming or an event like this just by getting a feel for the file sizes. When they go from ~300KiB on average to something in the multiple megs, you know something happened. - Jared
Update: I have had a source ask me to post the following: -- snip -- The problem with route leaking was caused by specific routing platform resulting in some peer routes not being properly tagged. We are deploying additional measures to prevent this from happening in the future -- snip -- - Jared On Feb 24, 2011, at 4:59 PM, Jared Mauch wrote:
It appears there have been a large number of routing leaks from 6453 today based on my detection scripts that have been running.
(shameless plug for http://puck.nether.net/bgp/leakinfo.cgi)
A quick report of the data show (for today so far) a few thousand of leaks more than is normal for a day like today. I included a snapshot of yesterday below as well.
I've included a more detailed report of the prefixes observed involved here:
http://puck.nether.net/~jared/tata-leak-20110224.txt
This seems to be a somewhat common event for 6453, loking through the history of data available, another event happened on 2011-01-28 as well.
I'm interested in what best operational practices people have employed to help avoid the leaks seen here so I can document them for others to learn to prevent this from happening again.
- Jared
bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-02-24' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 2208 | 6453 | 6453 360 | 7473 | 3257 230 | | 170 | 17379 | 5511 130 | 8068 | 3356 39 | 3225 | 6453 34 | 45419 | 3356 26 | 3356 | 3356 25 | 12180 | 2828 18 | 22351 | 701 16 | 7991 | 2914 16 | 14051 | 1239 10 | 29571 | 5511 4 | 32327 | 2828 4 | 8966 | 2914 4 | 19080 | 1239 4 | 30209 | 7018 4 | 18734 | 701 4 | 4657 | 3320 3 | 33748 | 1239 2 | 5056 | 1239 2 | 10026 | 2828 2 | 12252 | 2914 1 | 11696 | 2828 (24 rows)
bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-02-23' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 384 | 7473 | 3257 120 | 17379 | 5511 48 | | 27 | 45419 | 3356 24 | 12180 | 2828 11 | 23456 | 2914 (6 rows)
bgp=# select count(blame_asn),blame_asn,asn_responsible from leakinfo where aprox_time::date = '2011-01-28' group by blame_asn,asn_responsible order by 1 desc; count | blame_asn | asn_responsible -------+-----------+----------------- 9119 | 6453 | 6453 2265 | | 355 | 2914 | 2914 313 | 7473 | 3257 250 | 17379 | 5511 213 | 32592 | 701 106 | 3790 | 1239 72 | 19108 | 6461 62 | 14051 | 1239 51 | 34977 | 6453 48 | 31133 | 3356 47 | 8657 | 174 32 | 7713 | 2914 31 | 1257 | 1239 31 | 8966 | 2914 30 | 30209 | 7018 30 | 31133 | 1299 29 | 8342 | 1239 24 | 38925 | 3320 24 | 12180 | 2828 22 | 8657 | 3549 21 | 15641 | 3549 18 | 31133 | 2914 16 | 15412 | 2914 15 | 7473 | 3549 10 | 6762 | 1299 10 | 6762 | 7018 10 | 20299 | 1239 10 | 6762 | 3561 10 | 6762 | 174 9 | 4323 | 2914 7 | 26163 | 6461 7 | 9505 | 174 7 | 15149 | 6461 7 | 9070 | 3549 7 | 7819 | 6461 6 | 7473 | 174 6 | 3216 | 3549 6 | 1273 | 174 5 | 8657 | 3356 5 | 26769 | 3549 5 | 6762 | 2914 5 | 6762 | 3356 4 | 8047 | 701 4 | 8877 | 174 4 | 174 | 174 2 | 20299 | 174 2 | 7843 | 174 2 | 7473 | 6453 2 | 8928 | 3320 2 | 7991 | 2914 1 | 1273 | 3549 1 | 20485 | 2914 1 | 3216 | 1239 (54 rows)
On Fri, Feb 25, 2011 at 07:22:36AM -0500, Jared Mauch wrote:
Update:
I have had a source ask me to post the following:
-- snip -- The problem with route leaking was caused by specific routing platform resulting in some peer routes not being properly tagged. We are deploying additional measures to prevent this from happening in the future -- snip --
Hopefully someone learned a lesson about BGP community design, and how it should fail safe by NOT leaking if you accidentally fail to tag a route. Always require a positive match on a route to advertise to peers, not the absence of a negative match. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Yes, very scary actually.... Human error is unavoidable - it's going to happen at times - BUT.... In our communities design, there has been times where we have missed a tag on an inbound customer for example. It scares the crap out of me to think that something like that simple mistake could cause route leakage. Thankfully, anytime it has happened it would caught pretty quickly and fixed - in the meantime the routes simply didn't leave our network (the way it should be). Obviously the scales are different between someone like ourselves and that of TATA - but the principles and common sense remain. Paul -----Original Message----- From: Richard A Steenbergen [mailto:ras@e-gerbil.net] Sent: Friday, February 25, 2011 12:52 PM To: Jared Mauch Cc: NANOG list Subject: Re: 6453 routing leaks (January and Today) On Fri, Feb 25, 2011 at 07:22:36AM -0500, Jared Mauch wrote:
Update:
I have had a source ask me to post the following:
-- snip -- The problem with route leaking was caused by specific routing platform resulting in some peer routes not being properly tagged. We are deploying additional measures to prevent this from happening in the future -- snip --
Hopefully someone learned a lesson about BGP community design, and how it should fail safe by NOT leaking if you accidentally fail to tag a route. Always require a positive match on a route to advertise to peers, not the absence of a negative match. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
Would love a pm on the platform in question Sent from my iPhone On 2011-02-25, at 12:23 PM, "Paul Stewart" <paul@paulstewart.org> wrote:
Yes, very scary actually....
Human error is unavoidable - it's going to happen at times - BUT....
In our communities design, there has been times where we have missed a tag on an inbound customer for example. It scares the crap out of me to think that something like that simple mistake could cause route leakage. Thankfully, anytime it has happened it would caught pretty quickly and fixed - in the meantime the routes simply didn't leave our network (the way it should be).
Obviously the scales are different between someone like ourselves and that of TATA - but the principles and common sense remain.
Paul
-----Original Message----- From: Richard A Steenbergen [mailto:ras@e-gerbil.net] Sent: Friday, February 25, 2011 12:52 PM To: Jared Mauch Cc: NANOG list Subject: Re: 6453 routing leaks (January and Today)
On Fri, Feb 25, 2011 at 07:22:36AM -0500, Jared Mauch wrote:
Update:
I have had a source ask me to post the following:
-- snip -- The problem with route leaking was caused by specific routing platform resulting in some peer routes not being properly tagged. We are deploying additional measures to prevent this from happening in the future -- snip --
Hopefully someone learned a lesson about BGP community design, and how it should fail safe by NOT leaking if you accidentally fail to tag a route. Always require a positive match on a route to advertise to peers, not the absence of a negative match.
-- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
participants (5)
-
Jared Mauch
-
Kevin Oberman
-
Mark Gauvin
-
Paul Stewart
-
Richard A Steenbergen