Re: Impossible Circuit

11 Aug 2008

      I am not sure about the loop, but wouldn't a static or default static route 
specifying an outbound interface rather than a next hop router ip address on 
a multi-access network like Ethernet, frame-relay or ATM with connections to 
5 or more routers cause one router to output a stream of packets to a subnet 
and all five or more receiving routers receiving the packets possibly 
forwarding them separately causing the duplicate packets?

Just a thought,

Matt Rice
Seattle

----- Original Message ----- 
From: <nanog-request@nanog.org>
To: <nanog@nanog.org>
Sent: Sunday, August 10, 2008 8:21 PM
Subject: NANOG Digest, Vol 7, Issue 26
...
Send NANOG mailing list submissions to
nanog@nanog.org
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.nanog.org/mailman/listinfo/nanog
or, via email, send a message with subject or body 'help' to
nanog-request@nanog.org
You can reach the person managing the list at
nanog-owner@nanog.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of NANOG digest..."
Today's Topics:
1. Re: maybe a dumb idea on how to fix the dns problems i don't
     know.... (Chris Paul)
  2. Re: maybe a dumb idea on how to fix the dns problems i don't
     know.... (Chris Paul)
  3. Re: maybe a dumb idea on how to fix the dns problems i don't
     know.... (Cat Okita)
  4. Re: maybe a dumb idea on how to fix the dns problems i don't
     know.... (Joe Greco)
  5. impossible circuit (Jon Lewis)
  6. Re: maybe a dumb idea on how to fix the dns problems i don't
     know.... (William Herrin)
----------------------------------------------------------------------
Message: 1
Date: Sun, 10 Aug 2008 18:27:29 -0700
From: Chris Paul <chris.paul@rexconsulting.net>
Subject: Re: maybe a dumb idea on how to fix the dns problems i don't
know....
To: nanog@nanog.org
Message-ID: <489F9581.4010801@rexconsulting.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Joe Greco wrote:
...
...
...
Pretending for a moment that it was even possible to make such large
scale changes and get them pushed into a large enough number of clients
to matter, you're talking about meltdown at the recurser level, because
it isn't just one connection per _computer_, but one connection per
_resolver stub_ per _computer_ (which, on a UNIX machine, would tend to
gravitate towards one connection per process), and this just turns into
an insane number of sockets you have to manage.
Couldn't the resolver libraries be changed to not use multiple 
connections?
I think that the text I wrote clearly assumes that there IS only one
connection per resolver instance.  The problem is that hostname to IP
lookup is pervasive in a modern UNIX system, and is probably pretty
common on other platforms, too, so you have potentially hundreds or
thousands of processes, each eating up additional system file descriptors
for this purpose.
Well how I read what you first wrote implied that the resolvers are now
going to DOS servers with millions of connections due to each resolver
stub making a TCP connection...  I say this is something that if true,
can and should be changed.
Now you say that file descriptors on the client are going to run out
Isn't that changing the topic? And is that even really a problem?
So each process that needs to do a lookup opens a file descriptor for a
TCP connection, right? Whereas with UDP we don't have to do this. Is
this what I'm hearing you say? That I understand. (Hmm don't udp
connections take sockets too? Not sarcastic here.. just asking...)
And it is a good point but is this client file descriptor an
insurmountable problem? Also, what about the millions of connections to
the server? Is that really necessary for a dns resolver on one system to
open more than one TCP connection to its caching dns server?
I'm not saying that caching dns servers should keep open TCP connections
to authoritative name servers! OK? But how much latency do you increase
e on that uncached recursive lookup by changing to TCP?
CP
-- 
Chris Paul
Rex Consulting, Inc
email: chris.paul@rexconsulting.net
phone, direct: +1, 831.706.4211
phone, toll-free: +1, 888.403.8996
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of,
or taking of any action in reliance upon, this information by persons
or entities other than the intended recipient is prohibited.
Rex Consulting, Inc. is a California Corporation.
P Please don't print this e-mail, unless you really need to.
------------------------------
Message: 2
Date: Sun, 10 Aug 2008 18:29:00 -0700
From: Chris Paul <chris.paul@rexconsulting.net>
Subject: Re: maybe a dumb idea on how to fix the dns problems i don't
know....
To: nanog@merit.edu
Message-ID: <489F95DC.4010403@rexconsulting.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Victor Jerlin wrote:
...
...
...
Couldn't the resolver libraries be changed to not use multiple
connections?
And we'll change to IPv6 tomorrow!
Total apples and oranges. We all have to patch anyhow. This is just code
and firewall rules. IPv6 is way more complicated, friend.
-- 
Chris Paul
Rex Consulting, Inc
email: chris.paul@rexconsulting.net
phone, direct: +1, 831.706.4211
phone, toll-free: +1, 888.403.8996
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of,
or taking of any action in reliance upon, this information by persons
or entities other than the intended recipient is prohibited.
Rex Consulting, Inc. is a California Corporation.
P Please don't print this e-mail, unless you really need to.
------------------------------
Message: 3
Date: Sun, 10 Aug 2008 21:34:26 -0400 (EDT)
From: Cat Okita <cat@reptiles.org>
Subject: Re: maybe a dumb idea on how to fix the dns problems i don't
know....
To: Chris Paul <chris.paul@rexconsulting.net>
Cc: nanog@merit.edu
Message-ID: <20080810213344.D15677@gecko.reptiles.org>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
On Sun, 10 Aug 2008, Chris Paul wrote:
...
...
And we'll change to IPv6 tomorrow!
Total apples and oranges. We all have to patch anyhow. This is just code 
and
firewall rules. IPv6 is way more complicated, friend.
No - IPv6 is just code (and not even firewall rules).
cheers!
==========================================================================
"A cat spends her life conflicted between a deep, passionate and profound
desire for fish and an equally deep, passionate and profound desire to
avoid getting wet.  This is the defining metaphor of my life right now."
------------------------------
Message: 4
Date: Sun, 10 Aug 2008 21:35:58 -0500 (CDT)
From: Joe Greco <jgreco@ns.sol.net>
Subject: Re: maybe a dumb idea on how to fix the dns problems i don't
know....
To: chris.paul@rexconsulting.net (Chris Paul)
Cc: nanog@nanog.org
Message-ID: <200808110235.m7B2Zwxn028353@aurora.sol.net>
Content-Type: text/plain; charset=us-ascii
...
...
I think that the text I wrote clearly assumes that there IS only one
connection per resolver instance.  The problem is that hostname to IP
lookup is pervasive in a modern UNIX system, and is probably pretty
common on other platforms, too, so you have potentially hundreds or
thousands of processes, each eating up additional system file 
descriptors
for this purpose.
Well how I read what you first wrote implied that the resolvers are now
going to DOS servers with millions of connections due to each resolver
stub making a TCP connection...  I say this is something that if true,
can and should be changed.
Sure.  We can introduce a new feature, called a "local recurser," which
will do unified name resolution for all lookups asked for by any process
on the box.
Now, of course, the box enjoys certain benefits, such as being able to
remember who "MX nanog.org" is for the second time without having to
bother an external recurser.  And a hypothetical ability to forward all
requests via TCP to the external recurser.  Except, why bother, now that
you have the capability right on the box, why be dependent on anything
else?  Might as well just let it resolve all by itself.
Of course, the box also enjoys certain other liabilities, such as the
next time that all the name servers in the world need to be upgraded,
you now have just that many more recursers that are running on unattended
autopilot (because heaven knows, most PC's run without a professional
admin to keep things up to snuff, and this last problem wasn't exactly
the sort of thing you can just "auto update," because someone actually
has to verify that there aren't externalities such as NAT devices, etc!)
Sounds like a real fun time.
...
Now you say that file descriptors on the client are going to run out
Isn't that changing the topic? And is that even really a problem?
Actually, it's quite a problem, for the server.  Try, sometime, having a
few thousand file descriptors all open, and then running select() on that
fdset.  But it's not even really that pleasant on many clients.  It's a
kernel consumable.  You try to avoid introducing additional requirements
without a good reason.
...
So each process that needs to do a lookup opens a file descriptor for a
TCP connection, right? Whereas with UDP we don't have to do this. Is
this what I'm hearing you say? That I understand. (Hmm don't udp
connections take sockets too? Not sarcastic here.. just asking...)
You open and then close it for UDP.  You can do that for TCP, too, at a
substantial penalty.
...
And it is a good point but is this client file descriptor an
insurmountable problem? Also, what about the millions of connections to
the server? Is that really necessary for a dns resolver on one system to
open more than one TCP connection to its caching dns server?
There is no "DNS resolver on one system" unless you put one there.  At
which point, you can safely ask the question of why would you then connect
to a caching server (there are good reasons, in some cases).
The way libresolv works is that it takes those "nameserver" things listed
in resolv.conf and sends requests to them.  Since any program that uses
the network is likely to be linked to libresolv, you can have lots of
different programs on a system, each of which may want to resolve
different name resources.  There is no monolithic "thing" on a box to do
name resolution unless you put one there.
...
I'm not saying that caching dns servers should keep open TCP connections
to authoritative name servers! OK? But how much latency do you increase
e on that uncached recursive lookup by changing to TCP?
Since latency would not be extremely high on my list of concerns with this
plan, I'll pass and say "I don't really care to speculate."  There are 
many
other ways you'll have lit your hair on fire before latency is a big
concern.
... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] 
then I
won't contact you again." - Direct Marketing Ass'n position on e-mail 
spam(CNN)
With 24 million small businesses in the US alone, that's way too many 
apples.
------------------------------
Message: 5
Date: Sun, 10 Aug 2008 23:15:47 -0400 (EDT)
From: Jon Lewis <jlewis@lewis.org>
Subject: impossible circuit
To: nanog@nanog.org
Message-ID: <Pine.LNX.4.61.0808102146400.5503@soloth.lewis.org>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
After all the messages recently about how to fix DNS, I was seriously
tempted to title this messsage "And now, for something completely
different", but impossible circuit is more descriptive.
Before you read further, I need everyone to put on their thinking WAY
outside the box hats.  I've heard from enough people already that I'm nuts
and what I'm seeing can't happen, so it must not be happening...even
though we see the results of it happening.
I've got this private line DS3.  It connects cisco 7206 routers in
Orlando (at our data center) and in Ocala (a colo rack in the Embarq CO).
According to the DLR, it's a real circuit, various portions of it ride
varying sized OC circuits, and then it's handed off to us at each end the
usual way (copper/coax) and plugged into PA-2T3 cards.
Last Tuesday, at about 2:30PM, "something bad happened."  We saw a serious
jump in traffic to Ocala, and in particular we noticed one customer's
connection (a group of load sharing T1s) was just totally full.  We
quickly assumed it was a DDoS aimed at that customer, but looking at the
traffic, we couldn't pinpoint anything that wasn't expected flows.
Then we noticed the really weird stuff.  Pings to anything in Ocala
responded with multiple dupes and ttl exceeded messages from a Level3 IP.
Traceroutes to certain IPs in Ocala would get as far our Ocala router,
then inexplicably hop onto Sprintlink's network, come back to us over our
Level3 transit connection, get to Ocala, then hop over to Sprintlink
again, repeating that loop as many times as max TTL would permit.  Pings
from router to router crossing just the DS3 would work, but we'd see 10
duplicate packets for every 1 expected packet.  BTW, the cisco CLI hides
dupes unless you turn on ip icmp debugging.
I've seen some sort of similar things (though contained within an AS) with
MPLS and routing misconfigurations, but traffic jumping off our network
(to a network to which we're not directly connected) was seemingly
impossible.  We did all sorts of things to troubleshoot it (studied our
router configs in rancid, temporarily shut every interface on the Ocala
side other than the DS3, changed IOS versions, changed out the hardware,
opened a ticket with cisco TAC) but then it occurred to me, that if
traffic was actually jumping off our network and coming back in via
Level3, I could see/block at least some of that using an ACL on our
interface to Level3.  How do you explain it, when you ping the remote end
of a DS3 interface with a single echo request packet and see 5 copies of
that echo request arrive at one of your transit provider interfaces?
Here's a typical traceroute with the first few hops (from my home internet
connection) removed.  BTW, hop 9 is a customer router conveniently
configured with no ip unreachables.
7  andc-br-3-f2-0.atlantic.net (209.208.9.138)  47.951 ms  56.096 ms 
56.154 ms
 8  ocalflxa-br-1-s1-0.atlantic.net (209.208.112.98)  56.199 ms  56.320 ms 
56.196 ms
 9  * * *
10  sl-bb20-dc-6-0-0.sprintlink.net (144.232.8.174)  80.774 ms  81.030 ms 
81.821 ms
11  sl-st20-ash-10-0.sprintlink.net (144.232.20.152)  75.731 ms  75.902 ms 
77.128 ms
12  te-10-1-0.edge2.Washington4.level3.net (4.68.63.209)  46.548 ms 
53.200 ms  45.736 ms
13  vlan69.csw1.Washington1.Level3.net (4.68.17.62)  42.918 ms 
vlan79.csw2.Washington1.Level3.net (4.68.17.126)  55.438 ms 
vlan69.csw1.Washington1.Level3.net (4.68.17.62)  42.693 ms
14  ae-81-81.ebr1.Washington1.Level3.net (4.69.134.137)  48.935 ms 
ae-61-61.ebr1.Washington1.Level3.net (4.69.134.129)  49.317 ms 
ae-91-91.ebr1.Washington1.Level3.net (4.69.134.141)  48.865 ms
15  ae-2.ebr3.Atlanta2.Level3.net (4.69.132.85)  59.642 ms  56.278 ms 
56.671 ms
16  ae-61-60.ebr1.Atlanta2.Level3.net (4.69.138.2)  47.401 ms  62.980 ms 
62.640 ms
17  ae-1-8.bar1.Orlando1.Level3.net (4.69.137.149)  40.300 ms  40.101 ms 
42.690 ms
18  ae-6-6.car1.Orlando1.Level3.net (4.69.133.77)  40.959 ms  40.963 ms 
41.016 ms
19  unknown.Level3.net (63.209.98.66)  246.744 ms  240.826 ms  239.758 ms
20  andc-br-3-f2-0.atlantic.net (209.208.9.138)  39.725 ms  37.751 ms 
42.262 ms
21  ocalflxa-br-1-s1-0.atlantic.net (209.208.112.98)  43.524 ms  45.844 ms 
43.392 ms
22  * * *
23  sl-bb20-dc-6-0-0.sprintlink.net (144.232.8.174)  63.752 ms  61.648 ms 
60.839 ms
24  sl-st20-ash-10-0.sprintlink.net (144.232.20.152)  66.923 ms  65.258 ms 
70.609 ms
25  te-10-1-0.edge2.Washington4.level3.net (4.68.63.209)  67.106 ms 
93.415 ms  73.932 ms
26  vlan99.csw4.Washington1.Level3.net (4.68.17.254)  88.919 ms  75.306 ms 
vlan79.csw2.Washington1.Level3.net (4.68.17.126)  75.048 ms
27  ae-61-61.ebr1.Washington1.Level3.net (4.69.134.129)  69.508 ms  68.401 
ms ae-71-71.ebr1.Washington1.Level3.net (4.69.134.133)  79.128 ms
28  ae-2.ebr3.Atlanta2.Level3.net (4.69.132.85)  64.048 ms  67.764 ms 
67.704 ms
29  ae-71-70.ebr1.Atlanta2.Level3.net (4.69.138.18)  68.372 ms  67.025 ms 
68.162 ms
30  ae-1-8.bar1.Orlando1.Level3.net (4.69.137.149)  65.112 ms  65.584 ms 
65.525 ms
Our circuit provider's support people have basically just maintained that
this behavior isn't possible and so there's nothing they can do about it.
i.e. that the problem has to be something other than the circuit.
I got tired of talking to their brick wall, so I contacted Sprint and was
able to confirm with them that the traffic in question really was
inexplicably appearing on their network...and not terribly close
geographically to the Orlando/Ocala areas.
So, I have a circuit that's bleeding duplicate packets onto an unrelated
IP network, a circuit provider who's got their head in the sand and keeps
telling me "this can't happen, we can't help you", and customers who were
getting tired of receiving all their packets in triplicate (or more)
saturating their connections and confusing their applications.  After a
while, I had to give up on finding the problem and focus on just making it
stop.  After trying a couple of things, the solution I found was to change
the encapsulation we use at each end of the DS3.  I haven't gotten
confirmation of this from Sprint, but I assume they're now seeing massive
input errors one the one or more circuits where our packets were/are
appearing.  The important thing (for me) is that this makes the packets
invalid to Sprint's routers and so it keeps them from forwarding the
packets to us.  Cisco TAC finally got back to us the day after I "fixed"
the circuit...but since it was obviously not a problem with our cisco
gear, I haven't pursued it with them.
The only things I can think of that might be the cause are
misconfiguration in a DACS/mux somewhere along the circuit path or perhaps
a mishandled lawful intercept.  I don't have enough experience with either
or enough access to the systems that provide the circuit to do any more
than speculate.  Has anyone else ever seen anything like this?
If someone from Level3 transport can wrap their head around this, I'd love
to know what's really going on...but at least it's no longer an urgent
problem for me.
----------------------------------------------------------------------
 Jon Lewis                   |  I route
 Senior Network Engineer     |  therefore you are
 Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
------------------------------
Message: 6
Date: Sun, 10 Aug 2008 23:21:01 -0400
From: "William Herrin" <herrin-nanog@dirtside.com>
Subject: Re: maybe a dumb idea on how to fix the dns problems i don't
know....
To: nanog@merit.edu
Message-ID:
<3c3e3fca0808102021q5f64733em4cde0b8e00ffb697@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
On Sat, Aug 9, 2008 at 5:18 PM, Chris Paul <chris.paul@rexconsulting.net> 
wrote:
...
Sorry if this is real stupid for some reason because I don't think about 
DNS
all day (I'm the ldap dude) but since we have faster networks and faster
cpus today, what would be the harm in switching to use TCP for DNS 
clients?
The latency on the web isn't dns anymore ever it seems to me.....
Latency on in-addr lookups where you typically traverse multiple
forward trees to find the NS servers would seriously suck. At best, a
TCP-based lookup performs at about a third of the speed of a UDP
lookup. Worse unless your implementation is carefully optimized and
you make sure that the OS isn't adding options to the front of the
handshake. You have at least the whole syn/synack/ack handshake before
you can even ask the question.
Then there's the server cost associated with keeping that much state...
On Sat, Aug 9, 2008 at 6:10 PM, Matt F <matt@credibleinstitution.org> 
wrote:
...
Why not just require TCP for a lookup if a response with an incorrect 
TXID
is received?  You could require TCP for just the one lookup or for some
configured interval, say 1 hour.  That should slow attackers down
substantially.
Because the attacker is using a sequence of lookups in order to hit
one that lets him poison the cache. That is, he looks up a.google.com,
then he looks up b.google.com, then c.google.com, etc. until he gets
one where the server accepts his fake DNS server record for
google.com.
To be an effective defense, you'd have to do TCP lookups for the whole
scope ({anything}.google.com) for some period of time following the
bad ID. That in turn would open up a potential DOS where an attacker
could force the DNS server to fall back on TCP for essentially
everything, overwhelming it.
On Sat, Aug 9, 2008 at 8:25 PM, brett watson <brett@the-watsons.org> 
wrote:
...
On Aug 9, 2008, at 3:48 PM, Chris Paul wrote:
...
Hey authority DNS server operators. Can you make a change to your 
servers
to always allow TCP client connections? Would this be difficult? What 
would
be the harm?
SYN flooding?
SYN flooding is a solved problem.
On Sat, Aug 9, 2008 at 6:04 PM, Joe Abley <jabley@ca.afilias.info> wrote:
...
TCP works pretty well with anycast too, if you're careful. It's helpful 
if
your transactions are short-lived.
Define "careful." It's always possible for someone to find themselves
with an equal cost path to two different servers in the anycast set.
Add per-packet load balancing at the fork (which is outside the
control of the server operator) and what happens is the request times
out and the resovler fails over to the other NS record that isn't
anycasted.
Though the protocol is simple enough that it might be possible to fake
it. Build yourself a DNS-only stateless TCP stack for the anycasted
address. Have the server send the syn-ack without creating any state.
The request will almost certainly be entirely contained in one packet,
so when it arrives reply to it without creating any state. Ship off as
many packets as you need to reply followed by a Fin. Blindly ack any
packet that looks like it needs it. If any packets are lost, there
won't be any retransmit (you haven't really established a TCP
connection) so the query will time out and retry.
Regards,
Bill Herrin
-- 
William D. Herrin ................ herrin@dirtside.com bill@herrin.us
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004
------------------------------
_______________________________________________
NANOG mailing list
NANOG@nanog.org
http://mailman.nanog.org/mailman/listinfo/nanog
End of NANOG Digest, Vol 7, Issue 26
************************************

Matt Rice

tags

participants (1)