Anyone seen this kind of problem? SIP traffic not getting to destination but traceroute does
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network. I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router. SIP traffic coming from SIP provider to the affected customer came through fine. It's just Us -> SIP server was a problem. I thought there may be some strange BGP issue going on but we had other customers within the same /24 as the affected customers and they were connecting fine. The traffic at the time traversed Our network -> Qwest/century link -> Level 3 -> SIP provider I changed the routing around so it would go through our other upstream, AT&T, and it started working. With AT&T, the route was Our network -> AT&T -> Level 3 -> SIP provider So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
I can't say I have a specific answer to your question, but yesterday I was seeing major packet loss on outbound audio from all my VoIP customers using Qwest and going in to servers on L3. It's entirely possible that SIP was also being lost, just the audio was the more notable and pressing issue. It seems to be resolved at this point, but we have not yet heard from Qwest what the actual problem was. This was with sites in Northeast Ohio and the Chicago area connecting to servers in New York and LA for what it's worth. ---------- Sean Harlow sean@seanharlow.info On Nov 9, 2011, at 1:47 PM, Jay Nakamura wrote:
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network.
I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router.
SIP traffic coming from SIP provider to the affected customer came through fine. It's just Us -> SIP server was a problem.
I thought there may be some strange BGP issue going on but we had other customers within the same /24 as the affected customers and they were connecting fine.
The traffic at the time traversed
Our network -> Qwest/century link -> Level 3 -> SIP provider
I changed the routing around so it would go through our other upstream, AT&T, and it started working. With AT&T, the route was
Our network -> AT&T -> Level 3 -> SIP provider
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
What was the timeframe for your issues? Just curious since we saw some strangeness last night. Preston -----Original Message----- From: Sean Harlow [mailto:sean@seanharlow.info] Sent: Wednesday, November 09, 2011 12:00 PM To: Jay Nakamura Cc: NANOG Subject: Re: Anyone seen this kind of problem? SIP traffic not getting to destination but traceroute does I can't say I have a specific answer to your question, but yesterday I was seeing major packet loss on outbound audio from all my VoIP customers using Qwest and going in to servers on L3. It's entirely possible that SIP was also being lost, just the audio was the more notable and pressing issue. It seems to be resolved at this point, but we have not yet heard from Qwest what the actual problem was. This was with sites in Northeast Ohio and the Chicago area connecting to servers in New York and LA for what it's worth. ---------- Sean Harlow sean@seanharlow.info On Nov 9, 2011, at 1:47 PM, Jay Nakamura wrote:
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network.
I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router.
SIP traffic coming from SIP provider to the affected customer came through fine. It's just Us -> SIP server was a problem.
I thought there may be some strange BGP issue going on but we had other customers within the same /24 as the affected customers and they were connecting fine.
The traffic at the time traversed
Our network -> Qwest/century link -> Level 3 -> SIP provider
I changed the routing around so it would go through our other upstream, AT&T, and it started working. With AT&T, the route was
Our network -> AT&T -> Level 3 -> SIP provider
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
It started sometime Tuesday morning. I have yet to set the route back to Qwest. I am going to do that tonight and test it. On Wed, Nov 9, 2011 at 2:04 PM, Preston Parcell <preston.parcell@viawest.com> wrote:
What was the timeframe for your issues? Just curious since we saw some strangeness last night.
Preston
-----Original Message----- From: Sean Harlow [mailto:sean@seanharlow.info] Sent: Wednesday, November 09, 2011 12:00 PM To: Jay Nakamura Cc: NANOG Subject: Re: Anyone seen this kind of problem? SIP traffic not getting to destination but traceroute does
I can't say I have a specific answer to your question, but yesterday I was seeing major packet loss on outbound audio from all my VoIP customers using Qwest and going in to servers on L3. It's entirely possible that SIP was also being lost, just the audio was the more notable and pressing issue. It seems to be resolved at this point, but we have not yet heard from Qwest what the actual problem was.
This was with sites in Northeast Ohio and the Chicago area connecting to servers in New York and LA for what it's worth. ---------- Sean Harlow sean@seanharlow.info
On Nov 9, 2011, at 1:47 PM, Jay Nakamura wrote:
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network.
I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router.
SIP traffic coming from SIP provider to the affected customer came through fine. It's just Us -> SIP server was a problem.
I thought there may be some strange BGP issue going on but we had other customers within the same /24 as the affected customers and they were connecting fine.
The traffic at the time traversed
Our network -> Qwest/century link -> Level 3 -> SIP provider
I changed the routing around so it would go through our other upstream, AT&T, and it started working. With AT&T, the route was
Our network -> AT&T -> Level 3 -> SIP provider
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
I saw the problems starting around 09:30 Eastern and continuing past 17:00. Looking through ticket notes I had missed when writing my previous reply it seems that a fix was confirmed around 22:30 which involved a faulty piece of equipment being replaced. I do not have specifics on what went wrong and when it was actually fixed though. ---------- Sean Harlow sean@seanharlow.info On Nov 9, 2011, at 2:04 PM, Preston Parcell wrote:
What was the timeframe for your issues? Just curious since we saw some strangeness last night.
Preston
-----Original Message----- From: Sean Harlow [mailto:sean@seanharlow.info] Sent: Wednesday, November 09, 2011 12:00 PM To: Jay Nakamura Cc: NANOG Subject: Re: Anyone seen this kind of problem? SIP traffic not getting to destination but traceroute does
I can't say I have a specific answer to your question, but yesterday I was seeing major packet loss on outbound audio from all my VoIP customers using Qwest and going in to servers on L3. It's entirely possible that SIP was also being lost, just the audio was the more notable and pressing issue. It seems to be resolved at this point, but we have not yet heard from Qwest what the actual problem was.
This was with sites in Northeast Ohio and the Chicago area connecting to servers in New York and LA for what it's worth. ---------- Sean Harlow sean@seanharlow.info
On Nov 9, 2011, at 1:47 PM, Jay Nakamura wrote:
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network.
I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router.
SIP traffic coming from SIP provider to the affected customer came through fine. It's just Us -> SIP server was a problem.
I thought there may be some strange BGP issue going on but we had other customers within the same /24 as the affected customers and they were connecting fine.
The traffic at the time traversed
Our network -> Qwest/century link -> Level 3 -> SIP provider
I changed the routing around so it would go through our other upstream, AT&T, and it started working. With AT&T, the route was
Our network -> AT&T -> Level 3 -> SIP provider
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
Yes! Yesterday, from 9AM-10AM PST, I had a Qwest client transiting Level3 where traceroutes were working, but sip registrations were not. They were leaving fine, but not being received on the destination side. Then at 10AM-2PM PST, same client, registrations and invites were now working, but "180 RINGING" was being eaten. Things worked fully at 2PM. We only contacted Level3, and they didn't see any issues at around 1:45PM PST. Regards, Owen ----- Original Message ----- From: "Preston Parcell" <preston.parcell@viawest.com> To: "Sean Harlow" <sean@seanharlow.info>, "Jay Nakamura" <zeusdadog@gmail.com> Cc: "NANOG" <nanog@nanog.org> Sent: Wednesday, November 9, 2011 11:04:01 AM Subject: RE: Anyone seen this kind of problem? SIP traffic not getting to destination but traceroute does What was the timeframe for your issues? Just curious since we saw some strangeness last night. Preston -----Original Message----- From: Sean Harlow [mailto:sean@seanharlow.info] Sent: Wednesday, November 09, 2011 12:00 PM To: Jay Nakamura Cc: NANOG Subject: Re: Anyone seen this kind of problem? SIP traffic not getting to destination but traceroute does I can't say I have a specific answer to your question, but yesterday I was seeing major packet loss on outbound audio from all my VoIP customers using Qwest and going in to servers on L3. It's entirely possible that SIP was also being lost, just the audio was the more notable and pressing issue. It seems to be resolved at this point, but we have not yet heard from Qwest what the actual problem was. This was with sites in Northeast Ohio and the Chicago area connecting to servers in New York and LA for what it's worth. ---------- Sean Harlow sean@seanharlow.info On Nov 9, 2011, at 1:47 PM, Jay Nakamura wrote:
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network.
I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router.
SIP traffic coming from SIP provider to the affected customer came through fine. It's just Us -> SIP server was a problem.
I thought there may be some strange BGP issue going on but we had other customers within the same /24 as the affected customers and they were connecting fine.
The traffic at the time traversed
Our network -> Qwest/century link -> Level 3 -> SIP provider
I changed the routing around so it would go through our other upstream, AT&T, and it started working. With AT&T, the route was
Our network -> AT&T -> Level 3 -> SIP provider
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
On Wed, Nov 9, 2011 at 1:47 PM, Jay Nakamura <zeusdadog@gmail.com> wrote:
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than
I ran into exactly this problem last week with Rogers. All traffic from the client except udp/5060 could be received by us, and udp/5060 was blocked. We tested other IP addresses on our (provider) side and did not find any blocking there, so we assigned a new IP to the SIP gateway. I hardly think this can be an ordinary malfunction, but good luck getting a phone company to troubleshoot a problem with their subscribers using mobile data to connect to a third-party voice gateway... -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts
----- Original Message -----
From: "Jeff Wheeler" <jsw@inconcepts.biz>
On Wed, Nov 9, 2011 at 1:47 PM, Jay Nakamura <zeusdadog@gmail.com> wrote:
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than
I ran into exactly this problem last week with Rogers. All traffic from the client except udp/5060 could be received by us, and udp/5060 was blocked. We tested other IP addresses on our (provider) side and did not find any blocking there, so we assigned a new IP to the SIP gateway. I hardly think this can be an ordinary malfunction, but good luck getting a phone company to troubleshoot a problem with their subscribers using mobile data to connect to a third-party voice gateway...
Well, just a couple of days ago, we discussed that XO does this kind of rifle-bullet filtering in certain circumstances; is any party getting their connectivity from them? Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274
On Nov 9, 2011, at 2:45 PM, Jeff Wheeler wrote:
On Wed, Nov 9, 2011 at 1:47 PM, Jay Nakamura <zeusdadog@gmail.com> wrote:
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than
I ran into exactly this problem last week with Rogers. All traffic from the client except udp/5060 could be received by us, and udp/5060 was blocked. We tested other IP addresses on our (provider) side and did not find any blocking there, so we assigned a new IP to the SIP gateway. I hardly think this can be an ordinary malfunction, but good luck getting a phone company to troubleshoot a problem with their subscribers using mobile data to connect to a third-party voice gateway…
I've seen UDP/5060 be intercepted or blocked by various providers. This is common in international markets. If you are doing VoIP over the public internet, it may be worthwhile to invest in software or hardware that can VPN either 'back' or 'out' to the internet. I have a PPTP VPN solution I use to escape various hotel networks. You can even do an install on a Linux box with the poptop/pptpd solution. (Having a ssh server on tcp/80 and tcp/443 also can help, and is part of 'being prepared'). - Jared
Jay Nakamura wrote the following on 11/9/2011 12:47 PM:
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network.
I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router.
... So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
I've found tools like tcptraceroute (the name is deceiving, UDP is the default) and hping to be invaluable in tracking down issues like these that are obviously above the routing and into the transport layer. I'm not sure how an IP transit provider (who should be providing routing/switching) screws up transport layer connections - looks like they are arbitrarily "managing" client data. Just my $0.02. --Blake
On 11/9/2011 4:45 PM, Blake Hudson wrote:
I'm not sure how an IP transit provider (who should be providing routing/switching) screws up transport layer connections - looks like they are arbitrarily "managing" client data. Just my $0.02.
With today's routers, all sorts of weird things can go wrong, especially if it's a hardware failure. I had an IO/FE go out on a 7200 (which is as software as you get) which attributed to a lot of weirdness. It started when the IGP updated state information on the IO card's FE, which shut down mpls switching on the router, but the LSP itself was still considered up. It then showed by freaking out the neighbor 7206 when we reboot the failing one (could no longer ping the loopback of the neighbor router with and without using the LSP, but all IGP was up and you could ping/telnet/ssh to any other IP ). Finally the reboot itself showed the true issue (required multiple power cycles and a reset of the ata card to even load IOS in an unstable state). I don't even want to think what happens when a high end router's linecard starts to fail. Jack
It may also be related to QoS policy inside the carriers. Some time ago I've seen exactly the same symptoms with Verizon when sip signaling was sent marked as EF. Remarking it down to CS1 or CS3 (don't remember exactly) solved the problem. Michael On Wednesday 09 November 2011 13:47:37 Jay Nakamura wrote:
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network.
I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router.
SIP traffic coming from SIP provider to the affected customer came through fine. It's just Us -> SIP server was a problem.
I thought there may be some strange BGP issue going on but we had other customers within the same /24 as the affected customers and they were connecting fine.
The traffic at the time traversed
Our network -> Qwest/century link -> Level 3 -> SIP provider
I changed the routing around so it would go through our other upstream, AT&T, and it started working. With AT&T, the route was
Our network -> AT&T -> Level 3 -> SIP provider
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
I just removed the route to our other provider and traffic is going out Qwest again. The problem seems to be gone now. As others had similar problems during the same period using Qwest, it must have been some strange issue with Qwest. On Wed, Nov 9, 2011 at 1:47 PM, Jay Nakamura <zeusdadog@gmail.com> wrote:
We ran into a strange situation yesterday that I am still trying to figure out. We have many VoIP customers but yesterday suddenly select few of them couldn't reach the SIP provider's network from our network.
I could traceroute to the SIP providers server from the affected clients' IP just fine. I confirmed that the SIP traffic was leaving our network out the interface to the upstream provider and the SIP provider says they couldn't see the SIP traffic come into their border router.
SIP traffic coming from SIP provider to the affected customer came through fine. It's just Us -> SIP server was a problem.
I thought there may be some strange BGP issue going on but we had other customers within the same /24 as the affected customers and they were connecting fine.
The traffic at the time traversed
Our network -> Qwest/century link -> Level 3 -> SIP provider
I changed the routing around so it would go through our other upstream, AT&T, and it started working. With AT&T, the route was
Our network -> AT&T -> Level 3 -> SIP provider
So my questions is, is it possible there is some kind of filter at Qwest or Level 3 that is dropping traffic only for udp 5060 for select few IPs? That's the only explanation I can come up with other than the whole Juniper BGP issue 2 days ago left something in between in a strange state? I read the post about XO doing filtering on transit traffic, I haven't seen anyone say Level 3 or Qwest is doing the same.
participants (10)
-
Blake Hudson
-
Jack Bates
-
Jared Mauch
-
Jay Ashworth
-
Jay Nakamura
-
Jeff Wheeler
-
Michael Ulitskiy
-
Owen Roth
-
Preston Parcell
-
Sean Harlow