State of QoS peering in Nanog
Folks, The Canadian telecommunications regulator, the CRTC, has just launched a public notice with possible worldwide implications IMHO, Telecom Notice of Consultation CRTC 2011-206: http://www.crtc.gc.ca/eng/archive/2011/2011-206.htm I think this is the very first regulatory inquiry into IP to IP interconnection for PSTN local interconnection. One of the postulates that I intend to defend, is that in the PSTN today, in addition to interconnecting for the purpose of exchanging voice calls, it is possible to LOCALLY (at the Local Interconnection Region, roughly a US LATA) interconnect with guaranteed QoS for ISDN video conferencing. In other words, there is more to PSTN interconnection than the support of the G.711 CODEC. Other CODECs are supported, such as H.320. This brings me to a point. Why should we loose this important feature of the PSTN, support for multiple CODECs, as we carelessly bottom level IP-IP interconnection to G.711 only. Video conferencing on the Internet, particularly at high resolution, is not a reality today to say the least, foregoing of guessing what the future will hold. Why not consider HD audio ? Therefore: A) I want to capture all instances where this issue has been addressed worldwide. B) I also want to understand what is going on, insofar as enabling guaranteed QoS peering across BGP-4 interconnections in the Nanog community. C) I also want to understand whether there is inter-service-provider RSVP or other per-session QoS establishment protocols. I call upon the Nanog community to consider this proceeding as very important and contribute to this thread. And I will try to provide a forum for discussing this outside of Nanog when required. Regards, -=Francois=-
In a message written on Sat, Apr 02, 2011 at 04:00:30PM -0400, Francois Menard wrote:
One of the postulates that I intend to defend, is that in the PSTN today, in addition to interconnecting for the purpose of exchanging voice calls, it is possible to LOCALLY (at the Local Interconnection Region, roughly a US LATA) interconnect with guaranteed QoS for ISDN video conferencing.
The PSTN "features" fixed, known bandwidth. QoS isn't really the right term. When I nail up a BRI, I know I have 128kb of bandwidth, never more, never less. There is no function on that channel similar to IP QoS. When talking about IP QoS people like to talk about guaranteed, or reserved bandwidth for particular applications. The reality is though that's not how IP QoS works. IP QoS is really about identifying which traffic can be thrown away first in th face of congestion. Guaranteeing 128kb for a video call really means making sure all other traffic is thrown away first, in the face of congestion.
In other words, there is more to PSTN interconnection than the support of the G.711 CODEC. Other CODECs are supported, such as H.320.
This brings me to a point. Why should we loose this important feature of the PSTN, support for multiple CODECs, as we carelessly bottom level IP-IP interconnection to G.711 only.
IP networks can't tell the difference between G.711, H.320, and the SMTP packets used to deliver this e-mail. IP networks know nothing about CODECs, and operate entirely on IP address and port information.
B) I also want to understand what is going on, insofar as enabling guaranteed QoS peering across BGP-4 interconnections in the Nanog community.
You're looking at the wrong point in the network. In my experience, full peering circuits are very much the exception, not the rule. While almost all the exceptions hit NANOG and are the subject of fun and lively discussion, the reality is they are rare. When there is no congestion, there is no reason to drop a packet. A QoS policy would go unused, or if you want to look from the other direction everything has 100% bandwidth across that link. In an IP network, the bandwidth constraints are almost always across an administrative boundary. This means in the majority of the case across transit circuits, not peering. 80-90% of the packet loss in the network happens at the end user access port, inbound or outbound. Another 5-10% occurs where regional or non-transit free providers buy transit. Lastly, 3-5% occurs where there are geographic or geopolitical issues (oceans to cross, country boarders with restrictive governments to cross). Basically, you could mandate QoS on every peering link in the Internet and I suspect 99% of the end users would never notice any change. If you want to advocate for useful changes to end users that provide a better network experience, you need to focus your efforts in three areas: 1) Fight bufferbloat. http://en.wikipedia.org/wiki/Bufferbloat http://arstechnica.com/tech-policy/news/2011/01/understanding-bufferbloat-an... http://www.bufferbloat.net/ 2) Get access ISPs to offer QoS on customer access ports, ideally in some user configurable way. 3) Get ISP's who purchase transit further up the line to implement QoS with their transit provider for their customers traffic, if they are going to run those links at full. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On Sat, Apr 2, 2011 at 5:56 PM, Leo Bicknell <bicknell@ufp.org> wrote:
The PSTN "features" fixed, known bandwidth. QoS isn't really the right term. When I nail up a BRI, I know I have 128kb of bandwidth, never more, never less. There is no function on that channel similar to IP QoS.
The PSTN also has exactly one unidirectional flow per access port. This is not true of IP networks, where an end-user access port may have dozens of flows going at once for common web browsing, and perhaps hundreds of flows when using P2P file sharing applications, etc. The lifetime of these flows may be several hours (streaming movie) or under a second (web browser.) Where the PSTN has channels between two access ports (which might be packetized within the backbone) and a relatively complex control plane for establishing flows, the IP network has little or no knowledge of flows, and if it does have any knowledge of them, it's not because a control plane exists to establish them, it's because punting from the data plane to the control plane allows flow state to be established for things like NAT.
Basically, you could mandate QoS on every peering link in the Internet and I suspect 99% of the end users would never notice any change.
I don't agree with this. IMO all DDoS traffic would suddenly be marked into the highest priority forwarding class that doesn't have an absurdly low policer for the DDoS source's access port, and as a result, DDoS would more easily cripple the network, either from hitting policers on the higher-priority traffic and killing streaming movies/voip/etc, or in the absence of policers, it would more easily cause significant packet loss to best-effort traffic. I think end-users would notice because their ISP would suddenly grind to a halt anytime a clever DDoS was directed their way. We will no sooner see a practical solution to this than we will one for large-scale multicast in backbone and subscriber access networks. The limitations are similar: to be effective, you need a lot more state for multicast. For a truly good QoS implementation, you need a lot more hardware counters and policers (more state.) If you don't have this, all your QoS setup will do, deployed across a large Internet subscriber access network, is work a little better under ideal conditions, and probably a lot worse when subjected to malicious traffic.
2) Get access ISPs to offer QoS on customer access ports, ideally in some user configurable way.
I do agree that QoS should be available to end-users across access links, but I don't agree with pushing it further towards the core unless per-subscriber policers are available beyond those on access routers. Otherwise, all someone has to do to be mean to Netflix is send a short-term, high-volume DoS attack that looks like Netflix traffic towards an end-user IP, which would interrupt movie-viewing for a potentially larger number of users, or at least as many end-users as the same DoS would in the absence of any QoS. The case of per-subscriber policers pushed further towards the ISP core fares better. -- Jeff S Wheeler <jsw@inconcepts.biz> Sr Network Operator / Innovative Network Concepts
In a message written on Sat, Apr 02, 2011 at 07:00:52PM -0400, Jeff Wheeler wrote:
I don't agree with this. IMO all DDoS traffic would suddenly be marked into the highest priority forwarding class that doesn't have an absurdly low policer for the DDoS source's access port, and as a result, DDoS would more easily cripple the network, either from hitting policers on the higher-priority traffic and killing streaming movies/voip/etc, or in the absence of policers, it would more easily cause significant packet loss to best-effort traffic.
Agree in part, and disagree in part. No doubt DDoS programs will try and masquerade as "high priority" traffic. This will create a new set of problems, and require some new solutions. Let's separate the problem into two parts. The first is "best effort" traffic. Provided the QoS policy only prioritizes a fraction of the bandwidth (20 to maybe 40%), the impact of a DDoS that came in prioritized would only be a few percentage points worse than a standard DDoS. Today it takes about 10x link speed to make a link "completely unusable" (although YMMV, and it depends a lot on your traffic mix and definition of unusable). Witha 25% priority queue, and the DDoS hitting it that may drop to 8x. I think it is both statistically interesting, but also relatively minor. The second problem is what happens to priority traffic. You are correct that if DDoS traffic can come in prioritized then you only need fill the priority queue 2x-4x to generate issues (as streaming traffic is more sensitive), assuming traffic over the limit is not dropped but rather allowed best effort. This is likely a lower threshold than filling the entire link 5x-10x, and thus easier for the attacker. But it also only affects priority queue traffic. I realize I'm making a value judgment, but many customers under DDoS would find things vastly improved if their video conferencing went down, but everything else continued to work (if slowly), compared to today when everything goes down. In closing, I want to push folks back to the buffer bloat issue though. More than once I've been asked to configure QoS on the network to support VoIP, Video Conferencing or the like. These things were deployed and failed to work properly. I went into the network and _reduced_ the buffer sizes, and _increased_ packet drops. Magically these applications worked fine, with no QoS. Video conferencing can tolerate a 1% packet drop, but can't tolerate a 4 second buffer delay. Many people today who want QoS are actually suffering from buffer bloat. :( This is very hard to explain, while people on NANOG might get it 99% of the network engineers in the world think minimizing packet loss is the goal. It is very much an uphill battle to make them understand higher packet loss often _increases_ end user performance on full links. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
-----Original Message----- From: Leo Bicknell [mailto:bicknell@ufp.org] Sent: Saturday, April 02, 2011 10:24 PM
But it also only affects priority queue traffic. I realize I'm making a value judgment, but many customers under DDoS would find things vastly improved if their video conferencing went down, but everything else continued to work (if slowly), compared to today when everything goes down.
I'd like to observe that discussion when the Netflix guys come calling on the support line - "Hey Netflix, yeah you're under attack and your subscribers can't watch videos at the moment, but the good news is that all other apps running on our network are currently unaffected". ;>
In closing, I want to push folks back to the buffer bloat issue though. More than once I've been asked to configure QoS on the network to support VoIP, Video Conferencing or the like. These things were deployed and failed to work properly. I went into the network and _reduced_ the buffer sizes, and _increased_ packet drops. Magically these applications worked fine, with no QoS.
Video conferencing can tolerate a 1% packet drop, but can't tolerate a 4 second buffer delay. Many people today who want QoS are actually suffering from buffer bloat. :(
Concur 100%. In my experience, I've gotten much better performance w/ VoIP/Video Conferencing and other delay-intolerant applications when setting buffer sizes to a temporal value rather than based on a _fixed_ number of packets. Stefan Fouant
On 04/03/2011 12:50 PM, Stefan Fouant wrote:
-----Original Message----- From: Leo Bicknell [mailto:bicknell@ufp.org] Sent: Saturday, April 02, 2011 10:24 PM
But it also only affects priority queue traffic. I realize I'm making a value judgment, but many customers under DDoS would find things vastly improved if their video conferencing went down, but everything else continued to work (if slowly), compared to today when everything goes down. I'd like to observe that discussion when the Netflix guys come calling on the support line - "Hey Netflix, yeah you're under attack and your subscribers can't watch videos at the moment, but the good news is that all other apps running on our network are currently unaffected". ;>
In closing, I want to push folks back to the buffer bloat issue though. More than once I've been asked to configure QoS on the network to support VoIP, Video Conferencing or the like. These things were deployed and failed to work properly. I went into the network and _reduced_ the buffer sizes, and _increased_ packet drops. Magically these applications worked fine, with no QoS.
Video conferencing can tolerate a 1% packet drop, but can't tolerate a 4 second buffer delay. Many people today who want QoS are actually suffering from buffer bloat. :( Concur 100%. In my experience, I've gotten much better performance w/ VoIP/Video Conferencing and other delay-intolerant applications when setting buffer sizes to a temporal value rather than based on a _fixed_ number of packets.
There is no magic here at all. There are dark buffers all over the Internet; some network operators run routers and broadband without RED enabled, our broadband gear suffers from excessive buffering, as does our home routers and hosts. What is happening, as I outlined at the transport area meeting at the IETF in Prague, is that by putting in excessive buffers everywhere in the name of avoiding packet loss, we've destroyed TCP congestion avoidance and badly damaged slow start while adding terrible latency and jitter. Tail drop with long buffers delays notification of congestion to TCP, and defeats the algorithms. Even without this additional problem (which causes further havoc), TCP will always fill buffers on either side of your bottleneck link in your path. So your large buffers add latency, and when a link is saturated, the buffers on either side of the saturated links fill, and stay so (most commonly in the broadband gear, but often also in the hosts/home routers over 802.11 links). By running with AQM (or small buffers), you reduce the need for QOS (which doesn't yet exist seriously in the network edge). See my talk in http://mirrors.bufferbloat.net/Talks/PragueIETF/ (slightly updated since the Prague IETF) and you can listen to it at http://ietf80streaming.dnsalias.net/ietf80/ietf80-ch4-wed-am.mp3 A longer version of that talk is at:http://mirrors.bufferbloat.net/Talks/BellLabs01192011/ Note that there is a lot you can do immediately to reduce your personal suffering, by using bandwidth shaping to reduce/eliminate the buffer problem in your home broadband gear, and by ensuring that your 802.11 wireless bandwidth is always greater than your home broadband bandwidth (since the bloat in current home routers can be even worse than in the broadband gear). See http://gettys.wordpress.com for more detail. Please come help fix this mess at bufferbloat.net. The bloat mailing list is bloat@lists.bufferbloat.net. We're all in this bloat together. - Jim
-----Original Message----- From: Leo Bicknell [mailto:bicknell@ufp.org] Sent: Saturday, April 02, 2011 5:56 PM
In an IP network, the bandwidth constraints are almost always across an administrative boundary. This means in the majority of the case across transit circuits, not peering. 80-90% of the packet loss in the network happens at the end user access port, inbound or outbound. Another 5-10% occurs where regional or non-transit free providers buy transit. Lastly, 3-5% occurs where there are geographic or geopolitical issues (oceans to cross, country boarders with restrictive governments to cross).
Hi Leo, I think you bring up some interesting points here, and my experience and observations largely lend credence to what you are saying. I'd like to know however, just for my own personal knowledge, are the numbers you are using above based on some broad analysis or study of multiple providers, or are you deriving these numbers likewise you're your own personal observations? Thanks, Stefan Fouant
participants (5)
-
Francois Menard
-
Jeff Wheeler
-
Jim Gettys
-
Leo Bicknell
-
Stefan Fouant