Google's peering, GGC, and congestion management
Hi, In its peering documentation [https://peering.google.com/about/traffic_management.html], Google claims that it can drive peering links at 100% utilisation:
Congestion management
Peering ports with Google can be run at 100% capacity in the short term, with low (<1-2%) packet loss. Please note that an ICMP ping may display packet loss due to ICMP rate limiting on our platforms. Please contact us to arrange a peering upgrade.
How do they achieve this? More generally, is there any published work on how Google serves content from its CDN, the Google Global Cache? I'm especially interested in two aspects: - for a given eyeball network, on which basis are the CDN nodes selected? - is Google able to spread traffic over distinct peering links for the same eyeball network, in case some of the peering links become congested? If so, how do they measure congestion? Thanks for your input, Baptiste
On Oct 14, 2015, at 1:07 PM, Baptiste Jonglez <baptiste@bitsofnetworks.org> wrote:
In its peering documentation [https://peering.google.com/about/traffic_management.html], Google claims that it can drive peering links at 100% utilisation:
Congestion management
Peering ports with Google can be run at 100% capacity in the short term, with low (<1-2%) packet loss. Please note that an ICMP ping may display packet loss due to ICMP rate limiting on our platforms. Please contact us to arrange a peering upgrade.
How do they achieve this?
The 100% number is silly. My guess? They’re at 98%. That is easily do-able because all the traffic is coming from them. Coordinate the HTTPd on each of the servers to serve traffic at X bytes per second, ensure you have enough buffer in the switches for micro-bursts, check the NICs for silliness such as jitter, and so on. It is non-trivial, but definitely solvable. Google is not the only company who can do this. Akamai has done it far longer. And Akamai has a much more difficult traffic mix, with -paying customers- to deal with.
More generally, is there any published work on how Google serves content from its CDN, the Google Global Cache? I'm especially interested in two aspects:
- for a given eyeball network, on which basis are the CDN nodes selected?
As for picking which GGC for each eyeball, that is called “mapping”. It varies among the different CDNs. Netflix drives it mostly from the client. That has some -major- advantages over other CDNs. Google has in the past (haven’t checked in over a year) done it by giving each user a different URL, although I think they use DNS now. Akamai uses mostly DNS, although they have at least experimented with other ways. Etc., etc.
- is Google able to spread traffic over distinct peering links for the same eyeball network, in case some of the peering links become congested? If so, how do they measure congestion?
Yes. Easily. User 1 asks for Stream 1, Google sends them them to Node 1. Google notices Link 1 is near full. User 2 asks for Stream 2, Google sends them to Node 2, which uses Link 2. This is possible for any set of Users, Streams, Nodes, and Links. It is even possible to send User 2 to Node 2 when User 2 wants Stream 1. Or sending User 1 to Node 2 for their second request despite the fact they just got a stream from Node 1. There are few, if any, restrictions on the combinations. Remember, they control the servers. All CDNs (that matter) can do this. They can re-direct users with different URLs, different DNS responses, 302s, etc., etc. It is not BGP. Everything is much easier when you are one of the end points. (Or both, like with Netflix.) When you are just an ISP shuffling packets you neither send nor receive, things are both simpler and harder. -- TTFN, patrick
On 15 October 2015 at 16:35, Patrick W. Gilmore <patrick@ianai.net> wrote:
The 100% number is silly. My guess? They’re at 98%.
That is easily do-able because all the traffic is coming from them. Coordinate the HTTPd on each of the servers to serve traffic at X bytes per second, ensure you have enough buffer in the switches for micro-bursts, check the NICs for silliness such as jitter, and so on. It is non-trivial, but definitely solvable.
You would not need to control the servers to do this. All you need is the usual hash function of src+dst ip+port to map sessions into buckets and then dynamically compute how big a fraction of the buckets to route through a different path. A bit surprising that this is not a standard feature on routers. Regards, Baldur
On Oct 15, 2015, at 3:50 PM, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
On 15 October 2015 at 16:35, Patrick W. Gilmore <patrick@ianai.net> wrote:
The 100% number is silly. My guess? They’re at 98%.
That is easily do-able because all the traffic is coming from them. Coordinate the HTTPd on each of the servers to serve traffic at X bytes per second, ensure you have enough buffer in the switches for micro-bursts, check the NICs for silliness such as jitter, and so on. It is non-trivial, but definitely solvable.
You would not need to control the servers to do this. All you need is the usual hash function of src+dst ip+port to map sessions into buckets and then dynamically compute how big a fraction of the buckets to route through a different path.
A bit surprising that this is not a standard feature on routers.
The reason routers do not do that is what you suggest would not work. First, you make the incorrect assumption that inbound will never exceed outbound. Almost all CDN nodes have far more capacity between the servers and the router than the router has to the rest of the world. And CDN nodes are probably the least complicated example in large networks. The only way to ensure A < B is to control A or B - and usually A. Second, the router has no idea how much traffic is coming in at any particular moment. Unless you are willing to move streams mid-flow, you can’t guarantee this will work even if sum(in) < sum(out). Your idea would put Flow N on Port X when the SYN (or SYN/ACK) hits. How do you know how many Mbps that flow will be? You do not, therefore you cannot do it right. And do not say you’ll wait for the first few packets and move then. Flows are not static. Third…. Actually, since 1 & 2 are each sufficient to show why it doesn’t work, not sure I need to go through the next N reasons. But there are plenty more. -- TTFN, patrick
On 15 October 2015 at 22:00, Patrick W. Gilmore <patrick@ianai.net> wrote:
The reason routers do not do that is what you suggest would not work.
Of course it will work and it is in fact exactly the same as your own suggestion, just implemented in the network. Besides it _is already_ a standard feature, it is called equal cost multipath routing. The only difference is dynamically changing the weights between the multipaths.
First, you make the incorrect assumption that inbound will never exceed outbound. Almost all CDN nodes have far more capacity between the servers and the router than the router has to the rest of the world. And CDN nodes are probably the least complicated example in large networks. The only way to ensure A < B is to control A or B - and usually A.
I make absolutely no assumptions about ingress (towards the ASN) as we have no control of that. There is no requirement that routing is symmetric and it is the responsibility of whoever controls the ingress to do something if the port is overloaded in that direction. In the case of a CDN however, the ingress will be very little. Netflix does not take much data in from their customers, it is all egress traffic towards the customers and the CDN is in control of that. The same goes for Google. Two non CDN peers could use the system, but if the traffic level is symmetric then they better both do it.
Second, the router has no idea how much traffic is coming in at any particular moment. Unless you are willing to move streams mid-flow, you can’t guarantee this will work even if sum(in) < sum(out). Your idea would put Flow N on Port X when the SYN (or SYN/ACK) hits. How do you know how many Mbps that flow will be? You do not, therefore you cannot do it right. And do not say you’ll wait for the first few packets and move then. Flows are not static.
Flows can move at any time in a BGP network. As we are talking about CDNs we can assume that we have many many small flows (compared to port size). We can be fairly sure that traffic will not make huge jumps from one second to the next - you will have a nice curve here. You know exactly how much traffic you had the last time period, both out through the contested port and through the alternative paths. Recalculating the weights is just a matter of assuming that the next time period will be the same or that the delta will be the same. It is a classic control loop problem. TCP is trying to do much the same btw. You can adjust how close to 100% you want the algorithm to hit. If it performs badly, give it a little bit more space. If the time period is one second, flows can move once a second at maximum and very few flows would be likely to move. You could get a few out of order packets on your flow, which is not such a big issue in a rare event.
Third…. Actually, since 1 & 2 are each sufficient to show why it doesn’t work, not sure I need to go through the next N reasons. But there are plenty more.
There are more reasons why this problem is hard to do on the servers :-). Regards, Baldur
On Oct 15, 2015, at 5:13 PM, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
On 15 October 2015 at 22:00, Patrick W. Gilmore <patrick@ianai.net> wrote:
The reason routers do not do that is what you suggest would not work.
Of course it will work and it is in fact exactly the same as your own suggestion, just implemented in the network. Besides it _is already_ a standard feature, it is called equal cost multipath routing. The only difference is dynamically changing the weights between the multipaths.
You are confused. But I think I see the source of your confusion. Perhaps you are only considering a single port on a multi-port router with many paths to the same destination. Sure, if you want to say when Port X gets full (FSVO “full”), move some flows to the second best path. Yes, that is physically possible. However, that is a tiny fraction of CDN Mapping. Plus you have a vast number of assumptions - not the least of which is that there _is_ another port to move traffic to. How many CDN nodes have you seen? You think most of them have a ton of ports to a slew of different networks? Or do they plonk a bunch of servers behind a single router (or switch!) connected to a single network (since most of them are _inside_ that network)? My original point is the CDN can control how much traffic is sent to each destination. Routers cannot do this. BTW: What you suggest breaks a lot of other things - which may or may not be a good trade off for avoiding congesting individual ports. But the idea to make identical IP path decisions inside a single router non-deterministic is .. let’s call it questionable.
First, you make the incorrect assumption that inbound will never exceed outbound. Almost all CDN nodes have far more capacity between the servers and the router than the router has to the rest of the world. And CDN nodes are probably the least complicated example in large networks. The only way to ensure A < B is to control A or B - and usually A.
I make absolutely no assumptions about ingress (towards the ASN) as we have no control of that. There is no requirement that routing is symmetric and it is the responsibility of whoever controls the ingress to do something if the port is overloaded in that direction. In the case of a CDN however, the ingress will be very little. Netflix does not take much data in from their customers, it is all egress traffic towards the customers and the CDN is in control of that. The same goes for Google.
Two non CDN peers could use the system, but if the traffic level is symmetric then they better both do it.
You are still confused. I have 48 servers connected @ GigE to a router with 4 x 10G outbound. When all 48 get nailed, where in the hell does the extra 8 Gbps go? While if I own the CDN, I can easily ensure those 48 servers never push more than 40 Gbps. Or even 20 Gbps to any single destination. Or even 10 Mbps to any single destination. The CDN can ensure the router is -never- congested. The router itself cannot do that.
Second, the router has no idea how much traffic is coming in at any particular moment. Unless you are willing to move streams mid-flow, you can’t guarantee this will work even if sum(in) < sum(out). Your idea would put Flow N on Port X when the SYN (or SYN/ACK) hits. How do you know how many Mbps that flow will be? You do not, therefore you cannot do it right. And do not say you’ll wait for the first few packets and move then. Flows are not static.
Flows can move at any time in a BGP network. As we are talking about CDNs we can assume that we have many many small flows (compared to port size). We can be fairly sure that traffic will not make huge jumps from one second to the next - you will have a nice curve here. You know exactly how much traffic you had the last time period, both out through the contested port and through the alternative paths. Recalculating the weights is just a matter of assuming that the next time period will be the same or that the delta will be the same. It is a classic control loop problem. TCP is trying to do much the same btw.
You can adjust how close to 100% you want the algorithm to hit. If it performs badly, give it a little bit more space.
If the time period is one second, flows can move once a second at maximum and very few flows would be likely to move. You could get a few out of order packets on your flow, which is not such a big issue in a rare event.
This makes me lean towards my original idea that you have a total of one port on one router being considered. Perhaps that is what the OP meant. If so, sure, have at it. If they are interested in how CDN Mapping works, not even close.
Third…. Actually, since 1 & 2 are each sufficient to show why it doesn’t work, not sure I need to go through the next N reasons. But there are plenty more.
There are more reasons why this problem is hard to do on the servers :-).
The problem is VERY hard on the servers. Or, more precisely, on the control plane (which is frequently not on the servers themselves). But the difference between “it's hard” and “it's un-possible” is kinda important. -- TTFN, patrick
Remember, they control the servers. All CDNs (that matter) can do
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 15/Oct/15 16:35, Patrick W. Gilmore wrote: this. They can re-direct users with different URLs, different DNS responses, 302s, etc., etc. It is not BGP. Of course, some other CDN's don't use DNS, and instead use BGP by Anycasting target IP addresses locally. Of course, the challenge with this is that those CDN's need to have their own IP addresses in the markets they serve, while the DNS-based CDN's can use IP addresses of the local network with whom they host. I find the latter easier for ISP's, but I'm sure many of the CDN's find the former easier for them, particularly with the lack of IPv4 space in all but one region. Mark. -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJWIJZ7AAoJEGcZuYTeKm+GAM8P/3TtFp7cC/38rfa+ygtsXfgv lz0IqLSbV0+U32SIjMl9F7/oH58zepC0LHmQOm4mbD+oyDSfuV7arzOyS9f4kALC dR1fjd+hHv9EVNvobt3TSyDkRlFYc71WzqusD3L7h7yxpdvq7ILmOW3/9t/TkJDA mxln2vt8WbfHYUg16IQL060us5k2JP82At+L9LHT6IQ4QmtaUQQXXA77Cmht6U+F VUAkdT23jdK3xee4/qbzgwu3XWo06d4XhcmqCCZBwV8BpDG53rNvHyMe7am0WKIF WMj6WSUWfXJAjfhPWfLKMY9zfRtVJPj52bsKzGlRiEulaQol4aSjRBWbloQsJkm4 sbWi7ldj9YmhFkWg1iNOSq4Ek7WlVJoOVpqUZ0S/t0/bAciFjgU/9pSQsWRxSyoc wsLVqxcUarCg6EM6Ya1P0+9N2t+Qc/DSv+cRHIIwc9unxBgyCnJ4+1S8jPHFDfHD T/nCvqGtnPAqT9j8qJCkvcUB44YNv7pjHCCZQz8y8aAff9j+dXHAi/CXp/ZoiOrS 1AEassLSW0kk4GnmAz5AfeSJeV2mRonrJ+yZuPZMzEdCPQfK28X2yD2E97A4uG1z VyAzKHzkNDmvkwTEjY3vn/GzgDLvlZuum8zghe0/7TYOl24fpdcXiA4XmZQXGwnt yaxhXTOhcU3FbqfOVUfm =Vgbr -----END PGP SIGNATURE-----
On 15/Oct/15 16:35, Patrick W. Gilmore wrote:
Remember, they control the servers. All CDNs (that matter) can do this. They can re-direct users with different URLs, different DNS responses, 302s, etc., etc. It is not BGP.
Of course, some other CDN's don't use DNS, and instead use BGP by Anycasting target IP addresses locally. Of course, the challenge with this is that those CDN's need to have their own IP addresses in the markets they serve, while the DNS-based CDN's can use IP addresses of the local network with whom they host. I find the latter easier for ISP's, but I'm sure many of the CDN's find the former easier for them, particularly with the lack of IPv4 space in all but one region. Mark.
participants (4)
-
Baldur Norddahl
-
Baptiste Jonglez
-
Mark Tinka
-
Patrick W. Gilmore