On 15 October 2015 at 22:00, Patrick W. Gilmore <patrick@ianai.net> wrote:
The reason routers do not do that is what you suggest would not work.
Of course it will work and it is in fact exactly the same as your own suggestion, just implemented in the network. Besides it _is already_ a standard feature, it is called equal cost multipath routing. The only difference is dynamically changing the weights between the multipaths.
First, you make the incorrect assumption that inbound will never exceed outbound. Almost all CDN nodes have far more capacity between the servers and the router than the router has to the rest of the world. And CDN nodes are probably the least complicated example in large networks. The only way to ensure A < B is to control A or B - and usually A.
I make absolutely no assumptions about ingress (towards the ASN) as we have no control of that. There is no requirement that routing is symmetric and it is the responsibility of whoever controls the ingress to do something if the port is overloaded in that direction. In the case of a CDN however, the ingress will be very little. Netflix does not take much data in from their customers, it is all egress traffic towards the customers and the CDN is in control of that. The same goes for Google. Two non CDN peers could use the system, but if the traffic level is symmetric then they better both do it.
Second, the router has no idea how much traffic is coming in at any particular moment. Unless you are willing to move streams mid-flow, you can’t guarantee this will work even if sum(in) < sum(out). Your idea would put Flow N on Port X when the SYN (or SYN/ACK) hits. How do you know how many Mbps that flow will be? You do not, therefore you cannot do it right. And do not say you’ll wait for the first few packets and move then. Flows are not static.
Flows can move at any time in a BGP network. As we are talking about CDNs we can assume that we have many many small flows (compared to port size). We can be fairly sure that traffic will not make huge jumps from one second to the next - you will have a nice curve here. You know exactly how much traffic you had the last time period, both out through the contested port and through the alternative paths. Recalculating the weights is just a matter of assuming that the next time period will be the same or that the delta will be the same. It is a classic control loop problem. TCP is trying to do much the same btw. You can adjust how close to 100% you want the algorithm to hit. If it performs badly, give it a little bit more space. If the time period is one second, flows can move once a second at maximum and very few flows would be likely to move. You could get a few out of order packets on your flow, which is not such a big issue in a rare event.
Third…. Actually, since 1 & 2 are each sufficient to show why it doesn’t work, not sure I need to go through the next N reasons. But there are plenty more.
There are more reasons why this problem is hard to do on the servers :-). Regards, Baldur