When the server sends TCP traffic for that same connection back to host A, it needs to pick one of the N routers, in other words, it needs to pick an outbound interface from its N interfaces. ... The problem is that some routers are "better" than other routers in the sense that they are closer to the final destination address A. (For example, each router could be connected to a different ISP.)
One way for the server to pick the "optimal" downstream router, is to run "stub BGP" between the server and each of the routers. ... While this approach would certainly allow the server to pick the optimal downstream router in all cases, I would prefer not to run routing protocols on this server for a number of reasons:
It's probably good to keep in mind that this would be "optimal", not optimal. As far as I know the best you would get is to minimize the number of AS hops, which is probably correlated with, but definitely not the same as, metrics you actually care about like latency. All in all, running BGP does seem like an awful lot of work just to let you optimize for the wrong metric. Here's another thought, though. You don't need to run BGP to get the data that BGP will give you. There exist approximate maps of the Internet at the router or AS level with IP prefixes attached. It would be possible to periodically obtain one of these graphs, e.g. from CAIDA, and then run a shortest-paths algorithm on that graph to decide based on the destination IP address which router is best. Not only does this let you avoid running BGP, it also saves memory since you need only one copy of the graph, rather than one copy for each of the N BGP sessions. Of course, it's not real-time data, but if all you need is a good guess as to which of the outbound interfaces is best, it might be sufficient. Does anyone actually do something like this in practice? (I'm guessing no)
Someone suggested an idea to me which seems almost to simple to work, but I cannot find any good reason why it would not work.
The idea is "the server simply sends all outbound traffic for the TCP connection out over the same interface over which the most recent TCP traffic for that connection was received".
So the underlying idea here is that the source (or its ISP) has effectively done the work of picking a good path, and by replying on the same interface, you use the reverse of that path, which is also likely to be pretty good. Some of the assumptions in that reasoning seem imperfect: - There's a good chance the forward path (i.e. the one the source picked) isn't the best. - Asymmetry, as you noted: Even if the forward path was the best, the reverse of it is not necessarily the best. - A different asymmetry: Even if the forward was best and the reverse of it is best, the path followed by sending on the same interface is not necessarily the reverse of the forward path. So I understand that this heuristic could perform pretty well in practice, and certainly better than sending on a random interface (in terms of latency, not traffic engineering). But I can't see how it's the optimal strategy. I think there are commercial products that solve this problem the "right" way, by automatically and dynamically monitoring path quality and availability, and selecting paths for you. I think the Avaya Converged Network Analyzer is one. I recall speaking with an operator from a major content provider who said that they use an intelligent route selection product similar to this for their outbound traffic. I'd be personally interested to hear what other operators typically use. ~Brighten Godfrey