Re: Traffic Engineering [was Chanukah [was Re: Hezbollah]]

17 Sep 1997

      Kent,

     As a former network design guy who's done traffic engineering and 
     design (and redesign) on many networks (Internet and otherwise), I 
     disagree that traffic engineering doesn't work for the Internet.

     I've seen many people go with the "throw bandwidth at the problem" as 
     a cure all.  While it tends to work, it tends to be the most expensive 
     method of solving the problem.

     Doing traffic engineering right is hard.  The telcos have it down pat 
     for their voice networks, and telco-based ISPs often have applied 
     this design expertise to their ISP network.  Having a person do 
     traffic engineering can save the ISP big bucks.

     The traffic engineering techniques I'm talking can't handle wildly 
     dynamic situations.  For example, a news event like Princess Di's 
     death greatly increases traffic to/from England which plays temporary 
     havoc with forecasted traffic projections.  However, outside of these 
     anomalies, traffic projections work pretty well.

     I've outlined a basic technique below which works for many types of 
     networks, and has some ISP specific steps.  The key to this analysis is 
     that it takes into account the underlying traffic flows and then 
     determines the appropriate physical backbone topology, or the changes to 
     be made to an existing topology.  This is directly in contrast to the 
     "throw bandwidth at the problem" case that patches a backbone topology 
     which might be sub-optimal in the first place.

     Here's the overall outline:

     1.  Divide your network into a small number of geographic areas
         (between five and ten).  Each geographic area you choose probably  
         has a large city that serves as a major traffic source for that 
         area.  These cities are usually the natural cities for backbone 
         connectivity.  

         Create an NxN matrix, where N is the total number of areas
         in your network.  Each cell in the matrix will represent the total 
         traffic demand between each source/destination geographic area.

         There are several factors which effect this matrix, each of which
         will be discussed below.

        1.  The locality of traffic.
        2.  The typical utilization of customers. 
        3.  The entry/exit points of traffic from your network.

     2.  Identify which % of traffic, if any, has regional locality.
         For pure Internet traffic, the probability that the source and 
         destinatino of traffic are within the same metropolitan area 
         tends to be low (10% or lower for metros within the US).  

         However, there are exceptions.  Telecommuting applications 
         tend to have very high locality.  People close to work dial 
         into work through an ISP, so both the source and destination 
         of traffic tends to stay local (70% or higher).

         Places like the Bay Area in California also tend to have higher 
         traffic locality.  This is because the Bay Area has lots of 
         Internet users (which tend to be traffic sinks), and lots of 
         web sites (which tend to be traffic sources).

         ISPs outside of the US tend to have a higher percentage of 
         traffic staying within the country, especially non-english 
         speaking countries.

     3.  Measure/estimate the typical utilization of customers.

         Utilization needs to be measured/estimated in both send 
         and receive directions.  Dial-up users typically receive
         almost seven times as much as they send.  Corporate customers 
         not doing telecommuting applications tend to receive about 
         four times as much as they send (less because corporations 
         have web sites that others access).  Web server farms have 
         the opposite characteristics of dial-up users.

         Percentage utilization tends to increase with bandwidth.  
         In the U.S., a T1 customer connection typically has a 
         peak recieve utilization of 20% or less.  However, a DS3 
         customer can easily have a receive utilization of 50% or 
         more.  Simple explanation is that someone paying big bucks 
         for a DS3 wants to make sure it is justified.

         So, take the total number of users in each area, the 
         connection speeds and customer types, multiply by the 
         appropriate factors, and you get the total demand you are 
         trying to serve out of each area.

         Take this traffic demand, and multiply it by the non-local 
         traffic.  This represents the total traffic that you need 
         to get either in/out of the network, or in/out of this 
         particular part of your network.

     4.  Determine the entry/exit points for traffic with your network,
         and its effect upon the traffic matrix.

         How do you setup your routing policies?  Many ISPs use nearest 
         exit.  If the nearest exit is in the same geographic area, the 
         traffic sent by your customers does not affect any other part 
         of the overall traffic matrix.  If the nearest exit is not 
         within the same geographic area, determine the area where this 
         traffic will be sent.  Enter this value in the appropriate 
         source/destination box of the traffic matrix.

         It gets harder when peering with many other ISPs, some of whom 
         you connect to in the same area, and others in remote areas.  
         In this case, determine which percentage of the traffic goes 
         into each particular region, and 

         The main traffic sources into your network (excluding your 
         customers) are your peering points (both public and private). 
         The amount of traffic from each peering point is measurable.  
         You can generally estimate that this traffic is to be 
         distributed proportional to the overall traffic demand in each 
         geographic area.

         This is a significant amount of matrix math, but the overall 
         concept is simple.  Determine the overall flow between one 
         part of your network to another.

     5.  With me so far?  Good, now it's time to design your backbone to
         handle your demands.  You can use dedicated lines or layer two 
         services such as Frame Relay or ATM.

         The simplicity of using Frame Relay or ATM is that the circuits 
         you need between each geographic area has been defined by your traffic 
         matrix.  This is part of the appeal of using public L2 services for a 
         backbone.

         Designing your own backbone is a bit harder.  The actual topology 
         tends to be straightforward--you need to connect up the major cities 
         in each of the geographic areas.  For five areas, a simple ring 
         suffices.  For up to 10 areas, this tends to be rings bisected once or 
         twice.

         The real work in designing your own backbone is in satisfying the 
         traffic demands going across your network.  Remember that geographic 
         areas in the center of your network have to carry the traffic demands 
         going across your network.  This imposes a heavier burden in the 
         center of the network than the traffic matrix would indicate.  You 
         also have to worry about resiliency, having sufficient bandwidth
         when the backhoes go fiber hunting, etc.

     6.  Design the network within each geographic areas.

         The steps for designing the network within each geographic area tend 
         to be similar to that of designing the overall network.  Breaking the 
         overall design process into a regional network and backbone network 
         makes the problem more tractable.

     7.  Measure data from a real network.

         This is really important.  You've made lots of assumptions.  Regularly 
         check the overall traffic to see if it matches the assumptions.  
         Refine the traffic matrix to see if it still represents reality.  
         Create trendlines which show the overall traffic changes to/from each 
         area, and project these trendlines into the future.  You will tend to 
         have pretty good certainty about 4 months into the future, with the 
         value of the information decaying after that.

         Use this data to determine where to add additional peering points. 
         Estimate what impact this would have on the traffic matrix.

     8.  Factor the measured and projected data into the next network backbone 
         design.

         This next backbone design gives you the optimum backbone given the 
         underlying flows in your network.  See what changes you need to 
         make to your backbone to get to this new optimum backbone, and 
         order the circuits.

     Phew!  Like I said earlier, it is hard to do right, and I've left out 
     quite a few details in the above outline.  But having been there, done 
     that, (quite a few times) I can say it really works.  And it saves 
     ISPs money!

     Question for NANOG members.  How important is traffic engineering 
     given that it is fairly hard to do properly and you folks have enough 
     other things to think about?  

     Prabhu Kavi
     IP Business Marketing Manager
     Ascend Communications
     prabhu.kavi@ascend.com

______________________________ Reply Separator _________________________________
Subject: Chanukah [was Re: Hezbollah]
Author:  "Kent W. England" <kwe@geo.net> at smtplink 
Date:    9/16/97 2:09 PM

At 05:03 PM 14-09-97 -0400, Dorian R. Kim wrote:
...
... One of the
things that needs to be engineered into building and maintaining 
national/international backbones is traffic accounting to an arbitrary 
granularity that paves the way for better traffic engineering and 
bandwidth projections. There are already ample tools to to per-prefix 
matrix of traffic right now. Tying this in with good sales projections 
will alleviate much of the last minute fire fighting.
This will most likely never be 100% accurate and precise, but there is
no reason why we can't get a better handle on bandwidth forecasts. (say to 
95% percentile)
Dorian;

I don't want to throw cold water on the value of planning and foresight, 
but in terms of predicting traffic patterns it has never worked on the 
Internet. It sounds good and that was the argument that all the mainframe 
networkers made to us early Internet networkers -- Why can't you tell me 
upfront what your bandwidth requirements are going to be? Don't you know 
exactly how many terminals you have and where they are and what application 
keystrokes are going to be pressed at any given time? How else can you 
guarantee response time in your network? This Internet stuff is stupid. 
It'll never work.

Somehow with the way that HTTP/HTML caught fire and Internet-CB (aka 
VocalTec and CUSeeMe) took off, I would be loath to think I could project 
my backbone needs with any reliability based on *historical* projections.
...
Furthermore, with the deployment of WDM and Internet core devices moving 
closer to the transmission gear, if you have access to fiber, getting more 
bandwidth may become as straightforward as using an additional wavelength 
on the ADM that your router's plugged into.
-dorian
This I like a lot better as a design technique. Throwing more bandwidth at 
the problem almost always works (unless the transport protocol is broken). 
Like Peter Kline said, Turn up the speed dial upon onset of congestion. 
Simple. Effective.

Then again, creating a data architecture for the web (a problem that has 
been recognized, but not addressed in the last five years) would eliminate 
much of the backbone bandwidth demand. What would happen if -- presto -- a 
data architecture for the web showed up one day? A lot of backbone 
bandwidth would become surplus and a lot more edge bandwidth would be 
needed ASAP. What does that do to historical projections?

--Kent

Re: Traffic Engineering [was Chanukah [was Re: Hezbollah]]

pkavi＠pcmail.casc.com