Replacement for Avaya CNA/RouteScience
Howdy for reasons it might be inappropriate to discuss on this list we've decided that we're going to replace our Avaya/RouteScience box and we're looking for recommendations on different solutions for 'BGP management appliances'. We're aware of the Internap FCP product, but is there anything else out there besides 'oy, hire a BGP admin ya tool!' that anyone can offer? As always, comments are appreciated. -Drew
Have you considered any of the options from Vyatta? Aside from the "roll your own" community offerings they also have a precompiled virtual appliance as well as a physical appliance you can use. - Michienne Dixon Network Administrator liNKCity 312 Armour Rd. North Kansas City, MO 64116 www.linkcity.org (816) 412-7990 -----Original Message----- From: Drew Weaver [mailto:drew.weaver@thenap.com] Sent: Thursday, July 03, 2008 6:51 AM To: nanog@nanog.org Subject: Replacement for Avaya CNA/RouteScience Howdy for reasons it might be inappropriate to discuss on this list we've decided that we're going to replace our Avaya/RouteScience box and we're looking for recommendations on different solutions for 'BGP management appliances'. We're aware of the Internap FCP product, but is there anything else out there besides 'oy, hire a BGP admin ya tool!' that anyone can offer? As always, comments are appreciated. -Drew
what does vyatta have to do with route intelligence/optimization? vyatta is just a router.. -c -----Original Message----- From: Michienne Dixon [mailto:mdixon@nkc.org] Sent: Thu 07/03/08 10:36 AM To: nanog@nanog.org Subject: RE: Replacement for Avaya CNA/RouteScience Have you considered any of the options from Vyatta? Aside from the "roll your own" community offerings they also have a precompiled virtual appliance as well as a physical appliance you can use. - Michienne Dixon Network Administrator liNKCity 312 Armour Rd. North Kansas City, MO 64116 www.linkcity.org (816) 412-7990 -----Original Message----- From: Drew Weaver [mailto:drew.weaver@thenap.com] Sent: Thursday, July 03, 2008 6:51 AM To: nanog@nanog.org Subject: Replacement for Avaya CNA/RouteScience Howdy for reasons it might be inappropriate to discuss on this list we've decided that we're going to replace our Avaya/RouteScience box and we're looking for recommendations on different solutions for 'BGP management appliances'. We're aware of the Internap FCP product, but is there anything else out there besides 'oy, hire a BGP admin ya tool!' that anyone can offer? As always, comments are appreciated. -Drew QUALITY TECHNOLOGY SERVICES CONFIDENTIALITY NOTICE: This e-mail message including its attachments is classified COMPANY CONFIDENTIAL. It is intended for the person or entity to which it is addressed and may contain confidential material. Quality Technology Services controls the distribution of COMPANY CONFIDENTIAL assets, as such, any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact us at irt@qualitytech.com or 866-239-5000 and destroy all copies of the original message. Thank you.
On Thu, Jul 3, 2008 at 7:50 AM, Drew Weaver <drew.weaver@thenap.com> wrote:
Howdy for reasons it might be inappropriate to discuss on this list we've decided that we're going to replace our Avaya/RouteScience box and we're looking for recommendations on different solutions for 'BGP management appliances'.
We're aware of the Internap FCP product, but is there anything else out there besides 'oy, hire a BGP admin ya tool!' that anyone can offer?
Going off this and previous posts, you'd well-served to follow the advice you sarcastically dispense, and hire an engineer. Opex and capex (spread over a ~2 year product lifetime) costs for the above solutions in a small (several gigabits, several transit providers) environment are right up there with the salary of a junior to mid-level networking professional in most markets. By hiring a live human, you get not only somebody who can tweak localpref, but also a critical thinker who can aid in troubleshooting outages and help you plan for growth. Paul
-----Original Message----- From: Paul Wall [mailto:pauldotwall@gmail.com] Sent: Thursday, July 03, 2008 11:25 AM To: Drew Weaver Cc: nanog@nanog.org Subject: Re: Replacement for Avaya CNA/RouteScience
Going off this and previous posts, you'd well-served to follow the advice you sarcastically dispense, and hire an engineer.
Opex and capex (spread over a ~2 year product lifetime) costs for the above solutions in a small (several gigabits, several transit providers) environment are right up there with the salary of a junior to mid-level networking professional in most markets. By hiring a live human, you get not only somebody who can tweak localpref, but also a critical thinker who can aid in troubleshooting outages and help you plan for growth.
Paul
I'd like to hire that engineer, please. Can you send me his resume? Here's the job description: - Required to works 24x7x365. - Must monitor all network egress points to examine latency, retransmissions, packet loss, link utilization, and link cost. - Required to "tweak localpref" on an average of 5000 prefixes per day, based upon a combination of the above criteria. - Required to write up a daily, weekly, and monthly report to be sent to all managers on said schedule. - Must not require health or dental care. These devices are not a replacement for an actual engineer. They are a supplement to the network to assist the engineer in doing what he should be doing - engineering and planning as opposed to resolving some other network's packet loss/blackhole/peering dispute/latency problem. -evt
Eric Van Tol <eric@atlantech.net> writes:
I'd like to hire that engineer, please. Can you send me his resume? Here's the job description:
- Required to works 24x7x365. - Must monitor all network egress points to examine latency, retransmissions, packet loss, link utilization, and link cost. - Required to "tweak localpref" on an average of 5000 prefixes per day, based upon a combination of the above criteria. - Required to write up a daily, weekly, and monthly report to be sent to all managers on said schedule. - Must not require health or dental care.
These devices are not a replacement for an actual engineer. They are a supplement to the network to assist the engineer in doing what he should be doing - engineering and planning as opposed to resolving some other network's packet loss/blackhole/peering dispute/latency problem.
You can certainly get close to the requirements stated above by offering a decent salary and hiring a reasonably clued engineer with an SP background. You may have to settle for IRC, WoW, or SecondLife as daily recreational activity that doesn't buy you much (expressed in your requirements list as "tweaking localpref"). My general experience with such boxes is that they're awfully good at impressing the PHBs, but not something you can really defend from a cost/benefit perspective. I really do need to go into the "custom painted boxes with LCD screens on the front" business. I could make "melons", like Tom Vu. ---Rob
agreed. i see the most benefit from these boxes geared towards networks with critical apps that are latency intensive and more than a handful of transit providers than i do for a smaller provider.. depending on how many upstreams you're juggling, its not that hard to create some traffic engineering policies that can easily be modified, (whether by hand or you use a script with a front end that can push the changes for you) in order to re-route traffic in the event of issues with an SP network in your end to end path.. personally i think manual traffic engineering and re-routing is one of the more fun parts of engineering.. -christian On Thu, Jul 3, 2008 at 12:50 PM, Robert E. Seastrom <rs@seastrom.com> wrote:
Eric Van Tol <eric@atlantech.net> writes:
I'd like to hire that engineer, please. Can you send me his resume? Here's the job description:
- Required to works 24x7x365. - Must monitor all network egress points to examine latency, retransmissions, packet loss, link utilization, and link cost. - Required to "tweak localpref" on an average of 5000 prefixes per day, based upon a combination of the above criteria. - Required to write up a daily, weekly, and monthly report to be sent to all managers on said schedule. - Must not require health or dental care.
These devices are not a replacement for an actual engineer. They are a supplement to the network to assist the engineer in doing what he should be doing - engineering and planning as opposed to resolving some other network's packet loss/blackhole/peering dispute/latency problem.
You can certainly get close to the requirements stated above by offering a decent salary and hiring a reasonably clued engineer with an SP background. You may have to settle for IRC, WoW, or SecondLife as daily recreational activity that doesn't buy you much (expressed in your requirements list as "tweaking localpref").
My general experience with such boxes is that they're awfully good at impressing the PHBs, but not something you can really defend from a cost/benefit perspective. I really do need to go into the "custom painted boxes with LCD screens on the front" business. I could make "melons", like Tom Vu.
---Rob
-- ^christian$
From: Christian Koch [mailto:christian@broknrobot.com] Sent: Thursday, July 03, 2008 2:39 PM To: Robert E. Seastrom Cc: Eric Van Tol; nanog@nanog.org Subject: Re: Replacement for Avaya CNA/RouteScience
agreed. i see the most benefit from these boxes geared towards networks >with critical apps that are latency intensive and more than a handful of >transit providers than i do for a smaller provider..
Two questions: First, what would you characterize as a "smaller provider"? One that has only one or two transits? If that's the case, then yes, I would definitely agree with you. However, once you go beyond just a few transits and peers, choosing which one to use for an unhealthy destination becomes tedious if you're trying to do it all manually. That said, I believe there is a stopping point at which the size of the network outgrows the need for such a device. Second, can you provide an example of a network where users don't care about latency? I can't say that I've worked on tons of networks, but if "the internet is slow", and even though our customers may not be using the latest in realtime streaming media protocols and apps, they notice.
depending on how many upstreams you're juggling, its not that hard to >create some traffic engineering policies that can easily be modified, >(whether by hand or you use a script with a front end that can push the >changes for you) in order to re-route traffic in the event of issues with >an SP network in your end to end path..
It *is* relatively simple to make routing changes manually, but wouldn't you agree that human error is the cause of most outages? Even the most skilled engineers/techs have days where their fingers are larger than normal. These devices, at least the one we use, makes no changes to router configurations.
personally i think manual traffic engineering and re-routing is one of >the more fun parts of engineering..
-christian
Yes, as long as the problem is interesting. Manually changing localpref on a route because of packet loss in someone else's network, several times per week, is not interesting to me or my staff. Nor is checking every transit link several times a day to make sure that we're not going over a commit when other transits have plenty of bandwidth to spare. In my opinion, most of the value of these types of appliances is to help identify problem areas outside of your network, before end users notice them. I know firsthand that our route optimization appliance frees up my staff to work on other issues such as capacity planning, new service deployments, or discussing the latest MGS4 strategies. Well, hopefully not that last one. -evt
imo, no more than 3-4 transit providers and maybe a presence at 1 or 2 ixp's with x amount of peer's would be small im not saying customers won't/don't care about latency, its just not difficult to route around the problematic nodes (unless SP A/B/C gets to it first and band aid the issue until resolution), maybe i just don't see enough issues to even recognize the problem? agreed, human error is a big cause of a lot of issues. well there are plenty of ways to manipulate traffic other than local_pref, that is why i find it interesting, you have options. i don't understand what the difficulty is in monitoring your bandwidth and understanding your traffic patterns, if this is done properly, you can plan capacity and execute your routing policies for optimal performance, and not have to re-route/re-engineer traffic so often. does your traffic fluctuate that much that you cant get a good grasp on what you're pushing, from who, and when? i definitely see value in appliances like the fcp and route science box, i just think for a smaller provider it may not be necessary - or maybe i have it backwards,and it is a better solution for a smaller provider so they don't have to waste money on highly skilled engineers? maybe i am just thinking "inside" the box at the moment, from an engineers view..if so my apologies for steering off course -christian On Thu, Jul 3, 2008 at 4:51 PM, Eric Van Tol <eric@atlantech.net> wrote:
From: Christian Koch [mailto:christian@broknrobot.com] Sent: Thursday, July 03, 2008 2:39 PM To: Robert E. Seastrom Cc: Eric Van Tol; nanog@nanog.org Subject: Re: Replacement for Avaya CNA/RouteScience
agreed. i see the most benefit from these boxes geared towards networks with critical apps that are latency intensive and more than a handful of transit providers than i do for a smaller provider..
Two questions:
First, what would you characterize as a "smaller provider"? One that has only one or two transits? If that's the case, then yes, I would definitely agree with you. However, once you go beyond just a few transits and peers, choosing which one to use for an unhealthy destination becomes tedious if you're trying to do it all manually. That said, I believe there is a stopping point at which the size of the network outgrows the need for such a device.
Second, can you provide an example of a network where users don't care about latency? I can't say that I've worked on tons of networks, but if "the internet is slow", and even though our customers may not be using the latest in realtime streaming media protocols and apps, they notice.
depending on how many upstreams you're juggling, its not that hard to create some traffic engineering policies that can easily be modified, (whether by hand or you use a script with a front end that can push the changes for you) in order to re-route traffic in the event of issues with an SP network in your end to end path..
It *is* relatively simple to make routing changes manually, but wouldn't you agree that human error is the cause of most outages? Even the most skilled engineers/techs have days where their fingers are larger than normal. These devices, at least the one we use, makes no changes to router configurations.
personally i think manual traffic engineering and re-routing is one of the more fun parts of engineering..
-christian
Yes, as long as the problem is interesting. Manually changing localpref on a route because of packet loss in someone else's network, several times per week, is not interesting to me or my staff. Nor is checking every transit link several times a day to make sure that we're not going over a commit when other transits have plenty of bandwidth to spare.
In my opinion, most of the value of these types of appliances is to help identify problem areas outside of your network, before end users notice them. I know firsthand that our route optimization appliance frees up my staff to work on other issues such as capacity planning, new service deployments, or discussing the latest MGS4 strategies. Well, hopefully not that last one.
-evt
-- ^christian$
On Thu, Jul 03, 2008 at 10:36:27PM -0400, Christian Koch wrote:
i definitely see value in appliances like the fcp and route science box, i just think for a smaller provider it may not be necessary - or maybe i have it backwards,and it is a better solution for a smaller provider so they don't have to waste money on highly skilled engineers? maybe i am just thinking "inside" the box at the moment, from an engineers view..if so my apologies for steering off course
The FCP stinks at managing blackholing. There's supposedly new code on the way to help with some of the blackhole avoidance, but I'll believe it when I see it. It can only really control the outbound path, so if someone else chooses a path to me that blackholed between us, there's not a lot it can do. On the other hand, the best value of the FCP is commit management. It does a fantastic job of making sure we pay the least amount of money to our tranit providers. No more manual balancing of traffic frees up a lot of time, and having an automatic process for it means that we never exceed commit on links that we don't have to. The FCP produces lovely graphs and charts that describe this, which is probably why people accuse it of being too PHB-friendly. But Internap wasn't stupid - one of those pretty charts is cost savings the FCP has accumulated this month vs. the natural BGP decision. For a network with a heavy outbound bias, that quickly adds up to a decent chunk of change. Ross -- Ross Vandegrift ross@kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
Ross Vandegrift wrote:
On Thu, Jul 03, 2008 at 10:36:27PM -0400, Christian Koch wrote:
i definitely see value in appliances like the fcp and route science box, i just think for a smaller provider it may not be necessary - or maybe i have it backwards,and it is a better solution for a smaller provider so they don't have to waste money on highly skilled engineers? maybe i am just thinking "inside" the box at the moment, from an engineers view..if so my apologies for steering off course
We've used the FCP for quite a few years now, with "good" success. The point at which we started seeing it being worthwhile was about 4 providers. Many of the challenges weren't having qualified engineers, or knowing the nature of your traffic, it was more a matter of being able to be dynamic, aware of the impact of the prefixes/ASN's that you are making changes to, managing cost, etc. In a content heavy network, where your traffic patterns vary greatly (based on clients/visitors all over the world), just knowing your traffic isn't enough. The argument could probably be made that you could script some of this, but it still doesn't get you the same solution, so partly it depends if you need a complete solution. We reached a point that in order to monitor traffic, commits, costs, performance, etc.. That we were spending a significant amount of time to do this with an engineer (or 3). It's an ongoing thing, not a once a day change, and with all the factors involved as to why you would make a change, it becomes far less accurate doing it with an engineer (using scripts, and traffic data) than an appliance designed to do it. Some of the biggest challenges we hit using an engineer were being able to "accurately" determine the amount of data you will be shifting when a change is made, based on a prefix or ASN, also knowing what the performance impact looks like for that prefix or ASN when a change it made to send it via another provider, not to mention monitoring your current active paths to attempt to be aware of performance problems you want to make a pro-active change for.
The FCP stinks at managing blackholing. There's supposedly new code on the way to help with some of the blackhole avoidance, but I'll believe it when I see it. It can only really control the outbound path, so if someone else chooses a path to me that blackholed between us, there's not a lot it can do.
Agreed, none of the appliances I've seen are 100%, nor are they infinitely scalable. We've had numerous issues with blackhole problems and the FCP, and I too won't hold my breath for this to get resolved. Especially since in the last 5 yrs we've used this product, we've seen very little evolution in features and functionality. We are actually at the point that we're out growing the abilities of the FCP, and I'm interested in the input on this thread to try and figure out what's next. The preferred method of data collection with the FCP is SPAN/Monitor, however for our network/topology that doesn't scale well (not to mention their costs don't scale well either). They also support Netflow, but have a VERY limited ability to process it in any volume.
On the other hand, the best value of the FCP is commit management. It does a fantastic job of making sure we pay the least amount of money to our tranit providers. No more manual balancing of traffic frees up a lot of time, and having an automatic process for it means that we never exceed commit on links that we don't have to.
The FCP produces lovely graphs and charts that describe this, which is probably why people accuse it of being too PHB-friendly. But Internap wasn't stupid - one of those pretty charts is cost savings the FCP has accumulated this month vs. the natural BGP decision.
For a network with a heavy outbound bias, that quickly adds up to a decent chunk of change.
Ross
------------------------------------------------------ Tom Sands Chief Network Engineer Rackspace (210)312-4391 ------------------------------------------------------ Confidentiality Notice: This e-mail message (including any attached or embedded documents) is intended for the exclusive and confidential use of the individual or entity to which this message is addressed, and unless otherwise expressly indicated, is confidential and privileged information of Rackspace. Any dissemination, distribution or copying of the enclosed material is prohibited. If you receive this transmission in error, please notify us immediately by e-mail at abuse@rackspace.com, and delete the original message. Your cooperation is appreciated.
If you already own Cisco gear, Cisco OER (which now has another marketing name) might do the trick without buying any appliances, as it runs on top of IOS. Rubens On Thu, Jul 3, 2008 at 8:50 AM, Drew Weaver <drew.weaver@thenap.com> wrote:
Howdy for reasons it might be inappropriate to discuss on this list we've decided that we're going to replace our Avaya/RouteScience box and we're looking for recommendations on different solutions for 'BGP management appliances'.
We're aware of the Internap FCP product, but is there anything else out there besides 'oy, hire a BGP admin ya tool!' that anyone can offer?
As always, comments are appreciated.
-Drew
participants (10)
-
Christian Koch
-
Drew Weaver
-
Eric Van Tol
-
Koch, Christian
-
Michienne Dixon
-
Paul Wall
-
Robert E. Seastrom
-
Ross Vandegrift
-
Rubens Kuhl Jr.
-
Tom Sands