Anycast applicable to Radius Server Farm ?
Hi, we have a radius server farm. there is a L4 switch installed behind all servers. Incoming AAA packets are switched by L4 switch to different servers. In previous days we met a couple of problems with L4 switch which degraded our service a lot. Could it be possible to implement IPv4 Anycast architecture for radius server farm? Could it be any problem with AAA procedure? Any advice will be highly appreciated Joe __________________________________ Do you Yahoo!? Yahoo! Movies - Search movie info and celeb profiles and photos. http://sg.movies.yahoo.com/
JS> Date: Mon, 8 May 2006 12:07:13 +0800 (CST) JS> From: Joe Shen JS> Could it be possible to implement IPv4 Anycast architecture for JS> radius server farm? Yes. JS> Could it be any problem with AAA procedure? UDP is anycast-friendly. Your biggest problems are likely to be authentication database replication/synchronization and merging accounting records... i.e., nothing really different from standard RADIUS deployments. Try ECMP if you want load balancing without the L4-ish gear. This implies routers between the NASes and RADIUS boxen, but you _did_ specify anycast. ;-) Load balancing is trickier when RADIUS servers and NASes live on the same network segment. You'll need something a la Windows Advanced Server or distributed 802.3ad. I know of no turn-key implementation of the latter; I played around with it a few years back, but the project was shelved before completion. Several modern *ix flavors include rudimentary 802.3ad support, so implementation should be easier these days. (Note that MAC-based technology strays away from "anycast" in the sense that it operates at L2 instead of L3.) HTH, Eddy -- Everquick Internet - http://www.everquick.net/ A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/ Bandwidth, consulting, e-commerce, hosting, and network building Phone: +1 785 865 5885 Lawrence and [inter]national Phone: +1 316 794 8922 Wichita ________________________________________________________________________ DO NOT send mail to the following addresses: davidc@brics.com -*- jfconmaapaq@intc.net -*- sam@everquick.net Sending mail to spambait addresses is a great way to get blocked. Ditto for broken OOO autoresponders and foolish AV software backscatter.
JS> Could it be any problem with AAA procedure?
UDP is anycast-friendly. Your biggest problems are likely to be authentication database replication/synchronization and merging accounting records... i.e., nothing really different from standard RADIUS deployments.
What I met problem to understand is, 1) Is that required to route traffic from a specific BRAS to exact one server if DB behind radius server is syncronized periodically 2) There is two Farm, each has several servers. As number of paths supported by cisco/Juniper router is limited ( <= 8 or 16), we could not mix those server into one farm. is there any way to balance load between two or more farms automatically?
Load balancing is trickier when RADIUS servers and NASes live on the same network segment. You'll need something a la Windows Advanced Server or distributed 802.3ad. I know of no turn-key implementation of the latter;
Do you mean aggregate interfaces of several servers into one 802.3ad trunk? I think even NASes and radius live on the same ethernet, OSPF/IS-IS could establish equal cost paths. thanks Joe __________________________________ Do you Yahoo!? Yahoo! Movies - Search movie info and celeb profiles and photos. http://sg.movies.yahoo.com/
Hello Joe - Can you indicate in more detail what the problems were with the L4 switch? If the loadbalancing is done by source/destination IP address pairs, then you can have problems when a target goes down, as all of the source/destination IP address pairs will get switched to another target which then gets into difficulty and you end up with a cascading failure. It is generally preferable to have the loadbalancing done on a weighted per-packet basis, ideally distributed according to round-trip times. Also note that you can only do per-packet loadbalancing with simple RADIUS, things like EAP that require multiple exchanges of RADIUS requests typically require state to be maintained in the single RADIUS server that is processing the entire EAP sequence. regards Hugh On 8 May 2006, at 14:07, Joe Shen wrote:
Hi,
we have a radius server farm. there is a L4 switch installed behind all servers. Incoming AAA packets are switched by L4 switch to different servers.
In previous days we met a couple of problems with L4 switch which degraded our service a lot. Could it be possible to implement IPv4 Anycast architecture for radius server farm? Could it be any problem with AAA procedure?
Any advice will be highly appreciated
Joe
__________________________________ Do you Yahoo!? Yahoo! Movies - Search movie info and celeb profiles and photos. http://sg.movies.yahoo.com/
NB: Have you read the reference manual ("doc/ref.html")? Have you searched the mailing list archive (www.open.com.au/archives/ radiator)? Have you had a quick look on Google (www.google.com)? Have you included a copy of your configuration file (no secrets), together with a trace 4 debug showing what is happening? -- Radiator: the most portable, flexible and configurable RADIUS server anywhere. Available on *NIX, *BSD, Windows, MacOS X. - Nets: internetwork inventory and management - graphical, extensible, flexible with hardware, software, platform and database independence. - CATool: Private Certificate Authority for Unix and Unix-like systems.
Can you indicate in more detail what the problems were with the L4 switch?
We seperate our Radius servers into two farms, each farm has a L4 switch in front. To our understanding, radius authentication info. and accounting info. of a PPPoE session should be processed by the same Radius server. So, although L4 switch provides a single IP for BRAS configuration each BRAS is specified a real server IP in L4 switch. So, there comes the problem: 1) Load is not balanced automatically but by human estimation; there is server whose load is twice of some other server. 2) L4 switch becomes bottleneck of service availability. In past years, L4 switch caused several times of service failure. Just last friday, L4 switch does not repond to any network packets while its ethernet interface seems OK. 3) As L4 switch is the only entrance to a single server farm, DoS attack or some other kind of software bug will surely degrade security level. While, a farm using ECMP rely on server groups to resist DoS attack. 4) Maintence is a little bit costy. Any maintence , no matter on radius server or on L4 switch, need a scheduled time window. 5) Service protection is hard ( as you mentioned as 'cascade' one). As there are two server farms, if one farm failed it takes ten or more minute to migrate those Radius traffic to the other farm. This is unacceptable. So, we consider to find a more scable, reliable, secure and automatic multi-farm radius solution. Joe
If the loadbalancing is done by source/destination IP address pairs, then you can have problems when a target goes down, as all of the source/destination IP address pairs will get switched to another target which then gets into difficulty and you end up with a cascading failure. It is generally preferable to have the loadbalancing done on a weighted per-packet basis, ideally distributed according to round-trip times.
Also note that you can only do per-packet loadbalancing with simple RADIUS, things like EAP that require multiple exchanges of RADIUS requests typically require state to be maintained in the single RADIUS server that is processing the entire EAP sequence.
regards
Hugh
On 8 May 2006, at 14:07, Joe Shen wrote:
Hi,
we have a radius server farm. there is a L4 switch installed behind all servers. Incoming AAA packets
switched by L4 switch to different servers.
In previous days we met a couple of problems with L4 switch which degraded our service a lot. Could it be possible to implement IPv4 Anycast architecture for radius server farm? Could it be any problem with AAA procedure?
Any advice will be highly appreciated
Joe
__________________________________ Do you Yahoo!? Yahoo! Movies - Search movie info and celeb
are profiles and photos.
NB:
Have you read the reference manual ("doc/ref.html")? Have you searched the mailing list archive (www.open.com.au/archives/ radiator)? Have you had a quick look on Google (www.google.com)? Have you included a copy of your configuration file (no secrets), together with a trace 4 debug showing what is happening?
-- Radiator: the most portable, flexible and configurable RADIUS server anywhere. Available on *NIX, *BSD, Windows, MacOS X. - Nets: internetwork inventory and management - graphical, extensible, flexible with hardware, software, platform and database independence. - CATool: Private Certificate Authority for Unix and Unix-like systems.
__________________________________ Do you Yahoo!? Yahoo! Movies - Search movie info and celeb profiles and photos. http://sg.movies.yahoo.com/
Joe Shen wrote:
Can you indicate in more detail what the problems were with the L4 switch?
We seperate our Radius servers into two farms, each farm has a L4 switch in front. To our understanding, radius authentication info. and accounting info. of a PPPoE session should be processed by the same Radius server.
I dont think its true. If the auth radius server fails to respond, authentication and accounting will then go to the next configured server
So, although L4 switch provides a single IP for BRAS configuration each BRAS is specified a real server IP in L4 switch. So, there comes the problem:
1) Load is not balanced automatically but by human estimation; there is server whose load is twice of some other server.
See if you can extract load from the radius server using snmp or something and make your l4 switch utlilize that.
2) L4 switch becomes bottleneck of service availability. In past years, L4 switch caused several times of service failure. Just last friday, L4 switch does not repond to any network packets while its ethernet interface seems OK.
Add a couple of the actual servers IPs to the aaa servers the NAS's use
3) As L4 switch is the only entrance to a single server farm, DoS attack or some other kind of software bug will surely degrade security level. While, a farm using ECMP rely on server groups to resist DoS attack.
Your firewalls should be protecting your radius servers from DoS -- unless you really expect the world to communicate with them. Spoofed sources however could be hard to protect against.
4) Maintence is a little bit costy. Any maintence , no matter on radius server or on L4 switch, need a scheduled time window.
5) Service protection is hard ( as you mentioned as 'cascade' one). As there are two server farms, if one farm failed it takes ten or more minute to migrate those Radius traffic to the other farm. This is unacceptable.
Let the nas do it. they fail over much faster than that. Whatever you choose, try to combine the ability of the nas to failover radius servers into your redundancy plan.
Hello Joe - On 9 May 2006, at 01:23, Joe Shen wrote:
Can you indicate in more detail what the problems were with the L4 switch?
We seperate our Radius servers into two farms, each farm has a L4 switch in front. To our understanding, radius authentication info. and accounting info. of a PPPoE session should be processed by the same Radius server. So, although L4 switch provides a single IP for BRAS configuration each BRAS is specified a real server IP in L4 switch. So, there comes the problem:
Normal RADIUS does not require authentication and accounting for a single session to go to the same RADIUS server.
1) Load is not balanced automatically but by human estimation; there is server whose load is twice of some other server.
You should use a loadbalancer that can distribute RADIUS requests on a per-request basis according to round-trip times which will be a reasonable indication of server load. Ie. the fastest round-trip time will be from the least-loaded server.
2) L4 switch becomes bottleneck of service availability. In past years, L4 switch caused several times of service failure. Just last friday, L4 switch does not repond to any network packets while its ethernet interface seems OK.
I suggest you find a better loadbalancer. Contact me off list if you need suggestions.
3) As L4 switch is the only entrance to a single server farm, DoS attack or some other kind of software bug will surely degrade security level. While, a farm using ECMP rely on server groups to resist DoS attack.
You should design your system with two loadbalancers, and configure your NAS equipment to use one as primary and the other as secondary. You should configure half of your NAS equipment to use loadbalancer A as primary, and the other half of your NAS equipment to use loadbalancer B as primary (and the converse for secondary).
4) Maintence is a little bit costy. Any maintence , no matter on radius server or on L4 switch, need a scheduled time window.
A design as above will have no single point of failure.
5) Service protection is hard ( as you mentioned as 'cascade' one). As there are two server farms, if one farm failed it takes ten or more minute to migrate those Radius traffic to the other farm. This is unacceptable.
If you set your RADIUS timeouts and retries on the NAS equipment sensibly, depending on what end-user devices are being used (PC modems, DSL modems, GPRS WAP phones, mail servers, web servers ...) any outage should have almost imperceptible impact.
So, we consider to find a more scable, reliable, secure and automatic multi-farm radius solution.
hope that helps regards Hugh
Joe
If the loadbalancing is done by source/destination IP address pairs, then you can have problems when a target goes down, as all of the source/destination IP address pairs will get switched to another target which then gets into difficulty and you end up with a cascading failure. It is generally preferable to have the loadbalancing done on a weighted per-packet basis, ideally distributed according to round-trip times.
Also note that you can only do per-packet loadbalancing with simple RADIUS, things like EAP that require multiple exchanges of RADIUS requests typically require state to be maintained in the single RADIUS server that is processing the entire EAP sequence.
regards
Hugh
On 8 May 2006, at 14:07, Joe Shen wrote:
Hi,
we have a radius server farm. there is a L4 switch installed behind all servers. Incoming AAA packets
switched by L4 switch to different servers.
In previous days we met a couple of problems with L4 switch which degraded our service a lot. Could it be possible to implement IPv4 Anycast architecture for radius server farm? Could it be any problem with AAA procedure?
Any advice will be highly appreciated
Joe
__________________________________ Do you Yahoo!? Yahoo! Movies - Search movie info and celeb
are profiles and photos.
NB:
Have you read the reference manual ("doc/ref.html")? Have you searched the mailing list archive (www.open.com.au/archives/ radiator)? Have you had a quick look on Google (www.google.com)? Have you included a copy of your configuration file (no secrets), together with a trace 4 debug showing what is happening?
-- Radiator: the most portable, flexible and configurable RADIUS server anywhere. Available on *NIX, *BSD, Windows, MacOS X. - Nets: internetwork inventory and management - graphical, extensible, flexible with hardware, software, platform and database independence. - CATool: Private Certificate Authority for Unix and Unix-like systems.
__________________________________ Do you Yahoo!? Yahoo! Movies - Search movie info and celeb profiles and photos. http://sg.movies.yahoo.com/
NB: Have you read the reference manual ("doc/ref.html")? Have you searched the mailing list archive (www.open.com.au/archives/ radiator)? Have you had a quick look on Google (www.google.com)? Have you included a copy of your configuration file (no secrets), together with a trace 4 debug showing what is happening? -- Radiator: the most portable, flexible and configurable RADIUS server anywhere. Available on *NIX, *BSD, Windows, MacOS X. - Nets: internetwork inventory and management - graphical, extensible, flexible with hardware, software, platform and database independence. - CATool: Private Certificate Authority for Unix and Unix-like systems.
participants (4)
-
Edward B. DREGER
-
Hugh Irvine
-
Joe Maimon
-
Joe Shen