RE: The state of TACACS+

29 Dec 2014

      I've long since deleted the OP's message, but figured I would share our 
experiences having been using TACACS+ with our Cisco hardware for a 
couple of years.
Originally deployed for the need and want of controlling multiple users 
across several devices, and to safely control 3rd party read, or 
reverse-telnet access to the very few nodes that may need it, without 
needing to mess around with parser views on every device.
To that end it's worked just fine without complaint.
Note: We're using shrubbery.net's tac_plus.

The per-command authorization does slow some nodes down slightly, but 
nothing as severe as a few seconds each It does work out to about 1 
command per (1000ms / Node to AAA RTT) as you'd expect.  Eg; The worst 
I've seen on a ~200ms link, copy/paste lump-of-config will work out to 
about the expected 4-5 commands/second.  Devices running v15 seem to 
speed this up somehow, not sure if they multiplex commands under the 
hood, or if I'm just crazy.  I've never looked into it that closely for 
lack of interest and time.

There is a stupid gotcha when dealing with the command authorization in 
the TACACS configuration. If you permit 'johndoe' a 'show ip bgp .*', 
and he is also a member of a group with subsequent show commands, the 
show commands in the 'group' config block are completely ignored.  This 
makes some scenarios tricky.

We utilize a local root, unprivileged user with unique credentials 
across each device. It's possible to configure Cisco's AAA to prevent 
the local user login while AAA is up / reachable.
Generally, we are of the opinion that if our nodes cannot reach the AAA 
server, we have bigger problems that would necessitate a senior 
administrator with access to the local root user credentials anyway.
Otherwise, a TACACS server can be setup in literally minutes and the 
configuration required is minuscule and easy to backup safely.

A note on the local root user.  By far and away, the worst possible 
scenario is not AAA going down / becoming entirely unreachable, but 
instead when experiencing network instability. Having experienced this 
scenario for a few very frustrating hours, the experience is along the 
lines of;
- Enter a pile of commands. Some fail (wile AAA is briefly up), some 
succeed (while AAA is down).
- Swear at your console, and repeat until the problem(s) are resolved.
Our workaround was;
Add your backup / root user with full privileges to your TACACS 
backend, with _no_ password.  This denies them access when AAA is 
running as there is no password to authenticate against, but prevents 
"Authorization failed!" when the AAA server is briefly available in the 
middle of your diags / trying to resolve the connectivity problem.

For the Unix admins; The TACACS binary itself, is awful - It has no 
status exit codes. The process cannot be monitored or controlled safely 
by way of something like DJB's daemontools, even with the fg_helper hack 
- at least I've not managed to succeed to date and have given up. To 
that end, we have a hacked together script to assist with safely 
reloading configs and such that parses stdout and stderr to decide what 
to do.  Eg; trying to gracefully restart TACACS with a broken config 
will cause the daemon to exit - not awesome.

All that said, I have heard a lot of praise from an enterprise in my 
neck of the woods who shelled out for Cisco's TACACS+ VM Appliance. If 
you have the money it's supposedly worth it especially for the AD hooks.

I hope this provides some insight to those that may need it.

________________________________________
 From: NANOG [nanog-bounces@nanog.org] On Behalf Of Colton Conor 
[colton.conor@gmail.com]
Sent: Monday, December 29, 2014 9:28 AM
To: Michael Douglas
Cc: NANOG
Subject: Re: The state of TACACS+

Glad to know you can make local access only work if TACAS+ isn't 
available.
However, that still doesn't prevent the employee who know the local
username and password to unplug the device from the network, and the 
use
the local password to get in. Still better than our current setup of 
having
one default username and password that everyone knows.

On Mon, Dec 29, 2014 at 9:38 AM, Michael Douglas 
<Michael.Douglas@ieee.org>
wrote:
...
In the Cisco world the AAA config is typically set up to try tacacs 
first,
and local accounts second.  The local account is only usable if 
tacacs is
unavailable.  Knowledge of the local username/password does not 
equate to
full time access with that credential.  Also, you would usually 
filter the
incoming SSH sessions to only permit a particular management IP 
range; the
local credential, or tacacs credential, shouldn't be usable from any
arbitrary network.
On Mon, Dec 29, 2014 at 10:32 AM, Colton Conor 
<colton.conor@gmail.com>
wrote:
...
Scott,
Thanks for the response. How do you make sure the failsafe and/or 
root
password that is stored in the device incase remote auth fails 
can't be
accessed without having several employees engaged? Are there any
mechanisms
for doing so?
My fear would be we would hire an outsourced tech. After a certain 
amount
of time we would have to let this part timer go, and would disabled 
his
or
her username and password in TACAS. However, if that tech still 
knows the
root password they could still remotely login to our network and 
cause
havoc. The thought of having to change the root password on 
hundreds of
devices doesn't sound appealing either every time an employee is 
let go.
To
make matters worse we are using an outsourced firm for some network
management, so the case of hiring and firing is fairly consistent.
On Mon, Dec 29, 2014 at 9:22 AM, Scott Helms <khelms@zcorum.com> 
wrote:
...
Colton,
Yes, that's the 'normal' way of setting it up.  Basically you 
still
have
to configure a root user, but that user name and password is kept
locked
up
and only accessed in case of catastrophic failure of the remote
authentication system.  An important note is to make sure that 
the fail
safe password can't be accessed without having several people 
engaged
so
it
can't be used without many people knowing.
Scott Helms
Vice President of Technology
ZCorum
(678) 507-5000
--------------------------------
http://twitter.com/kscotthelms
--------------------------------
On Mon, Dec 29, 2014 at 10:15 AM, Colton Conor 
<colton.conor@gmail.com
...
wrote:
...
We are able to implement TACAS+. It is my understanding this a 
fairly
old
protocol, so are you saying there are numerous bugs that still 
need to
be
fixed?
A question I have is TACAS+ is usually hosted on a server, and
networking
devices are configured to reach out to the server for 
authentication.
My
question is what happens if the device can't reach the server if 
the
devices network connection is offline? Our goal with TACAS+ is 
to not
have
any default/saved passwords. Every employee will have their own
username
and password. That way if an employee gets hired/fired, we can 
enable
or
disable their account. We are trying to avoid having any 
organization
wide
or network wide default username or password. Is this possible? 
Do the
devices keep of log of the last successful username/password
combinations
that worked incase the device goes offline?
On Sun, Dec 28, 2014 at 5:02 PM, Robert Drake 
<rdrake@direcpath.com>
wrote:
...
Picking back up where this left off last year, because I 
apparently
only
work on TACACS during the holidays :)
On 12/30/2013 7:28 PM, Jimmy Hess wrote:
...
Even 5 seconds extra for each command may hinder operators, 
to the
extent
it would be intolerable;     shell commands should run almost
instantaneously....  this is not a GUI, with an hourglass.
 Real-time
responsiveness in a shell is crucial --- which remote auth 
should
not
change.   Sometimes operators paste a  buffer with a fair 
number of
commands,  not expecting a second delay between each command 
---  a
repeated delay, may also break a pasted sequence.
It is very possible for two of three auth servers to be
unreachable,
in
case of a network break, but that isn't necessary.      The
"response
timeout"  might be 5 seconds,  but in reality, there are 
cases
where
you
would wait  longer,  and that is tragic,   since there are 
some
obvious
alternative approaches that would have had results  that 
would be
more
'friendly'  to the interactive user.
(Like remembering which server is working for a while,   or
remembering
that all servers are down -- for a while,  and having a  50ms
timeout,
  with all servers queried in parallel,  instead of a 5 
seconds
timeout)
I think this needs to be part of the specification.
I'm sure the reason they didn't do parallel queries was 
because of
both
network and CPU load back when the protocol was drafted.  But 
it
might
be
good to have local caching of authentication so that can 
happen even
when
servers are down or slow.  Authorization could be updated to 
send
the
permissions to the router for local handling. Then if the 
server
dies
while
a session is open only accounting would be affected.
That does increase the vendors/implementors work but it might 
be
doable
in
phases and with partial support with the clients and servers
negotiating
what is possible.  The biggest drawback to making things like 
this
better
is you don't gain much except during outages and if you 
increase
complexity
too much you make it wide open for bugs.
Maybe there is a simpler solution that keeps you happy about
redundancy
but doesn't increase complexity that much (possibly anycast 
tacacs,
but
the
session basis of the protocol has always made that not 
feasible).
It's
possible that one of the L4 protocols Saku Ytti mentioned, 
QUIC or
MinimaLT
would address these problems too.  It's possible that if we 
did the
transport with BEEP it would also provide this, but I'm 
reading the
docs
and I don't think it goes that far in terms of connection 
assurance.
...
--
-JH
So, here is my TACACS RFC christmas list:
1.  underlying crypto
2.  ssh host key authentication - having the router ask tacacs 
for
an
authorized_keys list for rdrake.  I'm willing to let this go 
because
many
vendors are finding ways to do key distribution, but I'd still 
like
to
have
a standard (https://code.google.com/p/openssh-lpk/ for how to 
do
this
over LDAP in UNIX)
3.  authentication and authorization caching and/or something 
else

emille＠abccomm.com

tags

participants (1)