Leigh Porter wrote:
Could you have two instances of RADIUS, one for the middle-man and ignore the accounting from that server?
Well... First I'd like to thank all of those who responded off-list. To not waste everyone's time, I'd like to throw out there that this message can technically be pruned to PPPoE DSL ops. For completeness sake, I'll describe the problem (in more detail), and provide further info, as I think that we've got it solved. I'd appreciate feedback if anyone notices a flaw in my thinking, because as I've said, we auth users on DSL... we do not operate the DSL infrastructure. We have (from my unconfirmed understanding): Bell BAS/LAC---DSL LNS---ISP LNS----Me | | My Radius We were receiving auth requests from the ISP LNS. We were receiving acct requests from both the DSL LNS and the ISP LNS. The packets from both ISP and DSL are over trinary Internet paths, and don't rely on each other for us to receive them (or respond to them). I don't know whether it was the NASs themselves that were sending the RADIUS packets, or whether they were sent from a RADIUS server. I'm not familiar with those inner workings. My RADIUS logs would show the requests coming from a DNS name that included "lns" in both cases. Two problems were apparent. The first cosmetic, the second affected operations. - the duplicate acct packets (one from ISP and a second from DSL) were doubling up our accounting data for each user authentication - users who were ``kicked'' from the ISP (according to RADIUS logs) would not attempt to re-auth, causing a major helpdesk issue (sync, no conn) A colleague and I went to work on the issue, essentially trying to reverse engineer the problem, as we have no access to the intermediary gear, and as such, no way to access logs and/or details. We have found so far that it appears as though a user is authenticated once via our RADIUS server (as expected). We would then receive standard RADIUS acct packets from BOTH LNSs, which our RADIUS server merrily ack'd. When the connection between DSL and ISP broke, the ISP would see our connection as down, and terminate the session with a STOP packet. However, it appears as though the DSL provider would continue to send interim update acct packets to our RADIUS server, and it would never learn about the STOP. The CPE continues to think the session is still active (as a matter of fact, in the case of gw capable CPE, the IP info would still be retained). So, in conclusion, I'm thinking this: - the auth was accepted once, which allowed the session - the accounting packets have/had operational relevance to both the ISP, and the DSL providers - once I had the DSL provider turn off acct to my RADIUS servers and the sync-no-conn went away, the START/STOP packets are important to DSL connectivity In thinking: - we have multiple realms, and have tested on almost all of them. Each time a realm was removed from the DSL providers config, and only allowed via the ISP, things went back to normal - this type of setup may have unwittingly had a network op reset numerous (hundreds) of users on the ISP LNS, not realizing that the users would never reconnect (even though traditional experience would know that the user wouldn't notice a thing) - that this type of setup should be scrutinized a bit, because if this RADIUS acct packet issue could really be the cause of all of our recent issues, I'm glad I have 1k DSL users, not 1M. Does this RADIUS accounting packet 'keepalive' sound reasonable?..*off to print some RADIUS RFC's for review*. Steve ps. A few people mentioned filtering out packets to RADIUS from the unwanted sources. I was thinking about this a few days ago, but didn't understand the operational impact.At the switch date, we had numerous realms, and from what I have seen today, blocking RADIUS accounting packets from the "DSL" provider may have disconnected ALL of our users. This migration to having the intermediary ISP came **very** quickly. Feedback/operational experience requested...