I am hoping someone can shed some light on an interesting problem we are having - When we set up a customer for MLPPP, things tend to go well for a period of time. Then - all of a sudden - we will begin to have problems with our multilink bundles (generally only one at a time) and the only fix is to reload our 7513. This problem happens on both of our 7513 routers from time-to-time. Once we reload - the problem will stay gone for as long as several months, or in the last case only about 12 hours. Once we see the problem, it is apparent only in one direction. For example the customer can push the full capacity of their circuits to us, but they cannot pull anything above about 300k back to them on a two-T1 bundle. This is the same every single time we have the problem. We have changed multilink bundles, tried different types of switching and route caching, turning on and off fragmentation - the only thing that solves the problem is reloading the entire router. We can pull the T1s from the multilink bundle and each individual T1 works great. No line errors, no crc errors - nothing. No errors are apparent while in MLPPP mode either. No throttles or anything similar. We have had this problem in the past and it was recommended that we upgrade the code on our 7513s. We are currently running version 12.2(13)T5 on both our 7513 as well as the customers router. Upgrading the code did not solve the problem. I have been unable to locate a Cisco bug defining this type of problem for any version of their code. This particular customer's T1s are both terminated on the same VIP (we are running DMLP) but are terminated on separate PAs and hence separate CT3s. We have noticed the problem even with T1 bundles on the exact same PA and CT3. We are not doing multi-chassis DMLP. dCEF is enable on both routers, however the problem remains the same even after disable dCEF. Here are configs and router info: interface Multilink6 description Eastgate Mall (s2/1/0/8:0 and s2/0/0/12:0) [20291] ip address 207.158.1.133 255.255.255.252 no cdp enable ppp multilink ppp multilink interleave multilink-group 6 interface Serial2/0/0/12:0 description Eastgate Mall #2 no ip address encapsulation ppp no fair-queue ppp multilink multilink-group 6 interface Serial2/1/0/8:0 description Eastgate Mall #1 no ip address encapsulation ppp no fair-queue ppp multilink multilink-group 6 AR04#sh diag 2 Slot 2: Physical slot 2, ~physical slot 0xD, logical slot 2, CBus 0 Microcode Status 0x4 Master Enable, LED, WCS Loaded Board is analyzed Pending I/O Status: None EEPROM format version 1 VIP2 R5K controller, HW rev 2.02, board revision D0 Serial number: 17953368 Part number: 73-2167-05 Test history: 0x00 RMA number: 00-00-00 Flags: cisco 7000 board; 7500 compatible EEPROM contents (hex): 0x20: 01 1E 02 02 01 11 F2 58 49 08 77 05 00 00 00 00 0x30: 68 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 Slot database information: Flags: 0x4 Insertion time: 0x41C8 (1d08h ago) Controller Memory Size: 32 MBytes DRAM, 4096 KBytes SRAM PA Bay 0 Information: CT3 single wide PA, 1 port EEPROM format version 1 HW rev 1.00, Board revision A0 Serial number: 17814822 Part number: 73-3037-01 PA Bay 1 Information: CT3 single wide PA, 1 port EEPROM format version 1 HW rev 1.00, Board revision A0 Serial number: 09725065 Part number: 73-3037-01 --Boot log begin-- Cisco Internetwork Operating System Software IOS (tm) VIP Software (SVIP-DW-M), Version 12.2(13)T5, RELEASE SOFTWARE (fc1) TAC Support: http://www.cisco.com/tac Copyright (c) 1986-2003 by cisco Systems, Inc. Compiled Wed 28-May-03 21:57 by nmasa Image text-base: 0x60010930, data-base: 0x604C0000 AR04#sh ver Cisco Internetwork Operating System Software IOS (tm) RSP Software (RSP-JSV-M), Version 12.2(13)T5, RELEASE SOFTWARE (fc1) TAC Support: http://www.cisco.com/tac Copyright (c) 1986-2003 by cisco Systems, Inc. Compiled Wed 28-May-03 22:00 by nmasa Image text-base: 0x60010948, data-base: 0x61F0A000 ROM: System Bootstrap, Version 11.1(8)CA1, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1) AR04 uptime is 1 day, 8 hours, 24 minutes System returned to ROM by reload at 09:45:48 UTC Tue Dec 23 2003 System image file is "slot0:rsp-jsv-mz.122-13.T5.bin" cisco RSP4 (R5000) processor with 262144K/2072K bytes of memory. R5000 CPU at 200Mhz, Implementation 35, Rev 2.1, 512KB L2 Cache Last reset from power-on G.703/E1 software, Version 1.0. G.703/JT2 software, Version 1.0. X.25 software, Version 3.0.0. SuperLAT software (copyright 1990 by Meridian Technology Corp). Bridging software. TN3270 Emulation software. Primary Rate ISDN software, Version 1.1. Chassis Interface. 3 VIP2 R5K controllers (2 FastEthernet)(6 Channelized T3). 2 FastEthernet/IEEE 802.3 interface(s) 168 Serial network interface(s) 6 Channelized T3 port(s) 123K bytes of non-volatile configuration memory. 20480K bytes of Flash PCMCIA card at slot 0 (Sector size 128K). 8192K bytes of Flash internal SIMM (Sector size 256K). Slave in slot 7 is running Cisco Internetwork Operating System Software IOS (tm) RSP Software (RSP-DW-M), Version 12.2(13)T5, RELEASE SOFTWARE (fc1) TAC Support: http://www.cisco.com/tac Copyright (c) 1986-2003 by cisco Systems, Inc. Compiled Wed 28-May-03 22:33 by nmasa Slave: Loaded from system Slave: cisco RSP4 (R5000) processor with 262144K bytes of memory. Configuration register is 0x2102 Any help would be greatly appreciated. ****************************************** Richard J. Sears Vice President American Digital Network ---------------------------------------------------- rsears@adnc.com http://www.adnc.com ---------------------------------------------------- 858.576.4272 - Phone 858.427.2401 - Fax ---------------------------------------------------- I fly because it releases my mind from the tyranny of petty things . . "Work like you don't need the money, love like you've never been hurt and dance like you do when nobody's watching."
Richard, One bug I know of that could possibly be a match is: CSCec00268 Externally found severe defect: Resolved (R) Input drops and * throttles on PPP multilink interface fixed in 12.2(15)T9. The way to verify is to check 'sh int multilink <x>' and see if the interface is under throttle. Router#sh int mu 2 Multilink2 is up, line protocol is up <snip> Received 0 broadcasts, 0 runts, 0 giants, 0* throttles ^-- Here. If that's not it email me offline and I'll help you get it resolved. Thanks, Rodney On Tue, Jan 06, 2004 at 05:30:51PM -0800, Richard J. Sears wrote:
I am hoping someone can shed some light on an interesting problem we are having -
When we set up a customer for MLPPP, things tend to go well for a period of time. Then - all of a sudden - we will begin to have problems with our multilink bundles (generally only one at a time) and the only fix is to reload our 7513. This problem happens on both of our 7513 routers from time-to-time. Once we reload - the problem will stay gone for as long as several months, or in the last case only about 12 hours.
Once we see the problem, it is apparent only in one direction. For example the customer can push the full capacity of their circuits to us, but they cannot pull anything above about 300k back to them on a two-T1 bundle. This is the same every single time we have the problem.
We have changed multilink bundles, tried different types of switching and route caching, turning on and off fragmentation - the only thing that solves the problem is reloading the entire router. We can pull the T1s from the multilink bundle and each individual T1 works great. No line errors, no crc errors - nothing. No errors are apparent while in MLPPP mode either. No throttles or anything similar.
We have had this problem in the past and it was recommended that we upgrade the code on our 7513s. We are currently running version 12.2(13)T5 on both our 7513 as well as the customers router. Upgrading the code did not solve the problem. I have been unable to locate a Cisco bug defining this type of problem for any version of their code.
This particular customer's T1s are both terminated on the same VIP (we are running DMLP) but are terminated on separate PAs and hence separate CT3s. We have noticed the problem even with T1 bundles on the exact same PA and CT3. We are not doing multi-chassis DMLP. dCEF is enable on both routers, however the problem remains the same even after disable dCEF.
Here are configs and router info:
interface Multilink6 description Eastgate Mall (s2/1/0/8:0 and s2/0/0/12:0) [20291] ip address 207.158.1.133 255.255.255.252 no cdp enable ppp multilink ppp multilink interleave multilink-group 6
interface Serial2/0/0/12:0 description Eastgate Mall #2 no ip address encapsulation ppp no fair-queue ppp multilink multilink-group 6
interface Serial2/1/0/8:0 description Eastgate Mall #1 no ip address encapsulation ppp no fair-queue ppp multilink multilink-group 6
AR04#sh diag 2 Slot 2: Physical slot 2, ~physical slot 0xD, logical slot 2, CBus 0 Microcode Status 0x4 Master Enable, LED, WCS Loaded Board is analyzed Pending I/O Status: None EEPROM format version 1 VIP2 R5K controller, HW rev 2.02, board revision D0 Serial number: 17953368 Part number: 73-2167-05 Test history: 0x00 RMA number: 00-00-00 Flags: cisco 7000 board; 7500 compatible
EEPROM contents (hex): 0x20: 01 1E 02 02 01 11 F2 58 49 08 77 05 00 00 00 00 0x30: 68 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00
Slot database information: Flags: 0x4 Insertion time: 0x41C8 (1d08h ago)
Controller Memory Size: 32 MBytes DRAM, 4096 KBytes SRAM
PA Bay 0 Information: CT3 single wide PA, 1 port EEPROM format version 1 HW rev 1.00, Board revision A0 Serial number: 17814822 Part number: 73-3037-01
PA Bay 1 Information: CT3 single wide PA, 1 port EEPROM format version 1 HW rev 1.00, Board revision A0 Serial number: 09725065 Part number: 73-3037-01
--Boot log begin--
Cisco Internetwork Operating System Software IOS (tm) VIP Software (SVIP-DW-M), Version 12.2(13)T5, RELEASE SOFTWARE (fc1) TAC Support: http://www.cisco.com/tac Copyright (c) 1986-2003 by cisco Systems, Inc. Compiled Wed 28-May-03 21:57 by nmasa Image text-base: 0x60010930, data-base: 0x604C0000
AR04#sh ver Cisco Internetwork Operating System Software IOS (tm) RSP Software (RSP-JSV-M), Version 12.2(13)T5, RELEASE SOFTWARE (fc1) TAC Support: http://www.cisco.com/tac Copyright (c) 1986-2003 by cisco Systems, Inc. Compiled Wed 28-May-03 22:00 by nmasa Image text-base: 0x60010948, data-base: 0x61F0A000
ROM: System Bootstrap, Version 11.1(8)CA1, EARLY DEPLOYMENT RELEASE SOFTWARE (fc1)
AR04 uptime is 1 day, 8 hours, 24 minutes System returned to ROM by reload at 09:45:48 UTC Tue Dec 23 2003 System image file is "slot0:rsp-jsv-mz.122-13.T5.bin"
cisco RSP4 (R5000) processor with 262144K/2072K bytes of memory. R5000 CPU at 200Mhz, Implementation 35, Rev 2.1, 512KB L2 Cache Last reset from power-on G.703/E1 software, Version 1.0. G.703/JT2 software, Version 1.0. X.25 software, Version 3.0.0. SuperLAT software (copyright 1990 by Meridian Technology Corp). Bridging software. TN3270 Emulation software. Primary Rate ISDN software, Version 1.1. Chassis Interface. 3 VIP2 R5K controllers (2 FastEthernet)(6 Channelized T3). 2 FastEthernet/IEEE 802.3 interface(s) 168 Serial network interface(s) 6 Channelized T3 port(s) 123K bytes of non-volatile configuration memory.
20480K bytes of Flash PCMCIA card at slot 0 (Sector size 128K). 8192K bytes of Flash internal SIMM (Sector size 256K).
Slave in slot 7 is running Cisco Internetwork Operating System Software IOS (tm) RSP Software (RSP-DW-M), Version 12.2(13)T5, RELEASE SOFTWARE (fc1) TAC Support: http://www.cisco.com/tac Copyright (c) 1986-2003 by cisco Systems, Inc. Compiled Wed 28-May-03 22:33 by nmasa Slave: Loaded from system Slave: cisco RSP4 (R5000) processor with 262144K bytes of memory.
Configuration register is 0x2102
Any help would be greatly appreciated.
****************************************** Richard J. Sears Vice President American Digital Network ---------------------------------------------------- rsears@adnc.com http://www.adnc.com ---------------------------------------------------- 858.576.4272 - Phone 858.427.2401 - Fax ----------------------------------------------------
I fly because it releases my mind from the tyranny of petty things . .
"Work like you don't need the money, love like you've never been hurt and dance like you do when nobody's watching."
Richard J. Sears said:
We have changed multilink bundles, tried different types of switching and route caching, turning on and off fragmentation - the only thing
[snip]
dCEF is enable on both routers, however the problem remains the same even after disable dCEF.
Your last line, I think, is rather interesting. I have a 7513 with essentially the same hardware as you (ct3, fe's, etc) and was recently doing testing with red/wred. I had read the notes about dCEF and how this changes/limits some of what one can do with red/wred; as with dCEF enabled, the queueing happens on the VIP instead of the RSP. During testing, I setup a wred config on several interfaces and everything worked fine -- while using standard CEF. Later on, I enabled dCEF and began to test (d)wred. I found the limitations on parameters were a killjoy, so I tried backing out of dCEF/dwred. Interestingly, I couldn't back out. Even if I negated all the wred commands and disabled dCEF, the 'sh int' would still report the VIP was doing all the work. After more attempts and various ideas, I just reloaded the damn thing -- it came back up with, as I had hoped, the VIP no long doing wred, but rather the RSP. So, I wonder whether or not the mlppp instability isn't due to some obscure or yet undiscovered dCEF bug and also if when you disable dCEF, it's not really getting disabled. Maybe disable dCEF -- then reload? It may also help to get more familiar with the forwarding path data takes in this scenario; I'm not familiar with cisco's mlppp enough to know if all the encapsulation and multi-link work happens rsp side or vip side.
Cisco Internetwork Operating System Software IOS (tm) VIP Software (SVIP-DW-M), Version 12.2(13)T5, RELEASE SOFTWARE
Also, instead of following 12.2, maybe try the 12.0(S) train, if you're not needing specific features in 12.2. I've surveyed several other folks recently about which version they run; 12.0 S-line seems to be the least-hated and more-stable train. It sounds like (at this point) it'd be worth trying just about anything to get the mlppp links stable. <G> --Tk
participants (3)
-
Anton L. Kapela
-
Richard J. Sears
-
Rodney Dunn