Hi all.
I thought I'd share our recent experiences, per subject, just in
case others run into the same problems.
So... we finally decided to try 17.3(4a)MD for the CSR1000v, after
years of happy operation. Good Lord, what a drama!
At first, we couldn't figure out why iBGP sessions to all Cisco
boxes could not stand up. Then we realized it's because IS-IS to
them could not stand up. Then we realized it's because BFD
sessions could not stand up.
But even after removing BFD, IS-IS remained down.
After 3 days of searching, we finally landed on CSCuz58508. In
case you don't have CCO access, it is the same issue as described
here:
https://community.cisco.com/t5/cisco-cloud-service-router-csr/b00ocg4q4e-csr-1000v-16-3-1a-can-t-set-mtu-on-gig-interface/td-p/3054853
This was even more confusing for us, because our interface driver
on VMware ESXi is vmxnet3.
The bug ID suggests the problem is fixed in 16.3(2) and 16.4(1).
So to be safe, we tested 16.12(5)MD, which allowed us to enable
jumbo frames, but that only appeared to be a cosmetic thing. In
the background, the box was simply dropping packets, silently. We
found this out when we tried to copy other files to the node, and
it would just hang without any feedback. Removing the jumbo frame
support allowed the files to come through.
We noticed that nodes still running 3.17(0)S did not have any
issues with IS-IS or BFD, or MTU. However, this code was only ever
released as an ED train (and to be fair, we've been having dodgy
issues with it in recent years), so we decided to downgrade to
3.16(9)S (which is actually an upgrade from 3.17(00)S, since the
3.16 train is an MD release, with the latest release being March
2019, vs. July 2017 for 3.17(4)SED).
With that, no more MTU issues, BFD and IS-IS are happy, iBGP is
happy.
We definitely won't be wasting any more time trying to make
Denali, Gibraltar, Fuji, Everest or Amsterdam work on our CSR1000v
complement.
Needless to say, moving the ASR1000 platform to 17.3 has also come
with its own avenue of pleasure, what with all the ROMMON, CPLD
and FPGA upgrade mess that is. What the documentation says and
what happens in real life are two very different things. It has
taken us a week to come up with our own working procedure to
upgrade just one box, worse if it's a dual-RP system.
Mark.