On Fri, 17 Jun 2022 at 15:39, Tom Beecher <beecher@beecher.cc> wrote:
Thank you for calling out the HMC point. I think that alone is worth retiring the platforms that were built around it. The number of issues related the the HMC memory drivers were out of hand early on, and lingered long past the growing pains phase. I’m sure in the big picture, supply chain / manufacturing constraints accelerated this, but part of me is happy to see HMC based stuff go.
I can't pinpoint HMC as a bad solution, yes we've had our share of HMC issues, but we've also on JNPR and some other vendors previously replaced all linecards due to memory issues, before stacked DRAMs were a thing, memories are notoriously fragile. I can pinpoint HMC as a huge risk due to no manufacturer :). The memory issues are exacerbated by needing to reload the whole linecard when one memory has issues, JNPR has now delivered on newer images fixes where you can reduce both the collateral damage and downtime by reloading individual PFEs (and connected memories). I do think HMC was a solid engineering choice, and I am a bit annoyed that it lost to HBM instead of co-existing with little different optimization points. But that doesn't excuse the situation. -- ++ytti