On Tue, Jun 16, 2020 at 11:51 AM Randy Bush <randy@psg.com> wrote:
router implementations; i.e. every step in the chain. the only reason the mess is not blatantly visible is the fail soft design, aka notFound. the problem with fail soft is that you think you are protected when you are not.
I don't see how we would have reasonably found these problems without large scale actually operating deployments. To me this seems like: ipv6 rollouts dnssec rollouts any other large system change we expected things to work like X, in reality they work a little differently AND we have software / systems problems which SEEM like non-problems (or even features!) which under stress/scale prove to be complications to be filed down.
my inner naggumite is starting to wonder if fail soft was a mistake.
would be hard to argue: "Sure! you should deploy, worse case when things go wrong in your deployment (which happens, always) you fall off the net!" fail soft at least for a while is ok... and helps get systems/people/scale.