On Fri, Nov 6, 2020 at 5:47 AM Randy Bush <randy@psg.com> wrote:
Admittedly someone (randy) injected a pretty pathological failure mode into the system
really? could you be exact, please? turning an optional protocol off is not a 'failure mode'.
I suppose it depends on how you think you are serving the data. If you thought you were serving it on both protocols, but 'suddenly' the RRDP location was empty that would be a failure. Same if your RRDP location's tls certificate dies... One of my points was that it appeared that the software called 'bad tls cert' (among other things I'm sure) a failure, but not 'empty directory' (or no diff file). It's possible that ALSO 'no diff' is considered a failure but that swapping to alternate transport after a few failures was not implemented. (I don't know, I have not looked at that part of the code, and I don't think alex/tim said either way). I don't think alex is wrong in stating that 'ideally the operator monitors/alerts on health of their service', I think it's shockingly often that this isn't actually done though. (and isn't germaine in the case of the test / research in question) My suggestion is that checking the alternate transport is helpful. -chris