View Single Post
Old 02-01-2021, 15:37   #48
Skie
a giant headend
 
Join Date: Jan 2011
Location: Liverpool
Posts: 1,166
Skie has reached the bronze age
Skie has reached the bronze ageSkie has reached the bronze ageSkie has reached the bronze ageSkie has reached the bronze ageSkie has reached the bronze ageSkie has reached the bronze ageSkie has reached the bronze age
Re: Latency / Packet Loss to Cloudflare

It's a common failure scenario. One link goes down, other links take the strain and everything looks okay and you can sort the dead link in due course. But because the links and kit on them were specced years ago and are now carrying 25% more traffic individually, they aren't going to handle 2x the present load.

The signs of an overloaded route are also incredibly hard to create alerting for, as different bits of kit will do different things when faced with too much to do. Way too easy to create false alarms and those are the bane of monitoring systems, so unless you know what the failure looks like exactly and it has been accounted for in your alerting, you usually wont know it's happening without someone experiencing it directly.

And don't get me started on asynchronous routing...
Skie is offline   Reply With Quote