View Single Post
Old 19-01-2012, 11:58   #204
qasdfdsaq
cf.mega poster
 
Join Date: Aug 2004
Posts: 11,207
qasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronze
qasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronzeqasdfdsaq is cast in bronze
Re: MAJOR NETWORK ISSUE (17 Jan 2012)

Quote:
Originally Posted by Sephiroth View Post
As many techies will know, you can algorithmically re-route when there is a node failure; this put pressure on other nodes but is in any case standard everywhere these days.

Plus you can put in physical resilience so that another device takes over with the same IP address in case of failure of a key device. I suspect that didn't happen. If there is a site failure then, of course, this measure is not effective).

The trick is to do proper reliability analysis, identify the potential critical items and design the risk out accordingly. Just so you know, this is one of the day job things I do.
Yes, most sensible departments, including mine, do a combination of all 3 of the above. Although that said, I've seen our whole network taken down more often by human error and critical software bugs than actual hardware failure.

I see no excuse for this level of failure in any major ISP, particularly not one of the largest in the UK. And especially not lingering effects several days on as some people seem to be reporting...
qasdfdsaq is offline   Reply With Quote