Cable Forum

Cable Forum (https://www.cableforum.uk/board/index.php)
-   Other ISPs Discussion (https://www.cableforum.uk/board/forumdisplay.php?f=63)
-   -   BT admits DNS outage at weekend (https://www.cableforum.uk/board/showthread.php?t=33698112)

Qtx 01-07-2014 00:44

BT admits DNS outage at weekend
 
Quote:

BT was hit by a huge DNS outage on Saturday morning but the telecoms giant was very slow to respond to customer complaints, it has been claimed.

A DNS flaw downed BT's network across Blighty, according to anecdotal reports on Twitter.

But the one-time state monopoly was very sluggish to respond to gripes from subscribers who were unable to connect to the internet via BT's broadband service for several hours over the weekend.
Source TheRegister

A major outage that affected millions of their customers and they just went silent with support telling customers who called it was their own equipment. ISP support can be so poor sometimes.

A graph from bgpmon shows the drop in bandwidth during the outage

https://www.cableforum.co.uk/images/local/2014/06/1.png
From here

Ignitionnet 01-07-2014 00:50

Re: BT admits DNS outage at weekend
 
That's not a bandwidth graph and it contradicts that the issue was just DNS. Have another look at the label on the y-axis.

Qtx 01-07-2014 02:15

Re: BT admits DNS outage at weekend
 
Quote:

Originally Posted by Ignitionnet (Post 35711049)
That's not a bandwidth graph and it contradicts that the issue was just DNS. Have another look at the label on the y-axis.

If DNS goes down, bandwidth goes down too as people don't load up their web pages. Still, the graph is BGP router availability rather than bandwidth but this is how bgmonitor tweeted it, which is probably why I read the graph a different way. The link to the tweet was in my previous post but i'll post it below anyway:

Quote:


BGPmon.net
‏@bgpmon
Large outage & traffic drop for BT this weekend. AS2856 (BTnet UK Regional network) lost 67% of its prefixes for ~2hr pic.twitter.com/HKz6Ioq2hV

rhyds 01-07-2014 08:19

Re: BT admits DNS outage at weekend
 
It wasn't *just* a DNS issue. I'd stuck my own router on my BT line on Friday night (configured to OpenDNS) and that failed Saturday morning. Some sites would work while others didn't. It strikes me as being a pretty serious routing issue.

AbyssUnderground 01-07-2014 13:03

Re: BT admits DNS outage at weekend
 
Same for me and I use OpenDNS. I had to route through a proxy on my dedicated server to access sites like Facebook.

Qtx 01-07-2014 13:16

Re: BT admits DNS outage at weekend
 
The lack of info from BT has been pretty bad, just a general statement from a PR guy.

Seen a lot of people saying it was mostly major sites, such as amazon, twitter, facebook, netflix etc which seemed to all have problems while others were ok. Wondering if these sites were just noticed more as they are popular or if it was some kind of routing they do for particular sites.

Ignitionnet 01-07-2014 13:51

Re: BT admits DNS outage at weekend
 
Quote:

Originally Posted by Qtx (Post 35711057)
If DNS goes down, bandwidth goes down too as people don't load up their web pages. Still, the graph is BGP router availability rather than bandwidth but this is how bgmonitor tweeted it, which is probably why I read the graph a different way. The link to the tweet was in my previous post but i'll post it below anyway:

It's not router availability it's prefixes being advertised from the AS. Doesn't mean a router went down but that it stopped advertising some prefixes, hence my BT line lost its return path as it was no longer being advertised outside AS2856.

Actually it seemed to stop being advertised internally too. I suspect fat fingers when a configuration change was made to a route reflector; a single router going unavailable shouldn't cause this, there are almost certainly primary and shadow reflectors on the network.

DNS servers per se didn't go down, this was actually a routing issue on the BT network a side effect of which was that DNS became unavailable.

I do not use BT's DNS and had to admin down my Infinity line as both default gateway and the site I use for IP SLA were available :(

You misread the graph, it happens, no need to worry :)

---------- Post added at 12:51 ---------- Previous post was at 12:43 ----------

Quote:

Originally Posted by Qtx (Post 35711096)
The lack of info from BT has been pretty bad, just a general statement from a PR guy.

Seen a lot of people saying it was mostly major sites, such as amazon, twitter, facebook, netflix etc which seemed to all have problems while others were ok. Wondering if these sites were just noticed more as they are popular or if it was some kind of routing they do for particular sites.

Just noticed more. Everything from those to Deviantart to CNN were hosed, only the first 5 hops are common throughout.

Tracerting

www.twitter.com[199.16.156.38] ,Maximum hops:30
1 5ms 5ms 5ms 217.32.143.234
2 5ms 5ms 13ms 217.32.144.30
3 13ms 8ms 7ms 213.120.181.214
4 7ms 7ms 7ms 213.120.180.167
5 8ms 9ms 8ms 217.41.169.107
6 6ms 7ms 7ms 109.159.251.121
...
< Completed >


Tracerting

www.amazon.co.uk[178.236.7.220] ,Maximum hops:30
1 5ms 4ms 4ms 217.32.143.234
2 5ms 5ms 18ms 217.32.144.30
3 7ms 6ms 6ms 109.159.244.250
4 11ms 6ms 6ms 213.120.180.167
5 7ms 17ms 8ms 217.41.169.107
6 7ms 7ms 7ms 109.159.251.109
...
< Completed >

www.netflix.com[54.245.104.31] ,Maximum hops:30
1 5ms 5ms 5ms 217.32.143.234
2 5ms 5ms 5ms 217.32.144.30
3 8ms 9ms 8ms 213.120.181.26
4 8ms 14ms 8ms 213.120.180.167
5 9ms 10ms 11ms 217.41.169.107
6 7ms 8ms 7ms 109.159.251.85
...
< Completed >

www.facebook.com[31.13.80.65] ,Maximum hops:30
1 5ms 5ms 5ms 217.32.143.234
2 5ms 5ms 5ms 217.32.144.30
3 9ms 7ms 7ms 212.140.235.130
4 7ms 7ms 7ms 213.120.180.167
5 8ms 8ms 8ms 217.41.169.107
6 6ms 6ms 6ms 109.159.255.159
...
< Completed >

Tracertingwww.rt.com[62.213.85.4] ,Maximum hops:30
1 5ms 5ms 5ms 217.32.143.234
2 5ms 5ms 5ms 217.32.144.46
3 7ms 7ms 6ms 212.140.235.226
4 7ms 7ms 7ms 213.120.180.167
5 8ms 8ms 9ms 217.41.169.107
6 6ms 6ms 6ms 109.159.255.163
...

You get the idea. Before diverging still in BT's 21CN core, indeed still in Yorkshire at hop 6 - just entering BT's core network.

Qtx 01-07-2014 13:55

Re: BT admits DNS outage at weekend
 
Quote:

Originally Posted by Ignitionnet (Post 35711117)

DNS servers per se didn't go down, this was actually a routing issue on the BT network a side effect of which was that DNS became unavailable.

Yeah, makes perfect sense. I read two articles initially which said it was a DNS problem without mentioning any other issues, so didn't think any further than that.

The fact that BT wouldn't talk to the media or give them any information,(beside the fact it wasn't related to them playing with website blocking filters) you may be right about it being human error.

Uncle Peter 01-07-2014 13:56

Re: BT admits DNS outage at weekend
 
Blimey if this was the result of a b0rked change someone's going to get a size 11 in the nads for not following post implementation.

Although if someone was *cough* trying to slip in a sneaky one outside of change control make that a size 13

Qtx 01-07-2014 14:07

Re: BT admits DNS outage at weekend
 
Quote:

Originally Posted by Ignitionnet (Post 35711117)
You get the idea. Before diverging still in BT's 21CN core, indeed still in Yorkshire at hop 6 - just entering BT's core network.

Some of the routers on 109.159.25*.* are 10GB and some say XE. Any idea what the XE is?

Ignitionnet 01-07-2014 14:08

Re: BT admits DNS outage at weekend
 
Quote:

Originally Posted by Qtx (Post 35711123)
Yeah, makes perfect sense. I read two articles initially which said it was a DNS problem without mentioning any other issues, so didn't think any further than that.

The fact that BT wouldn't talk to the media or give them any information,(beside the fact it wasn't related to them playing with website blocking filters) you may be right about it being human error.

There aren't many other things that can cause a properly designed network to suddenly drop prefixes like this. Even if one router did fail the other one would be receiving the prefixes from the access edge, advertising them into the core and in turn the Internet facing routers advertising them onwards.

If primary and shadow both borked, or routes were getting nulled in the network someone screwed up. Friday is a good night to do changes, less interactive business traffic.

https://www.linkedin.com/in/neilmcrae can I'm sure advise :)

EDIT: A swift chat indicates that the issue was due to routing problems between 2 autonomous systems on the BT network. Various services including name, Cleanfeed, etc, are hosted in 5400, customer subnets including the one I'm using are hosted in 2856, and they stopped talking to each other, at one point there was a single route with those two systems in its path. Ooops.

route: 109.144.0.0/12
descr: BT Public Internet Service
origin: AS2856

Uncle Peter 01-07-2014 14:24

Re: BT admits DNS outage at weekend
 
Quote:

Originally Posted by Qtx (Post 35711129)
Some of the routers on 109.159.25*.* are 10GB and some say XE. Any idea what the XE is?

xe is usually used to denote 10GbE on Juniper devices

te for Ciscos

qasdfdsaq 01-07-2014 14:56

Re: BT admits DNS outage at weekend
 
BT Outage, VM DNS Outage, 4 million outages from Microsoft stealing some domain roots and hosing them too.... Been a great week for the internet...

Qtx 01-07-2014 15:00

Re: BT admits DNS outage at weekend
 
Quote:

Originally Posted by Ignitionnet (Post 35711130)
EDIT: A swift chat indicates that the issue was due to routing problems between 2 autonomous systems on the BT network. Various services including name, Cleanfeed, etc, are hosted in 5400, customer subnets including the one I'm using are hosted in 2856, and they stopped talking to each other, at one point there was a single route with those two systems in its path. Ooops.

route: 109.144.0.0/12
descr: BT Public Internet Service
origin: AS2856

Mystery solved, great edit! Explains the prefix gap. Guess they would now have added another route with a lower weight or local preference to minimise future failures between the two...

---------- Post added at 14:00 ---------- Previous post was at 14:00 ----------

Quote:

Originally Posted by Uncle Peter (Post 35711133)
xe is usually used to denote 10GbE on Juniper devices

te for Ciscos

Cheers for the info!

qasdfdsaq 01-07-2014 15:06

Re: BT admits DNS outage at weekend
 
I thought it meant Xtreme Elephant.


All times are GMT +1. The time now is 19:08.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.
All Posts and Content are © Cable Forum