March 2011 - Watchfront Limited

Service Status

31/03/11

18:54

Auto Draft

We have just suffered a brief routing flap in our Maidenhead ‘A’ datacentre. The soon to be retired router ‘derek’ went awry and has been reset. Apologies for any strange routes packets may have taken whilst this occurred.

Posted by Scott Wilkins

31/03/11

02:54

The latest from BT regarding the Reading issue is that a faulty router in Slough has affected all 20CN lines on Reading BRASes and all 21CN lines on Slough BRASes. No estimated time to repair has yet been quoted.

Posted by Scott Wilkins

30/03/11

17:46

All BT infrastructure ADSL lines connected via BT’s Reading BRASes are currently down. Apologies for the inconvenience – we will post further details when we have them.

Posted by Scott Wilkins

24/03/11

10:53

Auto Draft

We have resolved the problems we were having with spam filtering on Sunday 20th March, and our mail servers are now operating as normal. Apologies for the higher than normal level of spam over the last few days

Posted by Paul Malkowski

23/03/11

13:30

Auto Draft

At around 13:00 today all our BT-based ADSL went off. Most lines are now back, if you are still having problems, please try rebooting your ADSL router. Apologies for this outage

Posted by Paul Malkowski

23/03/11

13:21

Auto Draft

There is an issue with some ASL connections we are investigating

Posted by Paul Malkowski

21/03/11

11:51

Auto Draft

Here is the final statement from Bluesquare regarding the outage last Thursday:-

“This is a Reason for Outage Report with details regarding the power supply in BS2/3 with BlueSquare Data Services Ltd.

At 10:06 on Thursday 17th March one of the six UPS modules located in BlueSquare 2/3 suffered a critical component failure which resulted in a dead short on the output side (critical load side) of the UPS. This failure also caused an amount of smoke to be released by the failed UPS system which resulted in the fire alarm activating and the fire service attending. Once the fire service was happy with the situation we were able to restore power to the site via the generators with the UPS system bypassed whilst we investigated the fault further.

Due to the short circuit occurring on the output side of the UPS this meant the other UPS’s immediately went into an overload condition which then switched all modules into bypass mode, as per the design of the system. This overload then transferred to the raw mains and tripped the main incomer to the site. This caused the overload condition to cease and power was lost to the site. The UPS manufactures then worked to check all the remaining UPS modules to ensure the same component was within specification, and to fully test each UPS system, replacing some components where necessary. No further faults were found on the remaining UPS modules, and load was then switched back to full UPS protection at approx 02:15 and building load was transferred back from the generators to utility mains at approx 02:25.

Due to the size of the failure we have commissioned an independent organisation to forensically examine the failed UPS module. This work is scheduled to be completed next week and we will provide further details once we receive their report. This was an extremely unusual type of failure and the manufactures have not experienced such a problem before, despite over 3,000 similar UPS units being deployed. This suggests there isn’t an inherent design problem in the units but we will not reach any conclusions until the forensic examination is complete.

The failed UPS module will be replaced within the next 4 weeks and until that time we will remain on ‘N’ redundancy level at BlueSquare 2 & 3. Further updates will be provided before this replacement work takes place.

A number of customers have asked as to why this failure could occur when we operate an N+1 UPS architecture. The reason for this is that all of the six UPS modules in BlueSquare 2/3 are paralleled together as one large UPS system. BlueSquare 2/3 only requires 5 modules to hold the critical load to the site, however we have an additional unit which provides the redundancy in the event of a UPS module failure. However, as this failure was on the common critical load side of the UPS (the same output that feeds the distribution boards which then in turn feed the racks) and all the UPS systems are paralleled together, this had the effect of causing all UPS modules to go down.

Posted by Scott Wilkins

21/03/11

10:26

Auto Draft

With regard to our mail server problems, last night we tried increasing spam filtering again, and this morning both our primary and secondary mail servers were struggling. We have reduced the level of filtering again, and mail is starting to flow again.

Just to clarify, whilst emails may be delayed by these problems, no emails should be lost.

Posted by Paul Malkowski

20/03/11

19:11

Auto Draft

We have temporarily reduced the severity of spam filtering we do on emails, and our primary mail server is now behaving itself. We will investigate this further during the week, and hopefully return spam filtering to normal. In the meantime you may receive a bit more spam.

Posted by Paul Malkowski

20/03/11

11:18

Auto Draft

Our primary mail server is still have problems, we are investigating. In the meantime we have switched outbound smtp over to another machine, so you should be able to send emails now. You may need to flush your local DNS cache and restart your email client in order for it to use the new smtp server [81.187.105.1]

Posted by Paul Malkowski

Service Status

Auto Draft

Auto Draft

Auto Draft

Auto Draft

Auto Draft

Auto Draft

Auto Draft

Auto Draft

Status Archives

Shop

Credits

Support

Legal

Technical