Jump to content

[22 May 2013] Downtime and lost data: Some detail, info and apologies


Matt

Recommended Posts

Dear Hubbers,

 

My utmost apologies for the severe outage over the past days. In todays online world such a prolonged outage should never happen and a series of mistakes by a number of parties involved have resulted in this.

 

On Monday night our hosting company experienced a hardware failure which caused corruption of some of our key data. To make matters worse, the daily snapshot backups our hosts maintained were unknowingly also corrupted. After extended efforts to recover data (which were 70% successful) we were unable to recover the data successfully and made the decision to rollback a portion of the data to the latest clean backup. Unfortunately this backup dated back to January 2013.

 

What this means is that while we were able to recover a good amount of data up to the point of the failure we’ve lost some data for the 4 month period since 15 January. The data lost consists of: forum topics, forum posts and classified ads created after 15 January 2013.

 

I consider this a devastating loss and the worst outcome imaginable. I can only apologize to all those affected and commit to taking all measures to ensure we never suffer such severe loss in future.

 

For anyone who’d posted classified ads in this period your ads will need to be re-posted. In particular if you had upgraded your advert in the last 7 days we do have record of the upgrade and will either refund the full upgrade amount or reinstate the upgrade on your new advert. Please contact me via email once your advert is reposted and I will arrange the upgrade (or to arrange a refund).

 

I can’t point fingers at any one person or company in this scenario. On my side there should have been a more regular backup procedure in place alongside the backups our hosting company made. I had assumed that in the worst-case scenario we could roll back to the previous day’s backup held by our hosts. And you know what they say about assumptions….

On the other hand a hardware failure should not have resulted in such severe losses in a cloud hosting environment where uptime and redundancy are crucial.

 

We have learnt a very hard lesson and will be completely reviewing our hosting architecture, the providers and our own backup policies. This extent of downtime and data loss cannot EVER happen again in future.

 

Once again, my utmost apologies for everything. It's an unimaginable outcome and a very hard lesson for me.

 

If anyone has any questions, issues or even suggestions please get in touch with me on matt@thehubsa.co.za. I'll attend to any issues as soon as possible.

 

Sincerely,

 

Matt Eagar

Founder - thehubsa.co.za

Link to comment
Share on other sites

  • Replies 73
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Thats not a true tech apology until you have blamed the underpants gnomes ;)

 

http://southparkstudios.mtvnimages.com/shared/characters/non-human/underpants-gnomes.jpg

Link to comment
Share on other sites

+1

 

Matt, check inbox.

 

G

 

:clap:

 

Round of applause for the admission. That kind of honesty is not always forthcoming in our modern world.

 

Big up!! And good luck with the process upgrades.

Link to comment
Share on other sites

Does this mean we've lost all "ladies and bicycles" since January 15?

 

Should that not have been backed up on a separate back up given the importance of that thread?

 

Now we wait and see if The Hub can handle all Hubbers trying to catch up on the lost hours of this morning. Wonder if the servers can handle that sort of traffic.

Edited by Jigghead
Link to comment
Share on other sites

ye,

 

all gone,

 

repost, and re drool

 

G

 

Does this mean we've lost all "ladies and bicycles" since January 15?

 

Should that not have been backed up on a separate back up given the importance of that thread?

Link to comment
Share on other sites

S__t happens. I recently asked for a restore of a database. The DBA got it wrong and we lost about 2 weeks worth of work. For a DEV team this was quite catastrophic.A month later I still feel like I have just visited the principles office.

Link to comment
Share on other sites

Dear Hubbers,

 

My utmost apologies for the severe outage over the past days. In todays online world such a prolonged outage should never happen and a series of mistakes by a number of parties involved have resulted in this.

 

On Monday night our hosting company experienced a hardware failure which caused corruption of some of our key data. To make matters worse, the daily snapshot backups our hosts maintained were unknowingly also corrupted. After extended efforts to recover data (which were 70% successful) we were unable to recover the data successfully and made the decision to rollback a portion of the data to the latest clean backup. Unfortunately this backup dated back to January 2013.

 

What this means is that while we were able to recover a good amount of data up to the point of the failure we’ve lost some data for the 4 month period since 15 January. The data lost consists of: forum topics, forum posts and classified ads created after 15 January 2013.

 

I consider this a devastating loss and the worst outcome imaginable. I can only apologize to all those affected and commit to taking all measures to ensure we never suffer such severe loss in future.

 

For anyone who’d posted classified ads in this period your ads will need to be re-posted. In particular if you had upgraded your advert in the last 7 days we do have record of the upgrade and will either refund the full upgrade amount or reinstate the upgrade on your new advert. Please contact me via email once your advert is reposted and I will arrange the upgrade (or to arrange a refund).

 

I can’t point fingers at any one person or company in this scenario. On my side there should have been a more regular backup procedure in place alongside the backups our hosting company made. I had assumed that in the worst-case scenario we could roll back to the previous day’s backup held by our hosts. And you know what they say about assumptions….

On the other hand a hardware failure should not have resulted in such severe losses in a cloud hosting environment where uptime and redundancy are crucial.

 

We have learnt a very hard lesson and will be completely reviewing our hosting architecture, the providers and our own backup policies. This extent of downtime and data loss cannot EVER happen again in future.

 

Once again, my utmost apologies for everything. It's an unimaginable outcome and a very hard lesson for me.

 

If anyone has any questions, issues or even suggestions please get in touch with me on matt@thehubsa.co.za. I'll attend to any issues as soon as possible.

 

Sincerely,

 

Matt Eagar

Founder - thehubsa.co.za

 

Thanks, who's your host by the way?

Link to comment
Share on other sites

RSA Web as per logo at the bottom of the page.

 

this is now the 2nd time there has been problems at the dB level, DBA's should be lined up for execution.

 

G

 

Thanks, who's your host by the way?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Settings My Forum Content My Followed Content Forum Settings Ad Messages My Ads My Favourites My Saved Alerts My Pay Deals Help Logout