RTD is down #2981

agjohnson · 2017-06-29T13:16:04Z

Here's an update for anyone who experienced problems with Read the Docs:

Early this morning, at 3:07am PT, I was alerted that RTD went down. As it turns out, Rackspace encountered a global outage with their load balancers as well as with their dashboard. Our load balancers were completely offline, and our DNS couldn't be updated as it was hosted with Rackspace as well. Fortunately, I eventually found that the API was still up -- it was just the dashboard that was inoperable. I repointed DNS from the load balancers to single web servers and attempted to try to increase throughput of the web services to handle the load. At this point, Read the Docs was up, but started flapping and buckling under the load.

Shortly after this, Rackspace load balancers seem to return to normal operations. I have since reverted the changes I made and we are pushing traffic through the load balancers once again.

I'm still working to resolve some issues, it doesn't appear that we're back to 100% yet. I'll continue updating here. Thanks for the patience!

agjohnson · 2017-06-29T13:33:12Z

Things have now quieted down. I've squashed the remaining issues I was being alerted on. I've noticed residual problems from browsers still pointing to the cached DNS entries. I expect traffic will level off to this box shortly. If you are noticing a slow connection, you might still have this dns query cached.

jvanasco · 2017-06-29T15:43:22Z

Sorry that happened. I just wanted to suggest something -- which you might be doing already...

If RTD is registered via namecheap (which it might be, since eNom is on your whois...) you can create a secondary namecheap account that only has API access to update DNS hosts on a given domain(s), then run a simple python script to swap DNS. I use that setup a lot to deal with outages, load balancing issues, and DNS challenges for letsencrypt certificates. It's much faster than logging in through 2-Factor-Auth and using their web interface.

willingc · 2017-06-29T15:48:57Z

@agjohnson Update: Seems back to normal for me. Thanks for the heads up re: sluggish connection.

monicacecilia · 2017-06-30T21:41:02Z

@agjohnson Thanks for the update. Ours is one of those resources 'not back 100% yet', but I'll be patiently waiting until all things go back to normal. Sincere thanks for keeping us in the loop. RTD rocks!

agjohnson · 2017-07-07T20:56:37Z

@jvanasco DNS propagation time is still an issue there, but yes, having a secondary place for DNS, or at least decoupling from Rackspace makes sense. Where we stop with this process is another question though.

@monicacecilia This downtime would not have affected build processes, your issue is unrelated.

I'm closing this as there wasn't any fallout from the downtime. Thanks for the reports everyone!

monicacecilia mentioned this issue Jul 5, 2017

Builds for the past three days seem to fail #2984

Closed

agjohnson closed this as completed Jul 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTD is down #2981

RTD is down #2981

agjohnson commented Jun 29, 2017 •

edited

Loading

agjohnson commented Jun 29, 2017

jvanasco commented Jun 29, 2017

willingc commented Jun 29, 2017

monicacecilia commented Jun 30, 2017

agjohnson commented Jul 7, 2017

RTD is down #2981

RTD is down #2981

Comments

agjohnson commented Jun 29, 2017 • edited Loading

agjohnson commented Jun 29, 2017

jvanasco commented Jun 29, 2017

willingc commented Jun 29, 2017

monicacecilia commented Jun 30, 2017

agjohnson commented Jul 7, 2017

agjohnson commented Jun 29, 2017 •

edited

Loading