Hey folks!
Unfortunately, roughly 2 hours ago, lemm.ee went offline. This was caused by our load balancer thinking all of our servers becoming unhealthy, despite all health checks responding successfully when I requested them directly. I am still not sure what exactly caused the issue, but I will try to investigate more over the weekend.
For now, we have partially recovered, and I am continuing to work on remaining issues. Hopefully we will be back to 100% very soon. Sorry for the inconvenience!
Thank you for keeping us abreast of what’s happening. I appreciate you, and how you manage this instance.
Nginx? I had an nginx LB shit itself yesterday. Luckily it auto-recovered and I had HA but just weird it happened.
Actually, we’re using Hetzner’s cloud load balancer for lemm.ee. But if this issue repeats in the near future, then I will definitely consider setting up something else.
haproxy is where it’s at!
It’s probably a managed haproxy in Hetzner’s case.
Is there another instance where you could report issues?
If we logged into another account, we’d be able to see those before it comes back up.
There’s a Discord server in the sidebar that updates are posted in: https://discord.gg/XM9nZwUn9K
I know. I just don’t want to join a discord.
There are two useful sections on https://status.lemm.ee for this - firstly, there is an automated check for federation with all other instances on the bottom of the page, and everything there being red is a definite sign that something is wrong with lemm.ee itself. Secondly, near the top of that page, I will always write a status message manually when I discover & start work on any issues. This second part can have a bit of a delay, as it requires manual input from myself, but I have updated it every time we had any issues so far.
That’s good info. Thanks.
We appreciate what you do hero
Thank goodness! Hopefully discovering these vulnerabilities and protecting them will help keep Lemmy alive when the big dogs come in to sweep us away! (Worst fears)
I’d like to speak to a manager /s
Thanks for the quick fix! What did you have to do to get the load balancer working again?
For now, I just redeployed all of our servers completely, but as I don’t know the actual root cause of the issue yet, I’m still investigating to figure out if anything more is needed.
I survived the July 18th lemm.ee downtime, and all I got was this lousy comment.
Typically when this happens, the issue is on the LB itself. Maybe its own network had issues?
Would it be in bad taste to blame Russia?
All is forgiven, thank you for running this lovely instance _
Sometimes, downtimes are awesome. Get off your machine and spend time with your family, folks!
love you guys!
Seriously, your professionalism in handling the situation and in reporting it is fantastic.
It’s totally above and beyond anything we should expect for a service powered by donations!
Thank you!
Thanks for your great work and transperancy!