Jump to content
 Share

Roy

GS10 Outage + Information

Recommended Posts

Hia,

 

Our GS10 machine had an outage for around 30 minutes today. This was due to the machine experiencing a Linux kernel panic.

 

At first, one of our server's containers wasn't able to stop. I tried using the Docker stop, kill, and rm -f commands. They all just hung. I also tried gathering the logs from the container, but that hung as well. I was able to receive an error when trying to SSH into the specific container and that's what I will be using to submit a bug report to Docker. In order to fully stop the process, we had to reboot the Docker daemon. This would result in all of the running game servers on the machine going offline.

 

I posted an announcement on the official Discord server stating I would be going through with this. After restarting the daemon, I started putting up all of the game servers. However, the machine immediately went offline and I couldn't SSH into it.

 

I then went to our hosting provider and asked if an IPMI could be activated for the machine. They came back very quickly (thank you, Nexril!) and gave me the IPMI's details. I had to install some software to access the virtual console, but after accessing it, I discovered a Linux kernel panic was ongoing. There was no point in trying to investigate this issue while a kernel panic was ongoing, so I decided to reboot the machine and start the game servers (which succeeded this time).

 

Finally, all of our machines have an IPMI attached and activated. In the past, we would have to go to our hosting provider and request access to the IPMI. However, since we now have full access to each machine's IPMI, we can access it any time. This means issues like this should be resolved quicker in the future assuming I'm available.

 

I just wanted to give some information on this.

 

I apologize for the inconvenience and thank you for understanding.

Share this post


Link to post
Share on other sites


3 minutes ago, Leks said:

so uhm when can we have the new machines? 😠

I gave an update a few days ago from our future hosting provider in the Staff Discord:

 

Quote

Hi @everyone - I know a lot of you are excited for the impending release of CHI2, our new West Chicago facility. As many of you know, the new deployment has been delayed a fair bit from our earlier expectations, and I'm going to do my best to explain why that is in a public forum.

 

The short version is that we came across a number of issues with our initial deployment that would have caused us many, many headaches later on. We're relentlessly dedicated to continuous improvement, and decided to make the extra time to improve on these problem areas.

 

We're building CHI2 to scale more than our existing Steadfast deployment could ever dream of. This is not an easy task, nor one that should be taken lightly by any company, provider, or organization. This is not something we're willing to rush simply for the sake of money or profit.

 

We're going to do it well and we're going to test it properly.

 

While we're still skeptical to announce an official release date, current expectations are that we'll be online during the first week or second week of next month (September). As always, we're going to do the best we can to keep you updated - and we sincerely appreciate your patience, understanding, and support as we enter a new phase for our company.
 

 

Share this post


Link to post
Share on other sites




×
×
  • Create New...