Jump to content
 Share

Roy

GS10 Downtime + Temporary Move

Recommended Posts

Hey everyone,

 

The other day I made a thread stating we had an outage on our GS10 machine. This was caused by a Linux kernel panic, but unfortunately wasn't logged. In these cases, it's generally hard to narrow down what could be the cause to this.

 

We experienced two more crashes today (one which occurred 20 - 30 minutes ago). There were no useful logs indicating whether this was a hardware failure or a software crash. The kernel panic was simply not logged.

 

There hasn't been any updates to the server itself in over a month and until the last week, has ran stable. With that said, our other machines have never faced this issue before. In the last crash (which happened around 20 - 30 minutes ago), the IPMI console itself was frozen in both the HTML5 console and Java.

 

There might be some tools we can try using (e.g. installing software that creates a memory dump), but I was told they are pretty much useless.

 

I had a talk with the CEO of Nexril (who has been very helpful by the way), we've decided it would be best to swap out the drives to another machine. Unfortunately, it won't have the same powerful CPU as the current server, but this is the best thing I can think of at the moment as a temporary solution. The new machine will have the Intel Xeon E3-1240v5 @ 3.5 GHz. Since we're just swapping out the drives, there shouldn't be much reconfiguration needed besides just altering any network MAC addresses that may need to be changed. So things should come up fairly quickly after the initial shutdown. Afterwards, the CEO will run some stress testing against the CPU and RAM on the machine having issues to see if he can come back with anything. I will also be making backups of the servers just in-case this could be a possible hard drive failure (which is less likely, but who knows).

 

I will try to keep the servers up as much as we can until we switch to the new machine tomorrow. Though, I will have to go to bed at some point and if the OS crashes again, well, good game, lol.

 

Anyways, I do apologize for this inconvenience. I will update you all tomorrow.

 

Thank you for understanding.

Share this post


Link to post
Share on other sites


Discord update posted:

 

Quote

 RoyToday at 8:03 PM
@here Our GS10 machine has been down for a while now due to another system crash (an hour and a half or so). We are going to be swapping the drives to a new server shortly. With that said, I've installed a central logging server that GS10 will use. This will log messages to a remote server and should allow us to see more information when the server is having issues. It's possible the server's hard drive is failing and from what I've read, that is most of the time the case in situations similar to ours (the remote server logging should log this information for us).

 

Once I have an update, I will let you all know.

 

Thank you for your patience and understanding.

 

We will be booting the game servers back online after the drives are swapped to the new machine.

 

Thank you.

Share this post


Link to post
Share on other sites




×
×
  • Create New...