#94 Data Loss/Rollbacks

Closed
opened 2 months ago by jcsnider · 16 comments

Build Info

  • Intersect Version : 0.6.0.144
  • Operating System: Linux

Bug has existed since 0.5.00 (Beta 5, with the new Entity Framework database system)

??? May have gotten worse after 0.6.0.140? ??? - I hadn’t heard about it recently then all of a sudden lots of reports.

Issue Description

Users will properly restart their servers, and upon reboot their player databases will revert to a state from hours or days before meaning that player stats, levels, progression, etc is all lost.

Once the server stops properly saving it doesn’t give any indication of failure. No logs or crashes. No reproduction steps either, so any possible solutions could take days or weeks to verify.

I haven’t seen any reports of this happening with the game database.

Next Steps

Kassie said it might be related to the automatic backups that run on my server each night… that is just a basic tar archive but it’s possible that opening the sqlite db for reading would mess with EF. JC will look into and see if he can verify.

JC would like a custom Intersect build created or a config flag that logs out when saves are requested, when a save is waiting to lock the db, when we enter the lock, when the save concludes, and when a save leaves the lock (finalizes). If each ‘save’ was given an ID we could use the logs to track and see if any get stuck and then cause other saves to never occur.

On top of both of those debugging steps Panda is planning on overhauling some of our EF code anyways to make it more ‘proper’ with db instances.. that would likely solve the issue entirely anyways, so we have a multi pronged approach at figuring this out and solving it for good.

Other Stuff That’d be Good to Know

Does this occur with MySql player databases?

Does this occur on Windows?

Reproduction Steps

Unfortunately unknown at this time. If anyone can consistently reproduce this please let us know, we’d love to use your environment or steps to solve for good!

<!-- NOTE: If your issue is a security concern, please send an email to admin@ascensiongamedev.com instead of opening a public issue --> <!-- 1. Please speak English, this is the language all maintainers can speak and write. 2. Please ask questions or configuration/deploy problems on our forum.(https://ascensiongamedev.com) 3. Please take a moment to check that your issue doesn't already exist. 4. Please give all relevant information below for bug reports, because incomplete details will be handled as an invalid report. --> <!-- Check here() for info on how to determine which version you're on! --> ## Build Info - Intersect Version <!-- (Use X.X.X.XXX Format!) --> : 0.6.0.144 - Operating System: Linux Bug has existed since 0.5.00 (Beta 5, with the new Entity Framework database system) ??? May have gotten worse after 0.6.0.140? ??? - I hadn't heard about it recently then all of a sudden lots of reports. ## Issue Description Users will properly restart their servers, and upon reboot their player databases will revert to a state from hours or days before meaning that player stats, levels, progression, etc is all lost. Once the server stops properly saving it doesn't give any indication of failure. No logs or crashes. No reproduction steps either, so any possible solutions could take days or weeks to verify. I haven't seen any reports of this happening with the game database. ## Next Steps Kassie said it might be related to the automatic backups that run on my server each night... that is just a basic tar archive but it's possible that opening the sqlite db for reading would mess with EF. JC will look into and see if he can verify. JC would like a custom Intersect build created or a config flag that logs out when saves are requested, when a save is waiting to lock the db, when we enter the lock, when the save concludes, and when a save leaves the lock (finalizes). If each 'save' was given an ID we could use the logs to track and see if any get stuck and then cause other saves to never occur. On top of both of those debugging steps Panda is planning on overhauling some of our EF code anyways to make it more 'proper' with db instances.. that would likely solve the issue entirely anyways, so we have a multi pronged approach at figuring this out and solving it for good. ## Other Stuff That'd be Good to Know Does this occur with MySql player databases? Does this occur on Windows? ## Reproduction Steps Unfortunately unknown at this time. If anyone can consistently reproduce this please let us know, we'd love to use your environment or steps to solve for good! <!-- **If this issue involves anything visual please include a screenshot or video!** --> <!-- **Attach log files found in your Server/Logs, Client/Logs, and Editor/Logs folders!** -->
panda commented 1 month ago
Collaborator

Preliminary DB diagnostic build with thread safety re-implemented.

Note: This generates a lot of logs lines.

Do not run this with more than maybe 5 players or for an extended period of time.

https://www.ascensiongamedev.com/resources/filehost/4b604d2f4b0e5955eecac51d60fcae15.zip

Preliminary DB diagnostic build with thread safety re-implemented. Note: This generates a _lot_ of logs lines. Do not run this with more than maybe 5 players or for an extended period of time. https://www.ascensiongamedev.com/resources/filehost/4b604d2f4b0e5955eecac51d60fcae15.zip
Mapyo commented 1 month ago

Are there any known updates to this? @Panda I’d like to use your version however I host an average of 5-20 concurrent players and I am kind of spooked about my data being corrupted.

Update: This is happening to me on Windows Server 2016.

Are there any known updates to this? @Panda I'd like to use your version however I host an average of 5-20 concurrent players and I am kind of spooked about my data being corrupted. Update: This is happening to me on Windows Server 2016.
panda commented 1 month ago
Collaborator

@Mapyo From the only DB file JC received from you, it is not corrupted.

Even if you don’t use the build I linked (we need it used if we hope to diagnose this though) we still need to know what build you are currently using (and it should be the latest).

@Mapyo From the only DB file JC received from you, it is not corrupted. Even if you don't use the build I linked (we need it used if we hope to diagnose this though) we still need to know what build you *are* currently using (and it should be the latest).
Mapyo commented 1 month ago

@Panda, it may not have been corrupted however the server was not saving data to it. So it may either be an issue in that file or an issue somewhere else in my server. I haven’t had the files on my main server become corrupted and the one I gave JC was on a test server, but if it does happen on my main server I’ll let you know and send you that one too. Hopefully it will provide some some more info.

@Panda, it may not have been corrupted however the server was not saving data to it. So it may either be an issue in that file or an issue somewhere else in my server. I haven't had the files on my main server become corrupted and the one I gave JC was on a test server, but if it does happen on my main server I'll let you know and send you that one too. Hopefully it will provide some some more info.
panda commented 1 month ago
Collaborator

The reason for the other build is to diagnose what the server is attempting to do when it stops saving.

We suspect a specific type of issue to be causing this but without the logs from that build we do not know what the trigger is.

If the server stops saving data, shut it down and restart it. Do not delete the playerdata file just because it stops saving, and make a copy each time this occurs before starting the server again (and send us the copy privately).

At worst, you are getting rollbacks. If you just delete the entire file each time you encounter the issue you are rolling back to zero which is not ideal for you or our investigation.

The reason for the other build is to diagnose _what_ the server is attempting to do when it stops saving. We suspect a specific type of issue to be causing this but without the logs from that build we do not know what the trigger is. If the server stops saving data, shut it down and restart it. Do not delete the playerdata file just because it stops saving, and make a copy each time this occurs before starting the server again (and send us the copy privately). At worst, you are getting rollbacks. If you just delete the entire file each time you encounter the issue you are rolling back to zero which is not ideal for you or our investigation.

Just an update this bug is still going strong, had around 21 players online got a “31 burst packet flood” error on the server console and every account made in the last 2 days is erased and the testers from before that are days or close to a week rolled back.

Another note is that this now happens on EVERY server restart not just randomly now.

Just an update this bug is still going strong, had around 21 players online got a "31 burst packet flood" error on the server console and every account made in the last 2 days is erased and the testers from before that are days or close to a week rolled back. Another note is that this now happens on EVERY server restart not just randomly now.
Mapyo commented 3 weeks ago

Mine is not as severe as that, but I have been doing daily backups to ensure I don’t lose too much data.

Mine is not as severe as that, but I have been doing daily backups to ensure I don't lose too much data.
panda commented 3 weeks ago
Collaborator

Build 179 is introducing new logging that should help me understand the server’s behavior when these saves are failing.

Locally I observe some strange behavior that isn’t quite congruent with what you all are seeing (i.e. I don’t have rollbacks), but I am encountering the same issue that is causing your rollbacks.

Tomorrow I will work on merging the special diagnostic build and the latest build and release that.

Build 179 is introducing new logging that should help me understand the server's behavior when these saves are failing. Locally I observe some strange behavior that isn't quite congruent with what you all are seeing (i.e. I don't have rollbacks), but I am encountering the same issue that is causing your rollbacks. Tomorrow I will work on merging the special diagnostic build and the latest build and release that.
Mapyo commented 3 weeks ago

Sounds great, thank-you for dedicating alot of time to this.

Sounds great, thank-you for dedicating alot of time to this.
panda commented 3 weeks ago
Collaborator

Updated special diagnostic build (with less logging to test stability): https://www.ascensiongamedev.com/resources/filehost/7afefb5aadd0ae291a641b9b77bd7ba1.zip

Updated special diagnostic build (with less logging to test stability): https://www.ascensiongamedev.com/resources/filehost/7afefb5aadd0ae291a641b9b77bd7ba1.zip

Reproduced. Server rolled back just now while live, JC should be able to pull the logs from my server

Reproduced. Server rolled back just now while live, JC should be able to pull the logs from my server
Mapyo commented 3 weeks ago

@Panda just got back home. Is it reccommended that I upgrade to that version?

@Panda just got back home. Is it reccommended that I upgrade to that version?
jcsnider added the
verification
label 2 weeks ago
jcsnider commented 2 weeks ago
Owner

Everyone should update to 0.6.0.180+ now which will fix major player database rollbacks and provide better db logging going forward. I’ve marked this as verification but rollbacks should no longer be an issues from 0.6.0.180 on.

Everyone should update to 0.6.0.180+ now which will fix major player database rollbacks and provide better db logging going forward. I've marked this as verification but rollbacks should no longer be an issues from 0.6.0.180 on.
wishy commented 2 weeks ago

i deleted a character, made a new one with the same name, got the server shutdown rollback prevention, restarted the server, character was not deleted, so i delete it again and make a new one with the same name and all goes well

2020-02-05 15:58:34.018 [Error] Error Saving Player Database! Server will shutdown in order to prevent potential rollback scenarios!

--------------------------------------------------------------------------------
2020-02-05 15:58:36.820 [Error] Received unhandled exception from System.Threading.Tasks.Task.

--------------------------------------------------------------------------------
2020-02-05 15:58:36.823 [Error] 
  Message: Kan geen toegang krijgen tot het bestand C:\Floor100\Server\logs\errors-Intersect Server.log omdat het wordt gebruikt door een ander proces.
Stack:    bij System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   bij System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   bij System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   bij System.IO.StreamWriter.CreateFile(String path, Boolean append, Boolean checkHost)
   bij System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding, Int32 bufferSize, Boolean checkHost)
   bij System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding)
   bij Intersect.Logging.Output.FileOutput.get_Writer() in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Output\FileOutput.cs:regel 72
   bij Intersect.Logging.Output.FileOutput.InternalWrite(LogConfiguration configuration, LogLevel logLevel, Exception exception, String format, Object[] args) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Output\FileOutput.cs:regel 91
   bij Intersect.Logging.Output.FileOutput.Write(LogConfiguration configuration, LogLevel logLevel, String message) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Output\FileOutput.cs:regel 97
   bij Intersect.Logging.Logger.Write(LogLevel logLevel, String message) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Logger.cs:regel 44
   bij Intersect.Logging.Logger.Error(String message) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Logger.cs:regel 139
   bij Intersect.Logging.Log.Error(String message) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Log.cs:regel 112
   bij Intersect.Server.Core.Bootstrapper.ProcessUnhandledException(Object sender, Exception exception) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect.Server\Core\Bootstrapper.cs:regel 410
   bij Intersect.Server.Core.Bootstrapper.OnUnhandledException(Object sender, UnhandledExceptionEventArgs unhandledExceptionEvent) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect.Server\Core\Bootstrapper.cs:regel 428
   bij Intersect.Server.DbInterface.<>c__DisplayClass68_0.<SavePlayerDatabase>b__0() in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect.Server\Classes\Database\DbInterface.cs:regel 1264
   bij System.Threading.Tasks.Task.Execute()

--------------------------------------------------------------------------------

translation:

Message: Kan geen toegang krijgen tot het bestand C:\Floor100\Server\logs\errors-Intersect Server.log omdat het wordt gebruikt door een ander proces.

means

Message: Can’t access file C:\Floor100\Server\logs\errors-Intersect Server.log because it’s being used by another process.

Weird because it’s actually being logged in the logfile…

i deleted a character, made a new one with the same name, got the server shutdown rollback prevention, restarted the server, character was not deleted, so i delete it again and make a new one with the same name and all goes well 2020-02-05 15:58:34.018 [Error] Error Saving Player Database! Server will shutdown in order to prevent potential rollback scenarios! -------------------------------------------------------------------------------- 2020-02-05 15:58:36.820 [Error] Received unhandled exception from System.Threading.Tasks.Task. -------------------------------------------------------------------------------- 2020-02-05 15:58:36.823 [Error] Message: Kan geen toegang krijgen tot het bestand C:\Floor100\Server\logs\errors-Intersect Server.log omdat het wordt gebruikt door een ander proces. Stack: bij System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) bij System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost) bij System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost) bij System.IO.StreamWriter.CreateFile(String path, Boolean append, Boolean checkHost) bij System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding, Int32 bufferSize, Boolean checkHost) bij System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding) bij Intersect.Logging.Output.FileOutput.get_Writer() in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Output\FileOutput.cs:regel 72 bij Intersect.Logging.Output.FileOutput.InternalWrite(LogConfiguration configuration, LogLevel logLevel, Exception exception, String format, Object[] args) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Output\FileOutput.cs:regel 91 bij Intersect.Logging.Output.FileOutput.Write(LogConfiguration configuration, LogLevel logLevel, String message) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Output\FileOutput.cs:regel 97 bij Intersect.Logging.Logger.Write(LogLevel logLevel, String message) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Logger.cs:regel 44 bij Intersect.Logging.Logger.Error(String message) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Logger.cs:regel 139 bij Intersect.Logging.Log.Error(String message) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect (Core)\Logging\Log.cs:regel 112 bij Intersect.Server.Core.Bootstrapper.ProcessUnhandledException(Object sender, Exception exception) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect.Server\Core\Bootstrapper.cs:regel 410 bij Intersect.Server.Core.Bootstrapper.OnUnhandledException(Object sender, UnhandledExceptionEventArgs unhandledExceptionEvent) in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect.Server\Core\Bootstrapper.cs:regel 428 bij Intersect.Server.DbInterface.<>c__DisplayClass68_0.<SavePlayerDatabase>b__0() in C:\TeamCity\buildAgent\work\6c64f60308be6068\Intersect.Server\Classes\Database\DbInterface.cs:regel 1264 bij System.Threading.Tasks.Task.Execute() -------------------------------------------------------------------------------- translation: Message: Kan geen toegang krijgen tot het bestand C:\Floor100\Server\logs\errors-Intersect Server.log omdat het wordt gebruikt door een ander proces. means Message: Can't access file C:\Floor100\Server\logs\errors-Intersect Server.log because it's being used by another process. Weird because it's actually being logged in the logfile...
jcsnider commented 2 weeks ago
Owner

Fixed @wishy’s error in 0.6.0.182.

That one was caused by EF trying to save player variables before adding new players to the database.

Apparently that’s been happening the whole time, but since we never caught errors we never knew about it, and the following .SaveChanges() would add the player and then the variable and be fine.

So that crash was only an issue because of the new logging, but I’ve made logic changes which should make it always add the Players before their children components.

Fixed @wishy's error in 0.6.0.182. That one was caused by EF trying to save player variables before adding new players to the database. Apparently that's been happening the whole time, but since we never caught errors we never knew about it, and the following .SaveChanges() would add the player and then the variable and be fine. So that crash was only an issue because of the new logging, but I've made logic changes which should make it always add the Players before their children components.
jcsnider commented 2 days ago
Owner

Closing as there are no more reports. Any more incidents of this we will handle individually.

Closing as there are no more reports. Any more incidents of this we will handle individually.
Sign in to join this conversation.
Loading…
Cancel
Save
There is no content yet.