Desync/Crash Precise Cause Given Bliz Release?

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,285
Not likely since the desync is usually detected via a checksum. Although such checksums can tell you roughly where the issue is by its creation, it cannot tell you specifically what went wrong (what piece of code, what entities) as they are not reversible. Say Warcraft III had a dedicated checksum for unit state, then if that caused the desync then anything that manipulated a unit (triggers, normal gameplay, player orders, e.t.c.) might be a possible cause. The more generic the checksum is, the more possible causes and the less helpful it is for identifying what exactly caused a desync.

For a better diagnosis, modified executables would be required that keep track of all the checksum intermediate values and an associated tag representing the data being processed between checks so that such list can be compared to see what data and code is breaking. Such lists can be very large (thousands of lines) and tracking the data for them might have a significant performance impact.
 
Last edited:
Level 21
Joined
Mar 16, 2008
Messages
955
How would I practically implement modified executables? What does that even mean to someone who isn't very knowledgeable with code? Maybe at least as a debug version of the standard map. My version of Kings and Knights has become a huge mess of 1000+ triggers and 100s of custom assets.
 
The text call TriggerSleepAction appears in your map script 259 times. Sleeps are probably bad, and it's probably better if you do not use them.

The text GetLocalPlayer appears 4 times in your map script.
JASS:
    ///Stores previously sent messages for tamper detection purposes
        function s__MMD___QueueNode_create takes integer id,string msg returns integer
            local integer this= s__MMD___QueueNode__allocate()
            set s__MMD___QueueNode_timeout[this]=(TimerGetElapsed(MMD___clock)) + 7.0 + GetRandomReal(0, 2 + 0.1 * GetPlayerId(GetLocalPlayer())) // INLINED!!
            set s__MMD___QueueNode_msg[this]=msg
            set s__MMD___QueueNode_checksum[this]=MMD___poor_hash(s__MMD___QueueNode_msg[this] , id)
            set s__MMD___QueueNode_key[this]=I2S(id)
            return this
        endfunction


JASS:
    ///Returns true for a fixed size uniform random subset of players in the game
    function MMD___isEmitter takes nothing returns boolean
        local integer i= 0
        local integer n= 0
        local integer r
        local integer array picks
        local boolean array pick_flags
        loop
            exitwhen i >= 24
            if GetPlayerController(Player(i)) == MAP_CONTROL_USER and GetPlayerSlotState(Player(i)) == PLAYER_SLOT_STATE_PLAYING then
                if n < MMD___num_senders then //initializing picks
                    set picks[n]=i
                    set pick_flags[i]=true
                else //maintain the invariant 'P(being picked) = c/n'
                    set r=GetRandomInt(0, n)
                    if r < MMD___num_senders then
                        set pick_flags[picks[r]]=false
                        set picks[r]=i
                        set pick_flags[i]=true
                    endif
                endif
                set n=n + 1
            endif
            set i=i + 1
        endloop
        return pick_flags[GetPlayerId(GetLocalPlayer())]
    endfunction


It also shows up in CinematicModeWithoutChangingFog by Purgeandfire but this is probably a slightly modified BJ function and probably safe.

It appears that the whole MMD tamper-protection nonsense, which I am not familiar with, contains sequences of code that may in some instances deallocate MMD Queue Nodes and then fire the OnDestroy trigger in a totally local-player dependent way.

JASS:
//Generated destructor of MMD___QueueNode
function sc__MMD___QueueNode_deallocate takes integer this returns nothing
    if this==null then
        return
    elseif (si__MMD___QueueNode_V[this]!=-1) then
        return
    endif
    set f__arg_this=this
    call TriggerEvaluate(st__MMD___QueueNode_onDestroy)
    set si__MMD___QueueNode_V[this]=si__MMD___QueueNode_F
    set si__MMD___QueueNode_F=this
endfunction

I guess maybe deallocation of a vJASS struct in per-client local way -- happening on some machines and not others -- isn't something that I expressly know to be unsafe. So, maybe it is safe. But it seems a bit ballsy if you're having problems. As Reforged ages and becomes increasingly unstable, it seems to me that you might have a better success rate with the custom games by eliminating systems that are not strictly necessary, and keeping the custom game logic as simple as possible.

Whatever "tamper protection" scripts are, they don't fall into that category of "as simple as possible," especially not if they are spinning up and later deallocating vJASS structs on some machines and not other machines. If you're going to continue to use a system like that, you're going to want to make sure that the person who wrote it is still actively available on Hive, and is maintaining updates against new Reforged patches. For example, what if when this was written it was fine and safe to run TriggerEvaluate calls in local code, but then in the new patch something becomes more unstable and TriggerEvaluate becomes no longer local safe?

And I'm totally spitballing a bit. I guess whoever wrote that system for you, if I'm correct in thinking that they ought to be an online an active user to maintain something like that, or else you shouldn't use that, then that probably means you can contact them after you read this post. And they might tell you that I don't know what I'm talking about, and that their code is all perfectly local safe. It's possible, admittedly I didn't actually try playing your map and I'm not sure.

But desyncs and crashes are typically going to fall into 2 categories:
A) Blizzard/Activision/Microsoft screwed something up and there's nothing you can do about it except for avoiding a particular feature
B) You screwed something up and you can change your map to fix it, by doing it properly

In my opinion, the Type A problems have increased a lot in recent years. And that seems to be your assumption in this case, that you have a Type A problem and you want someone to go and find the cause of it. But, I've been looking at those debug symbols and I've been wondering if they're useful, and I have to say, for a "needle in a haystack" problem like you have here, I'm not really sure that the debug symbols are going to be useful without a big lot of work put in. So if you're going to ask someone to volunteer and dive in and spend an exceedingly long time trying to determine if you have a "Type A" problem, then you probably want to really, really exhaustively confirm that you don't have a "Type B" problem.

Edit 2:

Your trigger named FC00 uses the custom script call RemoveLocation(udg_TempPoint) without first setting a value to TempPoint. Is this some attempt to break the computer? Maybe you get lucky and Warcraft 3 ignores it. But when I did that same thing on 1.26 as a kid, it destroyed the computer and turned the units into funny shapes and made colors go everywhere. Generally don't call "RemoveUnit" or "RemoveLocation" on an unspecified value.

However, yet again, it's possible that might be a red herring. Looking at the debug leaks/ source as you mentioned, I am seeing the following implementation for RemoteLocation:

C++:
void RemoveLocation(unsigned int locHandle)
{
  CMapLocation *pLoc;

  pLoc = ConvertHandle<CMapLocation>(locHandle);
  if ( pLoc )
    pLoc->Destroy(pLoc);
}

So, this does appear to perform an if check on the location passed in, and only destroy it if it was non-null. Most likely, when the trigger FC00 runs in your map, the location here IS null. So what you're doing doesn't make sense, but the game's C++ is probably covering for you.

Of course, if one of the other triggers that uses udg_TempPoint runs first before FC00 (unlikely since FC00 runs on map initialization) then you would get into trouble here, since you might be removing a location twice, and that is more likely to cause the funny colors from my childhood than removing a "null" if memory serves.
 
Last edited:

Dr Super Good

Spell Reviewer
Level 64
Joined
Jan 18, 2005
Messages
27,285
How would I practically implement modified executables? What does that even mean to someone who isn't very knowledgeable with code?
This is not something trivial to do and best left to people with a lot of experience with programming.

To summarise the work flow.
  1. Convert debug symbols into code.
  2. Modify code, making any desired changes.
  3. Compile the code into a format that can be executed.
I think Retera made a few videos demonstrating this in response to the debug symbol link. I am not sure if he was using code made from the debug symbols, or the reverse engineered source code from WC3 Community Edition.
 
I am not sure if he was using code made from the debug symbols, or the reverse engineered source code from WC3 Community Edition.
Actually if we go to warsmash.net and scroll to the bottom of the page, there is a part of the page where I note that the US government has been hiding crashed alien space ships from the public for 100 years, and then I left a note that if any space aliens read my website and get inspired like humans feeding Doritos to an ant hill because it's funny, the space aliens should reverse War3 source and send it to me, because it's funny. And then, for some reason, half a year or so after I left that note on my website, someone on Discord sent me the code that I was using in my YouTube videos, which they got from somewhere, and I don't really know where they got that, but it wasn't included in that form in the debug symbols. So, it's even possible that what we see in my videos came from the space aliens, I don't have any proof one way or the other of that. And I am not a distributor of that code, I don't have Blizzard permission to give that to you. But it's nice, I wish we all had it, and I think that hopefully whoever runs RTS at Blizzard will get motivated to open source the game, and think about how that would be a societal good.

Edit:

Back on topic, I am trying to run this map from @Gnuoy in my independent reference implementation of Warcraft 3 to look for trigger issues that might help with possible bugs that might cause crash or desync. I figure that running the map in the reference implementation might be a bit like an advanced map script checker.

I'm seeing an issue with a timer event being registered on a timer that is set to some null or aberrant value. During the initialization of global variables, we have the following code:

JASS:
  5756     loop
  5757         exitwhen ( i > 1 )
  5758         set udg_Request_Hide_Timer[i]=CreateTimer()
  5759         set i=i + 1
  5760     endloop

Here we see that in the Variables UI, the udg_Request_Hide_Timer is set to have only 1 default value / size that is constructed in its array data.

Despite this, the code makes references to the Request Hide Timer array at non-zero indices:

JASS:
112076             call PauseTimerBJ(true, udg_Request_Hide_Timer[GetConvertedPlayerId(GetTriggerPlayer())])
112077             call StartTimerBJ(udg_Request_Hide_Timer[GetConvertedPlayerId(GetTriggerPlayer())], false, 10.00)


and later

JASS:
112625 //===========================================================================
112626 function InitTrig_Request_Hide_Menu_Yellow takes nothing returns nothing
112627     set gg_trg_Request_Hide_Menu_Yellow=CreateTrigger()
112628     call TriggerRegisterTimerExpireEventBJ(gg_trg_Request_Hide_Menu_Yellow, udg_Request_Hide_Timer[5])
112629     call TriggerAddAction(gg_trg_Request_Hide_Menu_Yellow, function Trig_Request_Hide_Menu_Yellow_Actions)
112630 endfunction

Maybe my reference implementation of the game is more valuable than Debug symbols, I wonder... Isn't this weird and not too good to be using timers from a timer array that were never set with CreateTimer() ? Even if, for some bizarre reason it works... It doesn't seem like good syntax. Am I crazy? Does Reforged mysteriously initialize timers that are not initialized by the user? Really?
 
Last edited:
Level 21
Joined
Mar 16, 2008
Messages
955
Hey thank you so much for taking a look at it and offering suggestions. I don't have time to fully go through it right now but I will go through and implement all the changes you suggest even if it's only a possibility that it causes a desync. I am, and the Kings and Knights players are, so grateful for your attention.

:thumbs_up:

Just a few points of my first impression after skimming your posts,
1) I think Reforged does initialize arrays without a size declaration. But in general I try to declare the array size in GUI just as good practice, even if it's not needed. I declared the timer array size in the GUI variable? I suppose this should show up in the script too. Maybe I didn't. I'll go back and make sure its size is declared though.

2) The MMD system must be stable because that's from wc3stats.com and a lot of maps use it for leaderboard support. But might be safe to disable it for debugging just to be 100% sure.

3) There are many desyncs lurking but the main one that is pretty consistent that players are complaining about recently is around the king defeat/victory triggers / castle death / king quits triggers. (kings are players red, blue, teal, purple)
 
Level 21
Joined
Mar 16, 2008
Messages
955
The text call TriggerSleepAction appears in your map script 259 times. Sleeps are probably bad, and it's probably better if you do not use them.

The text GetLocalPlayer appears 4 times in your map script.
JASS:
    ///Stores previously sent messages for tamper detection purposes
        function s__MMD___QueueNode_create takes integer id,string msg returns integer
            local integer this= s__MMD___QueueNode__allocate()
            set s__MMD___QueueNode_timeout[this]=(TimerGetElapsed(MMD___clock)) + 7.0 + GetRandomReal(0, 2 + 0.1 * GetPlayerId(GetLocalPlayer())) // INLINED!!
            set s__MMD___QueueNode_msg[this]=msg
            set s__MMD___QueueNode_checksum[this]=MMD___poor_hash(s__MMD___QueueNode_msg[this] , id)
            set s__MMD___QueueNode_key[this]=I2S(id)
            return this
        endfunction


JASS:
    ///Returns true for a fixed size uniform random subset of players in the game
    function MMD___isEmitter takes nothing returns boolean
        local integer i= 0
        local integer n= 0
        local integer r
        local integer array picks
        local boolean array pick_flags
        loop
            exitwhen i >= 24
            if GetPlayerController(Player(i)) == MAP_CONTROL_USER and GetPlayerSlotState(Player(i)) == PLAYER_SLOT_STATE_PLAYING then
                if n < MMD___num_senders then //initializing picks
                    set picks[n]=i
                    set pick_flags[i]=true
                else //maintain the invariant 'P(being picked) = c/n'
                    set r=GetRandomInt(0, n)
                    if r < MMD___num_senders then
                        set pick_flags[picks[r]]=false
                        set picks[r]=i
                        set pick_flags[i]=true
                    endif
                endif
                set n=n + 1
            endif
            set i=i + 1
        endloop
        return pick_flags[GetPlayerId(GetLocalPlayer())]
    endfunction


It also shows up in CinematicModeWithoutChangingFog by Purgeandfire but this is probably a slightly modified BJ function and probably safe.

It appears that the whole MMD tamper-protection nonsense, which I am not familiar with, contains sequences of code that may in some instances deallocate MMD Queue Nodes and then fire the OnDestroy trigger in a totally local-player dependent way.

JASS:
//Generated destructor of MMD___QueueNode
function sc__MMD___QueueNode_deallocate takes integer this returns nothing
    if this==null then
        return
    elseif (si__MMD___QueueNode_V[this]!=-1) then
        return
    endif
    set f__arg_this=this
    call TriggerEvaluate(st__MMD___QueueNode_onDestroy)
    set si__MMD___QueueNode_V[this]=si__MMD___QueueNode_F
    set si__MMD___QueueNode_F=this
endfunction

I guess maybe deallocation of a vJASS struct in per-client local way -- happening on some machines and not others -- isn't something that I expressly know to be unsafe. So, maybe it is safe. But it seems a bit ballsy if you're having problems. As Reforged ages and becomes increasingly unstable, it seems to me that you might have a better success rate with the custom games by eliminating systems that are not strictly necessary, and keeping the custom game logic as simple as possible.

Whatever "tamper protection" scripts are, they don't fall into that category of "as simple as possible," especially not if they are spinning up and later deallocating vJASS structs on some machines and not other machines. If you're going to continue to use a system like that, you're going to want to make sure that the person who wrote it is still actively available on Hive, and is maintaining updates against new Reforged patches. For example, what if when this was written it was fine and safe to run TriggerEvaluate calls in local code, but then in the new patch something becomes more unstable and TriggerEvaluate becomes no longer local safe?

And I'm totally spitballing a bit. I guess whoever wrote that system for you, if I'm correct in thinking that they ought to be an online an active user to maintain something like that, or else you shouldn't use that, then that probably means you can contact them after you read this post. And they might tell you that I don't know what I'm talking about, and that their code is all perfectly local safe. It's possible, admittedly I didn't actually try playing your map and I'm not sure.

But desyncs and crashes are typically going to fall into 2 categories:
A) Blizzard/Activision/Microsoft screwed something up and there's nothing you can do about it except for avoiding a particular feature
B) You screwed something up and you can change your map to fix it, by doing it properly

In my opinion, the Type A problems have increased a lot in recent years. And that seems to be your assumption in this case, that you have a Type A problem and you want someone to go and find the cause of it. But, I've been looking at those debug symbols and I've been wondering if they're useful, and I have to say, for a "needle in a haystack" problem like you have here, I'm not really sure that the debug symbols are going to be useful without a big lot of work put in. So if you're going to ask someone to volunteer and dive in and spend an exceedingly long time trying to determine if you have a "Type A" problem, then you probably want to really, really exhaustively confirm that you don't have a "Type B" problem.

Edit 2:

Your trigger named FC00 uses the custom script call RemoveLocation(udg_TempPoint) without first setting a value to TempPoint. Is this some attempt to break the computer? Maybe you get lucky and Warcraft 3 ignores it. But when I did that same thing on 1.26 as a kid, it destroyed the computer and turned the units into funny shapes and made colors go everywhere. Generally don't call "RemoveUnit" or "RemoveLocation" on an unspecified value.

However, yet again, it's possible that might be a red herring. Looking at the debug leaks/ source as you mentioned, I am seeing the following implementation for RemoteLocation:

C++:
void RemoveLocation(unsigned int locHandle)
{
  CMapLocation *pLoc;

  pLoc = ConvertHandle<CMapLocation>(locHandle);
  if ( pLoc )
    pLoc->Destroy(pLoc);
}

So, this does appear to perform an if check on the location passed in, and only destroy it if it was non-null. Most likely, when the trigger FC00 runs in your map, the location here IS null. So what you're doing doesn't make sense, but the game's C++ is probably covering for you.

Of course, if one of the other triggers that uses udg_TempPoint runs first before FC00 (unlikely since FC00 runs on map initialization) then you would get into trouble here, since you might be removing a location twice, and that is more likely to cause the funny colors from my childhood than removing a "null" if memory serves.
Highly suspecting it was the MMD triggers. I was running them during the game but the system was recommended only to run them after victory/defeat.
 
Top