[Reforged] it's time to allow uploading of the pkb files

Ardenaso · Jun 14, 2021

With @Retera's PKBlaster, a new breakthrough has been made for modding Reforged models where we can freely color popcorn emitters and not get limitations/constraints from the only available effects

Here's my videos with recolored popcorns

but pkb files cannot be uploaded; and I could only resort on putting compressed files on the comment section for them. it's time to allow them to be uploaded

Ralle · Jun 14, 2021

Do pkb have the ability to require textures or other assets that the user will need to upload if missing from the asset library in Warcraft? I know nothing about this format and for MDX we parse it to unveil dependencies.

@GhostWolf, @Retera, halp!

Ardenaso · Jun 14, 2021

Ralle said:
Do pkb have the ability to require textures or other assets that the user will need to upload if missing from the asset library in Warcraft? I know nothing about this format and for MDX we parse it to unveil dependencies.

@GhostWolf, @Retera, halp!

nope it's not texture, it has a different way of getting recognized like you're going to need to edit the .mdl file of a model to change the particle emissives unlike in textures which can be just assigned in "edit textures"

Retera · Jun 15, 2021

@Ralle Yes, the PKB files can contain references to texture assets stored in a string, and many of them do. So it is possible to have an MDX linked to a PKB linked to a DDS independent from the list of DDS textures referenced by the MDX. Similar to Reforged HD models, the PKB texture references are .tif incorrectly inside the PKB binary, but are .dds in the actual game storage.

In the same way, when the MDX refers to the PKB, it erroneously uses the .pkfx extension, even though the assets in the game storage are .pkb. Presumably the .pkfx and .tif assets are the format used during version control for the game and prior to its release/deploy cycle. So the game probably also allows those formats as valid for textures, although I have not tried.

I also found that the world editor allows a unit to use a PKB file instead of an MDX file as its "Art - Model File". Seems like maybe anything in the game that can refer to an MDX is also allowed to refer to a PKB directly. It's buggy when we do that, though, and I got annoyed at it pretty quickly and went back to wrapping all my PKB files behind an MDX that just loaded them via the reference ('CORN' chunk in the MDX1000 format).

Edit:
If you're going to try to reverse and parse the format of the PKB to find a list of all used textures, that is probably possible but it might take some time and if you found like the texture reference node type we wouldn't know if there might be some other second texture reference node type without a bunch of extra research.

It is evident to me that the PKB format has got a long list of binary nodes followed by a strings table, and the nodes just refer to the strings by index whenever a string is needed. I was thinking I saw some diffuse texture reference nodes in the node soup last time I was looking at it, but it might be easier initially to just check every string in the pkb in case it was a filepath or something until more is known about them. Might be on the order of ~100-200 checks per file, is that too many?
Probably not if you specifically search for .tif extension, I guess. I saw references to some other file(s) with other extension(s) but as far as I could tell they were not included with the game assets and might just be some kind of build metadata left over.

Ralle · Jun 15, 2021

Thanks for the explanation

. Do you have a spec for this format?

Ralle · Jul 10, 2021

Guys I need some sort of spec! CC: @ScrewTheTrees @Retera.

Retera · Jul 10, 2021

Here's what I do in the PKBlaster program:

Magic
I expect all PKB files to start with the following integers (little endian):
0xc9000b11
0x01040202
0x0000e14a

I don't know what this means, but I found this sequence at the start of every PKB file that I investigated.

Then I read:

Code:

FileHeader {
    int32 firstMagicIdentifier;
    int32 stringDataOffset;
    int64 secondMagicIdentifier;
}

I don't know what the first and second magic IDs are, but I preserve them when modifying files, for good measure. As I recall, it looks like the second magic ID was often something roughly equal to 2x the length of the file in bytes, plus a little bit more. I'm not sure, so I just hold these values intact when loading and saving. I never change the length of a file, so I luck out and don't have a problem.

Nodes
Until you reach the byte index specified by "stringDataOffset", we first want to read the nodes.
The nodes have a very similar structure between them, but in order to parse their contents you would need to know all of the node types by name, which I do not know.

Code:

Node {
    int32 byteLength; // exclusive with regards to itself, this is the sum byte length of all other data in Node
    int8 magic32ValueByte; // should always contain 0x20, or "32" in decimal
    int32 messageTypeStringKey; // match against string table to know the name of the type of the node
    int16 fieldCount; // see "Interpretation" below
    {data bytes}
}

For me, the parsing of these nodes takes place in a simple loop that reads until we reach stringByteOffset. In java, that looks like this:

Java:

        while (buffer.position() < stringDataOffset) {
            final int length = buffer.getInt();
            final byte magic32ValueByte = buffer.get(buffer.position());
            if (magic32ValueByte != 0x20) {
                throw new IllegalStateException("Not 32 bit (0x20): " + magic32ValueByte);
            }
            final int messageType = buffer.getInt(buffer.position() + 1);
            final short fieldCount = buffer.getShort(buffer.position() + 5);
            final ByteBuffer chunkContentsBuffer = ByteBuffer.allocate(length - 7);
            chunkContentsBuffer.clear();
            for (int i = 7; i < length; i++) {
                chunkContentsBuffer.put(buffer.get(buffer.position() + i));
            }
            nodes.add(new Node(messageType, chunkContentsBuffer));
            buffer.position(buffer.position() + length);
        }

Strings
The strings table, as its name implies, is just a giant list of all strings used in the file.

This can be read by reading:

Code:

StringsTable {
    int32 stringCount;
    {data}
}

To read the data, use a for loop that runs stringCount number of times. At each step, read 1 uint8 to represent the length of the string, then read N chars where N equals the length value uint8 that you just read in. So, strings are not padded and are not consistent in how long they are, and they can only be up to 255 bytes in length.

Interpretation
All of the above specification was something I invented by just hunting and tinkering with a hex editor and the contents of my Reforged installation. This is not exactly reasonable, but after I read on the PopcornFx website that you cannot change the color of an emitter in your Reforged Map without emailing Blizzard to ask for the source asset -- because modifying the compiled assets is too complicated and they don't support it -- I didn't feel like dealing with that company or even investigating their software that was used to make these files since they are anti-modding.

Obviously that would be great if you could parse out the "data bytes" of what I call the "Nodes". Actually I don't even know if they should be called "Nodes". You can see in the PKBlaster program source code on GitHub that in the source I called them Chunks even though by the time I created a UI for it, I was calling them "Nodes". Just now after ranting about PopcornFx in the last paragraph, I visited their website and found in their documentation wiki a screenshot of their "Editor UI" in a section called "Nodegraph" so maybe calling them Nodes is reasonable. Here's what that image looked like:

So, we're dealing with this spaghetti of UI wires between the nodes, probably. What I found is that what I refer to as {data bytes} in my definition of the nodes has a consistent format but it only makes sense if you have the full internal spec of the PKB that I do not have. It appears to be exactly fieldCount number of repeating groups that are kind of like key/value pairs. I assume they exist to represent the lighter colored boxes inside the big boxes above, or something like that. But the problem is that they are all of the form:

Code:

Field {
    int16 fieldType;
    {data}
}

The format does not include the length of the data, unfortunately, as far as I can tell. If you can imagine, in order to parse this, you have to know the length of all field types. So you need application/business logic in order to do the parsing. There seem to be quite a lot of the field types and these files can get quite large, and sometimes the field will contain more data with dynamic size in itself depending on what it means to the application.

At this point in my program to attempt to do recolor shenanigans on this format, what I decided to do is only try to make a list of the Field types necessary for Nodes who have stringTable[node.messageTypeStringKey] == "CParticleNodeSamplerData_Curve". These are Nodes whose type, as far as I understand based on that name, is a Node Sampler Data. In some circumstances, these sampler data nodes will include contents that can define color, and this was done often enough in Reforged that we can use it for recolors to achieve reasonable results.

Here are the list of CParticleNodeSamplerData_Curve fields that I allow for recoloring in my hacked together PKBlaster solution:

Code:

Type 0 {
    int32 unknown1;
    int32 unknown2;
}

Type 7 {
    int32 unknown;
    int32 propertyIndex; // maybe indexes into the list of nodes, or of strings, I forget
}

Type 9 {
    int32 unknown;
}

Type 10 {
    int32 unknown;
}

Type 14 {
    int32 unknown;
}

Type 16 {
    int32 numberOfFloats;
    float32 floats[numberOfFloats];
}

Type 17 {
    int32 numberOfFloats;
    float32 floats[numberOfFloats];
}


Type 18 {
    int32 numberOfFloats;
    float32 floats[numberOfFloats];
}

So, again that looks like a lot of "unknown" data but this is actually incredibly useful because I am using it for the length of the fields, so that I know how much to skip and can go on to process other fields.

To perform a recolor of the file, given a destination color, I will only change the above data in the case of Field Type 17 when we have numberOfFloats==12, or in the case of Field Type 18 when we have numberOfFloats==24. In both cases, I consider the floats to be repeating groups of RGBA, RGBA, RGBA...

It's not foolproof and I got reports of times when this just randomly messed up the color of some PKBs, but in many cases these floats do end up being RGBA or possibly a rate of change of RGBA, probably. Not from experience but just by guessing looking at the data I assumed there might be some of these "rate of change of RGBA" values so in all of my recolor solutions I attempt to keep the sign and total magnitude of these floating point values intact while just shifting their color distribution for the most part.

ScrewTheTrees · Jul 10, 2021

Ill just insert the specs i have i suppose.

C#:

Header {
    uint32: unknownHeaderVersion;   // This number is used to track what version of popcornfx the file was compiled with, higher numbers for newer versions.
    byte[4]: editorVersion;  //These 4 bytes dictate what verison of the editor it was compiled with,  for wc3 it was 2.2.4.1
    int32: editorBuildVersion; //Just like above, this number is the build/patch of the editor, for wc3 this is 57674.
    int32: totalNumberOfChunks; //Total number of chunks.
    int32: stringTableOffset;  //The offset in the file pointing to the string table.
    int32: unknownMagicId2; //
    int32: unknownMagicId3; //
    Chunk[totalNumberOfChunks]: chunks; //
    StringTable: strings; //
}
Chunk { //I think retera has more valid information on this, i didnt get far into the actual chunk architecture.
    int32: importedLength; //Size of this chunk in bytes.
    int32: magic32bitValue; //The TypeID of what kind of chunk this is.
    byte[importedLength - 4]: content; //Every chunk type has different content.
}
StringTable {
    int32: stringCount;
    String[stringCount]: data;
}
String {
    byte: stringLength;
    asciiChar[stringLength]: data;
}

Looks something like this i suppose:

C#:

//class PKBFile
      public PKBFile ReadFromStream(Stream stream) {
            var reader = new BinaryReader(stream);

            this.unknownHeaderVersion = reader.ReadUInt32();
            this.editorVersion = reader.ReadBytes(4);
            this.editorBuildVersion = reader.ReadUInt32();

            var totalNumberOfChunks = reader.ReadInt32(); //Possible that this is
            if (unknownHeaderVersion > 3372223249) reader.ReadInt32(); //Newer than reforged, has string table length here too.
            stringDataOffset = reader.ReadInt32();
            unknownMagicId2 = reader.ReadInt32();
            unknownMagicId3 = reader.ReadInt32(); //0

            Log.Information("Stream Length: {@length}, stringDataOffset: {@offset}",
                reader.BaseStream.Length, stringDataOffset);

            Log.Information("Number of chunks: {@one}, Magic2: {@two}, Magic3: {@three}",
                totalNumberOfChunks, unknownMagicId2, unknownMagicId3);

            Log.Information("h1: {@h1}, editorVersion: {@h2}, editorBuildVersion: {@h3}",
                unknownHeaderVersion, string.Join(".", editorVersion), editorBuildVersion);


            while (reader.BaseStream.Position < stringDataOffset) {
                var chunk = new UnknownChunk().ReadFromStream(reader);
                ChunkTable.Add(chunk);
            }


            reader.BaseStream.Position = stringDataOffset;

            var stringCount = reader.ReadInt32();
            for (var i = 0; i < stringCount; i++) {
                var length = reader.ReadByte();
                var entry = reader.ReadAsciiString(length);
                StringTable.Add(entry);
            }

            Log.Information("Total chunktable Size: {@amount}", this.ChunkTable.Count);


            return this;
        }



//class UnknownChunk
        public UnknownChunk ReadFromStream(BinaryReader reader) {
            this.importedLength = reader.ReadInt32();
            this.magicValue = reader.ReadUInt32(); //32
            reader.BaseStream.Position -= 4;


            this.data.Write(reader.ReadBytes(importedLength));


            return this;
        }

Retera · Jul 10, 2021

I like your spec better. Seems more clearly defined. If I was re-implementing a solution for this, I would probably use your spec and combine it with my Retera post above that goes into some of the sample chunk types, in case I needed those sampler chunk types.

So then the next question is why we need a spec, because it seems like having the Hive site do much analysis with these files is likely to be a waste of time in comparison to just allowing people to upload them. The files are able to include textures, but knowing the full extent of how and where in the file they reference the texture paths seems difficult to me. You would need to know an exhaustive list of all node/chunk types that were capable of referencing a TIFF/DDS filepath. I am not sure if anybody knows that list.

ScrewTheTrees · Jul 10, 2021

Retera said:
I like your spec better. Seems more clearly defined. If I was re-implementing a solution for this, I would probably use your spec and combine it with my Retera post above that goes into some of the sample chunk types, in case I needed those sampler chunk types.

So then the next question is why we need a spec, because it seems like having the Hive site do much analysis with these files is likely to be a waste of time in comparison to just allowing people to upload them. The files are able to include textures, but knowing the full extent of how and where in the file they reference the texture paths seems difficult to me. You would need to know an exhaustive list of all node/chunk types that were capable of referencing a TIFF/DDS filepath. I am not sure if anybody knows that list.

Your Chunk data is a lot better and makes more sense then whatever i had x)

But Yeah indeed, it is possible to use the editor to go through the chunks 1 by 1 and mapping all the data, but that would just be an enormous undertaking and probably just not worth it in the end... I did some experimentation with that but a lot of the data written to compiled versions of the file is not visible in the source files/editors. Which complicates things :/

Ardenaso · Dec 14, 2021

Bump. I really hope pkb files would be allowed to be uploaded on the post proper soon and not have to be posted on a zip file

Ralle · Dec 15, 2021

I think I got lost in a question related to this. Can a pkb file reference to a texture similarly to how a mdx can?

Retera · Dec 15, 2021

Yes, however the only currently known feasible way to find what textures the file uses is to iterate the entire strings table and pick the ones with .tif extension or some other heuristic, I guess.

Basically actually looking at the nodes and picking out texture reference nodes is not feasible given our current limits of our knowledge (without a substantial time investment that seems unguaranteed to even give perfectly accurate results).

Ralle · Dec 16, 2021

Damn that seems... Inelegant. Any better ideas?

BogdanW3 · Dec 16, 2021

PKBs only exist in the HD context and, from my enourmously underscaled research, always reference the textures via the _hd.w3mod prefix as well. My suggestion, alternative to .tif searching which is probably more resilient, would be looking for "_HD.w3mod/" and finding textures like that for now.

Another, simpler but almost objectively worse, option is to require no textures with a pkb file, and trust the uploader to include them all correctly (not to mess them and their paths up.)

Our decades of reversing everything we come across might have to end sooner rather than later, especially if formats continue getting more complex and any company gets into a disagreement about us interacting with their proprietary formats.

One more edit just so I'm a bit less confusing: this kind of shortcut would arbitrarily enforce that everyone modifying pkbs only utilises custom textures in the _HD space, which is not ideal by way of leaving less choice to a user.
As Retera mentioned, looking for all .tif (and potentially .dds and .tga) paths would probably be the only surefire way to get all textures a pkb uses.

Retera · Dec 17, 2021

BogdanW3 said:
One more edit just so I'm a bit less confusing: this kind of shortcut would arbitrarily enforce that everyone modifying pkbs only utilises custom textures in the _HD space, which is not ideal by way of leaving less choice to a user.
As Retera mentioned, looking for all .tif (and potentially .dds and .tga) paths would probably be the only surefire way to get all textures a pkb uses.

If there's one thing I've seen in wc3 mapping it's that people are probably not likely to follow some arbitrary convention we want them to follow unless it's made to be very obvious. I'm guessing people would use custom textures that did not include the special HD prefix, especially since such textures still load when the game is in the HD mode.
But maybe that's just my opinion.

[Reforged] it's time to allow uploading of the pkb files

Similar threads