• 🏆 Texturing Contest #33 is OPEN! Contestants must re-texture a SD unit model found in-game (Warcraft 3 Classic), recreating the unit into a peaceful NPC version. 🔗Click here to enter!
  • It's time for the first HD Modeling Contest of 2024. Join the theme discussion for Hive's HD Modeling Contest #6! Click here to post your idea!

[Misc] Extracting Assets using CascLib

Overview

Since patch 1.30 PTR, Warcraft 3 has been updated to use a CASC file system to manage their assets instead of the old MPQ system. So reading and extracting assets is a little different now.

For this tutorial, we'll be using Zezula's CascLib.

Note: my C++ is shit so don't use this code verbatim. Also, Warcraft III maps will still be MPQ archives.


Building

Building varies from operating system to operating system, but generally you can follow this path:
  • Clone the repository:
    Code:
    $ git clone https://github.com/ladislav-zezula/CascLib.git

  • Install cmake.

  • Navigate to that directory in your console/terminal.

  • Type the following to generate a Makefile:
    Code:
    $ cmake CMakeLists.txt

  • Type this to install the library to a local directory:
    Code:
    $ make install

  • (Optionally) To edit or build using an IDE, you can either open up the existing project (e.g. *.vcxproj, or *.kdev4) or generate your own using cmake (for example):
    Code:
    $ cmake -G Xcode
    Here are all the generators that cmake supports: <cmake-generators>

    You may need to manually configure your project to have the correct headers to include in the build products.
Linking will depend on your IDE, but in most cases you'll just drag-and-drop the library in and include the headers CascLib.h and CascPort.h.


Note: String Encoding

I'll be using ANSI-encoded strings throughout this tutorial. However, if you're on Windows, you'll likely have to prefix your strings with L if you're building with the UNICODE flag enabled. For example:
C++:
L"C:\\Program Files (x86)\\StarCraft II"

Read more about it in this post.


Opening Storage

Similar to how we'd open an MPQ to read from it, we'll "open" the storage to obtain a handle for reading assets. CascLib makes this very easy:
C++:
int main(int argc, const char * argv[]) {
    /// Access wc3's data files
    HANDLE storage;
 
    /// Open and check for errors
    if (!CascOpenStorage("/Applications/Warcraft III Public Test/", 0, &storage)) {
        std::cerr << "Error opening storage: " << GetLastError() << std::endl;
        return EXIT_FAILURE;
    }
 
    /// Cleanup
    CascCloseStorage(storage);
 
    return EXIT_SUCCESS;
}

Let's look at the CascOpenStorage function:
C++:
bool  WINAPI CascOpenStorage(const TCHAR * szDataPath, DWORD dwLocaleMask, HANDLE * phStorage);

FieldDescription
szDataPath
A path to your CASC file system. It should be a directory that contains the game's .build.info file (you can verify this with $ ls -a), or a subdirectory of the .build.info directory (CascLib will search for it for you). For Warcraft III, it is stored in the game's root directory.
dwLocaleMask
The set of locales you'd like to access. If you input 0, it will use the default locale based on the .build.info file. If you want to use this parameter, use the CASC_LOCALE_ constants defined in CascLib.h and combine them using a bitwise OR: CASC_LOCALE_ENUS | CASC_LOCALE_KOKR.
phStorage
A pointer (to a pointer) to represent the CASC storage. CascLib will write to this address, and you'll use that handle for all your other CASC operations. Make sure you keep a strong reference to this for as long as you need it!
return
The function returns true if it succeeded, and false if not. If an operation fails (returns false), then you can check the error using GetLastError(). It'll return an error code based on Windows' system error codes here.

CascCloseStorage will free all memory associated with the open operation. Be sure to call this before losing reference to your storage handle, else you'll run into memory leaks!


Searching for Files

Similar to StormLib, CascLib allows you to iterate through files based on a regex mask. "*" or NULL will return all results, "*.blp" will return only blp files, etc.

For this snippet, we're going to write a function that prints out all the files in a CASC storage matching a given mask:
C++:
static bool PrintFileNamesInStorage(HANDLE storage, const char *mask) {
    /// Struct to hold data retrieved from files
    CASC_FIND_DATA data;
 
    /// Retrieve a handle for iterating through files
    HANDLE findHandle = CascFindFirstFile(storage, mask, &data, NULL);
 
    if (!findHandle || findHandle == INVALID_HANDLE_VALUE) {
        return GetLastError() == ERROR_SUCCESS;
    }
 
    /// Iterate through each file we can find and log their name
    bool hasMoreResults = true;
    do {
        std::cout << data.szFileName << std::endl;
        hasMoreResults = CascFindNextFile(findHandle, &data);
    } while (hasMoreResults);

    /// Clean up and handle errors
    bool findFailed = (GetLastError() != ERROR_NO_MORE_FILES && GetLastError() != ERROR_SUCCESS)
    bool closeFailed = !CascFindClose(findHandle);

    if (findFailed || closeFailed) {
        return false;
    }

    return true;
}

ChunkDescription
CASC_FIND_DATA dataWhen iterating through the files, this struct will hold the data for the current file we're iterating on. For our purposes, we'll only be interested in data.szFileName, which is the path for the file. But there are other fields to query, such as the encoded key, content key, locale flags, etc. Check out _CASC_FIND_DATA in CascLib.h for more info.
CascFindFirstFileUse this function to initiate the search. It takes in our CASC storage handle, a mask to filter on (regex), a pointer to our data struct (CascLib will write to this), and an optional listfile parameter (i.e. if the file system doesn't have one already). It returns a handle that we'll use as an iterator later on.
Error HandlingIf the first "find" fails or cannot find any files matching the mask, findHandle will be either nil or set to INVALID_HANDLE_VALUE. We can capture that and return true/false based on whether the last error was ERROR_SUCCESS or not. (for example, if you try to match a mask "xx" and it can't find any files, that is still considered a successful operation).
Iterating through filesWe'll loop while hasMoreResults is true. CascFindNextFile will take our "find" handle and try to find the next file that matches the given mask. Again, you'll pass in the pointer to your data struct, and do whatever you'd like with it in the loop. It'll return false if there are no more results or if an error occurred.
Clean upOnce we're done with our search, we can close the find handle to free up memory, and check for errors. We ignore ERROR_NO_MORE_FILES because that just signals when the loop has finished.

Sample usage:
C++:
PrintFileNamesInStorage(storage, "*.blp");

Reading & Extracting Files

Extracting consists of three main steps:
  • Attempt to open the desired file
  • Read all the data from it
  • Write it somewhere else to disk
Here is a rough outline of what that looks like:
C++:
static bool ExtractFile(HANDLE storage, const char *pathToFile, const char *destination) {
    /// File handle for read operations
    HANDLE file;
 
    /// Attempt to open the file
    if (!CascOpenFile(storage, pathToFile, 0, CASC_OPEN_BY_NAME, &file)) {
        return false;
    }
 
    /// Prepare the output file
    std::ofstream output(destination, std::ios::out | std::ios::binary);
 
    /// Read the file in chunks
    char buffer[0x10000];
    DWORD bytesRead = 1;
 
    while (bytesRead > 0) {
        if (!CascReadFile(file, buffer, sizeof(buffer), &bytesRead)) {
            break;
        } else if (bytesRead > 0) {
            output.write(buffer, bytesRead);
        }
    }
 
    /// Close the file and clean up
    output.close();
    CascCloseFile(file);
 
    if (!output) {
        throw std::exception();
    }
 
    int lastError = GetLastError();
    return lastError == ERROR_HANDLE_EOF || lastError == ERROR_SUCCESS;
}

ChunkDescription
CascOpenFile
This will open the file in the provided storage. The pathToFile portion is the internal path to the asset, e.g. "war3.mpq:replaceabletextures/commandbuttons/btnmurloc.blp". Refer to this for a list of all valid input. The next parameter is unused, so we'll just put 0. CASC_OPEN_BY_NAME tells CascLib that you're providing the plain name of the file to open, rather than an encoded key (EKey) or a content key (CKey). Finally, you'll provide a pointer to your handle that we'll use for read operations. It also maintains a file pointer that automatically gets set as you scan the file.
Reading the file
CascLib allows you to specify how many bytes to read, so we'll read it in chunks. CascReadFile takes the file handle as the first parameter to specify which file to read from and where in the file it should begin reading. By default, it is the start of the file, but you can change this using CascSetFilePointer. The buffer parameter is the pointer to where the bytes should be stored. The next parameter determines how many bytes we want to read. The last parameter is a pointer to "bytesRead", which CascLib will fill with how many bytes were actually read. The function returns false if it fails, else the loop will continue so long as the file has bytes to read. The file pointer gets moved automatically after each read operation.
Cleanup
Similar to all our other operations, we need to manually tell CascLib that we're done with the file handle to free up extra memory. At the end, we'll return "true" if the last error is EOF (which isn't really an error we care about) or if it is successful.

Sample usage:
C++:
ExtractFile(storage, "war3.mpq:sound/music/mp3music/comradeship.mp3", "/Applications/Warcraft III Public Test/comradeship.mp3");


Final Words

  • For technical specs, check out the wowdev wiki

  • For an overview on what the directories and individual files mean at a high level, check out this page.

  • For contributing and reporting bugs, go to the CascLib main repo
 
Last edited:
Which is very poorly written. To make an actual working implementation you will need to look at Zezula's CascLib.

It is important to note that only the main data files will be migrating to CASC. All Warcraft III maps will remain as MPQ.
YASH! Thank you for that heartwarming information of maps use MPQ :D
 

Dr Super Good

Spell Reviewer
Level 63
Joined
Jan 18, 2005
Messages
27,178
Important to point out that CascOpenStorage takes the path in the form of TCHAR. If you are building your application targeting Microsoft Windows this should be UTF16 encoded WCHAR null terminated string with Unicode compile flag enabled. Under these conditions the demo code above will not compile as that is an ANSI encoded char string.

One must prefix the string constant with L for it to be a WCHAR string such as...
Code:
L"C:\\Program Files (x86)\\StarCraft II"

When targeting windows the library must be built with Unicode flag enabled to use WCHAR. Windows does not natively support UTF8, instead defaulting to legacy multi byte code pages that are locale dependant. Windows does support UTF16, hence why when Unicode flag is enabled it uses the 16bit WCHAR type for strings and all OS API calls. If the library is not built Unicode aware it might not be able to access CASC archives that are located on a path that is not representable by the standard locale specific character set, throwing a file not found in such conditions.

Why the developer of this library designed the API like this I do not understand... In this day and age it should accept a UTF8 path strings. On Windows it should internally convert this input UTF8 path string into UTF16 for use by the windows specific APIs. On other OS like Linux or Apple it should keep it as UTF8 and use it directly. This is because UTF8 is pretty much the global standard for encoding Unicode, with Windows only using UTF16 for legacy reasons.

In C++ Windows does not use explicit UTF16 strings directly. This is because C++ only added support for them long after Windows adopted UTF16 and used WCHAR. Technically one should be able to type cast from UTF16 strings to WCHAR on Windows as they should be encoded the same. WCHAR constants might impose a limit of only characters representable using 1 16bit surrogate, however the Windows API itself will still process UTF16 surrogate pairs correctly, at least most of the time.
 
(Optionally) [...] generate you're own [...]

Found a typo there.

Still, this is another level of modding (reading casc files) that I cannot yet comprehend fully (and strive to understand). I will presume that not a lot of the community has produced tools capable of reading said CASC files, and extract contents therefrom.
 

Dr Super Good

Spell Reviewer
Level 63
Joined
Jan 18, 2005
Messages
27,178
Well so far I have a piece of self written Java that can extract CASC files for local CASC archives. The only problem is that I do not know which file, it perfectly extracts nameless anonymous files.

I thought MPQ specification was badly written. Then I met CASC...
 
Level 5
Joined
Dec 20, 2008
Messages
67
Important to point out that CascOpenStorage takes the path in the form of TCHAR. If you are building your application targeting Microsoft Windows this should be UTF16 encoded WCHAR null terminated string with Unicode compile flag enabled. Under these conditions the demo code above will not compile as that is an ANSI encoded char string.

One must prefix the string constant with L for it to be a WCHAR string such as...
Code:
L"C:\\Program Files (x86)\\StarCraft II"

When targeting windows the library must be built with Unicode flag enabled to use WCHAR. Windows does not natively support UTF8, instead defaulting to legacy multi byte code pages that are locale dependant. Windows does support UTF16, hence why when Unicode flag is enabled it uses the 16bit WCHAR type for strings and all OS API calls. If the library is not built Unicode aware it might not be able to access CASC archives that are located on a path that is not representable by the standard locale specific character set, throwing a file not found in such conditions.

Why the developer of this library designed the API like this I do not understand... In this day and age it should accept a UTF8 path strings. On Windows it should internally convert this input UTF8 path string into UTF16 for use by the windows specific APIs. On other OS like Linux or Apple it should keep it as UTF8 and use it directly. This is because UTF8 is pretty much the global standard for encoding Unicode, with Windows only using UTF16 for legacy reasons.

In C++ Windows does not use explicit UTF16 strings directly. This is because C++ only added support for them long after Windows adopted UTF16 and used WCHAR. Technically one should be able to type cast from UTF16 strings to WCHAR on Windows as they should be encoded the same. WCHAR constants might impose a limit of only characters representable using 1 16bit surrogate, however the Windows API itself will still process UTF16 surrogate pairs correctly, at least most of the time.

I mean i can understand that he opted for a TCHAR interface. It is the default for native applications on windows. If he had used utf-8 the user would have to convert his native path to utf8 before he can call the library.
Proper handling of Unicode, especially regarding filenames, is a festering wound of C++ on Windows. You can have either correct code or nice code. Maybe this gets better with std::filesystem. I have not really worked with that yet. And i highly doubt that Microsoft will change anything about their C API.
 

Dr Super Good

Spell Reviewer
Level 63
Joined
Jan 18, 2005
Messages
27,178
I mean i can understand that he opted for a TCHAR interface. It is the default for native applications on windows. If he had used utf-8 the user would have to convert his native path to utf8 before he can call the library.
No new application should support anything other than Unicode... TCHAR is only needed if you plan to support both otherwise you explicitly use either UTF-8 or UTF-16 to leave no ambiguity.

CASC seems to have 3 core components.
  • CDN Archive: Where all data gets sourced from, hosted by Blizzard.
  • Local Archive: Where required data is cached, in the application install directory.
  • File System Structures: Describe the layout of files and references to actual files in either CDN or local archives. This is different for every game.
So far with Java I have local archive reading working perfectly for Warcraft III. The problem is that I do not have any file system structure support so one cannot lookup a specific file by name even though all files can be decoded.
 
Top