Tomba 2 Extraction Script
Tomba 2 Asset Extractor
Overview
This script is a specialized tool designed to extract and organize assets from the PlayStation game Tomba! 2: The Evil Swine Return. The script processes the game's .IDX
, .DAT
, and .IMG
files to extract and categorize data such as sprites, 3d models for assets, level geometry, text, and metadata. This is particularly useful for modding, asset recovery, or archival purposes.
The script relies on Python and uses the struct
module to interpret binary data formats. Users can customize the script to specify input (CDpath
) and output (outfolder
) directories for asset extraction.
Prerequisites
- Python installed on your system.
- A copy of the game's
.IDX
,.DAT
, and.IMG
files. - Basic understanding of binary file structures and file system operations.
Usage
Configure Input and Output Paths
Edit the following lines in the script to match the locations of your input files and desired output directory:
CDpath = "/path/to/TOMBA2/CD" # Location of the .IDX, .DAT, and .IMG files
outfolder = "/path/to/outputfolder" # Destination for extracted files
Ensure the paths do not include a trailing forward-slash (/
).
Execute the Script
Run the script using Python:
python tomba2_extractor.py
The script will read the .IDX
file to determine asset locations and extract data from the .DAT
and .IMG
files into organized subdirectories within the outfolder
.
How It Works
Creating Directories
The script creates a structured output directory with the following organization:
outputfolder/ ├── chunk_00/ │ ├── 00_sdats/ │ │ ├── 0000-1234.sdat │ │ └── 00_pointers.txt │ ├── 00_vrams/ │ │ ├── 0000-1234.cvram │ │ ├── 00_shards/ │ │ │ ├── 00-0.shard │ │ │ └── ... │ │ └── 00.vram │ └── 00_trail/ │ ├── 1234-5678.bin │ └── ... ├── chunk_01/ └── ...
Data Decoding with Tuplify
The tuplify
function splits a 32-bit integer into two components:
- dat_id: The higher 8 bits.
- dat_ptr: The lower 24 bits.
def tuplify(item): dat_id = item >> 24 dat_ptr = item & 0x00FFFFFF return (dat_id, dat_ptr)
This aids in interpreting asset pointers.
Reading Chunks
The .IDX
file is divided into fixed-size chunks. For each chunk, the script extracts metadata, pointers, and actual data:
for chunk_index in range(int(os.path.getsize(idxpath) / chunk_size)): print(f"Reading Chunk index {chunk_index:02X}...") ...
Chunk Metadata
Chunk metadata includes:
- img_start and img_end: Texture data range in
.IMG
. - dat_start and dat_end: Asset data range in
.DAT
. - pointer_amount: Number of pointers for this chunk.
Reading Asset Data
Data is read from .IMG
and .DAT
using the ranges defined in the chunk metadata.
Handling Special Data Types
SDAT (Data) Pointers
SDAT data includes pointer structures interpreted as tuples using the tuplify
function. The extracted pointers are saved in a text file.
out_sdat_info.write(f"ID: {sdat_pointers[i][0]:02X} | Pointer: {sdat_pointers[i][1]:04X}\n")
VRAM and Texture Shards
The .IMG
file contains VRAM data structured into texture "shards." Each shard corresponds to a texture fragment with its own metadata:
- x, y: Coordinates within VRAM.
- w, h: Dimensions of the texture.
Shards are recombined to form complete VRAM pages:
with open(imgdest + f"/{chunk_index:02X}.vram", "w+b") as vram: vram.seek(0x100000 - 1) vram.write(b"\0") ...
Trail Data
Trail data is the "trailer" section at the end of each chunk, containing additional asset ranges. The script identifies these ranges and extracts the corresponding data.
for t in range(0, len(traildata), 2): dat_trail_start, dat_trail_end = traildata[t], traildata[t+1] ...