Tomba 2 Extraction Script

From Tomba! Wiki
Jump to navigation Jump to search

Tomba 2 Asset Extractor

LINK TO SCRIPT

Overview

This script is a specialized tool designed to extract and organize assets from the PlayStation game Tomba! 2: The Evil Swine Return. The script processes the game's .IDX, .DAT, and .IMG files to extract and categorize data such as sprites, 3d models for assets, level geometry, text, animation types 1 2 3, level collision, drawmaps, backgrounds. This is particularly useful for modding, asset recovery, or archival purposes.

The script relies on Python and uses the struct module to interpret binary data formats. Users can customize the script to specify input (CDpath) and output (outfolder) directories for asset extraction.


Prerequisites

  • Python installed on your system.
  • A copy of the game's .IDX, .DAT, and .IMG files.
  • Basic understanding of binary file structures and file system operations.

Usage

Configure Input and Output Paths

Edit the following lines in the script to match the locations of your input files and desired output directory:

CDpath = "/path/to/TOMBA2/CD"  # Location of the .IDX, .DAT, and .IMG files 
outfolder = "/path/to/outputfolder"  # Destination for extracted files

Ensure the paths do not include a trailing forward-slash (/).

Execute the Script

Run the script using Python:

python tomba2_extractor.py

The script will read the .IDX file to determine asset locations and extract data from the .DAT and .IMG files into organized subdirectories within the outfolder.


How It Works

Creating Directories

The script creates a structured output directory with the following organization:

outputfolder/
├── chunk_00/
│   ├── 00_sdats/
│   │   ├── 0000-1234.sdat
│   │   └── 00_pointers.txt
│   ├── 00_vrams/
│   │   ├── 0000-1234.cvram
│   │   ├── 00_shards/
│   │   │   ├── 00-0.shard
│   │   │   └── ...
│   │   └── 00.vram
│   └── 00_trail/
│       ├── 1234-5678.bin
│       └── ...
├── chunk_01/
└── ...

Data Decoding with Tuplify

The tuplify function splits a 32-bit integer into two components:

  • dat_id: The higher 8 bits.
  • dat_ptr: The lower 24 bits.
def tuplify(item):    
    dat_id = item >> 24
    dat_ptr = item & 0x00FFFFFF
    return (dat_id, dat_ptr)

This aids in interpreting asset pointers.

Reading Chunks

The .IDX file is divided into fixed-size chunks. For each chunk, the script extracts metadata, pointers, and actual data:

for chunk_index in range(int(os.path.getsize(idxpath) / chunk_size)):
    print(f"Reading Chunk index {chunk_index:02X}...")
    ...

Chunk Metadata

Chunk metadata includes:

  • img_start and img_end: Texture data range in .IMG.
  • dat_start and dat_end: Asset data range in .DAT.
  • pointer_amount: Number of pointers for this chunk.

Reading Asset Data

Data is read from .IMG and .DAT using the ranges defined in the chunk metadata.


Handling Special Data Types

SDAT (Data) Pointers

SDAT data includes pointer structures interpreted as tuples using the tuplify function. The extracted pointers are saved in a text file.

out_sdat_info.write(f"ID: {sdat_pointers[i][0]:02X} | Pointer: {sdat_pointers[i][1]:04X}\n")

VRAM and Texture Shards

The .IMG file contains VRAM data structured into texture "shards." Each shard corresponds to a texture fragment with its own metadata:

  • x, y: Coordinates within VRAM.
  • w, h: Dimensions of the texture.

Shards are recombined to form complete VRAM pages:

with open(imgdest + f"/{chunk_index:02X}.vram", "w+b") as vram:
    vram.seek(0x100000 - 1)
    vram.write(b"\0")
    ...

Trail Data

Trail data is the "trailer" section at the end of each chunk, containing additional asset ranges. The script identifies these ranges and extracts the corresponding data.

for t in range(0, len(traildata), 2):
    dat_trail_start, dat_trail_end = traildata[t], traildata[t+1]
    ...