Tomba 2 Extraction Script: Difference between revisions
No edit summary |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= '''Tomba 2 Asset Extractor''' = | = '''Tomba 2 Asset Extractor''' = | ||
Link to [https://drive.google.com/file/d/1-Rr2GQL00wug4x4ABgSIZWPGqTUwhyys/view?usp=drive_link Tomba2Ex.py] | |||
== '''Overview''' == | == '''Overview''' == | ||
Line 11: | Line 11: | ||
* Python installed on your system. | * Python installed on your system. | ||
* A copy of the game's <code>.IDX</code>, <code>.DAT</code>, and <code>.IMG</code> files. | * A copy of the game's <code>[[TOMBA2.IDX|.IDX]]</code>, <code>[[TOMBA2.DAT|.DAT]]</code>, and <code>[[TOMBA2.IMG|.IMG]]</code> files. | ||
* Basic understanding of binary file structures and file system operations. | * Basic understanding of binary file structures and file system operations. | ||
Line 22: | Line 22: | ||
<code>CDpath = "/path/to/TOMBA2/CD" # Location of the .IDX, .DAT, and .IMG files</code> | <code>CDpath = "/path/to/TOMBA2/CD" # Location of the .IDX, .DAT, and .IMG files</code> | ||
<code>outfolder = "/path/to/outputfolder" # Destination for extracted files</code> | <code>outfolder = "/path/to/outputfolder" # Destination for extracted files</code> | ||
=== '''Execute the Script''' === | === '''Execute the Script''' === | ||
Run the script using Python: | Run the script using Python: | ||
<code>python | <code>python Tomba2Ex.py</code> | ||
The script will read the <code>.IDX</code> file to determine asset locations and extract data from the <code>.DAT</code> and <code>.IMG</code> files into organized subdirectories within the <code>outfolder</code>. | The script will read the <code>[[TOMBA2.IDX|.IDX]]</code> file to determine asset locations and extract data from the <code>[[TOMBA2.DAT|.DAT]]</code> and <code>[[TOMBA2.IMG|.IMG]]</code> files into organized subdirectories within the <code>outfolder</code>. | ||
---- | ---- | ||
Line 63: | Line 61: | ||
=== '''Reading Chunks''' === | === '''Reading Chunks''' === | ||
The <code>.IDX</code> file is divided into fixed-size chunks. For each chunk, the script extracts metadata, pointers, and actual data: | The <code>[[TOMBA2.IDX|.IDX]]</code> file is divided into fixed-size chunks. For each chunk, the script extracts metadata, pointers, and actual data: | ||
for chunk_index in range(int(os.path.getsize(idxpath) / chunk_size)): | for chunk_index in range(int(os.path.getsize(idxpath) / chunk_size)): | ||
print(f"Reading Chunk index {chunk_index:02X}...") | print(f"Reading Chunk index {chunk_index:02X}...") | ||
Line 71: | Line 69: | ||
Chunk metadata includes: | Chunk metadata includes: | ||
* '''img_start''' and '''img_end''': Texture data range in <code>.IMG</code>. | * '''img_start''' and '''img_end''': Texture data range in <code>[[TOMBA2.IMG|.IMG]]</code>. | ||
* '''dat_start''' and '''dat_end''': Asset data range in <code>.DAT</code>. | * '''dat_start''' and '''dat_end''': Asset data range in <code>[[TOMBA2.DAT|.DAT]]</code>. | ||
* '''pointer_amount''': Number of pointers for this chunk. | * '''pointer_amount''': Number of pointers for this chunk. | ||
==== '''Reading Asset Data''' ==== | ==== '''Reading Asset Data''' ==== | ||
Data is read from <code>.IMG</code> and <code>.DAT</code> using the ranges defined in the chunk metadata. | Data is read from <code>[[TOMBA2.IMG|.IMG]]</code> and <code>[[TOMBA2.DAT|.DAT]]</code> using the ranges defined in the chunk metadata. | ||
---- | ---- | ||
Line 86: | Line 84: | ||
=== '''VRAM and Texture Shards''' === | === '''VRAM and Texture Shards''' === | ||
The <code>.IMG</code> file contains VRAM data structured into texture "shards." Each shard corresponds to a texture fragment with its own metadata: | The <code>[[TOMBA2.IMG|.IMG]]</code> file contains VRAM data structured into texture "shards." Each shard corresponds to a texture fragment with its own metadata: | ||
* '''x, y''': Coordinates within VRAM. | * '''x, y''': Coordinates within VRAM. |
Latest revision as of 15:39, 10 January 2025
Tomba 2 Asset Extractor
Link to Tomba2Ex.py
Overview
This script is a specialized tool designed to extract and organize assets from the PlayStation game Tomba! 2: The Evil Swine Return. The script processes the game's .IDX
, .DAT
, and .IMG
files to extract and categorize data such as sprites, 3d models for assets, level geometry, text, animation types 1 2 3, level collision, drawmaps, backgrounds. This is particularly useful for modding, asset recovery, or archival purposes.
The script relies on Python and uses the struct
module to interpret binary data formats. Users can customize the script to specify input (CDpath
) and output (outfolder
) directories for asset extraction.
Prerequisites
- Python installed on your system.
- A copy of the game's
.IDX
,.DAT
, and.IMG
files. - Basic understanding of binary file structures and file system operations.
Usage
Configure Input and Output Paths
Edit the following lines in the script to match the locations of your input files and desired output directory:
CDpath = "/path/to/TOMBA2/CD" # Location of the .IDX, .DAT, and .IMG files
outfolder = "/path/to/outputfolder" # Destination for extracted files
Execute the Script
Run the script using Python:
python Tomba2Ex.py
The script will read the .IDX
file to determine asset locations and extract data from the .DAT
and .IMG
files into organized subdirectories within the outfolder
.
How It Works
Creating Directories
The script creates a structured output directory with the following organization:
outputfolder/ ├── chunk_00/ │ ├── 00_sdats/ │ │ ├── 0000-1234.sdat │ │ └── 00_pointers.txt │ ├── 00_vrams/ │ │ ├── 0000-1234.cvram │ │ ├── 00_shards/ │ │ │ ├── 00-0.shard │ │ │ └── ... │ │ └── 00.vram │ └── 00_trail/ │ ├── 1234-5678.bin │ └── ... ├── chunk_01/ └── ...
Data Decoding with Tuplify
The tuplify
function splits a 32-bit integer into two components:
- dat_id: The higher 8 bits.
- dat_ptr: The lower 24 bits.
def tuplify(item): dat_id = item >> 24 dat_ptr = item & 0x00FFFFFF return (dat_id, dat_ptr)
This aids in interpreting asset pointers.
Reading Chunks
The .IDX
file is divided into fixed-size chunks. For each chunk, the script extracts metadata, pointers, and actual data:
for chunk_index in range(int(os.path.getsize(idxpath) / chunk_size)): print(f"Reading Chunk index {chunk_index:02X}...") ...
Chunk Metadata
Chunk metadata includes:
- img_start and img_end: Texture data range in
.IMG
. - dat_start and dat_end: Asset data range in
.DAT
. - pointer_amount: Number of pointers for this chunk.
Reading Asset Data
Data is read from .IMG
and .DAT
using the ranges defined in the chunk metadata.
Handling Special Data Types
SDAT (Data) Pointers
SDAT data includes pointer structures interpreted as tuples using the tuplify
function. The extracted pointers are saved in a text file.
out_sdat_info.write(f"ID: {sdat_pointers[i][0]:02X} | Pointer: {sdat_pointers[i][1]:04X}\n")
VRAM and Texture Shards
The .IMG
file contains VRAM data structured into texture "shards." Each shard corresponds to a texture fragment with its own metadata:
- x, y: Coordinates within VRAM.
- w, h: Dimensions of the texture.
Shards are recombined to form complete VRAM pages:
with open(imgdest + f"/{chunk_index:02X}.vram", "w+b") as vram: vram.seek(0x100000 - 1) vram.write(b"\0") ...
Trail Data
Trail data is the "trailer" section at the end of each chunk, containing additional asset ranges. The script identifies these ranges and extracts the corresponding data.
for t in range(0, len(traildata), 2): dat_trail_start, dat_trail_end = traildata[t], traildata[t+1] ...