TXTD: Difference between revisions

From Tomba! Wiki
Jump to navigation Jump to search
(Created page with "Text data")
 
No edit summary
Line 1: Line 1:
Text data
In the context of ''Tomba! 2: The Evil Swine Return'', a '''TXTD file''' is a data structure embedded within the game's resource files (like the DAT file). It contains '''text data''' used for in-game dialogues, descriptions, or other textual elements. This text is stored in a compressed or encoded format and is often accompanied by metadata that determines how the text is organized, displayed, or interacted with in the game.
 
=== '''Structure of a TXTD File''' ===
The TXTD file is divided into '''two hierarchical layers''':
 
# '''Master Table''':
#* The top-level structure, pointing to multiple '''entries''' of text data.
#* Each pointer in this table corresponds to a "block" or "group" of related text strings.
#* Contains metadata about the number and location of these text blocks.
# '''Entry Table''':
#* Each entry contains pointers to individual '''text strings''' within the text block.
#* Provides metadata such as the offset of the text, the size, and possibly control codes for text behavior (e.g., color, pauses).
 
==== '''Detailed Structure''' ====
 
===== '''1. Header Section''' =====
 
* The file begins with a header containing the following fields:
** '''Master Root Pointer''': Offset to the start of the master table.
** '''Master Entry Count''': Number of entries in the master table.
** Additional padding or unused bytes.
 
===== '''2. Master Table''' =====
 
* A list of pointers to '''entry tables'''.
* Each pointer is represented by a '''relative address''' (offset from the start of the file or current context).
 
===== '''3. Entry Table''' =====
 
* For each master entry, there is a corresponding entry table.
* Contains metadata for individual text strings:
** '''Pointer to Text Data''': Offset of the text string relative to the start of the table.
** '''Extra Field''': May encode additional information about the text, such as speaker or type.
 
===== '''4. Text Strings''' =====
 
* Binary-encoded text data begins at the offsets specified in the entry table.
* Uses a custom encoding scheme to represent characters (as seen in the <code>letters</code> dictionary of the script).
* Ends with a terminator byte (<code>0xFF</code>).
 
=== '''TXTD Encoding Scheme''' ===
 
# '''Character Representation''':
#* Characters are stored as '''byte values''', with each byte mapping to a specific character or control code.
#* Example:
#** <code>0x41</code> → <code>A</code>
#** <code>0x42</code> → <code>B</code>
#** <code>0xFA</code> → Line break (<code>\n</code>)
#** <code>0xFC</code> → Pause (<code>{$PAUSE}</code>)
# '''Control Codes''':
#* Non-alphanumeric bytes are often used for special formatting or commands.
#* Examples:
#** <code>{$COLOR_F1}</code>: Changes text color.
#** <code>{$END}</code>: Marks the end of a text block.
# '''Termination''':
#* Each text string ends with the byte <code>0xFF</code>, signaling the end of the string.
 
=== '''Tomba! 2 TXTD File Extraction Script''' ===
This script is used for extracting and interpreting '''TXTD files''' from the '''DAT file''' in ''Tomba! 2: The Evil Swine Return''. TXTD files often contain '''text data''', such as in-game dialogues, descriptions, or other textual assets. The script decodes this data using a custom character set and formats it for readability or modification.
 
=== '''Script Details''' ===
 
==== '''Key Functions''' ====
 
# '''<code>preview(DAT, offset)</code>''':
#* Main function for extracting text from the specified DAT file.
#* Takes two arguments:
#** <code>DAT</code>: Path to the DAT file.
#** <code>offset</code>: Offset where the TXTD data begins.
#* Processes the data in two hierarchical layers:
#** '''Master Entries''': Top-level pointers directing to specific text blocks.
#** '''Entry Headers''': Sub-pointers within each master entry that direct to individual text strings.
#* Calls <code>prepareText</code> and <code>getText</code> to decode and format the text.
# '''<code>prepareText(ptr, who, real, par1, par2, num)</code>''':
#* Formats and retrieves text from a given pointer.
#* Skips entries if the pointer is invalid (<code>0xFFFF</code>).
# '''<code>getText(real)</code>''':
#* Converts a sequence of binary data into readable text using the <code>letters</code> dictionary.
#* Iterates until it encounters the terminator byte (<code>0xFF</code>), which signals the end of a text block.
# '''<code>getB(number=1)</code>''':
#* Helper function to read a specified number of bytes from the file and convert them into integers (little-endian format).
 
==== '''Dictionary: <code>letters</code>''' ====
The <code>letters</code> dictionary maps hexadecimal values to their corresponding characters or control sequences. Key highlights include:
 
* '''Alphabet and Symbols''': Maps standard alphanumeric characters (<code>A-Z</code>, <code>a-z</code>, <code>0-9</code>) and punctuation.
* '''Special Characters''': Supports extended characters such as <code>Ä</code>, <code>¥</code>, and <code>…</code>.
* '''Control Codes''':
** <code>{$END}</code>: Signals the end of a text block.
** <code>{$PAUSE}</code>: Inserts a pause in the text.
** <code>{$COLOR_F1}</code>: Changes text color (with <code>{$END_COLOR_F0}</code> to revert).
 
=== '''Workflow of the Script''' ===
 
# '''Initialize''':
#* Define the path to the DAT file and the offset of the TXTD data.
#* Load the DAT file in binary mode.
# '''Read Master Entries''':
#* Extract master root and the number of master entries.
#* Use pointers in the master headers to locate the start of each text block.
# '''Process Entry Headers''':
#* For each master entry, extract sub-pointers (entry headers).
#* Use these sub-pointers to locate individual text strings.
# '''Decode Text''':
#* Convert binary data into readable text using the <code>letters</code> dictionary.
#* Handle special formatting codes and ensure proper string termination.
# '''Output''':
#* Structure and output the extracted text for further use or modification.

Revision as of 16:45, 10 January 2025

In the context of Tomba! 2: The Evil Swine Return, a TXTD file is a data structure embedded within the game's resource files (like the DAT file). It contains text data used for in-game dialogues, descriptions, or other textual elements. This text is stored in a compressed or encoded format and is often accompanied by metadata that determines how the text is organized, displayed, or interacted with in the game.

Structure of a TXTD File

The TXTD file is divided into two hierarchical layers:

  1. Master Table:
    • The top-level structure, pointing to multiple entries of text data.
    • Each pointer in this table corresponds to a "block" or "group" of related text strings.
    • Contains metadata about the number and location of these text blocks.
  2. Entry Table:
    • Each entry contains pointers to individual text strings within the text block.
    • Provides metadata such as the offset of the text, the size, and possibly control codes for text behavior (e.g., color, pauses).

Detailed Structure

1. Header Section
  • The file begins with a header containing the following fields:
    • Master Root Pointer: Offset to the start of the master table.
    • Master Entry Count: Number of entries in the master table.
    • Additional padding or unused bytes.
2. Master Table
  • A list of pointers to entry tables.
  • Each pointer is represented by a relative address (offset from the start of the file or current context).
3. Entry Table
  • For each master entry, there is a corresponding entry table.
  • Contains metadata for individual text strings:
    • Pointer to Text Data: Offset of the text string relative to the start of the table.
    • Extra Field: May encode additional information about the text, such as speaker or type.
4. Text Strings
  • Binary-encoded text data begins at the offsets specified in the entry table.
  • Uses a custom encoding scheme to represent characters (as seen in the letters dictionary of the script).
  • Ends with a terminator byte (0xFF).

TXTD Encoding Scheme

  1. Character Representation:
    • Characters are stored as byte values, with each byte mapping to a specific character or control code.
    • Example:
      • 0x41A
      • 0x42B
      • 0xFA → Line break (\n)
      • 0xFC → Pause ({$PAUSE})
  2. Control Codes:
    • Non-alphanumeric bytes are often used for special formatting or commands.
    • Examples:
      • {$COLOR_F1}: Changes text color.
      • {$END}: Marks the end of a text block.
  3. Termination:
    • Each text string ends with the byte 0xFF, signaling the end of the string.

Tomba! 2 TXTD File Extraction Script

This script is used for extracting and interpreting TXTD files from the DAT file in Tomba! 2: The Evil Swine Return. TXTD files often contain text data, such as in-game dialogues, descriptions, or other textual assets. The script decodes this data using a custom character set and formats it for readability or modification.

Script Details

Key Functions

  1. preview(DAT, offset):
    • Main function for extracting text from the specified DAT file.
    • Takes two arguments:
      • DAT: Path to the DAT file.
      • offset: Offset where the TXTD data begins.
    • Processes the data in two hierarchical layers:
      • Master Entries: Top-level pointers directing to specific text blocks.
      • Entry Headers: Sub-pointers within each master entry that direct to individual text strings.
    • Calls prepareText and getText to decode and format the text.
  2. prepareText(ptr, who, real, par1, par2, num):
    • Formats and retrieves text from a given pointer.
    • Skips entries if the pointer is invalid (0xFFFF).
  3. getText(real):
    • Converts a sequence of binary data into readable text using the letters dictionary.
    • Iterates until it encounters the terminator byte (0xFF), which signals the end of a text block.
  4. getB(number=1):
    • Helper function to read a specified number of bytes from the file and convert them into integers (little-endian format).

Dictionary: letters

The letters dictionary maps hexadecimal values to their corresponding characters or control sequences. Key highlights include:

  • Alphabet and Symbols: Maps standard alphanumeric characters (A-Z, a-z, 0-9) and punctuation.
  • Special Characters: Supports extended characters such as Ä, ¥, and .
  • Control Codes:
    • {$END}: Signals the end of a text block.
    • {$PAUSE}: Inserts a pause in the text.
    • {$COLOR_F1}: Changes text color (with {$END_COLOR_F0} to revert).

Workflow of the Script

  1. Initialize:
    • Define the path to the DAT file and the offset of the TXTD data.
    • Load the DAT file in binary mode.
  2. Read Master Entries:
    • Extract master root and the number of master entries.
    • Use pointers in the master headers to locate the start of each text block.
  3. Process Entry Headers:
    • For each master entry, extract sub-pointers (entry headers).
    • Use these sub-pointers to locate individual text strings.
  4. Decode Text:
    • Convert binary data into readable text using the letters dictionary.
    • Handle special formatting codes and ensure proper string termination.
  5. Output:
    • Structure and output the extracted text for further use or modification.