PHP TrueType font file reader and writer

TrueType font files are organized as a collection of tables. There exists a table directory at the beginning of the TrueType font file, which consists of two parts:

  1. The first part has fixed size (12 bytes long) and contains the version number (4 bytes), the number of tables in the TrueType font file (2 bytes) and three variables (2 bytes each) that are used to speed up binary searches in the second part.
  2. The second part has variable size. There exists an entry for each table in the TrueType font file. The entry holds the table name (4 bytes), the table checksum (4 bytes), the offset from the beginning of the TrueType font file the table starts at (4 bytes) and the length of the table. The second part is thus "16 * numberOfTables" bytes long.

The class constructor parses the table directory and constructs a PHP array of tables. Here is the constructor:

function __construct($b) {
    $this->b = $b;

    $off = 0;
    $version = self::getFixed($b, $off); // sfnt version
    $numTables = self::getUshort($b, $off); // number of tables
    $searchRange = self::getUshort($b, $off);
    $entrySelector = self::getUshort($b, $off);
    $rangeShift = self::getUshort($b, $off);
    $this->tables = array();
    for ($i = 0; $i < $numTables; $i++) {
        $name = self::getRaw($b, $off, 4);
        $checksum = self::getUlong($b, $off);
        $offset = self::getUlong($b, $off);
        $length = self::getUlong($b, $off);
        $this->tables[$name] = array('offset' => $offset, 'length' => $length);
    }
}

The argument "$b" is a string (returned by "file_get_contents('file.ttf')").

Data types defined in TrueType font files include "Fixed", "Ushort", "Ulong", etc. Private static functions "getFixed", "getUshort", "getUlong" etc. are expected to read a "Fixed", "Ushort", "Ulong" field, located in "$b" at offset "$off" AND advance "$off" past the field. Here is an example:

private static function getUshort($b, &$off) {
    $num = ord($b[$off++]);
    $num = 256 * $num + ord($b[$off++]);
    return $num;
}

Reader

Function "getTableRaw($name)" returns a string that is the raw bytes of a table (or null if the table is not found). The argument "$name" is the name of the table. It can be used when we want to "copy" a table from one TrueType font file to another. Here is the function:

function getTableRaw($name) {
    if (isset($this->tables[$name])) {
        $entry = $this->tables[$name];
        return substr($this->b, $entry['offset'], $entry['length']);
    }
    return null;
}

For most (structured) tables, there exist an "unmarshal<tablename>" function that parses the table and returns a PHP array. For example, "unmarshalHead()" parses the "head" table and returns a PHP array which contain (among other) the entries "xMin", "yMin", "xMax" and "yMax" which define the font bounding box. Here is the function:

static function unmarshalHead($b) {
    $head = array();
    $off = 0;
    $head['version'] = self::getRaw($b, $off, 4); // This is actually fixed
    $head['revision'] = self::getRaw($b, $off, 4); // This is actually fixed
    $off += 4; // Skip checksum adjustment
    $off += 4; // Skip magic number
    $head['flags'] = self::getUshort($b, $off);
    $head['unitsPerEm'] = self::getUshort($b, $off);
    $head['created'] = self::getRaw($b, $off, 8); // This is actually longdatetime
    $head['modified'] = self::getRaw($b, $off, 8); // This is actually longdatetime
    $head['xMin'] = self::getFword($b, $off);
    $head['yMin'] = self::getFword($b, $off);
    $head['xMax'] = self::getFword($b, $off);
    $head['yMax'] = self::getFword($b, $off);
    $head['macStyle'] = self::getUshort($b, $off);
    $head['lowestRecPPEM'] = self::getUshort($b, $off);
    $head['fontDirectionHint'] = self::getShort($b, $off);
    $head['indexToLocFormat'] = self::getShort($b, $off);
    $head['glyphDataFormat'] = self::getShort($b, $off);
    return $head;
}

There exist helper functions that go further and perform specific tasks. For example, after unmarshaling the "cmap" table, there is usually a need to find the encoding table for a specific "platformID" - "platformSpecificID" pair. Function "getEncodingTable($cmap, $platformID, $platformSpecificID)" handles this. The argument "$cmap" is a PHP array returned by the "unmarshalCmap()" function, the arguments "$platformID" and "$platformSpecificID" are two numbers. There is also usually a need to find the glyph index for a given character code using a specific encoding table. Function "characterToIndex($encodingTable, $charCode)" handles this. The argument "$encodingTable" is a PHP array returned by "getEncodingTable(...)", and the argument "$charCode" is the character code whose glyph index we want to find.

Writer

For most (structured) tables there exist a "marshal<tablename>" function that accepts a PHP array and returns a string. For example, "marshalHead($head)" accepts a PHP array and returns a string that is the raw bytes for the "head" table. Here is the function:

static function marshalHead($head) {
    $b = str_repeat(chr(0), 54); // Size of 'head' is 54 bytes
    $off = 0;
    self::setRaw($b, $off, $head['version'], 4); // This is actually fixed
    self::setRaw($b, $off, $head['revision'], 4); // This is actually fixed
    self::setUlong($b, $off, 0); // Checksum Adjustment - to be calculated later
    self::setUlong($b, $off, 0x5F0F3CF5); // Magic Number
    self::setUshort($b, $off, $head['flags']);
    self::setUshort($b, $off, $head['unitsPerEm']);
    self::setRaw($b, $off, $head['created'], 8); // This is actually longdatetime
    self::setRaw($b, $off, $head['modified'], 8); // This is actually longdatetime
    self::setFword($b, $off, $head['xMin']);
    self::setFword($b, $off, $head['yMin']);
    self::setFword($b, $off, $head['xMax']);
    self::setFword($b, $off, $head['yMax']);
    self::setUshort($b, $off, $head['macStyle']);
    self::setUshort($b, $off, $head['lowestRecPPEM']);
    self::setShort($b, $off, $head['fontDirectionHint']);
    self::setShort($b, $off, $head['indexToLocFormat']);
    self::setShort($b, $off, $head['glyphDataFormat']);
    return $b;
}

Private static functions "setFixed", "setUshort", "setUlong" etc. are expected to write a "Fixed", "Ushort", "Ulong" field in "$b" at offset "$off" AND advance "$off" past the field. That is, the opposite of functions "getFixed", "getUshort", "getUlong", etc.

There exists a function "marshalAll" which accepts a collection of strings (one per table), concatenates them and returns a new string. In the meantime, it calculates the checksums for the tables (to be stored in table directory) and the checksum adjustment for the "head" table. The tables in the returned string are 4-bytes aligned.

You can find the complete PHP class here: TTF.php

You can find a demo class here: TTFdump.php. This demo class demonstrates two things. First, how to dump the Unicode mapped characters in a PNG image. Second, how to dump the glyphs in a PDF document (basic knowledge of PDF structure is assumed here).

For comments, inquiries, etc, contact us at: info at 4real dot gr