All posts

Why does my PDF show tofu boxes for Japanese?

Empty rectangles instead of Japanese characters mean your PDF couldn't find glyphs for those codepoints. Here are the four causes and the fixes.

by gpdf team

The question, in other words

I wrote Japanese text with gpdf and the output PDF shows empty rectangles where the characters should be. What is that, and how do I get real Japanese glyphs into the file?

The quick answer

That is tofu — the PDF viewer draws a placeholder rectangle because the font embedded in the PDF has no glyph for the Unicode codepoint you asked for. Four things cause it, and one is far more common than the rest.

In order of frequency:

  1. No CJK font registered. gpdf.NewDocument has no WithFont call, so the document falls back to the PDF Base-14 fonts (Helvetica, Times, Courier). None of those cover U+3040–U+9FFF.
  2. CJK font registered, but wrong family on c.Text. WithFont("NotoSansJP", ...) is set, but template.FontFamily("Arial") on the text forces gpdf to look up Japanese in a Latin font.
  3. Font file doesn't actually contain CJK glyphs. The TTF on disk is a Latin subset (NotoSans-Regular.ttf rather than NotoSansJP-Regular.ttf). Name looks right, coverage is empty.
  4. Bytes corrupted before gpdf saw them. The string was decoded as Shift-JIS or Latin-1 upstream and the runes you are trying to render are no longer Japanese. If you see 縺ゅ→縺 instead of rectangles, this is the one.

The canonical fix for cause #1

Nine times out of ten, it is this:

package main

import (
    "log"
    "os"

    "github.com/gpdf-dev/gpdf"
    "github.com/gpdf-dev/gpdf/document"
    "github.com/gpdf-dev/gpdf/template"
)

func main() {
    font, err := os.ReadFile("NotoSansJP-Regular.ttf")
    if err != nil {
        log.Fatal(err)
    }

    doc := gpdf.NewDocument(
        gpdf.WithPageSize(gpdf.A4),
        gpdf.WithMargins(document.UniformEdges(document.Mm(20))),
        gpdf.WithFont("NotoSansJP", font),
        gpdf.WithDefaultFont("NotoSansJP", 12),
    )

    page := doc.AddPage()
    page.AutoRow(func(r *template.RowBuilder) {
        r.Col(12, func(c *template.ColBuilder) {
            c.Text("こんにちは、世界。")
        })
    })

    data, err := doc.Generate()
    if err != nil {
        log.Fatal(err)
    }
    if err := os.WriteFile("hello.pdf", data, 0o644); err != nil {
        log.Fatal(err)
    }
}

Two lines register and default the font. No CGO. No AddUTF8Font bookkeeping. If you were seeing □□□□□、□□。 before and run this program with a real NotoSansJP-Regular.ttf next to it, you get real glyphs.

Grab NotoSansJP-Regular.ttf from Google Fonts.

How to tell which cause you are hitting

Most of this is staring at three places: where you build the document, where you write the text, and the TTF file itself.

If your output is □□□ (identical rectangles), it is cause 1, 2, or 3. The PDF embedded a font but it has no glyphs for those codepoints. Open the PDF in Acrobat, go to File → Properties → Fonts, and look at which fonts were actually embedded. If the list is only Helvetica / Times / Courier, cause 1. If NotoSansJP is listed and rectangles are still there, cause 2 or 3.

If your output is 縺ゅ→縺 or ã"ã‚"ã«ã¡ã¯ (garbled Latin-ish characters), it is cause 4. Your Japanese string was re-encoded somewhere before gpdf got it. Most common culprit: a CSV saved as Shift-JIS by Excel and read with os.ReadFile as if it were UTF-8, or an HTTP endpoint that didn't declare charset=utf-8. Fix the decoder, not the PDF.

Mixed output — some characters render, some are boxes — means the font has partial coverage. A font labelled "Japanese" might include hiragana and katakana but skip uncommon kanji like 鬱 or 龠. Switch to Noto Sans JP (covers JIS X 0213) or Source Han Sans JP if you hit this.

Cause 2 in detail: right font, wrong family name

This one is sneaky because the font is embedded — it just isn't used. A minimal repro:

doc := gpdf.NewDocument(
    gpdf.WithFont("NotoSansJP", font),
    // No WithDefaultFont.
)

page.AutoRow(func(r *template.RowBuilder) {
    r.Col(12, func(c *template.ColBuilder) {
        c.Text("こんにちは") // Uses the default font: Helvetica.
    })
})

Fix: add gpdf.WithDefaultFont("NotoSansJP", 12) to NewDocument, or pass template.FontFamily("NotoSansJP") on every c.Text that needs Japanese. The family name in WithFont and the one on c.Text must match exactly, including case. NotoSansJP and notosansjp are two different fonts to gpdf.

Cause 3 in detail: the wrong TTF file

NotoSans-Regular.ttf and NotoSansJP-Regular.ttf are two different files. The first is a Latin font with zero CJK coverage. The second is the Japanese cut, around 17,000 glyphs. They look almost identical in a directory listing. Autocomplete will happily hand you the wrong one.

gpdf does not validate glyph coverage at registration. If you hand it bytes, it trusts you. The failure shows up as tofu at render time.

Quickest way to check:

  • macOS: Font Book → double-click the file → preview shows a glyph grid
  • Linux: otfinfo -u NotoSans-Regular.ttf dumps the Unicode coverage
  • Cross-platform: fontToolsttx -t cmap NotoSans-Regular.ttf dumps the cmap table as XML

If U+3042 (あ) is not in the list, you have the Latin subset.

Cause 4 in detail: encoding corruption

This one does not actually involve gpdf. The string handed to c.Text already had the wrong bytes. Print it before rendering:

text := loadLabelFromSomewhere()
fmt.Printf("%q\n", text) // Shows actual runes
c.Text(text)

If fmt.Printf("%q\n", text) prints "縺ゅ→縺" instead of "あいうえ", the corruption happened upstream. gpdf cannot fix it — find the place where UTF-8 was mis-decoded.

Common upstream culprits:

  • Reading a CSV exported from Excel (Windows Shift-JIS) with os.ReadFile and casting straight to string
  • A database column declared latin1 or utf8mb3 (not utf8mb4) already storing mojibake
  • An HTTP response missing Content-Type: application/json; charset=utf-8, and a client that guessed Latin-1

One edge case worth naming

gpdf silently subsets. If you register NotoSansJP at document construction, render こんにちは, and then later in the same document render 鬱陶しい, the second render will work — gpdf adds the new glyphs to the subset on the fly. But if you generate a PDF, hand it to a viewer, and then try to edit the PDF in Acrobat and type a new kanji that wasn't in the original text, you will get tofu there. The subset is frozen at Generate() time. Re-run your program; don't patch the PDF.

Try gpdf

gpdf is a Go library for generating PDFs. MIT licensed, zero external dependencies, native CJK support.

go get github.com/gpdf-dev/gpdf

⭐ Star on GitHub · Read the docs