Introduction

If you've been keeping up with rec.games.roguelike.angband, you might have seen a thread about unicode support for the *bands. I am the person who started that thread, LeonTorres. This page will be devoted to the idea of implementing Unicode support in ToME as a means to enhance the text-mode experience. I will outline the rationale for this and how it's being implemented. We can discuss details here, including side issues like font coverage and other possibilities that would be born as a result of using Unicode.

As it happens, I began this project because playing text mode ToME was a little bit painful; Certain characters like # and ^ were re-used too often. Furthermore, my system was having problems displaying solid blocks in a GTK screen. One thing let to the next and now I have ToME working with unicode support.I can't go back to the default mode, I will always be playing with these enhancements.

After assigning some choice character glyphs, I began to actually play the game for the first time. My bug-finding character, a DarkElf Sorceress, just died in Moria, which brought my attention back to this. Let me tell you, Unicode support is vital to text mode, it enhances the game so much that I wonder why people bother with graphical tiles! I can't go back to the default mode, I will always be playing with these enhancements.

Teaser Screenshots

Rationale: Text Mode is Canon

Roguelike games were born in a text mode environment and has given birth to the humorous and surprisingly useful concept of representing objects with letters and characters. Many of us have fond memories of the vital ~ we picked up, or nightmares about the red D we ran into at dungeon level 10. And regardless of how many times people try to replace text-mode with graphical tiles, the roguelike games are always fundamentally text-mode in character, from the code itself to the player experience. If someone wants to play with pretty graphics, why there's this fine game called Diablo II you could try out, which is essentially a roguelike with awesome visuals.

However, text mode has limitations. There is an effective limit to the number of characters available, namely the lower 128 characters in the ASCII map, of which only 96 or so are consistent across platforms. This hurts any attempt to make rogue-like games too rich because characters have to be re-used for new terrain features, objects and monsters. ToME, because it is an amazingly rich and detailed roguelike game, suffers from this issue.

Currently, the only available solution for end players is to use graphical tiles, but this is overkill and defeats the textual character of the game. Another technique uses special font symbols to represent objects, however these fonts are nothing but graphical tile emulations for text-mode; Instead of a = for a ring, you might see what looks like a pixmap of a ring. Again, this defeats the textual nature of the game.

There is a far superior way to fix the problem, and that is simply to adopt unicode. Unicode is a standard, which means that the drawing blocks we need plus any other special symbols we would like to use will be consistent across fonts and platforms. To achieve the graphics in the screenshots, all that was needed was to specify the unicode ID of the appropriate symbol in a pref file (I created a new one called font-uni.prf for this purpose). The card suits have a certain set of IDs, as do the drawing blocks. Math symbols like infinity are in one block, and even the Tolkien languages have their own special place.

Implementation

This section is for those familiar with the Angband code, please feel free to skip it.

My current hack is very simple and elegant. The code represents characters to render with x_char, which is of size char. To make this compatible with unicode, we just need to bump up the size of x_char to 16 bit unsigned. This is conforms to a standard type called wchar_t, which is used to represent wide characters on most systems. Because Angband likes 4 character type names, we create a type named wchr:

typedef u16b wchr;

Once this is done, all that remains is to re-type all references to x_char as a wchr instead of a char. We can leave d_char alone. There is also a bit of tweaking for loading and saving files, and the reading of pref files.

The bulk of the changes are expected to be in z-term.c. As it happens, the graphical tile support has already separated the model from the view for the concern of rendering x_char, which means no extra work has to be done to make a special view for unicode. In non-jargon speak, this means no changes to the structure of functions in z-term.c. We just need to change most instances of char to wchr.

Finally, we need to update each main-xxx.c that wants to implement unicode. If a system doesn't want to, it merely needs to accept wchr in place of char and pretend it's a char. How each system renders the unicode is up to the maintainers of that system. I have created a GTK2 implementation that works well. Since I am not a windows programmer, I have no idea how to do this for the most popular ToME setup. However, after looking at main-win.c and the docs at MSDN, it appears one just needs to tweak one particular text rendering function that just happens to like u16b formatted unicode. For systems that need wide characters (utf-8), I have added a wchr_ary_to_utf8() method in util.c,

/*
 * Converts an array of wchr values to a UTF-8 encoded string. Arguments
 * are  the wchr array, the length of the array, and a buffer to write to.
 * The buffer must be at least size (3*n)+1. Returns the number of bytes
 * written, which is effectively the length of the string. The string is
 * guaranteed to be null-terminated.
 *
 */
int wchr_ary_to_utf8(const wchr *a, int n, char *s)

User Interface Breakages

Some things will inevitably break if unicode is adopted. Surprisingly, recall still works because it was designed to work with graphical tiles. In fact, recall should stay the same. We shouldn't change monsters to be strange symbols, like Greek or Cyrillic letters. However, we could accent the uniques.

The biggest break will be in interact with visuals. Currently, it cycles within 128 symbols of the current one. That means if you want to change the solid block, it will cycle around the symbols but it can't reach beyond that block. For now, the easiest change will be to let users pick only from the default ASCII map. To allow unicode, we must either create a new system or leave that up to third party editing tools.

Font Issues

The most important issue to resolve if unicode is to be adopted is the poor coverage of useful symbols in monospace unicode fonts. Many monospace fonts that support unicode happen to lack drawing symbols and the like, which begs the question why they're called unicode fonts in the first place. The best way to fix this problem is to find out what the most popular monospace fonts are for each system and create an angband-specific font from them. This also allows us to add Cirth and Quenya characters for extra eye candy. However, this is a lot of work. The assistance of an artist-type who knows how to draw with Photoshop, Adobe Illustrator, or any scalable vector graphics application would be invaluable here.

Patches

Here are the experimental patches for people to try out. Note that only gtk+ 2.0 is supported. I put a note in main-win.c if you want to try to get it working on Windows, which I think should be easy. (Don't take my word for it though!)

Notes

  1. Make sure the font can be selected in a gtk font-selection dialog, such as in gnome-terminal, gimp or maybe Firefox (if it uses gtk).
  2. You could also install gtk for windows, I haven't tried that
  3. Grab a fresh copy of ToME 231 source

$ cp unitome-0.1-prf.patch unitome-0.1-src.patch tome-231-src
$ cd tome-231-src
$ patch -p1 < unitome-0.1-prf.patch unitome-0.1-src.patch tome-231-src
$ cd src
$ make -f makefile.gtk
$ cp tome ..
$ cd ..
$ ./tome

If there are no hiccups on the way, you should see the intro screen. When you begin in Bree, you should see the new tiles.

Discussion

Please discuss this topic here. Also, feel free to wax elaborate on the possibilities of unicode, such as using Cirth font to label unknown scrolls as in the screenshot.

LeonTorres: Where should I upload testing patches, once they're ready? There's also an issue how to 'enable' unicode. With a switch? By default? Also, regarding accenting the uniques, I have done this for Boldor and some of the Orcs and it's pretty cute. I'll grab a screenshot next time I run into them.

NerdanelVampire: I like this! I don't think all the difficult things are necessarily necessary to gain a big benefit to clarity. I mainly just want to be able to distinguish runes from tomes and trapped lava and water from untrapped. Other clarifying symbol changes come free with that, and Cirth and Tengwar are basically just eye-candy.

About the uniques, since you can't put accents on arbitrary letters, that doesn't work across the spectrum, mainly for the vowels only.

LeonTorres: Yeah, I wasn't intending this to change everything, just to enhance clarity. That's the big problem with other visual efforts: They go too far and create something that can't rightfully be called text-mode. But it's surprising what a little enhancement does to the game experience, so I've been changing things and I can't seem to stop! Another benefit is that ToME can be multilingual. That's going to be a very hard thing to implement and probably needs to be done in vanilla angband first.

NeilStevens: In general, I support the idea expanding beyond some dumb decades-old, 7-bit character set in ToME. We can't even properly display the names of major locations (like Khazad-dûm) in ToME right now.

Specifically, I don't really get excited about trying to make a text interface more graphical. You want graphics? Use graphics!

LeonTorres: I guess I'll add a Patch section once the patch is ready. Perhaps sometime this weekend. Unfortunately, it will only be for GTK2 (I also fixed main-gtk2.c). I will also make available the font I used, which I created myself. It has some rather ugly self-created Cirth and Quenya in the agreed upon blocks, but those won't be used in the first patchset. What's the default windows system font used by the windows version?

TheFalcon: I think the default is 8x13, if that's what you mean. I really do like this idea, and the screen shots look really great. I just wish I understood all the technical stuff. :sigh:

LeonTorres: Important Update Well, after sleeping on this a bit and looking carefullly at the code, it looks like unicode support has to be done thorougly or not at all. It turns out mixing unicode with old fashioned C-strings results in various display bugs. For instance, if runes are represented as µ then it won't be rendered properly in the inventory or in the message screen on top because of how those strings are pieced together. Every line of string handling code will need to be updated. This is bad news for this project, because we can't just drop the status quo and abandon the legacy just for unicode.

At the moment, I'll continue with this project and overhaul all of ToME for full unicode compliance and keep a patch set for interested people. If there's interest in making ToME 3.0 unicode compliant, I'll be willing to help. In the meanwhile, if anyone really wants to use those drawing symbols in their game right away, we can add those symbols to the fonts that come with ToME. I won't be doing this because I'm happy with what I have so far, bugs and all, but I encourage anyone else to sieze the initiative and update the fonts. That's probably the easiest way to enhance text mode at this point.

NeilStevens: OK, don't do ToME 2. There's never, ever going to be a ToME 2.4 with a new feature such as this. At this point, I suggest you join the mailing list and coordinate ToME 3 work there. I really hope you do do this. I'm sick of the ASCII obsession some people have, and would love ToME to get far, far away from it, heh.

LeonTorres: Great, I'll join the list and focus on ToME 3. Sacred cows are going to have to die hard. :) I want to play ToME a little more though to get familiar with the game. I'm already spoiled from having to debug and peek into code and _info files. :(

DarkGod: Hey look who is back! Me ! ;> Now about the matter taht matters, I'd love unicode support, if only to properly display names :) The example that really won me over is the one with the scroll title in a weird incomprehensible alphabet, that's neat! Aas neil said, don't develop for 22x, it ain't getting any more major releases, use 300 instead. I'll be happy to help you integrate them.

LeonTorres: Hi folks, I've added another screenshot showing accented characters in names. That's to prove the groundwork is done, the rest is piddling display bug hunting and alignment issues. Later on I'll post a patch for those wanting to try this with ToME 2.3.1, perhaps this weekend. Gotta get to work. _;

NeilStevens: Ohhh... that screenshot... it's enough to make me wonder if we can ditch OpenBSD, and anything else that gets in our way, heh.

DarkGod: I dont think we should ditch unicode support.. Come on, gnome, kde and all do unicode and they work on openbsd ;)

NeilStevens: Who are you, and where were you when the same argument could have supported the all-SDL, all-unicode ToME? heh

Seriously, though, wchar_t doesn't even exist on OpenBSD, so it won't compile in order to run with GNOME. Plus, anyone who wants to run it without X on FreeBSD will be locked out, too, as far as I can tell.

LeonTorres: I don't know how many times I have to say this, but whcar_t isn't necessary. It was brought up as an alternative to utf-8 encoding when dealing with the problem of how to declare wide strings in the code. We do not need to use whcar_t, and if adopted, we are not going to use it. Anyway, I'm now familiar enough with the issues to know what the solution has to be, and it's going to be utf-8 for the most part which is then converted into u16b (not wchar_t!) UCS-2 unicode so that the terminal structure can handle it (look carefully at z-term.h). Then the Term_text functions can decide what to do with it. At this point I'll have to dismiss any concerns that prove not to be familiar with the term code as useless.

I also ran into an issue how text is drawn when backporting SDL to Angband. It's too slow. Notice how the screen flickers when you scroll a page line by line? Watching a manathrust make its way across an Enlightened level under GTK, ncurses and SDL makes my 800 Mhz Duron cry. You guys are way beyond being able to support ancient hardware with ToME. Backward compatibility? Hah! You're better off supporting popular systems at this rate; It's the Ubuntu vs. Debian debate all over again. For us who are never upgrading, I'm thinking about making the TERM_XTRA_FRESH function useful again, so we can buffer changes before drawing them like regular terminals do. Angband z-term seems to have deviated from terminal-like buffering techniques, favoring convenience over speed. I tested this by removing the call to draw the term in Term_text_sdl and moving it to Term_xtra. It's slow. I'm still working on the project, but my focus is now on NPP Angband and this terminal issue.

NeilStevens: OK fine, if we use utf8, we still abandon virtually all the current non-X uses of ToME. I'm fine with that, personally, becuse if we do that, my SDL-only proposal would probably make us more portable. And if we go all-SDL, the overall health of ToME is improved for the reasons I listed back then.

LeonTorres The linux console and all random terminas I have shows UTF8 fine under ncurses (try unicode_start under linux console). What you're describing is a font issue, which is not a real feature-killing problem. For terminals that don't want higher characters, I've already described a trivial utf-8 to ASCII converter. Non-X uses don't even matter anymore, speed is getting to be an issue even under ncurses. Try it yourself, reveal a whole Barrow level and shoot a manathrust vertically with the delay factor set to something reasonable. At this point, I'll let the patches speak for themselves, I'm starting to repeat myself.

NeilStevens: My text console is fast. Not everyone is using a Linux graphical console.

And "just a font issue" is still an issue if the fonts just aren't there, and they're not. We can't expect people to change their system console font just for ToME.

KiyoshiAman: So? That doesn't mean we can't provide a console font for those who want special characters. It's not like we can't also provide a toggle in the options dialog for the user [default: yes] to use UTF-8 or ASCII.

NeilStevens: Making this an option is pointless. If we can't depend on using the full unicode character set to distinguish or name things, we can't really use it at all. For example: are we really supposed to keep two versions of every terrain feature, name, and flavor? If Tengwar were in unicode yet, that's an example of something you can't easily "convert" into ASCII.

KiyoshiAman: If we use XML, we can provide an 'alternate' name for anything that uses non-ASCII characters. Of course, you could do this in lua, too, but XML is perfectly suited for this sort of thing anyhow [it's already UTF-8, and we only have to ouput from .xml to .raw once, anyhow; much nicer than forcing module writers to learn lua if they're not changing the way the T-Engine works, in my opinion.]

LeonTorres: Hey guys, I'm no longer doing unicode support for any angband. Instead, I've started my own variant. The goal of my variant is to show proof of concept in an independent branch, so others can test my ideas out and try them for flavor.

Among my ideas is a generation system that uses fractal brownian motion instead of the limited midpoint displacement method used in generate.c. I have a working fractal generator using Perlin noise that generates awesome ToME terrain maps. You can see a screenshot here

I also have a different system of defining "locations" such as dungeons and wildernesses (yes, plural). Since I'm taking ideas from you guys, you're welcome to take them from my project. It will be called quenta and I'll make an announcement when the dust settles. :-)

NeilStevens: My advice is not to start with an Angband, unless your primary goal is to get your code absorbed by 'variants' which don't actually vary much from the original. You're better off if you avoid the peculiarities of Angband's license and crufty, odd, weirdly-optimized, old C code.

LeonTorres Yes, that's a good point. Things are rapidly evolving on my end; my _new_ intent is to form a superset of Angband which has traditional Angband as a flavor that can be defined in text files and modules, I will need to begin with Angband as a model and guide. To prevent license issues, I am developing orthogonally to the legacy by adding new features in separate files (e.g., init-xml to replace init1 and 2). New code will be GPL'd. Think of this in terms of AD&D -> D&D 3ed. Other bands are welcome to incorporate features, but I fear it's going to stop being an angband very shortly (except for library routines like fractals). Of course, this is all very preliminary.

NeilStevens: It should be interesting if your game becomes a competitior with T-Engine. Have fun.

IdeaArchive/UnicodeSupport (last edited 2005-04-21 04:15:46 by NeilStevens)