rlvm 0.07 released!

15 Nov 2009
Tags: rlvm

rlvm 0.7 is a major update. rlvm now supports AIR, along with several other non-KEY games (see STATUS.TXT). rlvm is now at the point where random older RealLive games may work. It can now read KOE voice archives used by older games, as well as voice patches that are loose Ogg Vorbis files (such as the one for KANON). In addition, there’s been a large number of small, minor graphical fixes, including faces in text boxes, text shadow in text boxes, visual appearance of #SELBTNs (such as in Fuuko’s route), and various shading effects.

Because of problems in previous versions of the boost::serialization library, preferences such as text box color and text speed will not be read with rlvm 0.7. Saved games were not subject to the corruption bugs and should continue to work.

Comment

The Hardest Bug

07 Nov 2009
Tags: rlvm

This article is long. It covers a bug that took weeks to understand what the problem was and is easily the hardest bug I’ve ever had to track down.

Background: The stage

rlvm is my open source reimplementation of RealLive, a commercial Japanese game engine and scripting system which is used for some fairly niche games. RealLive is a fast Turing-complete bytecode interpreter with a set of standard functions for text, graphics, sound, music and animation. So a programmer would write the game in some unknown scripting language and the game would be compiled into a SEEN.TXT, a binary archive format that contains compiled bytecode.

Because several popular Japanese anime series have been written based off of said games, there’s been interest in the West about playing them. This has lead to a man named named Haeleth documenting the inner workings of RealLive and he released an assembler/disassembler for the bytecode format called rldev, which rlvm would have been impossible without. He’s also made a translation tool named rlBabel which I’ll get to in a bit.

CLANNAD is one such game. rlvm could play the Japanese version of CLANNAD, but now it was being translated in a wiki. Scripts were disassembled with rldev, which separated the code into a disassembled bytecode file and a string resource file. The string resource files are uploaded to the wiki and translated. The translated string resource files are fed back to rldev to compile a version translated into English.

These games are heavily text based (the genre is called visual novels for a reason) and the bytecode format reflects this. Any quoted freeform text is allowed and is simply printed to screen. There’s an opcode to print out a piece of memory (strout() and intout()) for the uncommon case where a piece of text was built up programmatically in a buffer, but this is rarely used in real games. The bytecode therefore needs to be tokenized into text, commands, debugging symbols, and the raw ASTs which are how math expressions are stored.

The problem

I announced being able to play translated CLANNAD files in version 0.06 of rlvm and everything seemed to work. The translation project then issued a new file with debugging symbols stripped, which shaved several megabytes off the SEEN.TXT file. The problem: It didn’t work in rlvm. The game appeared to work fine as usual, except that it never displayed any text. From the same source code, a SEEN.TXT file compiled with debugging symbols worked, and the stripped SEEN.TXT file didn’t. The game appeared to work, but no text was displayed on screen.

No text in the Text box

rlBabel: How these games are translated

Haeleth is responsible for pretty much all the tools to translate a game from Japanese into some other language. Japanese text is laid out monospace, contains no spaces between words and, with some exceptions, can be line-broken anywhere, including in the middle of a word. To get around this, Haeleth wrote an additional library: rlBabel, meant to add Western linebreaking rules to a Japanese game engine and added support for rlBabel to his compiler suite.

When you compile a game with rldev and rlBabel turned on, instead of inserting:

"Text to output"

which goes through the raw, static text interface, it does the equivalent of the following:

strout("")   {- Manually call the text system to display a window -}
strcpy(buffer, "Text to output")
Call a DLL with buffer which appends it to an internal buffer
gosub displayText

...

displayText:   {- One copy per translation unit -}      
  loop forever:
    retVal = Call a DLL asking about what the bytecode should do.
    if retVal is New Line
      <cause a new line to be printed>
    else if retVal is Print Character
      <prints the current character>
      <manually moves the text insertion point to where the DLL wants it>
    else if retVal is End of String
      gosub_return

So half of rlBabel is some headers that get compiled into your SEEN.TXT, and the other half is a big DLL which does font metrics, text tokenization and feeds the data back to the bytecode side. rlvm has the DLL code compiled into it, slightly modified. A previously translated game can be played start to finish in rlvm.

If you think this is ridiculously over-engineered, attempts to do it all in bytecode are worse, not to mention less accurate. Also, it’s almost unusably slow.

Debugging symbols obscure the error

After weeks of working on debugging this, with quite a large number of false starts and dead ends, I figured several things out:

  • My in memory representation of the bytecode was equivalent between the two input files, sans line markers, which only exist to say “Line XXX in the source starts here.” Except for line markers, the files were being parsed identically.
  • My implementation of rlBabel wasn’t to blame. The DLL interface was being called properly, but was mysteriously always passed the empty string as its input buffer. The bytecode would ask the DLL what to do, and my code would sensibly return getcEndOfString.
  • I then printed out each opcode as it was executed. strcpy() was never being called in the SEEN.TXT file that had debugging symbols stripped.

The behaviour in game made sense. A text window would be displayed by the manual printing of an empty string, and nothing else was printed because rlBabel’s input buffer was empty. The rest of the game appeared to work because the only thing broken was something to do with getting data into rlBabel’s input buffer.

In my design, the implementation of each bytecode was responsible for incrementing the instruction pointer. The bug: The instruction pointer was being advanced by the function that caused text to be printed to the screen; it was not incremented after a textout element was finished. This appeared to work because the textout implementation just called that function.

But strout() also called this same low level function. Then strout() incremented the instruction pointer again because it was a function element. The instruction pointer was advanced twice and skipped the call to strcpy().

Wait, but why did things work with debug symbols? Let’s take a look at the sequence of bytecode elements that one text line and one pause command compile to:

# Source With Debug Without Debug
1 "Line of Text" Line marker
2 Kidoku marker Kidoku marker
3 strout strout
4 Line marker
3 strcpy strcpy
4 Line marker
5 Goto Goto
6 Goto Goto
7 Line marker
8 Goto Goto
9 Goto Goto
10 pause Line marker
11 pause pause

Ignore the gotos; they are an implementation detail of rlBabel. In the debug version, when strout() finishes, the instruction pointer is incremented by two and is placed on strcpy(). The skipped bytecode is a line marker which doesn’t matter. In the version stripped of line markers, when strout() finishes, it jumps over the strcpy() that loads text into rlBabel’s buffer and jumps straight into rlBabel’s implementation.

The fix to this was a two line patch.

Fixed!

Life Counter and the Android Market

29 Aug 2009
Tags: Android and LifeCounter

Back in March, I tried writing something on the Google Android platform, and released improvements over the next month. LifeCounter, a small app to keep track of the history in a Magic the Gathering game, has now been released for almost six months. As I don’t work on LifeCounter anymore, this is a postmortem.

Android Market’s user comments are equivalent to those on Youtube.

There was begging for a YuGiOh version. There were two star comments saying that this was great work. There were a lot of comments that were just incoherent. Here are two of my favorite worst comments on the free version of Life Counter:

This is AWESOME! Too bad none of us will ever get a girlfriend :( (Five stars)

would have been useful 10 years ago(and the phone). No more stealing back packs for me. (Five stars)

What!?

People will not pay for polish.

If there’s a free app on the Android Market that does anything remotely similar to what you’re planning to write, I urge you to simply not bother. At the time of initial release, there were two apps that tracked life in Magic. One was simply a background image with large, aliased purple text on it and no obvious way to change the score (you had to tap above and below the number). The other was two numbers (which were different sizes), again with no obvious way to use it (you had to slide up and down on the number).

I arrogantly believed that this would be easy. I just had to build something that was user friendly and then polish it until it shined. Add a log of all changes to the score. Support 2-4 players. Look pretty with different colors, rounded corners, et cetera. I spent significant time on small details and subtle behaviours.

Immediately after I released it as a paid product, I got an email accusing me of swindling people off because it didn’t do anything substantially different. I pointed out all the additional features and UI polish, to no avail. I got a bunch of email in the same vein over LifeCounter’s lifetime.

Over six months, I’ve only sold 104 copies of LifeCounter. It has been a commercial failure. People simply don’t pay for quality, at least in this market.

One bad review can sink your app.

Over the lifetime of the paid app, I had 15 five star rating and 1 four star rating. And then I got a 1 star review on August 1st. Here’s what happend to sales:

Graph of Sales

The review retreads the demonstrably false claim that LifeCounter has no features above its free competition. But here’s the thing: It doesn’t matter if it’s false. It’s now the first user comment people see, and it’s obviously affected sales per above. It’s also likely to stay at the top for a long time. Fewer new sales means a smaller pool of people to actually write reviews that would push this lie off the top.

What should you do?

As much as it pains me to say this, as I love the Android platform, I wouldn’t recommend people invest the time to write commercial software for it. I may write more utilities for Android (or port rlvm once the NDK is more complete), but I will probably never do anything expecting to profit from my software on this platform.