Search This Blog

Monday, December 28, 2009

& Modern Things Part 3

So yeah, the Xbox 360 is pretty uninteresting, apart from the memory limitation. Fortunately, the third and final thing you can program with XNA is rather interesting: the Zune. The Zune is Microsoft's answer to the iPod: a portable music player. However, the Zune is also capable of running third-party programs through XNA. This makes it rather interesting, as it gives you the best taste of what it was like to program a console in the past (back before console CPUs were in the GHz and memory was in the hundreds of megs), as it has some of the same programming considerations (or at least moreso than the 360 or PC).

There are a few different models of Zune with different capabilities, which can be separated into what I'll call the 'Zune' (the first several models, which are pretty much identical for our purposes), and the recent Zune HD.



The Zune series is shown above. Members of the series vary in exact design, but all have a few common features. All models have a low-power ARM CPU, 64 megs of RAM, and has the ability to do basic bitmap graphics. They contain a 240x320 LCD screen in 3:4 aspect ratio (the screen size varies from 1.8 to 3.2 inches, diagonal), running at 30 FPS (some high-end models also support 60 FPS). For input, each has a circular pad as well as one button to each side of the pad.

These features give the Zune a unique set of programming considerations. Obviously, the CPU is drastically less powerful than on the PC or Xbox 360. Games have a quota of 16 megs of RAM usage, in which both code and data must fit (although as this is only 1/4 of the total RAM of the Zune, garbage collection overhead is much less of a problem, and you should be able to use all 16 megs without seeing too much garbage collection slowdown). Graphics are limited to 2D sprite-based operations, and screen space is very limited (the Zune actually has a quite impressive display resolution for its screen size, but it's still tiny). Of course, the fact that this is a portable system means that battery life is also an issue. But perhaps the most tricky issue is the input system.

The Zune is designed to be used standing up, as shown in the above picture. In this orientation, the screen is taller than it is wide. Control can use both thumbs, with one thumb on one button, and the other thumb shared between the circular pad and the second button. However, as games typically are designed around a screen that is wider than it is tall, this configuration comes off as somewhat unnatural, though some games are more appropriate for this than others (e.g. a top-down shooter would have no problems, here).

Alternately, the Zune may be turned on its side, yielding the standard 4:3 aspect ratio used on everything prior to high-definition televisions and wide-screen monitors. The chief problem with this configuration, then, becomes control. Due to the button configuration of the Zune, this configuration allows only one thumb for input, shared between the circular pad and the two buttons. This has the effect of drastically reducing the potential complexity of gameplay input, as it's far more cumbersome to switch between buttons than to simply have a second thumb take care of one of them.



The Zune HD is shown above. You could really call this the Zune 2 (or, if you're Nintendo, the Super Zune), as it's almost entirely different from the previously mentioned Zune series. It's powered by an nVidia Tegra (2600), a system on a chip which both acts as an (ARM) CPU and a GPU capable of 3D graphics (although thus far XNA does not support 3D on the Zune HD). As far as I know, the amount of memory on the HD has not been published, though it's safe to say it has at least 64 megs RAM. While the screen is only 3.3 inches (about the same as the higher-end Zunes), the screen is now 480x272 (when laid sideways) in the HDTV 16:9 aspect ratio.

Perhaps most significant, however, and the reason the Zune is much more interesting than the PC and Xbox 360, is the change in input system. As can be seen in the image, the Zune HD has no buttons or other controls. Instead, the HD features two new input methods: a multi-touch display and an accelerometer.

For those not familiar with it, multi-touch displays are a type of touch-screen, which take as input a point on the screen and the pressure exerted on the screen at that point. Multi-touch takes this to the next level by allowing multiple points of contact, including tracking of movement of each point. This allows for very flexible and powerful input, permitting such interfaces as "point and click" via pressing the screen at a point, dragging and dropping, and things such as gestures. This even allows interfaces such as the one seen in Minority Report (and other sci-fi-ish depictions), where multiple points of contact with the touch screen can be used to grab and move, rotate, or resize items on screen (this type of interface exists in the Microsoft Surface and other multi-touch devices; also, be sure to check out the Dungeons and Dragons on Surface demonstration).

You might not be able to guess what an accelerometer is, based only on the name: obviously it measures acceleration, but unless you're into physics, you probably wouldn't make the connection with gravity. Technically, holding an object in air (as opposed to letting it free-fall) requires an upward force to be applied to the object, and a force produces acceleration. An accelerometer measures this acceleration against the force of gravity; in other words, an accelerometer measures which direction is up, based on the orientation of the device. Of course, it also detects other types of acceleration, such as movement in space, as well. Between the possibilities, this allows a number of interesting (and impossible, with traditional input methods) input systems, such a basing the in-game camera on position and/or orientation of the Zune, actions triggered by bumping or shaking, etc.

Microsoft makes the multi-touch screen and accelerometer available in XNA through the XNA Zune Extensions addon. Once you've downloaded that, search the help for Zune HD Input Overview for general information about support. After that, zunezune.org has posted a helpful simple "game" that demonstrates multi-touch and accelerometer: the code is available here, and a video of the program in action is below. Finally, Platformer: Adding Touch Support (also in Extensions help) is a tutorial that shows you how to add multi-touch and accelerometer support to the platformer starter kit.

Wednesday, December 23, 2009

& Modern Things Part 2

So, after looking a bit at several (very) old game systems, how about we look at a couple of the new ones. Though in reality, the XBox 360 is actually pretty boring; which is to say that it's more or less a modern computer.

The 360's Xenon CPU is a pretty typical incarnation of the PowerPC line used in older Macs, and is related to the Playstation 3's Cell CPU (although the Cell has a very unusual architecture resembling a cluster on a chip, and differs quite a bit from other PowerPC chips - or most CPUs, for that matter). The PowerPC line, the low-end portion of the larger Power line, are RISC processors with simple instructions limited primarily to operations on registers, in contrast to the CISC x86 and the CPUs of the 2600, NES, and SNES, which tend to use many instructions that operate on memory data.

The Xenon is composed of three symmetric 64-bit cores with a shared L2 cache. Each core executes two threads simultaneously, and contains (among the expected things) a SIMD vector unit for significant math performance (although if you're using XNA you won't have access to the vector unit). The only remotely noteworthy part of the CPU is the fact that unlike some other PowerPC varieties (and all Intel CPUs for quite a while, now), execution is in-order, meaning that it must pause execution of a thread when a slow I/O operation is required; the assumption then is that the number of threads executing at a time (2 per core) will reduce the impact of individual stalls.

The Xenos GPU is also fairly uninteresting. It's a custom ATI 3D GPU designed specifically for the XBox 360 and optimized for console games, though a lot of it resembles common PC GPUs of the same vintage. It supports the DirectX 9 Shader Model 3, although it contains some custom extensions that provide some of the features new in DirectX 10 Shader Model 4 (though the details may differ), such as the unified shader architecture. It also has dedicated hardware to provide 4x anti-aliasing for "free" (as opposed to the performance penalty that normally occurs with anti-aliasing) and optimized z-only rendering. Finally, after all the fancy rendering is done, the 360 supports several (television) output resolutions from 640x480 (standard TV) to 1920x1080 (highest HD widescreen).

But the most noteworthy parts are the ones we haven't seen before (at least in this series of posts).

Some versions of the 360 come with a hard drive of varying sizes. In addition to saved games (which can also be stored on memory cards or internal flash memory, on models without the hard drive), this drive is used for game caching (hard drives are faster than DVDs) and optional downloadable content. It's also used to store the XBox compatibility software that allows the 360 to emulate games for the original XBox.

Finally, all XBox 360s come with the ability to connect to the internet (wired ethernet ports are standard, with a wireless addon), especially for the purpose of connecting to the XBox Live service. Live is a social platform that covers a wide range of services (although some require a paid subscription to Live), including friends lists and communication; multiplayer matchmaking and play; game achievements that allow your friends to see your gaming accomplishments; voice and video chat with friends; downloadable bonus game content; the Live Marketplace, where you can purchase and download addons and entire games (including XNA games) and other content (e.g. movies); and several major third-party web services such as Netflix streaming movies and Last.fm steaming music.

So, that's the hardware and the platform. But what's it like to program? Well, thanks to the surreal veil of secrecy surrounding consoles in general, that much isn't really common knowledge, and I'm not entirely sure. Development is in C or C++, probably with the Intel C++ compiler. The 360 uses a custom operating system (so they say) that supports at least some approximation of the Windows API and DirectX; while the OS does not use the same driver system Windows normally uses, the CPU is probably the only piece of hardware in the system programmers are supposed to directly access, with other hardware abstracted through the Windows or DirectX APIs or some such. Given this, if Microsoft is smart, they made it as similar to programming on Windows as possible, so that developers can transition from the PC to the 360 with minimal education. Though one thing that will definitely have to differ is the compiler intrinsics, for things such as vector math (perhaps the same intrinsics that were used on the PowerPC Mac) and multithreading-related things (e.g. memory barriers; remember those?).

The situation is different if you're using XNA. In this case programming the 360 is almost identical to programming the PC via XNA. Programming is done in C#, and run on the .NET compact framework. The runtime libraries consist of a subset of the .NET class library and the additional features supplied by the XNA class library. No hardware is directly accessible; the CPU and memory are hidden behind the .NET framework, and graphics and sound hardware can only be accessed through the XNA class library (as far as I know you can't directly access DirectX through XNA, at least on the 360).

But regardless of how you program it, perhaps the most noteworthy difference between programming a PC and the 360 is the memory limitation. While on PCs it's always been cheapest to just make your users buy more memory, on consoles it's frequently the case that you have to actually spend development manpower shaving off memory usage to make your game fit in the console's memory (at least for large, complex games). The 360 has 512 megs of memory. While this may not sound so bad at first, you have to realize that this is common memory, shared by both the CPU and the graphics system (though at least the OS probably takes up drastically less memory on the 360 than on the PC). Compared to PC games that typically take north of a gig of main memory and 512 megs video memory, 512 megs starts to look pretty small (for the curious, the Playstation 3 is comparable: 256 megs main memory and 256 megs graphics memory).

This is especially true in the case of XNA. As stated previously, thanks to garbage collection, you can only use 30-40% of the total system memory before you start seeing a substantial decrease in available processing power due to garbage collection; on the 360 this comes out to something like 64-128 megs, depending on how much memory is used for graphics. Fortunately, there's a way to deal with this penalty: avoid garbage collection entirely. Garbage collection is triggered when a memory allocation fails due to there not being enough unallocated memory to perform the allocation; the framework then performs garbage collection to look for memory that was allocated but is no longer being used, and can be freed to make room for the new allocation.

In other words, if you can avoid allocating memory during gameplay, you can prevent garbage collection (you could also manually cause the framework to do garbage collection at times which are convenient, such as during loading or pausing); this is optimization 101: the fastest code is the code that isn't executed. Use structs, which are allocated on the stack or within the containing memory structure, rather than classes, which are allocated out of the heap; use allocation-minimizing algorithms and data structures, such as an open-addressing hash table (e.g. Dictionary), where the hash table is an array of entries, rather than an array of linked lists or a binary tree (e.g. SortedDictionary), which must allocate memory for each entry; use reasonable reserve sizes for structures so the structure isn't likely to need to be reallocated during gameplay; use free lists as much as possible; use specific enumerators rather than IEnumerator; etc. - anything that can significantly reduce the need to allocate memory.

Friday, December 18, 2009

& Modern Things Part 1

One thing that has always kind of mystified me is the degree of inaccessibility maintained about video game console development. For almost every single video game console in history, getting access to the manufacturer's developer documentation and hardware requires that you be licensed (and it sucks to be you if the manufacturer doesn't like you) and have a large sum of money for the development kits; for most systems, the accessibility of the system to lone hobbyists has been 0.

Of course, eventually the system, if there's enough interest, is reverse-engineered, and unofficial documentation is put up on the internet (and perhaps followed by law suits). But such documentation is usually incomplete, inaccurate, or just plain badly written, and there's almost always anti-piracy lockout mechanisms that prevent you from actually running any code you manage to write on the system itself.

This is a strikingly odd decision, and seemingly self-destructive. Many software companies turn a blind eye to piracy among college students because they realize that students pirating their programs in college, when they couldn't afford to buy the programs, anyway, makes them more likely to use them professionally, when they do have money to purchase the programs. Some companies even give away their programs to students for educational purposes, for the very same reason: more student users that get the program for free means more buyers after graduation. Economically, it's a simple business investment, trading theoretical income now, which they couldn't collect anyway, for greater actual income later.

This situation on consoles is no different. If you promote hobbyist development on your game console, that increases the probability that those hobbyists will develop on your console when they go pro. This is especially applicable considering that the modern game console business model is to sell the consoles at a loss, then make up that loss in licensing of games; thus, every additional game developed for your system equals more money for you. Ultimately, I suspect the reason for this odd behavior lies in the lockout mechanism: in order to allow hobbyists to develop for the console, the console cannot have a lockout mechanism, which would make it much easier to pirate games.

Anyway, there are only a couple exceptions to this rule, that I know of. Sony published a mini developer kit for the Playstation, called the Net Yaroze, for about a grand, which included a black development version of the system (lacking the lockout hardware), various documentation (I don't know if this was the same or inferior to the professional documentation), some run-time libraries (probably inferior to the professional kits) and software to get custom code onto the Yaroze. Of course, anything you write could only be run by people with their own Yaroze, due to the lockout system.

Very surprisingly, only in the last couple years has anyone actually targeted non-professionals as serious developers for their system. Of course I'm talking about XNA (official site). XNA is a free game programming framework built on top of the .NET platform. Games are written in C# in Visual C#, and make use of the .NET and XNA class libraries. The .NET libraries (a subset, to be precise) provide general support code such as data structures and multi-threading, and the XNA libraries cover hardware access, such as graphics and sound, and various support routines too game-specific for the .NET libraries, such as quaternions and interpolated curves. Use of the .NET framework also provides things such as garbage collection, that make programming easier and faster; the use of non-native code and APIs also means that, if code is written carefully, a single game can be compiled and run on all three platforms XNA supports: Windows, Xbox 360, and the Zune (the fine print: development on the 360 requires a Creators Club Premium subscription; membership is available to anyone, but costs $100/year).

Unlike the completely unrelated, professional Xbox 360 Software Developer Kit, XNA is targeted specifically at hobbyists and independent developers (e.g. a person who writes a small game and wants to sell it for cheap). Microsoft runs an active community site containing Microsoft-written code samples, tutorials, and starter kits (e.g. a 2D RPG and a 3D racing game), and forums where both hobbyists and XNA developers post. There are also third-party XNA tutorial/code sites, though Ziggyware, the largest and best, imploded a ways back (humorously, according to Google, Ziggyware and a few others had posted the presentation I wrote on kd-trees as part of my graphics term project, prior to the site going under).

Once a game is finished, in addition to distributing the source and/or binaries manually, finished games can be submitted to several places, depending on platform. Free Windows games may be submitted to the XNA community website for download. Finished games can also be submitted to the Xbox Live Marketplace, where they can be downloaded, either for free or for sale (whatever you decide on), by anyone who has an Xbox 360 (a Creators Club subscription is not required to play games published on the Marketplace). To say that again: you can sell your amateur Xbox games for cash through Live Marketplace, right next to professional downloadable games.

Of course, this isn't without its caveats. XNA doesn't allow you to directly access the hardware, or use the same low-level API professional developers use from a natively-executed (after compiling) language like C++. This means that XNA games cannot compete in sheer speed and power with professionally-developed games. It also means that you may not be able to access 100% of the features of the hardware, where XNA only exposes a portion of the feature set; one very notable omission is the ability to use the Xbox 360's vector math unit, a fact that drastically reduces the performance potential for some types of calculations.

Second is the fact that it's a .NET platform. While this choice was good from an ease of programming perspective (garbage collection is easier to program and less prone to various coding errors, especially given that most of the people coding on XNA will be amateurs and thus may not be very good), it's not so good from a performance perspective. Perhaps the most stereotypical problem for garbage-collected systems like Java or the .NET platform is that garbage collection is far from free, both in terms of CPU and memory. If you plot the proportion of time the CPU spends collecting garbage (instead of, you know, actually running the program) against memory usage, you'll find that after a certain threshold cost grows exponentially. One study found that, when memory utilization is at 20%, garbage collection is no more expensive (and maybe even faster) than explicit allocation/deletion, but costs rise quickly from there: garbage collection is 17% slower at 33% memory usage, 70% slower at 50% memory usage, and as you approach 100% memory utilization garbage collection approaches 100% of total CPU time.

A third major point of concern is that the XNA framework on the Xbox 360 does not have the full .NET framework/compiler, but only the Compact Framework. This version is designed for hardware that doesn't have a lot of memory or processing power (especially things like PDAs), and offers reduced memory and CPU overhead at the price of sub-optimal execution speed. Of the various optimizations performed by the full-fledged .NET framework, only a subset are performed by the compact framework; for example, inlining is restricted, virtual function calls are implemented in a more compact but slower manner, and both the framework and garbage collection algorithm are just dumber in general, to name a few issues (see here for a more lengthy list).

However, making decisions is easy when there's only one option, and if you're a hobbyist wanting to code for a modern video game console, that option is XNA. Even if you're not interested in consoles, XNA still provides a convenient and free game development platform, designed specifically with hobbyists in mind. It also opens the possibility of making a bit of money off your amateur games prior to going pro.

Tuesday, December 15, 2009

& Moral Panics

One topic that came up rather suddenly in IRC is the topic of the irrationality of humans when a person feels wronged. The particular topic in chat was that, I'm told, you should never, ever touch leaked materials, such as the Windows source or COFEE, because this tends to send companies (especially the one that produced said thing) into moral panics and refuse to ever hire you.

Think about this for a moment; a little bit of rational thought concludes that this is highly irrational behavior, reminiscent of the Pointy-Haired Boss (is there a Dilbert strip on this topic, I wonder?). If you were Microsoft, for instance, and you were looking to hire a programmer for the Windows team (although this could also apply to other parts as well), the #1 most desirable candidate for you is the one who has played extensively with the leaked Windows source, all other things being equal. Not only would categorically refusing to hire such a person result in no benefit, but it would materially harm you as a company, by refusing the candidate most beneficial to you. This is a case where moral outrage contradicts reason, and acting on that outrage results in self-destructive behavior that does more harm than good; or, as the saying goes, cutting off your nose to spite your face.

An alternate form of this is observed extensively in the copyright industries, who have a long history of various licensing and technology blunders with a detrimental effect to their own sales in the name of fighting piracy (and the goal of fighting piracy is, you know, to increase sales). In this case the moral outrage is provoked by a fixation on the amount of piracy; this is a fundamentally flawed measurement. The entire purpose of business is to maximize profit, and that is concerned (usually) solely with sales: reducing piracy (if you can even manage that) is of no benefit if doing so does not produce a net increase in sales at the same time; whatever the exact number of pirated copies may be is entirely irrelevant. And if you haven't managed to boost sales in the process, you're all the worse off because you're already out the money you spent trying to fight piracy.

(for those wondering, yes, the term "moral panics" is from Patry)

Sunday, December 13, 2009

& Even Older Things

On Friday I went to a free, open presentation at University of California Irvine. Normally I wouldn't even mention such a thing on the blog, but it just so happened that a significant part of of presentation was about the hardware of the Atari 2600, a game console launched in 1977, 6 years before the Nintendo (NES). As this fits right in with a series of posts on this blog, I figured I might as well write about it.

Interestingly, the 2600 used almost the same CPU (the 6507) as the NES (6502) and Commodore 64 computer (6510). According to Wikipedia, the 6507 was a smaller version of the NES's 6502, while the 6510 was an expanded version of the 6502 . The 6502 supported 64 KB of address space (16-bit addressing), while the 6507 supported 8 KB (13-bit addressing); the 6502 also supported (external) hardware interrupts (the vertical blank interrupt, in the case of the NES), while the 6507 did not.

In contrast to the NES's 2 KB and SNES's 128 KB, the 2600 had an amazing 128 bytes of RAM (although more could be added on cartridges, for a manufacturing price). Early 2600 games came in 2 KB ROMs, although the system supported up to 32 KB with bank switching (4 KB at a time); in comparison, NES games ranged from (I think) 32 KB to 768 KB (also with bank switching: 32 KB at a time), although in theory it could support more.

But by far the most "interesting" thing about the system was the graphics system. Unlike the NES, SNES, and pretty much all consoles and computers made in the last three decades, the 2600 had no video RAM to speak of. Instead of storing on-screen image data in video memory which is then composed by the graphics chip and output to the display, on the 2600 the video was drawn actively, one line at a time; by "actively" I mean that the game had to compose each line as it was drawn by the television. Each line, 160 pixels wide, was composed of 24 two-color background blocks, 2 eight-bit single-color bitmap sprites, 2 single-color line "missiles" whose colors mirror those of the sprites, and a line "ball" that was much like the missiles, but was the color of the background.

However, while the hardware was very (very) limited, the ability (or necessity, in this case) to change the screen contents while drawing was underway provided some flexibility for the clever programmer (just like with the NES and SNES). Clever use of the missiles and ball, changing each scan line, allowed for more complex graphics than you'd imagine given the hardware capabilities. By changing the sprite configuration you could have more than two sprites, though having more than two sprites on the same scan line required alternating between them each frame, resulting in flicker. Alternating palettes allowed the system to display 4 different colors on each line (out of a total of 128), as well as multicolored sprites and backgrounds. Clever use of the missiles and ball allowed for additional sprites per line, or multi-color and non-block backgrounds. Developers even found that they could expand the background to a full 48 blocks (the background is 48 blocks wide, but the background is only 24 bits, describing the left half of the screen; the right side is formed by repeating or mirroring the left side) by modifying the background registers halfway through the line.

Finally, the 2600 had sound analogous but inferior to the NES's. It had two channels of sound, one generating a square wave of varying pitch, the other white noise. In comparison, the NES had five channels: 2 square wave (used to approximate most melodic instruments in music), one triangle wave (often for bass or low-frequency strings), a noise channel (used for drums and other percussion), and a waveform channel (I'm not familiar with any instances of this being used by games).

Pac Man, showing off more than two sprites and 48-block-wide backgrounds:


Pitfall, showing off multi-colored sprites and backgrounds, as well as (possibly) non-block backgrounds:

Friday, December 04, 2009

Random Fact of the Day

Exact Audio Copy, like some similar programs (I know older versions of Nero are like this) can be made to work on a Windows limited user account, but it requires a couple of things.

First, you have to make the DAT files in the EAC folder writable by all users (or at least those you want to be able to use it).

Second, you have to enable low-level access to the disk drive for limited users. The easiest and least dangerous (in terms of downloading software from who knows where) method of doing this is to simply use Nero BurnRights (search down on the page), and it will allow you to set this option; note that you don't need to actually have Nero to use this program. Of course you'll have to install it and run it as admin, but once you set the option you'll be able to use EAC (and other similar programs) from any user.

Finally, make sure EAC is set to use the Native Win32 Interface (EAC Options->Interface); this should be the default, but who knows.

Thursday, December 03, 2009

& Almost as Old Things

Recently I've had the whim to play through Final Fantasy 6 (3 in the US) again, and have been doing so on an (SNES) emulator. I'm currently about to start the last level, but that's beside the point of this post. While playing, I thought I'd take a look at the SNES hardware and write a blog post about it, given that I'd already looked at the NES hardware - though not anything half so extensive as what I did with the NES, just a look at the basic hardware. What I found actually surprised me. As it turns out, the SNES really is the Super NES; that is, it has very similar hardware, only better.

First of all, the CPU is a generational upgrade to the NES' CPU, bigger and better. The CPU is now 16-bit (as opposed to the NES' 8-bit CPU), but has essentially the same instruction set (with some augmentations). The CPU is still CISC, sporting the same three general registers (an accumulator and two index registers) and operating on an accumulator register using the contents of memory. However, the CPU now has much greater flexibility in memory access, with a 24-bit pointer register and the ability to access memory relative to the stack pointer*. It also now has multiply and divide instructions, as well as a few other things. But deep down, it's the very same instruction set architecture, and in fact has an 8-bit compatibility mode that lets it directly execute code from the original NES CPU.

*Both of these features were conspicuous absent in the NES, as I believe I noted. As the NES only had 8-bit registers, it had no way to hold pointers (which were 16 bits) in registers; to make use of pointers the NES had an indirect addressing mode where the CPU would write a pointer to memory 8 bits at a time, and then had an instruction to load/store a value through that pointer (think "mov reg, [memory]"). While the NES did have a stack, it had only push and pop instructions, and lacked the ability to access data relative to the stack pointer, preventing use of the stack to pass parameters or store local variables; consequently, parameters and local variables were assigned fixed memory addresses, and the stack was rarely used.

The situation is somewhat similar in the graphics system. While the graphics chip is drastically more powerful than the NES', it's based on the same concept of background layers and sprites, all drawn from a (larger) bank of 8x8 tiles. The SNES supports twice as many sprites as the NES (and a lot more per line) and sprites can be much larger (up to 64x64, compared to 8x16 on the NES), with 16 colors each (compared to 4, including transparent, on the NES); but perhaps the most interesting improvement is that there are now 4 background layers, and they can be combined via various raster operations in many interesting ways.

To illustrate this, take a look at this picture: a typical shot from a battle. The SNES supports 8 different graphics modes. For the most part the difference between the 8 is the number of layers and how many colors each layer supports (presumably this is due to it being too expensive to put enough video memory in to allow full color from all layers at once). In this scene I'm guessing that it's using mode 3 ("3 layers, two using 16-color palettes and one using 4-color palettes"), based on the number of layers I can see plus the number of colors.



To show off the graphic abilities of the SNES, next is a screen shot of the special effects from casting of a spell, which we're going to dissect.



This is layer 1. You can see the bubble from the spell here, sporting transparency that tints other layers. Interestingly, you can see part of a screen (the magic selection screen) that isn't open at all; in the composite, this section is covered up by the bottom part of layer 2. In other words, in the top part of the screen layer 1 is drawn on top of all other layers, while the bottom part is drawn at the bottom of the layers; this goes to illustrate what I said about very complicated and flexible interaction between the layers (it's possible this involves changing the layer parameters in between lines, a technique I mentioned being possible on the NES).



Layer 2 is the background for the whole screen, the top section the battlefield, the bottom section the menu system. Note that a sine wave offset pattern has been applied to the battlefield background; while I haven't investigated it to be certain, I suspect this is accomplished by simply modifying the screen scroll position in between drawing each line.



Layer 3, again serving an array of purposes, is used for special effects and the text on the menus. You can't see it clearly at all from just this layer, but this layer is used to produce those discolored blotches on the spell bubble. This may be another case of transparency for the special effect, but it's hard to tell from screen shots alone.



To illustrate this fact, layers 1 and 3 together:



And finally, the least interesting part: the sprite layer. Although here again we see something rather unexpected: it looks like there's some type of sprite garbage that is normally (again) covered by the menu.



So, that's basically what 7 of the 8 graphics modes are all about. The 8th one, known as Mode 7, however, is a bit different. It has only a single large 256-color background layer, but it has the ability to apply a transformation matrix to the background, allowing scaling and rotation. This is used very commonly in SNES games, especially with the swap-the-registers-between-lines trick, allowing it to do primitive 3D perspective projection. Believe it or not, that minimap is actually a sprite (as is some of the glow in the background).



One particularly interesting (in the sense of peculiar) feature of the SNES design is the sound system. As opposed to most sound systems, the SNES system does not consist of the CPU simply writing channel parameters such as sample #, frequency, etc. to the sound chip which then performs the requested operation. Rather, the SNES has a separate CPU (the SPU) which acts as a sound coprocessor: sound programs are written, assembled, and then executed on this coprocessor; once the program has been uploaded, the sound system can play (e.g. music) without any further involvement of the main CPU at all (I proved this a decade ago by showing that shorting between two pins on the SNES would crash the main CPU while the SPU continued to play the music without missing a beat), though obviously the main CPU must instruct the SPU when it's time to play dynamic sound effects.

We can only guess why the SNES was designed this way. The most obvious possibility is that this frees the main CPU from having to deal with music and sounds effects, leaving it more cycles to spend on something else (especially in the case where one or more channels of the music must be temporarily dropped to allow a sound effect to be played).

An alternate possibility that I haven't been able to confirm is that this is done to increase the resolution of the audio system. In Blaster Master, the game would perform all the calculations for the frame, then spin waiting for the vertical blank interrupt to begin work on the next frame. Thus code executed 60 times a second, apparently including audio code. If this is true in the general case, that limits the resolution of audio operations to 1/60 second as well. In contrast, the SPU runs at 1 MHz independent of the main CPU, allowing it to issue commands to the sound generator at any time, in theory allowing for higher music tempo and more complex audial effects.

Saturday, September 19, 2009

The Taxonomy of Linking

There are generally at least two steps to building a program, regardless of language. Compiling takes a source code file (operating on one file at a time), parses it, and produces machine code (or some manner of other thing such as Java bytecode) in what's called an object file. Here, there's one object file produced from each source file.

Linking then occurs on the set of object files for a project, producing a single binary. 'Linking' refers to connecting the code between object files, so that functions in one file may access functions and data in other object files, which the compiler couldn't do because it works on one file at a time. The linker also adds the operating system-specific wrapper material needed to allow the OS to load and execute the binary.

That's the general idea, anyway. As with most things, reality is more complicated.

What was just described is known as static linking (or build-time linking). Static linking is when called code is included directly in the binary. It's loaded and unloaded with all the other code in that binary, and as such is always available. The code may either come from object files, as above, or from what are called static libraries - essentially archives containing multiple object files combined into one.

However, there are disadvantages to static linking. Because the code is tightly integrated into the binary, it's effectively impossible to update any functions without recompiling the entire binary. Furthermore, because the code is included in each binary, functions used by many binaries must be duplicated, wasting space.

The alternative to static linking is dynamic linking. In dynamic linking, common functions are built into stand-alone binaries called dynamic link libraries (DLLs). Functions in these DLLs, unlike in static libraries, are not integrated into other binaries by the linker, but instead are called directly from the other binaries. This requires the functions be exported - a process in which the linker writes information into the DLL about what functions the DLL contains and where those functions exist in the DLL; this data is then used by the operating system to allow other binaries to locate and call these exported functions (importing). As DLLs exist independently of binaries that call exported functions, only a single copy of exported functions need exist on disk or in memory, and DLLs may be updated independently of other binaries.

Dynamic linking may further be classified as load-time linking or run-time linking. In load-time linking the linker, when creating a binary that calls functions in DLLs, creates an import table, which lists all DLLs and functions therein that the binary requires. The operating system then, when loading the binary, automatically loads all of these DLLs and finds the addresses of all imported functions, writing the addresses to fixed places in the binary; the binary may then trivially call the functions through these pointers that are at known locations. This is entirely automatic; the DLLs are automatically loaded when the calling binary is, and unloaded when the calling binary is unloaded (assuming no other binaries still need them), ensuring they're always available when needed.

To accomplish this, the linker needs what are known as import libraries. These are generated by the linker when it builds the DLL, and contains a list of all functions exported by the DLL. The reason this is necessary is that DLLs themselves contain only a function name (or ordinal number) - only enough information to allow the operating system to locate the desired function; they may not include information such as what parameters a function takes or the calling convention to use to call the function, which are necessary to seamlessly link to the function. A consequence of this is that both the DLL name and all functions needed must be known when the calling binary is linked.

The other type of dynamic linking is run-time linking. Unlike load-time linking, in which the entire importing process is automatic, the run-time linking process is entirely manual. You must manually load the DLL when you need it, call OS functions to get pointers to the desired functions (which you must manually specify by name/ordinal), manually call the functions through the obtained pointers, making sure you use the right parameters and calling convention, and finally manually unloading the DLL when you no longer need it.

As this is much, much less convenient, you generally use load-time linking whenever possible, reserving run-time linking only for those times when you can't use load-time linking for one reason or another. Generally this is because you don't know the name of the DLL (or the function to import) at link-time. There may be multiple DLLs with different versions of a function (e.g. in Diablo II there are DLLs for the graphics system, with separate DLLs for DirectDraw, Direct3D, and Glide); the DLL may be loaded based on user action (e.g. loading plugins); the DLL may not exist until run-time (e.g. the calling binary must download the DLL first); etc.

(for more general information on DLLs, see The Old New Thing)

Sometimes, however, you get in the unenviable position of not being able to use either. This is the situation I found myself in when linking ThunderGraft to the FMOD Ex DLL. Because the DLL will be packed into a Self-Executing MPQ and extract as a temporary file with a random name, I couldn't use load-time linking (which requires a fixed DLL file name). However, the fact that the DLL was exporting full classes meant that I couldn't use run-time linking either, as the names were too complex to manually specify (with such memorable names as "?createSound@System@FMOD@@QAG?AW4FMOD_RESULT @@PBDIPAUFMOD_CREATESOUNDEXINFO@@PAPAVSound@2@@Z" [spaces inserted so the blog formatting doesn't get all screwed up by the huge string]). Static linking was also out, as the authors do not make a static library of FMOD available (which isn't too surprising as static libraries tend to be compiler- and version-specific).

Fortunately, there's one more option: delay-loading. Delay-loading is essentially a neat linker trick. It uses an import library for a DLL, but instead of generating import tables it generates code that loads the DLL and get the imported function addresses through the run-time linking functions; the DLL is loaded and the function addresses obtained when an imported function is first called (or you tell the linker code to do so at a time of your choosing). This allows the convenience of load-time linking with the flexibility of run-time linking, and allows you to get out of sticky situations like this one.

Of course, in this case it was all due to poor design in FMOD to begin with. It's a well-known principle that you should never export full classes; a much better way is to export pure virtual classes. This avoids the problem because functions in virtual classes are not exported individually, but rather accessed through the v-table; no exports to link to, no names to mangle, etc. It's to avoid this very problem that COM chose to use pure virtual classes as the basis of the COM object system, after all, and there really isn't any better alternative.

Tuesday, September 15, 2009

& Hobbit Birthday Parties

There probably aren't many that care that wouldn't see my post on Campaign Creations, but here it is just in case:
Well, it's that time of year again, and I've got some presents to give out. First, I released build 2009.09.13 of MPQDraft a couple days ago. While it only fixes bugs, it fixes some major ones - so major I'm amazed nobody told me about them in the last year and I had to find them myself.

The bugs fixed, in approximate order of severity:
-SEMPQs not activating for games specified with relative names
-plugin page not setting the plugin pointer after selecting with Browse, resulting in a crash
-plugin page stops listing plugins after one fails to load
-not saving custom executable names in the SEMPQ wizard
-plugins weren't forced to use proper plugin IDs for their modules, allowing some plugins to get away with bad behavior that didn't work with FireGraft (which forced plugins to use the correct ID)

As with the last year and a half, MPQDraft is open-source on SourceForge. You can get both the binaries and the source from the files page.

Second, ThunderGraft has been resurrected from the dead. For those who haven't heard about ThunderGraft before, it allows the use of MP3s, Ogg Vorbis, and other sound compression formats in Diablo, Diablo II, StarCraft, and WarCraft II: Battle.net Edition, which give higher audio quality and smaller file sizes than the WAV compression used in those games normally; ThunderGraft also integrates with StarEdit to allow maps to use non-WAV sounds in triggers. I'm calling it a beta version as ShadowFlare just yesterday mentioned a crash that only occurred on her computer, and I probably won't have time to investigate it until at least tomorrow.

And finally, as sort of "present #2.5", ThunderGraft has been open-sourced as well, and it's also on SourceForge. The binaries are available on the files page, but I haven't gotten around to putting a ZIP of the source up yet. Specifically, I really, really need to clean up the directory structure of MPQDraft and ThunderGraft to make it much easier for people other than me (without my directory organization) to compile it. You can still view the source through the source browser, or download it with Subversion or CVS, but you probably won't get it compiled without a bit more info from me; I'll get to that when I have time.

As always, there are two flavors: release and debug. I'd advise using debug for mod development, as it generates log files that are very helpful in fixing bugs, should you find any (but would probably annoy people who download your mod).

Enjoy!

Monday, August 31, 2009

& Very Old Things - Part 3

So. Thus far everything I did involved nothing more than the memory editor and cheat search. In other words, something just about anybody could do. To really say that I reverse-engineered Blaster Master, I needed something a lot bigger, and I knew just what to do: I was gonna find the cause of a 21-year-old bug. There's a well known glitch in Blaster Master that on some bosses, if you or the boss is being hit when you pause the game, you (or the boss) will continue to take damage while the game is paused. This only works on some bosses, however, leaving a big question mark as to whether it's a bug or a feature (it was especially strange that it affected both player and boss); I wanted to find out.



Unsurprisingly, it's quite a bit of work to reverse-engineer something you have absolutely no info on (e.g. in computer programs you at least know OS API calls the program makes, and can use that as a starting point), especially when you're learning both the hardware and the assembly language as you go. Ultimately, it took me an embarrassingly long time to get the job done (certainly more than it was worth), and I ended up disassembling a lot more of the game than was necessary (as is often the case when you don't have any idea what you are looking for). I'll just talk about some of the more important lines of investigation, findings, and complications.

As I didn't know anything about the game, naturally the first thing was to hit "step into" and see where I ended up. I always ended up in the same place: a busy loop that checks a single value. Given the nature of this and some idea about the hardware, I reasoned that this was a loop to wait for the next vertical sync, to start on the next frame; this was confirmed by observing that address being written to from the non-maskable interrupt (vsync) handler.

From there I stepped out of the function and took a look around, writing down the various functions called after vsync. I then went about refining the list and detailing more levels of the call tree, ultimately (after all of my disassembly) resulting in the call tree and other related notes below (note the comments have a focus on things relevant to finding this pause glitch). One thing of particular importance is that there's an object-oriented handler for each object type, and the appropriate handler is called for each non-empty object table entry.

$E936 waits for vertical blank to occur
$C971: for each object calls $C8DF, decreases hit timer if nonzero, calls $C9A4, and calls $C928
-$C8DF copies object data from $400+$56 to $46 ($C925 writes life)
-$C9A4 sets hit timer back to full and decreases life in the object buffer when hit with hit timer at 0 after decrement
--$EA3A
--$EB51 saves the object-specific handler to $7A (LE)
--jumps to the object-specific handler at $C9D3
-$C928 copies object data from $46 to $400+$56
$CA4B->$EA3A->$E61B->$E63C/$EB98 unmaps $8DF6 page [the page the object handlers are], which gets mapped by the NMI handler

$EB7E NMI handler (does not branch)
$EB97 IRQ/BRK handler (stub)
$F7: controller 1 state (bits in reverse order)
$F8: controller 2 state (bits in reverse order)


For a while I did some looking around with breakpoints on the location in the object buffer where the life is stored, looking for things that modify it. Some of the information gathered this way:

Copy of object struct for current object: 46
Buffer used to hold 16-bit pointers [LE] from various tables: 7A

$53 [life of the current object] written to directly by $8DF6 in $C9A4 when damage is taken
$8C6B-8C9F, $8DDF-8DFE (inside $8D76) only executed when hit timer is 0
During pause taking hit, $C1E6->$D7A0 returns A0/N+ to $8DD5 on boss 2, but 7F/N- to boss 5

$D7A0 seems to determine whether you get hit during pause. $7E indicates whether you're being hit or not: high bit for hit, lower bits for damage taken.
$7E is written to by $D71A in ($C141->$D711) when hit by boss, $D7B2 to clear hit per frame. $D71A is not reached when paused with boss 5 because $D6CD does not return
$D6CD returns if the current object is hitting the player, throws an "exception" returning to caller of caller if not hitting (returns from $D710)


Eventually I decided to take a more systematic look at the object handler for the catacombs (overhead view) player object:

$8C38->8D98->$8DBF catacombs player object handler paused
-$C15F
-$C0FF->$EF2B sets $3E and $3F to the X and Y coordinates (in pixels on screen) of the player for $D7A0
-$C1E6->$D7A0 returns A0/N+ to $8DD5 on boss 2, but 7F/N- to boss 5, when paused and being hit. A indicates the damage player should take. highest bit of A and the negative flag indicates whether player is hit or not.
-on N+, $C216 then handles hit (including decrease life)
-$93BD
-$EA3A
-$F029
-$E63C


From this we can see that $C1E6 is the switch we're looking for, determining whether the player takes damage; though a look through the function shows a clear lack of anything resembling hit detection - it simply reads from a struct in memory at $7C, generated who knows where. So, if that isn't determined in the player object handler, maybe it was handled in the objects you can be hit by. For that reason, and to satisfy my curiosity a bit about what makes bosses tick, I decided to take a look at a boss handler function; and as I hate the crab boss so much, naturally I went with that, first.



It was about this time that I discovered that NOPing function calls was a very effective way to quickly get an idea what a function is doing. This method works poorly on computers, because of the greater number of registers and drastically greater stack usage, making such NOPs result in crashes more often than not. On the NES (or at least in Blaster Master), however, the method works very well, with crashes very uncommon.

Crab (level 5) boss unpaused object handler ($A6D0):
-$A07B: (no discernible effect)
-$A6DE: handles movement of crab and state transitioning (but not animation)
-$A7A2: spawns bubbles when necessary
--$C1F5: spawns a bubble
-$C0FF: updates crab screen position based on crab 65 position (NOPout fixes crab position where it shouldn't be, despite crab 65 object movement). affects everything onward.
-$C153->$EB51
-for each body segment (right pincer, left pincer, back):
--$C141->$D711: detection of whether player has been hit. detection of player being hit by green sprite is elsewhere.
---$D6CD: checks for collision
--$C216->$DECC (only if hit)
-$C12C: (no discernible effect)
-$C189: draws (animates?) the green sprite
-$C144->$D697: sets player damage if player hits green sprite. also handles things damaging the boss.
--$D6CD: returns on collision (either player hit by something or boss hit by something). eats the stack frame if no collision (returns to caller of caller)
-$9EB3: animates the background layer


Okay, so $D6CD is another function that looks like a good candidate. So, let's take a look and see if it's what we're looking for or another tangentially related function (don't blame me for the formatting, Blogger doesn't like indentations).

$44 = A
$00 = $3E - ($40 / 2)
$01 = $3F - ($41 / 2)
X = 0xF
do
{
A = $7E[X]
if (A > 0)
{
$45 = A
A = $7C[X] - $00
if (A <= $40)
{
A = $7D[X] - $01
if (A <= $41)
return
}
}
X -= 3
} while (X >= 0)

throw


Now that looks like a collision detection function, more or less. We can see the X coordinate, Y coordinate, width, and height of whatever it is we're comparing against at $3E-$41, respectively (this is a case of a memory region being used like a stack frame). We can also see an array of 5 structs containing the X coordinate, Y coordinate, and a thingy, respectively, starting at $7C. Clearly only the size of the "current" object is considered, and the array of thingies are just points.

Well, let's think about it for a minute. We know that this function is called 4 times for the crab boss, for the 4 different parts of the body (and that one of those is not like the others). What might collide with the boss that would be of interest? Well, the player and the player's projectiles. A bit of playing around proved this to be true: the first entry in the array is the player, the rest are the player's projectiles (limited to 4, as you can see). That third byte in the struct array indicates the amount of damage the projectile does (or 7F for the player).

So, now we have everything we need to figure out how this works. The boss object checks for collisions with the player and the player's projectiles. If it hits the former, it marks the player for damage, which is then applied by the player object handler (presumably next frame, since the player handler always gets executed before enemy handlers); if it's the latter, the boss takes damage, though only if the part hit is the green sprite, which is the hit box.

Now that we know how the collision detection and damage system works, and why bosses that are susceptible to being hit while paused also themselves can hit you while paused (the collision detection for both is one in the same); that just leaves the question of why it only operates on some bosses. Well, it was about here that I discovered that that wasn't actually true. While only some bosses can be hit (and hit you) while paused, this is not the case for projectiles. Specifically, the projectiles fired by boss 3 and boss 8-1 can hit you while paused, even though the boss cannot be hit while paused.

Well, it turns out there are two sets of object-specific handlers, one for when the game is running, one for when it's paused. Now, bullets and other things are generally not prone to the problem of dealing damage while paused, as they disappear as soon as they hit something (e.g. the bubbles spewed by the crab boss). The things that can hit you while paused appear to be exactly those objects that have pause handlers and that don't disappear after hitting you; this is the case with all susceptible bosses, plus the projectiles of bosses 3 and 8-1.
Boss object handlers:
1 (63): base $A58A, paused stub
2 (5F): $9B64
3 (61): $A196, paused stub
4 (5D): $970C
5 (65): $A6CD, paused stub
6 (5F): $9B64
7 (5D): $970C
8-1 (67): $AA34, paused stub
8-2 (69): $AC84, paused stub


While I didn't go searching for every single object handler to compare the paused and non-paused versions, I did compile a list of the ones for the bosses. Note that there is only actually a single table of handlers in memory; the address in the table points to the paused handler, while the unpaused handler is always 3 bytes afterward (3 bytes is the size of a JMP instruction). For the bosses not prone to this glitch, you can see that I've indicated the pause handler is a stub that returns immediately. For the rest, the pause handler jumps midway into the unpaused handler (after things like movement).



To illustrate this, take a look at the handler for boss 2 ($9B67):
-$A07B: (no discernible effect)
-$C01E: handles movement of boss
-if time to spawn next projectile:
--$C1F5->$D851: fork projectile from boss, giving it a new proto-object type
--$C216
-paused handler jumps directly to here
-$C0FF
-$9D77: draw and do hit detection for arms
-$C090->$D770: do hit detection for back (not hit box)
--$D711: hit detection
-$C093: do hit detection for hit box
-$9EB3: moves and animates the background


So, that's how it is. Now we know why it happens and why it only affects some things. That just leaves one more question: is it a bug or a feature? Unfortunately, the evidence is much too ambiguous to answer that clearly. While it's within the bounds of imagination that damaging the boss during pause could have been intentionally put in as a cheat of sorts, it's very hard to imagine that the same thing against the player would be intended.

Yet it's also hard to imagine that it could be a bug; while the player being hit during pause could be eliminated by a single change to the player handler (a single mistake, in other words), doing the same for bosses would require that every boss's handler be changed. Furthermore, there's the fact that only 2 of the 5 unique boss object handlers (2 of them are used for 2 bosses each, bringing the total to 9 bosses) have pause handlers, in contrast to the player and most projectiles, and without pause handlers the bosses don't even get drawn completely (and what point is there in having a pause that lets you see the boss battle if you can't see the boss?). It seems as though a lot of stuff was left out or in arbitrarily; I have to wonder if there is a deadline lurking in here, somewhere.



Various resources used during reverse-engineering:
Nintendo Entertainment System Documentation
6502 Instruction Summary
How NES Graphics Work
Comprehensive NES Mapper Document
For more info, see this big list of documents

Saturday, August 29, 2009

& Very Old Things - Part 2

As for the hacking itself, this started out with simple stuff. Search memory, find special values (life, lives, gun power, boss items, etc.), and overwrite them. This was quite easy with an emulator that has a cheat search feature; you simply have it save the memory contents at the initial state, the repeatedly perform searches based on specified criteria until you've narrowed it down to 1 or a small number of possibilities. For example, finding player health was a trivial matter of searching for something that decreases each time you take damage and stays the same in between damage.



With the exception of the boss items, all of the others in this list were similarly easy to find. Boss items, however, were a bit more complicated. Because you only get the items after beating a boss, naturally this is a situation where there are likely to be a lot of changes all over RAM, as it's a major transition in the gameplay; this means that you can't narrow it down as easily. I had to try several locations, starting with the ones whose values made them the most probable to be correct (specifically ones that had a bit flipped on when the item was picked up).

However, there was one other complication figuring that one out: boss items are represented by two separate bit masks that are partially redundant. I first discovered the mask at 3FC, because that's the one that changes the bullet your tank fires after you get the first boss item; as it was a visual change, it was very easy to see that it was working. Once I found that, I checked to see if it was a bit flag, and confirmed it was, finding the flags for the second and third boss items by experimentation.

That's where things got weird. While this mask contained flags for boss items, the flags for the other boss items didn't seem to actually do anything. As well, while having the hover bit flag set caused hover power to be displayed (it's hidden prior to getting the third boss item), you couldn't actual hover. The bits here also did not make the upgrades show up on the pause screen. So, I went on, with my infinite life and gun power, and killed a few more bosses (somehow I still remember most of the maps of this game).

I found that there was a second bit mask with a completely different set of flags, one for each of the 7 items in the game. For the first 2 items, these flags did nothing more than make the item show up in the pause screen. For the third item (hover) this flag allowed you to hover, but did not show the hover power bar or allow you to accumulate hover power (so you still couldn't actually fly with this flag alone). I still can't imagine why it was done this way, but at least I figured out all the flags.

One thing that I couldn't figure out with cheat searches, however, was boss life. That is, I found it - repeatedly: it was in a different place every time (including even for the same boss). I correctly reasoned that this implied that the boss's life was not a special value, but that bosses were simply common entries in the list of objects in the game. As it turns out, so is the player; you just don't notice that because the player is always at index 0 in the object array and thus always at address 0x400, while the boss was stuck in whatever slot was available at the time. This led directly to the discovery of the object array itself. From there, I started looking at the struct that represented each object, beginning with offset 0xD: life.

As I've already pointed out that you tend to have to get clever when coding on a system with so little RAM, it shouldn't be surprising that maps are stored in a similarly clever way. As far as I can tell, there is only a single map for each area in the game (8 areas), which is 128x256 or less. However, the map is diced up and jam-packed in there such that they need to do some camera tricks when transitioning between areas (teleporting through some doors) to make sure you never see other sections that are contiguous in memory.

While I was working on it, I figured I should try to answer some long-standing questions about Blaster Master. To begin with, why does the power of your gun (what it actually shoots) not always match the gun power that's displayed? For example, if you have full gun (8 bars) and get hit, you take 1 bar of damage, but the effect of your gun falls beneath what 7 bars should give you. Many years ago I hypothesized that there were two variables for gun power; one was the one displayed, and one that was the actual effect: getting hit did a bit more damage to effect than the display.

It turned out to be simpler than that. It's a consequence of the fact that while most hits (I can't think of any counterexamples) do an integral number of bars of damage, life, gun, and hover are stored from 0-FF, in bytes. See where this is going? One bar = 20, but 8 bars = FF, not 100. In other words, the number of bars displayed is (x + 1) / 20. When you get full gun and then take a hit, your gun drops to DF; while this still displays as 7 bars, it doesn't give you the maximum gun effect (which requires E0). It's not clear whether this is a bug or a feature, though I'm kind of leaning toward a bug (specifically, an edge case the coders didn't think of, which could be easily solved).

All numbers are in hex.

Continues: 37E
Lives: DD
Gun: C3
Hover: 92
Bosses killed: 3FB. bit x = boss x+1
Boss items 1: 3FC. same values as 3FB.
Boss items 2: 99. bit 0: hover, 1: dive, 2: wall 1, 3: wall 2, 4: crusher, 5: key, 6: hyper
Homing Missiles: 6F0
Lightning: 6F1
Thunderhead Missiles: 6F2
Pause: 15

Gun Levels:
20: 2 long shots
40: 3 long shots
80: circling shots
C0: white wave shots
E0: full wave shots


Beginning of Object Table: 400. 12 slots. Slot 0 is reserved for the player. Slots 1-4 are reserved for projectiles. Enemies fill slots 5-11 as needed.

Object Struct (14 bytes)
byte @0: Object type. A few of the known object types:
3: Normal
4: Dying
5: On left wall
6: On right wall
7: On ceiling
8: Diving
9: Climbing H->V convex
A: Climbing V->H convex
B: Climbing horizontal->vertical concave
C: Climbing V->H concave
D/E/F 10/11/12: Entering/transitioning/leaving door right/left
1B: Outside tank
1D: Entering tank
84/85/86: Entering/transitioning/leaving door catacombs

byte @1: Orientation. The exact meaning of this seems to vary by object type.
Tank mode: Between 0 for left and F for right (the intermediates indicate that the tank is changing face), with the high bit set if the tank is pointing to the left
Catacombs mode: 0 for up, 1 for right, 2 for down, etc.
uint16 [LE] @2: Horizontal position. The high byte indicates the attribute-size tile, while the bottom byte is the position within that tile. The highest bit indicates that you're currently moving through a teleporting door.
uint16 [LE] @4: Vertical position
byte @6: X velocity
byte @7: Y velocity
byte @8: Attribute-size tile index on screen? This was always observed to be (Y * 0x11) + X where X and Y are relative to the tiles visible on screen.
byte @9: Hit timer: the number of ticks until the object can be hit again. Controls invulnerability, damage flashing, and also decreases gun by 1 for each tick above 1 (this causes gun power to drop smoothly rather than suddenly, though I'm not sure why they did it this way; life does not do this when damage is taken).
byte @A: Current AI state
byte @B: Frames till AI state change
byte @C: ?? (no observed effect)
byte @D: Life. For trivia value, normal tank shot does 4, hyper shot does 6, crusher shot does 8, normal catacomb shot does 1, catacomb shot with wave gun does 2, and grenades do 2.


Note that the AI state bytes in the struct are for simple enemies. Bosses have other state stored elsewhere. For example, in the above crab boss, the AI state and timer control movement but not spraying bubbles, which occurs entirely independently. The density of bubble spray, however, was directly controlled by the life of the boss; e.g. below 0x20 life (it starts with 50) it does the full sawtooth volley with dozens of bubbles.

Wednesday, August 26, 2009

The Story of ThunderGraft

For those who aren't familiar with it, ThunderGraft was possibly the coolest of the three modding tools I released (the other two being MoPaQ 2000 and MPQDraft). Diablo, Diablo II, Starcraft, and Warcraft II: BNE all use PCM WAV audio (22khz 16-bit if I recall correctly) for their sound effects and music. Diablo used raw PCM WAV audio, while Starcraft introduced (and the other two used) a special "WAV compression", which compressed the audio in lossy ADPCM form, resulting in a compression ratio between 3:1 and 4:1 (though at the cost of audio quality); WAV compression, however, was implemented transparently in the MPQ API - the games saw what they read and wrote to the MPQs as simply PCM, and that's all the audio streaming API was capable of playing (in other words, the audio streaming functions were identical in all four games).

ThunderGraft was a utility that added the ability for all four of these games to play MP3s and Ogg Vorbis, in addition to standard PCM WAVs; even better, it did this in a version-independent way* (the very same ThunderGraft binary worked for all versions of all of those games). Naturally this was huge, especially in the days of dial-up (which was when the modding community first became big). You could use modern audio compression formats that yielded significantly smaller file sizes and higher audio quality compared to WAV compression. Yet ThunderGraft is all but dead now. Why is that?

MPQDraft and ThunderGraft have a few things in common. They're both very simple, clean, version/program independent, and highly effective. The reason is also the same: they both rely on very clean exploits of design to do their thing, without getting into messy exploits of implementation that depend on the precise binary patched. MPQDraft exploited the priority system in the MPQ API; ThunderGraft exploited the fact that the audio streaming API was encapsulated in functions supplied in Storm.dll.

The fact that the functions were entirely contained in Storm meant that I could cleanly capture calls to them and redirect them. For cleanliness, I opted to simply replace the streaming API in its entirety. Anything less would have been version-dependent, as it would have required all sorts of messy code modifications inside Storm's internal functions.

This meant that I needed my own decoding and streaming code to replace Storm's. At the time I just coincidentally happened to have such a thing handy. Specifically, a friend of mine by the moniker Dark_Brood was writing a game engine called Aegis, and happened to have audio decoding/streaming code handy for me to plug into ThunderGraft. He sent me a static library of the code, and in it went. Easy.

Where things took a turn for the worst was when he wrote the next iteration of his game engine. This involved rewriting a whole bunch of stuff, and integrated the various systems in the engine much more tightly (e.g. added garbage collection and other global things). This meant that it was no longer possible to simply extract the audio portion of the engine.

While in theory I could have just continued to use the old version, there was a big problem: I never had the code for the original version, nor did he save a copy after rewriting the code. This is a big problem because static libraries are compiler- and version-dependent. Now the only way I could even compile ThunderGraft was on the very same version the original was compiled on: Visual C++ 6, which is some 11 years old, now, and I haven't even had it installed for many years.

Thus, the only way to resurrect ThunderGraft would be to replace the decoding and streaming system entirely, and thus far I simply haven't managed to muster the effort. After open-sourcing MPQDraft I wanted to do the same with ThunderGraft, but was unable to readily do so for the same reason.

*There is one thing that's version-dependent in ThunderGraft, as there's simply no theoretical way to do it in a version-independent way: importing of non-WAV audio files into maps with StarEdit (the Starcraft map editor). Special support for this was required for several reasons: 1. StarEdit verifies that things imported are WAVs and refuses anything else, 2. it needed to know how long the audio file was in order for triggers to work right. The audio decoding and streaming in ThunderGraft, in contrast, was truly game- and version-independent.

Sunday, August 23, 2009

& Very Old Things - Part 1


Bring back any painful memories?

So, a couple days I had one of those evil, perverted urges I'm so notorious for. Specifically, I felt like doing a bit of hacking of Blaster Master. For those not familiar with it, it was an NES game published 21 years ago. A good one, in fact. But before I talk a bit about anything I did resembling reverse-engineering, let me explain the NES hardware to the best of my knowledge (which consists of what I saw in my hacking, and a bit of stuff I looked up online).

The main CPU of the NES is a modified 6502, the CPU used in various Commodore and Apple computers (not that I ever used one of those). The 6502 is a primitive CISC CPU, and in fact somewhat resembles the x86 (though more primitive). It has 3 8-bit registers: a numeric accumulator (A) and 2 index registers (X and Y). Operations generally operate on the value of A and a memory location or literal, and write the result back to A.

It has a stack, though from what I saw in Blaster Master it seems to be relatively rarely used, used mainly to save registers (though even that was pretty rare). Specifically, I didn't see much in the way of stack frames holding local variables. Functions that require local variables tended to have (dedicated) memory ranges allocated to them to be used as private buffers. For example, one address I initially thought was the boss life location actually turned out to be in a buffer region used by the function while updating objects in the object list. This kind of surprised me, as the NES has only 2 KB of RAM; I imagine this must be forced by the architecture, e.g. a lack of instructions to access the stack at a specific offset.

As for the graphics system. If you ever messed around with text mode and text graphics in DOS, the NES graphics system is similar. At the lowest level, graphics are stored as blocks called patterns. Each pattern is 8x8 pixels, 2 bits each (total of 4 colors). The NES has enough video memory for 512 patterns at a time, stored in 2 pattern tables. What exactly it does with these patterns depends. The NES has 3 graphics layers: 1 for background and 2 for sprites (1 in front of and 1 behind the background layer).

deconstructulator provides an excellent illustration for most of the things discussed here. It shows the pattern tables (note that the top half is the sprite pattern table and the bottom half is the background pattern table) and sprites along with a playable version of Super Mario Bros.; although it does not show the name/attribute tables. Refer to that from here onward.

The background layer (illustration below) is very much like the screen buffer in DOS text mode; it's a buffer of screens composed of 32x30 patterns each (indices into the pattern table), called name tables (lost in translation, perhaps?); all patterns in the background must come from the same pattern table. The NES supports up to 4 screen buffers in this way, though it only contains enough memory for 2 of them (if the cartridge doesn't supply additional memory, the game can simply set the NES to mirror the 2 screens).

However, so far each pixel only has 2 bits of color (the bits in the pattern table). An additional table, called the attribute table, supplies an additional 2 bits to the name tables, increasing the total to 4 bits, or 16 colors (though each tile on screen still has only 4 colors within it). However, the attribute table does not contain a value for every tile. Rather, it only contains 1 two-bit entry for each block of 2x2 entries in the name table (so 16x16 blocks composed of 4 patterns). This is the reason 16x16 blocks appear everywhere in NES games - that's the smallest block that can be fully independently controlled.

The background position is specified independently of the sprites. A horizontal and vertical scroll position is specified by pixel for the background, which indicates where the graphics chip should start drawing the background on screen. This is how smooth scrolling works, although implementing this can be quite a trick, as you have to essentially use the name tables as a ring buffer (even more tricky if you support scrolling in both dimensions).




A shot of the side-scrolling portion of Blaster Master (top) and the corresponding, merged name/attribute tables (bottom). The white lines indicate where the top left of the screen currently is (where the two lines cross). The visual discontinuities are caused by the way the game streams the background into the name/attribute tables on demand, filling in only what's on screen in the frame; they're stale data. The color glitches result when the current color in the attribute table for an entry does not match some of the stale patterns still in that name table for that entry (e.g. on the far left the second column of patterns is discolored because they are stale and the attribute entry now corresponds to the first column of patterns, which is on screen).

The NES supports 64 sprites, each either 1 or 2 patterns in size (8x8 and 8x16, respectively), depending on whether a global flag is set. Analogous to the attribute table, each sprite contains the 2 high bits of color (though still that allows only 4 colors per sprite). Other info each sprite possesses includes position (obviously), mirroring parameters, and whether the sprite is in the foreground or background. However, even if you can have 64 sprites, the NES only supports 8 sprites per scan line; if there is more than that, only the 8 with the highest priority are drawn (smart games can cycle through their sprites each frame, but that results in flicker).

There are, however, some tricks that can be done to circumvent some of the innate limitations (e.g. 16 colors, 16x16 blocks, 64 sprites, etc.). These often involve clever but perverse tricks related to swapping out the various things such as scroll offset or sprite data while drawing is underway. One common use of this is to draw part of the screen (the gameplay) with scrolling, while a portion of the screen on top or bottom is fixed (the score area). This can also be used to draw more than 64 sprites per frame, though there's still a limitation of only 8 sprites per scan line.



Blaster Master and others frequently use the background as a grotesque way to draw very large sprites. In the above screen shot, the boss crab is actually the background, while the bubbles, player, and energy bars are sprites (the green areas of the boss are actually sprites, as well). The crab has a couple frames of animation; each of these is on a separate screen in the name table, and every frame the scrolling position is modified. This results in the crab moving around the screen and animating, as if it were a huge sprite.

Oh, and in case you were wondering, yes, this boss frequently exceeds the 8 sprites per scan line limit (that would be 4 bubbles), resulting in the game slowing down to draw them all.