A modern Music Player Daemon based on Rockbox open source high quality audio player
libadwaita audio rust zig deno mpris rockbox mpd
at master 472 lines 22 kB view raw
1 2HACKING ON THE GNUBOY SOURCE TREE 3 4 5 BASIC INFO 6 7In preparation for the first release, I'm putting together a simple 8document to aid anyone interested in playing around with or improving 9the gnuboy source. First of all, before working on anything, you 10should know my policies as maintainer. I'm happy to accept contributed 11code, but there are a few guidelines: 12 13* Obviously, all code must be able to be distributed under the GNU 14GPL. This means that your terms of use for the code must be equivalent 15to or weaker than those of the GPL. Public domain and MIT-style 16licenses are perfectly fine for new code that doesn't incorporate 17existing parts of gnuboy, e.g. libraries, but anything derived from or 18built upon the GPL'd code can only be distributed under GPL. When in 19doubt, read COPYING. 20 21* Please stick to a coding and naming convention similar to the 22existing code. I can reformat contributions if I need to when 23integrating them, but it makes it much easier if that's already done 24by the coder. In particular, indentions are a single tab (char 9), and 25all symbols are all lowercase, except for macros which are all 26uppercase. 27 28* All code must be completely deterministic and consistent across all 29platforms. this results in the two following rules... 30 31* No floating point code whatsoever. Use fixed point or better yet 32exact analytical integer methods as opposed to any approximation. 33 34* No threads. Emulation with threads is a poor approximation if done 35sloppily, and it's slow anyway even if done right since things must be 36kept synchronous. Also, threads are not portable. Just say no to 37threads. 38 39* All non-portable code belongs in the sys/ or asm/ trees. #ifdef 40should be avoided except for general conditionally-compiled code, as 41opposed to little special cases for one particular cpu or operating 42system. (i.e. #ifdef USE_ASM is ok, #ifdef __i386__ is NOT!) 43 44* That goes for *nix code too. gnuboy is written in ANSI C, and I'm 45not going to go adding K&R function declarations or #ifdef's to make 46sure the standard library is functional. If your system is THAT 47broken, fix the system, don't "fix" the emulator. 48 49* Please no feature-creep. If something can be done through an 50external utility or front-end, or through clever use of the rc 51subsystem, don't add extra code to the main program. 52 53* On that note, the modules in the sys/ tree serve the singular 54purpose of implementing calls necessary to get input and display 55graphics (and eventually sound). Unlike in poorly-designed emulators, 56they are not there to give every different target platform its own gui 57and different set of key bindings. 58 59* Furthermore, the main loop is not in the platform-specific code, and 60it will never be. Windows people, put your code that would normally go 61in a message loop in ev_refresh and/or sys_sleep! 62 63* Commented code is welcome but not required. 64 65* I prefer asm in AT&T syntax (the style used by *nix assemblers and 66likewise DJGPP) as opposed to Intel/NASM/etc style. If you really must 67use a different style, I can convert it, but I don't want to add extra 68dependencies on nonstandard assemblers to the build process. Also, 69portable C versions of all code should be available. 70 71* Have fun with it. If my demands stifle your creativity, feel free to 72fork your own projects. I can always adapt and merge code later if 73your rogue ideas are good enough. :) 74 75OK, enough of that. Now for the fun part... 76 77 78 THE SOURCE TREE STRUCTURE 79 80[documentation] 81README - general information related to using gnuboy 82INSTALL - compiling and installation instructions 83HACKING - this file, obviously 84COPYING - the gnu gpl, grants freedom under condition of preseving it 85 86[build files] 87Version - doubles as a C and makefile include, identifies version number 88Rules - generic build rules to be included by makefiles 89Makefile.* - system-specific makefiles 90configure* - script for generating *nix makefiles 91 92[non-portable code] 93sys/*/* - hardware and software platform-specific code 94asm/*/* - optimized asm versions of some code, not used yet 95asm/*/asm.h - header specifying which functions are replaced by asm 96asm/i386/asmnames.h - #defines to fix _ prefix brain damage on DOS/Windows 97 98[main emulator stuff] 99main.c - entry point, event handler...basically a mess 100loader.c - handles file io for rom and ram 101emu.c - another mess, basically the frame loop that calls state.c 102debug.c - currently just cpu trace, eventually interactive debugging 103hw.c - interrupt generation, gamepad state, dma, etc. 104mem.c - memory mapper, read and write operations 105fastmem.h - short static functions that will inline for fast memory io 106regs.h - macros for accessing hardware registers 107save.c - savestate handling 108 109[cpu subsystem] 110cpu.c - main cpu emulation 111cpuregs.h - macros for cpu registers and flags 112cpucore.h - data tables for cpu emulation 113asm/i386/cpu.s - entire cpu core, rewritten in asm 114 115[graphics subsystem] 116fb.h - abstract framebuffer definition, extern from platform-specifics 117lcd.c - main control of refresh procedure 118lcd.h - vram, palette, and internal structures for refresh 119asm/i386/lcd.s - asm versions of a few critical functions 120lcdc.c - lcdc phase transitioning 121 122[input subsystem] 123input.h - internal keycode definitions, etc. 124keytables.c - translations between key names and internal keycodes 125events.c - event queue 126 127[resource/config subsystem] 128rc.h - structure defs 129rccmds.c - command parser/processor 130rcvars.c - variable exports and command to set rcvars 131rckeys.c - keybindingds 132 133[misc code] 134path.c - path searching 135split.c - general purpose code to split strings into argv-style arrays 136 137 138 OVERVIEW OF PROGRAM FLOW 139 140The initial entry point main() main.c, which will process the command 141line, call the system/video initialization routines, load the 142rom/sram, and pass control to the main loop in emu.c. Note that the 143system-specific main() hook has been removed since it is not needed. 144 145There have been significant changes to gnuboy's main loop since the 146original 0.8.0 release. The former state.c is no more, and the new 147code that takes its place, in lcdc.c, is now called from the cpu loop, 148which although slightly unfortunate for performance reasons, is 149necessary to handle some strange special cases. 150 151Still, unlike some emulators, gnuboy's main loop is not the cpu 152emulation loop. Instead, a main loop in emu.c which handles video 153refresh, polling events, sleeping between frames, etc. calls 154cpu_emulate passing it an idea number of cycles to run. The actual 155number of cycles for which the cpu runs will vary slightly depending 156on the length of the final instruction processed, but it should never 157be more than 8 or 9 beyond the ideal cycle count passed, and the 158actual number will be returned to the calling function in case it 159needs this information. The cpu code now takes care of all timer and 160lcdc events in its main loop, so the caller no longer needs to be 161aware of such things. 162 163Note that all cycle counts are measured in CGB double speed MACHINE 164cycles (2**21 Hz), NOT hardware clock cycles (2**23 Hz). This is 165necessary because the cpu speed can be switched between single and 166double speed during a single call to cpu_emulate. When running in 167single speed or DMG mode, all instruction lengths are doubled. 168 169As for the LCDC state, things are much simpler now. No more huge 170glorious state table, no more P/Q/R, just a couple simple functions. 171Aside from the number of cycles left before the next state change, all 172the state information fits nicely in the locations the Game Boy itself 173provides for it -- the LCDC, STAT, and LY registers. 174 175If the special cases for the last line of VBLANK look strange to you, 176good. There's some weird stuff going on here. According to documents 177I've found, LY changes from 153 to 0 early in the last line, then 178remains at 0 until the end of the first visible scanline. I don't 179recall finding any roms that rely on this behavior, but I implemented 180it anyway. 181 182That covers the basics. As for flow of execution, here's a simplified 183call tree that covers most of the significant function calls taking 184place in normal operation: 185 186 main sys/ 187 \_ real_main main.c 188 |_ sys_init sys/ 189 |_ vid_init sys/ 190 |_ loader_init loader.c 191 |_ emu_reset emu.c 192 \_ emu_run emu.c 193 |_ cpu_emulate cpu.c 194 | |_ div_advance cpu.c * 195 | |_ timer_advance cpu.c * 196 | |_ lcdc_advance cpu.c * 197 | | \_ lcdc_trans lcdc.c 198 | | |_ lcd_refreshline lcd.c 199 | | |_ stat_change lcdc.c 200 | | | \_ lcd_begin lcd.c 201 | | \_ stat_trigger lcdc.c 202 | \_ sound_advance cpu.c * 203 |_ vid_end sys/ 204 |_ sys_elapsed sys/ 205 |_ sys_sleep sys/ 206 |_ vid_begin sys/ 207 \_ doevents main.c 208 209 (* included in cpu.c so they can inline; also in cpu.s) 210 211 212 MEMORY READ/WRITE MAP 213 214Whenever possible, gnuboy avoids emulating memory reads and writes 215with a function call. To this end, two pointer tables are kept -- one 216for reading, the other for writing. They are indexed by bits 12-15 of 217the address in Game Boy memory space, and yield a base pointer from 218which the whole address can be used as an offset to access Game Boy 219memory with no function calls whatsoever. For regions that cannot be 220accessed without function calls, the pointer in the table is NULL. 221 222For example, reading from address addr can be accomplished by testing 223to make sure mbc.rmap[addr>>12] is not NULL, then simply reading 224mbc.rmap[addr>>12][addr]. 225 226And for the disbelievers in this optimization, here are some numbers 227to compare. First, FFL2 with memory tables disabled: 228 229 % cumulative self self total 230 time seconds seconds calls us/call us/call name 231 28.69 0.57 0.57 refresh_2 232 13.17 0.84 0.26 4307863 0.06 0.06 mem_read 233 11.63 1.07 0.23 cpu_emulate 234 235Now, with memory tables enabled: 236 237 38.86 0.66 0.66 refresh_2 238 8.42 0.80 0.14 156380 0.91 0.91 spr_enum 239 6.76 0.91 0.11 483134 0.24 1.31 lcdc_trans 240 6.16 1.02 0.10 cpu_emulate 241 . 242 . 243 . 244 0.59 1.61 0.01 216497 0.05 0.05 mem_read 245 246As you can see, not only does mem_read take up (proportionally) 1/20 247as much time, since it is rarely called, but the main cpu loop in 248cpu_emulate also runs considerably faster with all the function call 249overhead and cache misses avoided. 250 251These tests were performed on K6-2/450 with the assembly cores 252enabled; your milage may vary. Regardless, however, I think it's clear 253that using the address mapping tables is quite a worthwhile 254optimization. 255 256 257 LCD RENDERING CORE DESIGN 258 259The LCD core presently used in gnuboy is very much a high-level one, 260performing the task of rasterizing scanlines as many independent steps 261rather than one big loop, as is often seen in other emulators and the 262original gnuboy LCD core. In some ways, this is a bit of a tradeoff -- 263there's a good deal of overhead in rebuilding the tile pattern cache 264for roms that change their tile patterns frequently, such as full 265motion video demos. Even still, I consider the method we're presently 266using far superior to generating the output display directly from the 267gameboy tiledata -- in the vast majority of roms, tiles are changed so 268infrequently that the overhead is irrelevant. Even if the tiles are 269changed rapidly, the only chance for overhead beyond what would be 270present in a monolithic rendering loop lies in (host cpu) cache misses 271and the possibility that we might (tile pattern) cache a tile that has 272changed but that will never actually be used, or that will only be 273used in one orientation (horizontally and vertically flipped versions 274of all tiles are cached as well). Such tile caching issues could be 275addressed in the long term if they cause a problem, but I don't see it 276hurting performance too significantly at the present. As for host cpu 277cache miss issues, I find that putting multiple data decoding and 278rendering steps together in a single loop harms performance much more 279significantly than building a 256k (pattern) cache table, on account 280of interfering with branch prediction, register allocation, and so on. 281 282Well, with those justifications given, let's proceed to the steps 283involved in rendering a scanline: 284 285updatepatpix() - updates tile pattern cache. 286 287tilebuf() - reads gb tile memory according to its complicated tile 288addressing system which can be changed via the LCDC register, and 289outputs nice linear arrays of the actual tile indices used in the 290background and window on the present line. 291 292Before continuing, let me explain the output format used by the 293following functions. There is a byte array scan.buf, accessible by 294macro as BUF, which is the output buffer for the line. The structure 295of this array is simple: it is composed of 6 bpp gameboy color 296numbers, where the bits 0-1 are the color number from the tile, bits 2972-4 are the (cgb or dmg) palette index, and bit 5 is 0 for background 298or window, 1 for sprite. 299 300What is the justification for using a strange format like this, rather 301than raw host color numbers for output? Well, believe it or not, it 302improves performance. It's already necessary to have the gameboy color 303numbers available for use in sprite priority. And, when running in 304mono gb mode, building this output data is VERY fast -- it's just a 305matter of doing 64 bit copies from the tile pattern cache to the 306output buffer. 307 308Furthermore, using a unified output format like this eliminates the 309need to have separate rendering functions for each host color depth or 310mode. We just call a one-line function to apply a palette to the 311output buffer as we copy it to the video display, and we're done. And, 312if you're not convinced about performance, just do some profiling. 313You'll see that the vast majority of the graphics time is spent in the 314one-line copy function (render_[124] depending on bytes per pixel), 315even when using the fast asm versions of those routines. That is to 316say, any overhead in the following functions is for all intents and 317purposes irrelevant to performance. With that said, here they are: 318 319bg_scan() - expands the background layer to the output buffer. 320 321wnd_scan() - expands the window layer. 322 323spr_scan() - expands the sprites. Note that this requires spr_enum() 324to have been called already to build a list of which sprites are 325visible on the current scanline and sort them by priority. 326 327It should be noted that the background and window functions also have 328color counterparts, which are considerably slower due to merging of 329palette data. At this point, they're staying down around 8% time 330according to the profiler, so I don't see a major need to rewrite them 331anytime soon. It should be considered, however, that a different 332intermediate format could be used for gbc, or that asm versions of 333these two routines could be written, in the long term. 334 335Finally, some notes on palettes. You may be wondering why the 6 bpp 336intermediate output can't be used directly on 256-color display 337targets. After all, that would give a huge performance boost. The 338problem, however, is that the gameboy palette can change midscreen, 339whereas none of the presently targetted host systems can handle such a 340thing, much less do it portably. For color roms, using our own 341internal color mappings in addition to the host system palette is 342essential. For details on how this is accomplished, read palette.c. 343 344Now, in the long term, it MAY be possible to use the 6 bpp color 345"almost" directly for mono roms. Note that I say almost. The idea is 346this. Using the color number as an index into a table is slow. It 347takes an extra read and causes various pipeline stalls depending on 348the host cpu architecture. But, since there are relatively few 349possible mono palettes, it may actually be possible to set up the host 350palette in a clever way so as to cover all the possibilities, then use 351some fancy arithmetic or bit-twiddling to convert without a lookup 352table -- and this could presumably be done 4 pixels at a time with 35332bit operations. This area remains to be explored, but if it works, 354it might end up being the last hurdle to getting realtime emulation 355working on very low-end systems like i486. 356 357 358 SOUND 359 360Rather than processing sound after every few instructions (and thus 361killing the cache coherency), we update sound in big chunks. Yet this 362in no way affects precise sound timing, because sound_mix is always 363called before reading or writing a sound register, and at the end of 364each frame. 365 366The main sound module interfaces with the system-specific code through 367one structure, pcm, and a few functions: rockboy_pcm_init, rockboy_pcm_close, and 368rockboy_pcm_submit. While the first two should be obvious, rockboy_pcm_submit needs 369some explaining. Whenever realtime sound output is operational, 370rockboy_pcm_submit is responsible for timing, and should not return until it 371has successfully processed all the data in its input buffer (pcm.buf). 372On *nix sound devices, this typically means just waiting for the write 373syscall to return, but on systems such as DOS where low level IO must 374be handled in the program, rockboy_pcm_submit needs to delay until the current 375position in the DMA buffer has advanced sufficiently to make space for 376the new samples, then copy them. 377 378For special sound output implementations like write-to-file or the 379dummy sound device, rockboy_pcm_submit should write the data immediately and 380return 0, indicating to the caller that other methods must be used for 381timing. On real sound devices that are presently functional, 382rockboy_pcm_submit should return 1, regardless of whether it buffered or 383actually wrote the sound data. 384 385And yes, for unices without OSS, we hope to add piped audio output 386soon. Perhaps Sun audio device and a few others as well. 387 388 389 OPTIMIZED ASSEMBLY CODE 390 391A lot can be said on this matter. Nothing has been said yet. 392 393 394 INTERACTIVE DEBUGGER 395 396Apologies, there is no interactive debugger in gnuboy at present. I'm 397still working out the design for it. In the long run, it should be 398integrated with the rc subsystem, kinda like a cross between gdb and 399Quake's ever-famous console. Whether it will require a terminal device 400or support the graphical display remains to be determined. 401 402In the mean time, you can use the debug trace code already 403implemented. Just "set trace 1" from your gnuboy.rc or the command 404line. Read debug.c for info on how to interpret the output, which is 405condensed as much as possible and not quite self-explanatory. 406 407 408 PORTING 409 410On all systems on which it is available, the gnu compiler should 411probably be used. Writing code specific to non-free compilers makes it 412impossible for free software users to actively contribute. On the 413other hand, compiler-specific code should always be kept to a minimum, 414to make porting to or from non-gnu compilers easier. 415 416Porting to new cpu architectures should not be necessary. Just make 417sure you unset IS_LITTLE_ENDIAN in the makefiles to enable the big 418endian default if the target system is big endian. If you do have 419problems building on certain cpus, however, let us know. Eventually, 420we will also want asm cpu and graphics code for popular host cpus, but 421this can wait, since the c code should be sufficiently fast on most 422platforms. 423 424The bulk of porting efforts will probably be spent on adding support 425for new operating systems, and on systems with multiple video (or 426sound, once that's implemented) architectures, new interfaces for 427those. In general, the operating system interface code goes in a 428directory under sys/ named for the os (e.g. sys/nix/ for *nix 429systems), and display interfaces likewise go in their respective 430directories under sys/ (e.g. sys/x11/ for the x window system 431interface). 432 433For guidelines in writing new system and display interface modules, i 434recommend reading the files in the sys/dos/, sys/svga/, and sys/nix/ 435directories. These are some of the simpler versions (aside from the 436tricky dos keyboard handling), as opposed to all the mess needed for 437x11 support. 438 439Also, please be aware that the existing system and display interface 440modules are somewhat primitive; they are designed to be as quick and 441sloppy as possible while still functioning properly. Eventually they 442will be greatly improved. 443 444Finally, remember your obligations under the GNU GPL. If you produce 445any binaries that are compiled strictly from the source you received, 446and you intend to release those, you *must* also release the exact 447sources you used to produce those binaries. This is not pseudo-free 448software like Snes9x where binaries usually appear before the latest 449source, and where the source only compiles on one or two platforms; 450this is true free software, and the source to all binaries always 451needs to be available at the same time or sooner than the 452corresponding binaries, if binaries are to be released at all. This of 453course applies to all releases, not just new ports, but from 454experience i find that ports people usually need the most reminding. 455 456 457 EPILOGUE 458 459That's it for now. More info will eventually follow. Happy hacking! 460 461 462 463 464 465 466 467 468 469 470 471 472