########
##################
###### ######
#####
##### #### #### ## ##### #### #### #### #### #### #####
##### ## ## #### ## ## ## ### ## #### ## ## ##
##### ######## ## ## ## ##### ## ## ## ## ##
##### ## ## ######## ## ## ## ### ## ## #### ## ##
##### #### #### #### #### ##### #### #### #### #### #### ######
##### ##
###### ###### Issue #19
################## May 29, 2000
######## (version html-cb-1.01)
Ask, and it shall be given.
C'mon, it's only, what, 9 months late?
Many of you, I am sure, have been wondering, "Is C=Hacking still alive? Has he lost interest?" The respective answers are yes, and no.
Although I have not lost interest in the 64, I have lost a lot of free time I once had, and I am now able to pursue a lot of other interests! So the total time allocated to the 64, and hence to C=Hacking, has decreased considerably. Work on this issue actually began last summer, around August or September. But work on jpx began about the same time, followed by work on Sirius, and I devoted my C64 time to them instead of C=Hacking. Then work intensified at work, and work began on a garage, and a plane, and... well, you get the idea. Poor issue #19 just got worked on in little dribbles every few weeks.
The main reason I share this sad tale is that, the way I see it, C=Hacking could use a little help, if it is to come out more frequently. If nobody volunteers it will still come out, but in exactly the way it does right now -- a little less frequently than it ought to. Some of the more time-consuming tasks are: finding articles, reviewing (actually refereeing) articles, and collecting the latest news and tips. Finding articles means finding people who are doing some nifty Commodore project, or talking someone into doing some nifty Commodore project. Refereeing an article means reading the article carefully, making sure everything is technically correct, making suggestions for improvement, and so on. And collecting news means being plugged into the system.
I have a few people I rely on for some of these things, but I could use more, and if you'd like to help out (especially finding new articles, or keeping up to date on the latest C64 news) please drop me an email.
With that out of the way, brother Judd would like to preach on a malaise that afflicts the C64 world and which has been getting worse: Not Finishing The Job. I just think about all the promising projects I've heard about over the last few years -- off the top of my head I remember a SCPU game, a SCPU monitor, several demos, multiple utilities, a VDC code library, several OSes... -- which were Almost Done. And where are they now? Presumably, still Almost Done. So if you have a project which is Almost Done, but has been sitting around for the last few months/years... please, please finish up that last 10% and release it.
We, the technical community, are a community. We draw strength from each other, we get ideas and motivation from each other, and we push each other to do great things. It's a big feedback loop, where activity stimulates more activity, and decreased activity begets yet less activity. I suppose C=Hacking serves as a prime example of this.
I'm not saying we're on the verge of a big programming renassaince, but I am concerned that we are drying up. Maybe if people finish up those programs lying around it will reverse the trend. (I mean, hey, doesn't this finally finished-up issue want to make you go out and do cool stuff?)
In other news, The Wave seems to be testing out wonderfully and is totally cool. In case you've been under a rock these past few months The Wave is an integrated TCP/IP suite for Wheels -- telnet, graphical web browser, PPP, the works. Lots of people have been beta-testing it for several months now and it is solid. Outstanding.
I was asked lo these many months ago to put in a plug for http://www.6502.org, which is run by Mike Naberezny (mnaberez@nyx.net). He is looking for comments, suggestions, and maybe even contributions, so drop him a line and tell him what you think.
The ever-resourceful Pasi Ojala has several new thingies on his web site. This is probably ancient history by now but it's in my "latest news" file, sooo...
Myke Carter (mykec@delphi.com) has developed a filter program that allows C=Hacking to be converted to geoWrite format. Thus, if you'd like a geoWrite version of C=Hacking, send him some email!
Finally, this is memorial day here in the States, and I'd just like to suggest folks take a little time to think about the purpose of this holiday and why we have it.
Okay then, enough with the jabber, and on to hacking excellence.
Editor, The Big Kahuna, The Car'a'carn..... Stephen L. Judd
C=Hacking logo by.......................... Mark Lawrence
Special thanks to the folks who have helped out with reviewing and such, and to the article authors for being patient!
Legal disclaimer:
About the authors:
Jolse Maginnis is a 20 year old programmer and web page designer, currently taking a break from CS studies. He first came into contact with the C64 at just five or six years of age, when his parents brought home their "work" computer. He started out playing games, then moved on to BASIC, and then on to ML. He always wanted to be a demo coder, and in 1994 met up with a coder at a user's group meeting, and has since worked on a variety of projects from NTSC fixing to writing demo pages and intros and even a music collection. JOS is taking up all his C64 time and he is otherwise playing/watching sports, out with his girlfriend, or at a movie or concert somewhere. He'd just like to say that "everyone MUST buy a SuperCPU, it's the way of the future" and that if he can afford one, anyone can!
Richard Cini is a 31 year old vice president of Congress Financial Corporation, and first became involved with Commodore 8-bits in 1981, when his parents bought him a VIC-20 as a birthday present. Mostly he used it for general BASIC programming, with some ML later on, for projects such as controlling the lawn sprinkler system, and for a text-to-speech synthesyzer. All his CBM stuff is packed up right now, along with his other "classic" computers, including a PDP11/34 and a KIM-1. In addition to collecting old computers Richard enjoys gardening, golf, and recently has gotten interested in robotics. As to the C= community, he feels that it is unique in being fiercely loyal without being evangelical, unlike some other communities, while being extremely creative in making the best use out of the 64.
Adrian Gonzalez is a 26 year old system/network administrator for an ISP serving Laredo, TX and Neuvo Laredo, Mexico. He and his brother convinced their parents to buy them a C64 in 1984, and whereas his brother moved on to PCs he stuck with the 64 and later bought an Amiga. He learned BASIC programming in sixth grade and wrote a few BASIC programs for the family business; since then Adrian has put several demos and utilities under his belt. In addition to fancy graphics and music, Adrian has an interest in copy protection schemes (and playing the occasional game, of course). When he's not coding, he's either playing basketball, playing piano, editing videos, or going out to movies/parties. You can visit his web page at http://starbase.globalpc.net/c64/main.html for more info.
For information on the mailing list, ftp and web sites, send some email to chacking-info@jbrain.com.
While http://www.ffd2.com/fridge/chacking is the main C=Hacking homepage, C=Hacking is available many other places including:
$FFC6
I actually have a little Jiffy that I 'discovered' recently. It's one of those things that is so obvious and simple that it took me several tries before I stumbled onto it. It also highlights a rather powerful feature of the lowly C64 kernal.
Not long ago, I was asked to write a slideshow program for jpz. Ideally, a slideshow program should be a "plug-in" for the regular viewer, which can load pictures from some list in a file. But I didn't see a decent way to do this, especially for jpz which has maybe 200 bytes free total. Then the thunderclap finally occured.
Everyone has used CMD4 to redirect a file to the printer. But just as the kernal can redirect output to different devices, it can redirect the input to be from different devices, using CHKIN. So all the slideshow program has to do is open a list of filenames, redirect input to that file, and execute the normal jpz. jpz just uses JSR CHRIN to get data -- normally that data comes from the keyboard, but with CHKIN it comes from the file instead, akin to "a.out < input" in unix. Since jpz doesn't close the file, calling jpz repetitively will keep reading from the input file.
The result is a simple and effective slideshow program, and a trick which ought to be useful in other situations. Here is the entire slideshow code, located at $02ae to be autobooting. The main loop is seven lines long:
*
* Simple slideshow -- slj 4/2000
*
org $02ae
name txt 'ssw.files'
start
lda #start-name
ldx #<name
ldy #>name
jsr $ffbd
lda #3
tay
ldx $ba
jsr $ffba
jsr $ffc0
ldx #<main ;Modify JPZ to jump to main instead
ldy #>main ;of exiting
lda $10fb ;Check if jpy or jpz is in memory
cmp #$4c
bne :jpy
stx $10fc
sty $10fd
beq main
:jpy stx $10ed
sty $10ee
main
ldx #3
jsr $ffc6
jsr $ffe4
lda $90 ;loop until EOF reached
and #$40
bne :done
jmp $1000 ;call jpz
:done
lda #3
jsr $ffc3
jsr $ffcc
jmp $a474
da start
da start
Commodore disk drives 1570/71 and 1581 implemented a new fast serial protocol to be used with the C128 computer. This synchronous serial protocol speeds up data transfer between the computer and the drive ten-fold. The amazing thing is that this kind of serial protocol was supposed to be used in VIC-20 and the 1540 drive until it was discovered that a hardware bug in the 6522 VIA (versatile interface adapter) chip prevented the use of the chip's synchronous serial interface.
The synchronous serial port would've allowed whole bytes to be sent in both directions without processor intervention with the maximum speed of one bit per two clock cycles. Without a bug-free synchronous serial port the transfer had to be slowed down considerably so that the receiver has a chance to detect all changes in the serial bus lines. This became the dead slow software-driven Commodore serial protocol.
The complex interface adapter (6526 CIA) chips used in Commodore 64 and later in Commodore 128 have bug-free synchronous serial interfaces: serial data and serial clock inputs/outputs. In input mode, each time a rising edge is detected in the serial clock pin (CNT), the state of the serial data (SP) is shifted into a register. When 8 bits are received the accumulated bits are moved into the serial data register and a bit is set in the interrupt status register to reflect this. If the corresponding interrupt is enabled, an interrupt is generated.
In output mode the serial clock line is controlled by Timer A. The serial clock is derived from the timer underflow pulses. When a byte is written to the serial data register, the value is clocked out through the serial data pin (SP) and the corresponding clock signal appears on the serial clock pin (CNT). After all 8 bits are sent, the serial interrupt bit is set in the interrupt status register.
Synchronous serial bus is used in C128/157x/1581 fast serial protocol. An obsolete signal in the peripheral serial bus (SRQ) was taken into service as the new fast (synchronous) serial clock line. The old serial data line doubles as slow and fast serial data line. And the old serial clock line doubles as slow serial clock line and fast serial (byte) acknowledge line.
The fast serial protocol is basically very simple. The side sending data configures its synchronous serial port into output mode, the other side uses input mode. The old peripheral serial bus clock line is controlled by the receiving side and is used as an acknowledge: when the receiver is ready for data, it toggles the state of the clock line. The actual data is transferred using the synchronous serial ports. The sender writes the data to be sent into the serial data register and waits for the transfer to complete. The receiver waits for a byte to arrive into its serial data register. The actual transfer is automatically handled by the hardware.
Both the drive and the computer must detect whether the other side can handle fast serial transfers. This is accomplished by sending a byte using the synchronous serial port while doing handshaking. The drive sends a fast serial byte when the computer sends a secondary address (SECOND, which is called by e.g. CHKOUT), the computer can in practice send the fast serial byte anytime after the drive is reset and before the drive would send fast serial bytes.
To use burst fastloader with C64 we need to connect the CIA synchronous serial port to the synchronous serial lines of the Commodore peripheral serial bus. Two wires are needed: one to connect the serial bus data line to the syncronous serial port data line and one to connect the serial bus SRQ (the obsolete line for service request, now fast serial clock) to the synchronous serial port clock line. Select the right connections depending on whether you want to use CIA1 or CIA2.
1570/1,1581 C64
Pin1 SRQ Fast serial bus clk CNT1/2 User port 4/6
Pin5 DATA Data - slow&fast bus SP1/2 User port 5/7
Top view - old c64, CIA1
User port Cass port Serial connector
|||||||||||| |||||| HHHHH behind:
|||||||||||| |||||| .-1 3 5-.
||______________________| 2 4 | / \
| CNT1 6 | // \\
|_______________________________| |||||
SP1 1 264 5
Top view - old c64, CIA2
User port Cass port Serial connector
|||||||||||| |||||| HHHHH behind:
|||||||||||| |||||| .-1 3 5-.
||________________________| 2 4 | / \
| CNT2 6 | // \\
|_________________________________| |||||
SP2 1 264 5
Solder the wires either to the resistor pack or directly to the user port connector, but remember to leave the outer half of the connector free so that you can still plug in your user port devices.
Then solder the other ends to the serial connector. Those left- and rightmost pins are 1 and 5, respectively, so it is fairly easy to do the soldering. You can also build a cable which connects those lines externally.
Of course the C64 only uses the standard slow serial routines and we need a seperate fastloader routine to take advantage of the fast serial connection we just soldered into our machine. The following load routine is located in the unused area $2a7-$2ff and in the cassette buffer $334-$3ff. Just load and run the "burster" program. It installs the loader and replaces the default load routine by our routine. The old load routine is used if:
So, it is possible to use the old load routine by prepending a colon (':') to the filename. This is needed if you need to use both fast and slow serial devices at the same time. Unfortunately detecting fast-serial-capable devices is not feasible, because a lot of ROM code would have to be duplicated and then the loader would become too large. Because of this it becomes the responsibility of the user to prepend the colon (':') if a slow serial device is accessed.
A fastloader version is available for both CIA1 (asm, exe) and CIA2 (asm, exe) versions, uuencoded versions are attached to this article. Only the CIA1 version is discussed here.
; DASM V2.12.04 source
;
; Burst loader routine, minimal version to allow loading of programs upto 63k
; in length ($400-$ffff). Directory is loaded with the normal load routine.
;
; (c)1987-98 Pasi Ojala, Use where you want, but please give me some credit
;
; This program needs SRQ to be connected to CNT1 and DATA to SP1 (CIA1).
; Cassette drive won't work with those wires connected if the disk drive
; is turned on. (SRQ is connected to cassette read line.)
;
; SRQ = Bidirectional fast clock line for fast serial bus
; DATA= Slow/Fast serial data (software clocked in slow mode)
;
; In C128D (64-mode) you should use CIA2, because it has special hardware
; which inhibits the use of CIA1 (or so I'm told).
;
; A short description of the burst protocol and commands can be found
; from the "1581 Disk Drive User's Guide".
processor 6502
ORG $0801
DC.B $b,8,$ef,0 ; '239 SYS2061'
DC.B $9e,$32,$30,$36,$31
DC.B 0,0,0
install:
; copy first block to $2a7..$2ff
ldx #block1_end-block1-1 ; Max $58
0$ lda block1,x
sta _block1,x
dex
bpl 0$
; copy second block to $334..$3ff
ldx #block2_end-block2 ; Max $cc
1$ lda block2-1,x
sta _block2-1,x
dex
bne 1$
lda $0330 ; load vector
ldx $0331
cmp #MyLoad
beq 3$
2$ sta OldVrfy+1 ; chain the old load vector
stx OldVrfy+2
lda #MyLoad
sta $0331
3$ rts
block1
#rorg $02a7
_block1
OldLoad lda #0
OldVrfy jmp $f4a5 ; The 'normal' load.
MyLoad: ;sta $93
cmp #0 ; Is it a prg-load-operation ?
bne OldVrfy ; If not, use the normal routine
stx $ae ; Store the load address
sty $af
tay ; ldy #0
lda ($bb),y ; Get the first char from filename
ldy $af
cmp #$24 ; Do we want a directory ($) ?
beq OldLoad ; Use the old routine if directory
cmp #58 ; ':'
beq OldLoad
; Activate Burst, the drive then knows we can handle it
sei ; We are polling the serial reg. intr. bit
ldy #1 ; Set the clock rate to the fastest possible
sty $dc04
dey ; = ldy #0
sty $dc05
lda #$c1
sta $dc0e ; Start TimerA, Serial Out, TOD 50Hz
bit $dc0d ; Clear interrupt register
lda #8 ; Data to be sent, and interrupt mask
sta $dc0c ; (actually we just wake up the other end,
0$ bit $dc0d ; so that it believes that we can do
; burst transfers, data can be anything)
beq 0$ ; Then we poll the serial (data sent)
; Clears the interrupt status
; This program assumes you don't try to use it on a 1541
; If you try anyway, your machine will probably lock up..
lda #$25 ; Set the normal (PAL) frequence to TimerA
sta $dc04 ; Change if you want to preserve NTSC-rate
lda #$40
sta $dc05
lda #$81
jmp LoadFile
GetByte lda #8 ; Interrupt mask for Serial Port
0$ bit $dc0d ; Wait for a byte
beq 0$ ; (Serial port int. bit changes, hopefully)
;ldy $dc0c ; Get the byte from Serial Port Register
ToggleClk:
lda $dd00 ; Toggle the old serial clock (=send Ack)
eor #$10 ; so that the disk drive will start
sta $dd00 ; sending the next byte immediately
;tya ; return the value in Accumulator, update flags
lda $dc0c ; Get the byte from Serial Port Register
rts
#rend
block1_end
block2
#rorg $0334
_block2
LoadFile:
sta $dc0e ; Start TimerA, Serial IN, TOD 50Hz (PAL)
;cli
jsr $f5af ; searching for ..
lda $b7 ; Preserve the filename length
pha
lda $b9 ; Do the same with secondary address
sta $a5 ; We store it to cassette sync countdown..
; No cassette routines are used anyway, as
lda #0 ; this prg is in cassette buffer..
sta $b7 ; No filename for command channel
lda #15
sta $b9 ; Secondary address 15 == command channel
lda #239
sta $b8 ; Logical file number (15 might be in use?)
jsr $ffc0 ; OPEN
sta ErrNo+1
pla
sta $b7 ; Restore filename length
bcs ErrNo ; "device not present",
; "too many open files" or "file already open"
; Send Burst command for Fastload
ldx #239
jsr $ffc9 ; CHKOUT Set command channel as output
sta ErrNo+1
bcs NoDev ; "device not present" or other errors
; Bummer, the interrupt status register bit indicating fast serial
; will be cleared when we get here..
ldy #3
3$ lda BCMD-1,y ; Burst Fastload command
jsr $ffd2
dey
bne 3$
; ldy #0
1$ lda ($bb),y
jsr $ffd2 ; Send the filename byte by byte
iny
cpy $b7 ; Length of filename
bne 1$
jsr $ffcc ; Clear channels
sei
jsr $ee85 ; Set serial clock on == clk line low
bit $dc0d ; Clear intr. register
jsr ToggleClk ; Toggle clk
jsr HandleStat ; Get Initial status
pha ; Store the Status
;jsr $f5d2 ; loading/verifying
; (uses CHROUT, which does CLI, so we can't use it)
; We could add a check here..
; if we don't have at least two bytes, we cannot read load address..
; It seems that for files shorter than 252 bytes the 1581 does not count
; the loading address into the block size.
jsr GetByte ; Get the load address (low) - We assume
; that every file is at least 2 bytes long
tax
jsr GetByte ; Get the load address (high)
tay ; already in Y
lda $a5 ; The secondary address - do we use load
; address in the file or the one given to
bne Our ; us by the caller ?
stx $ae ; We use file's load addr. -> store it.
sty $af
Our ldx #252 ; We have 252 bytes left in this block
pla ; Restore the Status
bne Last ; If not OK, it has to be bytes left
Loop jsr GetAndStore ; Get X bytes and save them
jsr HandleStat ; Handle status byte
beq Loop ; If all was OK, loop..
Last tax ; Otherwise it is bytes left. Do the last..
jsr GetAndStore ; Get X number of bytes and save them
jsr $ee85 ; Serial clock on (the normal value)
lda #239
jsr $ffc3 ; Close the command channel
clc ; carry clear -> no error indicator
bcc End
FileNotFound:
pla ; Pop the return address
pla
jsr $ee85 ; Serial clock on (the normal value)
lda #4 ; File not found
sta ErrNo+1
NoDev lda #239
jsr $ffc3 ; Close the command channel
ErrNo lda #5 ; Device not present
sec ; carry set -> error indicator
End ldx $ae ; Loader returns the end address,
ldy $af ; so get it into regs..
cli
rts ; Return from the loader
HandleStat:
jsr GetByte ; Get a byte (and toggle clk to start the
; transfer for next byte)
cmp #$1f ; EOI ?
bne 0$
jmp GetByte ; Get the number of bytes to follow and RTS
0$ cmp #2 ; File Not Found ?
bcs FileNotFound ; file not found or read error
; code 0 or 1 -> OK
ldx #254 ; So, the whole block is coming
lda #0 ; No error -> Z set
rts
GetAndStore:
jsr GetByte ; Get a byte & toggle clk
;sta $d020
ldy #$34
sty 1 ; ROMs/IO off (hopefully no NMI:s occur..)
ldy #0
sta ($ae),y ; Store the byte
ldy #$37
sty 1 ; Restore ROMs/IO (Should preserve the
; state, but here it doesn't..)
inc $ae ; Increase the address
bne 0$
inc $af
0$ dex ; X= number of bytes to receive
bne GetAndStore
rts
BCMD: dc.b $1f, $30, $55 ; 'U0',$1F == Burst Fastload command
; If $9F, Doesn't have to be a prg-file
#rend
block2_end
Now that was it. Now I just hold back and wait until someone implements this for VIC-20's buggy 6522 chips so that I don't have to.. :-)
begin 644 burster-cia1 M`0@+".\`GC(P-C$```"B5[U"")VG`LH0]Z+'O9D(G3,#RM#WK3`#KC$#R:S0[ M!.`"\!"-J@*.JP*IK(TP`ZD"C3$#8*D`3*7TR0#0^8:NA*^HL;NDK\DD\.K)Y M.O#F>*`!C`3<B(P%W*G!C0[<+`W<J0B-#-PL#=SP^ZDEC03<J4"-!=RI@4PTB M`ZD(+`W<\/NM`-U)$(T`W:T,W&"-#MP@K_6EMTBEN86EJ0"%MZD/A;FI[X6X- M(,#_C<0#:(6WL&NB[R#)_XW$`[!<H`.Y]P,@TO^(T/>QNR#2_\C$M]#V(,S_R M>""%[BP-W"#S`B#,`T@@[`*J(.P"J*6ET`2&KH2OHOQHT`@@WP,@S`/P^*H@@ MWP,@A>ZI[R##_QB0$FAH((7NJ02-Q`.I[R##_ZD%.*:NI*]88"#L`LD?T`-,A G[`+)`K#:HOZI`&`@[`*@-(0!H`"1KJ`WA`'FKM`"YJ_*T.A@'S!5/ `` end size 354 begin 644 burster-cia2 M`0@+".\`GC(P-C$```"B2[U"")VG`LH0]Z+)O8T(G3,#RM#WK3`#KC$#R:S0E M!.`"\!"-J@*.JP*IK(TP`ZD"C3$#8*D`3*7TR0#0^8:NA*^HL;NDK\DD\.K)Y M.O#F>*`!C`3=B(P%W:G!C0[=+`W=J0B-#-TL#=WP^TPT`ZD(+`W=\/NM`-U)T M$(T`W:T,W6"I@(T.W2"O]:6W2*6YA:6I`(6WJ0^%N:GOA;@@P/^-Q@-HA;>PZ M:Z+O(,G_C<8#L%R@`[GY`R#2_XC0][&[(-+_R,2WT/8@S/]X((7N+`W=(.<"9 M(,X#2"#@`JH@X`*HI:70!(:NA*^B_&C0""#A`R#.`_#XJB#A`R"%[JGO(,/_* M&)`2:&@@A>ZI!(W&`ZGO(,/_J04XIJZDKUA@(.`"R1_0`TS@`LD"L-JB_JD`H =8"#@`J`TA`&@`)&NH#>$`>:NT`+FK\K0Z&`?,%4"Y `` end size 344
by Ken Ross
petlibrary@bigfoot.com
http://members.tripod.com/~petlibrary
A recent query had me digging out an old item dealing with the user port on the CBM/PETs. The main use I've put it to in the past has been to drive a parallel printer with just the addition of a home brew cable (a Panasonic Daisy Wheel printer salvaged before bin men got it!). The user port is the edge connection tween the IEEE edge and the cassette#1. The top side is mostly diagnostic, the underside is the easy to use area. It's an I/O (Input/ Output) system that you can control with a few PEEKs and POKEs. Reading from left to right (as you look at the back of the beastie):
A _ ground B _ input to 6522 VIA, CA1 C D E F G H J K L _ are I/O lines ( 8 of them ) , PA0-7 [ data lines ] M _ CB2 line from VIA can be I/O N _ ground
A text file to be printed out can be read a character at a time with MID$(etc) for this PRG to deal with and quite high speeds can be reached even without having to compile it.
(This is actually a section of listing just printed out from my 8096 - hence untidy numbers.)
3010 POKE 59459, 255:REM make PA0-7 into outputs
3020 POKE 59467,PEEK(59467) AND 277 :REM disable shift register
3022 RETURN :REM finished with this sub
[this enables the user port for this purpose]
3023 REM this sub puts the data into output
3024 if DATA <32 then goto 3080 :REM line does biz for LF & CR
3026 if DATA =>65 and DATA<= 90 then DATA=DATA +32 : goto 3029
[petscii lower case is chr$(65-90) but ascii uses 97-122]
3027 if DATA =>193 and DATA<= 218 then DATA=DATA -128 :goto 3029
[petscii upper case is chr$(193-218) which has to be shifted to
ascii 65-90]
[ascii uses up to 127 but petscii uses up to 255 for chars]
3029 REM line below sets strobe low to inform printer new data character on
way
3031 POKE 59468, PEEK(59468) AND 31 OR 192
3035 REM below sets strobe high as data arrives
3045 POKE 59468,PEEK(59468) AND 31 OR 224
3050 POKE 59471, DATA:REM at last data is POKE'd !!!
[the data numbers from above]
3060 POKE 59468,PEEK(59468) AND 31 OR 224 :REM strobe high still
3065 REM handshake sub
3066 POKE 59467, PEEK(59467) OR 1
3067 WAIT 59469,2
3068 K=PEEK(59457)
3069 REM end of handshake sub
[well it works for me!!]
3070 RETURN :REM back to main area for next data
3080 REM bit for LF & CR sub & return
[this depends on the printer and the same procedure for paper eject
if needed]
The cable connections are CBM - CENTRONICS CB2 - DATA STROBE #1 PA0~7 - DATA1-8 #2-9 CA1 - ACKNOWLEDGE #10 ( or BUSY #11 depending on printer ! ) GND - grounds #14, 16, 24, 33, chassis gnd 17
More modern printers will also need additional commands to enable things. The commands needed for Epson printers (with the exception list of Epsons that don't use them !) are on my website at:
http://members.tripod.com/~petlibrary/printesc.htm
If any more info turns up it'll be there in time.
By Jolse Maginnis
Some readers may have read my article in GO64 issue 8/1999, which was a bit of an introduction to JOS and some Operating System concepts, but it wasn't very technical, and didn't really get into the nitty gritty. Getting down and dirty with the bits and bytes is what C-Hacking is all about, so that's what this series of articles will try to do wherever possible.
I'll try to go into detail about modern OS designs, paying particular detail to what is relevant to the C64/SuperCPU and what we can do without. I'll also try and make comparisons to the kind of coding most of us are used to, e.g. just using the kernel to access hardware, or just skipping the kernel altogether. Most of the article will be in reference to the SuperCPU, specifically it's 65816 CPU, and the OS I'm making for it, called JOS. If you haven't got a SuperCPU yet, hopefully you'll want one by the end! (Remember it won't stop you running stock programs!)
When I first heard about the SuperCPU, I got pretty excited. "20Mhz! That's 20 times faster! 16Mbs! That's 256 times more RAM! I can only imagine what it's capable of!", well I didn't actually say those things, but I at least thought them! At the time I had already started making an OS for the C64, and at the time I didn't know much at all about making an OS, all I knew about was multitasking, and how to do it on C64. After that day, I decided I'd wait until I managed to get myself a SuperCPU and make an OS on that, and to my surprise, at that time, there didn't seem to be anyone else developing an OS for the SuperCPU.
Only when the SCPU arrived and I had started coding for it, did I realise how powerful it was. Yeah it's 20 times faster in clock speed, but it's also a 16 bit processor, which might not seem like a great step up, but once you start coding in 16 bits, it's hard to see how you did without it!
The 65816 has some great advantages over the 6502:
The top three things in particular, together with the 16 bit wide registers means it's very suited to programming in a high level language like C, particularly when compared to code that has to be produced for 6502. Higher level languages can actually use the real CPU stack rather than having to simulate it, as with 6502. Also by moving the Direct Page register, local variables can be accessed like zero page variables, so performance isn't hurt too much.
All this would be good even at a lower speed like 1 or 2Mhzs, but it's at 20! The SuperCPU adds some real power to your old C64, but it's all hidden away because we're running a ~20 year old "OS". It's just crying out for a new one!
The C64 has many limitations, most of which are provided by the kernel and the CBM serial bus. Here's a list of the main limits:
Single Tasking - Running two seperate programs at the same time impossible.
Some devices aren't catered for - Some devices don't have a chance at running with old programs that were designed before their time.
Old sequential filesystem - It's not designed for random access files, although random access is possible, it's just slower. All C64 programs have to written, so that files are read from the beginning to the end, which is a little bit limiting. Also it's the drives that dictate the filesystem, so we aren't just stuck with the kernel's limits, we're stuck with the drives' as well. Having several files open on many drives, while reading and writing to all of them just isn't a possibility. Why would you want to do that? If you we're multitasking several programs, that's just might be what happens!
It became pretty clear that the C64's kernel was of no use to JOS, since it had too many limitations. So everything had to be re-written from scratch, with the limits removed.
Along with re-doing the filesystem and adding multitasking, I had some other plans for JOS:
Networking - Everything is internet, internet, internet these days, and why not, the internet is great! So TCP/IP and SLIP/PPP were high on the list of TODO's.
GUI - The SuperCPU is ideal for a nice, flexible, easy to program GUI.
Console - I wanted the console to be as close as possible to one of the standard terminals (vt100,ansi etc..) thus making it easy to get by without needing a terminal emulation program.
Shared libraries & shared code, relocatable binary format - Sharing as much code as possible really saves memory and loading time. The binary format means that you don't have to worry about where in memory your program will be.
Modular and scalable - It's nice to be able to choose exactly what your OS needs, rather than getting lumped with it all. E.g. Do you really need tcp/ip loaded if your not going to use the internet? If i'm running a webserver, do I really need the console driver loaded?
Device independence - Application should not have to worry at all about what devices they are using, which means that they'll be compatible any device including new ones. This is particularly useful when it comes to disk drives and filesystems.
Porting and writing C programs - Wouldn't it be great if our C64's could take advantage of the Open Source movement that's sweeping the world, and compile some of these open source programs?
OK, so why am I bothering? At first I just wanted to see what I could do with it, but now that it's come so far, it's not only of interest to me, as it's become a very powerful OS.
Unless you've been living on a remote desert island for the last 5 years, you'll know about the terrible trend in personal computing these days; buy a new PC now and in 6 months or less it's outdated. As CBM users, we successfully avoid all this. Sure, CMD have tonnes of upgrades available, but they're all "once in a lifetime" upgrades, I'm pretty sure I wont be upgrading my SuperCPU!
Have you ever thought about why PC's become outdated so quickly? It's very popular to blame Microsoft (and I will!), since they are the main proponent of bloat with their ever expanding OSes and applications, but it's just generally accepted now that it's ok to leave things unoptimized, and just add more and more "layers". I run Linux on my 486 PC, with 10mb of RAM, and it's unbelievable how much time is spent "chunking" or "thrashing", due to programs and their components taking up so much RAM. For me, it's all about layers. It's what separates C64's from the bloated world of the PC. Here's my comparisons...
CPU Type -------- PC - 32 bit processors C64/SuperCPU - 8/16 bit Processor
This is quite arguable, but when most of your code doesn't deal with numbers over 32768, 32 bit's can be a bit wasteful, but of course if you need to do 32 bit arithmetic on an 8 or 16 bit processor, that too is wasteful. For me a 16 bit processor is the ideal size, particularly after doing lots of 8 bit coding.
Language used ------------- PC - Mainly C, C++ C64/SuperCPU - Just about everything in Assembler
C can be a thin layer or a thick layer, depending on the processor. On 6502 it's quite a thick layer, which is why most things for C64 were written in ASM. On 65816, that layer isn't so thick, so it's a much more viable alternative. Although, when you write in a higher level language, you tend to forget about the actual code it produces, and don't bother optimizing it. C++ adds another layer onto C, not only because of the code it produces, but the style of program. Good object oriented programming practice adds extra bloat, because there is more emphasis on doing function calls, to do things that ordinarily are done by directly accessing the data. The real bloat of Object Orientation isn't actually the code that you write yourself, you can still write optimized code in an OO language, but the bloat is in the libraries of objects that you use when writing your application, take a look at JAVA's huge object libraries for example.
OS type ------- PC - Multitasking OS C64/SuperCPU - Kernel, or no OS at all.
A multitasking OS adds some layers by default, since it has to switch between processes. The OS isn't just the task switcher however, it's everything that's needed to run applications, such as device drivers and shared libraries. In my opinion, absolutely none or as little as possible of the OS should be written in a high level language, since it's going to be used by every application, and you want frequently used things to be as optimized as possible. Most definitely the most useful task an OS can provide is doing all the Disk I/O. Unfortunately for us, the C64's kernel and CBM's serial bus are no where near fast enough, so coders made their own DOS routines.
User Interface -------------- PC - Windows, X Windows C64 - BASIC, GEOS
Windows and X are the most popular GUI's going around. X doesn't impose any standards on applications, they are free to use whatever widget toolkits they want, and usually do! When you have a few different applications running, each with it's own GUI toolkit, you soon run out of memory, particularly if they're big bloated C++ toolkits. Windows isn't quite the same, you at least have a consistent look and feel, which also adds up to less memory wastage because most apps use the same code. GEOS is nice looking but isn't very flexible at all, but this does mean that it's a very thin layer. My hope is to achieve a balance between the two.
So why'd I bother with all that? Well I just want to hilight that JOS will be taking all those things into account, and I want to minimize the amount and size of layers being added to our beloved C64's.
There are two main styles of OSes doing the rounds at the moment, both with their own good and bad points.
These, as the name suggests, are one large monolith of code, which usually contain driver code for all devices. You would definitely consider the C64's kernel as a monolithic kernel. Multitasking kernels sometimes allow modularization, which is basically very similar to what a microkernel does, by allowing parts of the OS to be dynamically loaded. Linux is a very popular example of this. It's a monolithic kernel which allows kernel modules to be loaded dynamically. Last time I checked Lunix Next Generation worked along these lines.
e.g.
lda #'a' jsr $ffd2
Prints 'a' character to the current file/device.
Microkernels truly are micro in size, if they're done correctly. Rather than lump all the device driver and API code in together, Microkernels only provide very simple services for setting up processes and allowing them to communicate with each other. All the device drivers and file-systems are then supplied by optional programs that are loaded dynamically at run time. This allows maximum scalability, as you simply don't have to load parts of the OS that you don't need. The best example other than JOS would be QNX (http://www.qnx.com), a UNIX based Microkernel OS, which is extremely scalable and very small in code size. On 6502/C64, OS/A65 is another Microkernel OS.
Microkernel OSes rely heavily on fast Inter Process Communication (IPC). Luckily this is quite easy to achieve on 65816, and is basically a matter of passing pointers between processes.
To do this in JOS it involves setting up a message somewhere in memory and then calling the S_send system call, to send to the server process. Usually the message will be put on the stack and then popped off when returned, much like a C function call.
e.g. to open the file "hello.txt" for reading
pea O_READ ; flags
pea ^hellostr ; high byte
pea !hellostr ; low word
pea IO_OPEN ; Message code
tsc
inc
tax ; Low word of Message = Stack+1
ldy #0 ; Stack is in Bank 0
lda #Channel ; Channel where "hello.txt" is.
jsr @S_send
tsc
clc
adc #8
tcs
hellostr .asc "hello.txt",0
note: These are 65816 instructions, so if you don't know what they do you better look them up! The '@' symbol is used to force long addressing, '^' is used for the high 8 bits of a 24bit address, and '!' is used as the bottom 16 bits. Note that pea is a 16-bit instruction, so pea ^hellostr will add an extra 00 byte.
The first 4 pea's prepare an 8 byte filesystem message, containing:
Message code for an Open: IO_OPEN 24 bit Pointer to Filename: hellostr Open flags for reading: O_READ
This message is passed to the filesystem using one of JOS's Inter Process Communication (IPC) system calls, S_send. This call takes the 24 bit address of the message in X/Y, and the IPC channel for which to send the message to, in the A register. Every system call in JOS assumes 16 bit A/X/Y registers, as there really isn't anything to be gained by switching to 8 bits for things that only need 8 bits. Adding 8 to the stack pointer at the end "pops" the message back off the stack.
This all looks a bit complicated doesn't it? Which is where shared libraries help out. The standard C library for JOS allows you to do I/O and such without actually worrying about the system calls. Yes it is a "layer", but it's a very thin one, since the library is written in ASM.
pea O_READ ; same as the c code: open("hello.txt",O_READ);
pea ^hellostr
pea !hellostr
jsr @_open
pla
pla
pla
Much simpler right?
Compare that with the C64 kernel equivalent of:
lda #namelen
ldx #<hellostr
ldy #>hellostr
jsr $ffbd ; SETNAM
lda #1
ldx #8
ldy #1
jsr $ffba ; SETLFS
jsr $ffc0 ; OPEN
Notice that the JOS version doesn't worry about device numbers or anything.. I'll get to that later...
Before I get into juicy OS details, I should explain about C and the standard C library, as I'll be mentioning it quite a bit.
C is a very powerful language that was created by the same people who created UNIX, so the two really go hand in hand. The majority of applications written for UNIX type OSes are written in C; in fact, rather than give you executable files, they are normally distrubuted as C source code, that you have to compile yourself. Why is it used so much? Well if the only high level language you've seen is BASIC, then you'd wonder how any high level language could be used for good quality programs. C is different because it's just about as close as you can get to programming in assembly without actually doing it, particularly on newer processors. It isn't quite so pretty on 6502, but it's quite good on the 65816.
In BASIC you're used to having "built in" commands that will print to the screen, and commands for opening files and reading input, and any other I/O you can think of. But C on the other hand, has nothing "built in", it doesn't even have much of a notion of strings! Strings are just pointers to null terminated arrays of characters in C. So how do you actually get C to do anything useful? I.e. do some I/O?
This is where the C standard library comes in. This library contains functions that deal with the underlying OS, and in particular opening/closing & reading/writing files. It also has code for dealing with strings, allocating memory, reading directories and various other useful functions. The standard library also contains more UNIX orientated functions, for dealing with OS features such as IPC and process control (more on processes later).
JOS implements a large section of the standard C library, in particular the section that most command line applications will use. It does implement some of the UNIX specific functions, but not in a compatible way, and programs that use these functions are likely to be system applications that aren't useful for any other system anyway.
Although it's called the standard 'C' library, that doesn't mean it can't be used in assembly language, in fact it's quite a bit easier to call the C functions than to deal directly with the OS, and there is no speed penalty in using the C library because it's been hand coded in assembly language anyway.
Would you like to see what it's like to code using the standard C library? I've been talking about functions, and if you're familiar with C64 BASIC's functions, it's quite similar to that, except that you can pass more than one value to the function. It's basically the same as writing subroutines in assembly, where we usually pass values using the A,X & Y registers or a ZP value etc.. The only difference is that ALL values are passed using the CPU stack, which is easily accesible with the 65816. Ok let's take a look at the previous open file example:
C code:
file = open("hello.txt",O_READ);
65816 assembly (16 bit regs):
pea O_READ
pea ^hellostr
pea !hellostr
jsr @_open ; C functions get "_" prepended to their names
pla ; so you don't get them mixed with assembly ones
pla
pla
stx file ; store the result in file
sty file+2
Notice that the values are placed onto the stack in reverse order, so they come out in the correct order when the function accessing them. They are also long jsr's because they aren't likely to be in the same bank as the calling program.
You might think that having to pop the values back off the stack is cumbersome, and you're right. Why can't _open pop them off? Well it could, it'd need to do some messing around with the stack at the end but it'd make things look nicer. The reason it can't is because C functions don't always know how much data will be on the stack, so they might pop the wrong amount off. It may look ugly, but you get used to it.
Now I'll give you a bigger example of what C code looks like after it's been compiled to prove that the 65816 is capable of producing half decent code. This will probably only make sense if you've done C programming before, so if you're not interested in this kind of thing skip this section.
Here's a minimal version of the standard unix util 'cat', which concatenates files together and sends them to the screen or whatever the stdout file is, as it can be redirected in UNIX.
#include <stdio.h>
int main(int argc, char *argv[]) {
FILE *fp;
int ch=0;
int upto=1;
if (argc<2) {
fprintf(stderr,"Usage: cat FILE ...\n");
exit(1);
}
argc--;
while(argc--) {
fp = fopen(argv[upto++],"r");
if (!fp) {
perror("cat");
exit(1);
}
while((ch = fgetc(fp)) != EOF)
if (putchar(ch) == EOF) {
perror("cat");
exit(1);
}
fclose(fp);
}
}
and here's the (unoptimized) compiled version:
#define _AS sep #$20:.as
#define _AL rep #$20:.al
#define _XS sep #$10:.xs
#define _XL rep #$10:.xl
#define _AXL rep #$30:.al:.xl
#define _AXS sep #$30:.as:.xs
.xl ; make sure it's 16 bit code
.al
.(
mreg = 1
mreg2 = 5
.text
+_main
-_main:
.(
RZ = 8 ; RZ = register size: Two psuedo 32 bit registers
LZ = 26 ; LZ = Local size: size of the local variables for this
; function
phd
tsc /* make space for local variables */
sec
sbc #LZ
tcs
tcd /* set up the DP register as the frame pointer */
stz RZ+1 /* ch = 0; */
lda #1 /* upto = 1; */
sta RZ+7
lda LZ+6 /* if (argc < 2) NOTE: could be just */
.( /* cmp #2 : bpl L2 */
cmp #2 /* but the compiler doesn't know how far */
bmi skip /* away L2 is. */
brl L2
skip .)
pea ^L4 /* fprintf(stderr,"Usage: cat FILE ...\n"); */
pea !L4
pea ^___stderr
pea !___stderr
jsr @_fprintf
tsc
clc
adc #8
tcs
pea 1 /* exit(1) */
jsr @_exit
pla
L2:
lda LZ+6 /* argc-- NOTE: dec LZ+6 would be better! */
dec
sta LZ+6
brl L6
L5:
pea ^L8 /* This rather large bit of code is all for */
pea !L8 /* fopen(argv[upto++],"r"); */
lda RZ+7 /* arrays don't translate so well! */
sta RZ+9
lda RZ+9
inc
sta RZ+7
ldx RZ+9
lda #0
.(
stx mreg2
ldy #2
beq skip
blah asl mreg2
rol
dey
bne blah
skip ldx mreg2
.)
clc
tay
txa
adc LZ+8
tax
tya
adc LZ+8+2
sta mreg2+2
stx mreg2
lda [mreg2]
tax
ldy #2
lda [mreg2],y
pha
phx
jsr @_fopen
tsc
clc
adc #8
tcs
stx RZ+11
sty RZ+11+2
ldx RZ+11 /* assign it to fp */
lda RZ+11+2
sta RZ+3+2
stx RZ+3
.( /* if (!fp)
lda RZ+3
cmp #!0
bne made
lda RZ+3+2
cmp #^0
beq skip
made brl L13
skip .)
pea ^L11 /* perror("cat"); */
pea !L11
jsr @_perror
pla
pla
pea 1 /* exit(1) */
jsr @_exit
pla
brl L13
L12:
pei (RZ+1) /* putchar(ch); */
jsr @_putchar
pla
stx RZ+15
lda RZ+15 /* if (putchar(ch) == EOF)
.(
cmp #-1
beq skip
brl L15
skip .)
pea ^L11 /* perror("cat"); */
pea !L11
jsr @_perror
pla
pla
pea 1 /* exit(1)
jsr @_exit
pla
L15:
L13:
pei (RZ+3+2) /* fgetc(fp); */
pei (RZ+3)
jsr @_fgetc
pla
pla
stx RZ+17 /* ch = fgetc(fp); */
lda RZ+17
sta RZ+1
lda RZ+17 /* while ((ch = fgetc(fp)) != EOF) */
.(
cmp #-1
beq skip
brl L12
skip .)
pei (RZ+3+2) /* fclose(fp); */
pei (RZ+3)
jsr @_fclose
pla
pla
L6:
lda LZ+6 /* while(argc--) */
sta RZ+9
lda RZ+9
dec
sta LZ+6
lda RZ+9
.(
cmp #0
beq skip
brl L5
skip .)
ldx #0 /* return from main() */
L1:
tsc
clc
adc #LZ
tcs
pld
rtl
.)
.text
-L11 .asc "cat",0
-L8 .asc "r",0
-L4 .asc "Usage: cat FILE ...",10,0
.)
As you can see, there's still quite a bit to be optimized as far as the compiler is concerned, but the code is still quite good.
Having a C compiler and a standard C library that contains the most used standard functions, is going a long way towards being able to port UNIX's and other similar environments' applications. So what i've done is create a 65816 backend for a free ANSI C compiler called LCC.
I'm no longer talking theory here either, since a little while ago I decided to give my standard C library and the compiler a test on portability, with some great results. I've managed to do extremely simple porting jobs on: Pasi's C versions of his gunzip and puzip, Andre Fachat's XA 6502/65816 cross compiler, Marco Baye's ACME cross assembler. All of which, besides ACME, so far seem to be working exactly how they should. There'd be thousands of open source programs that could easily be ported to JOS, many of which wouldn't be of much use to anyone, but still!
We've all had experience with multitasking so I won't bore you too much. For our purposes, it means being able to do several things at once.
But what actually is a "thing"? They're usually called "processes" or "tasks". I usually call them processes, so that's what I'll refer to them as.
There are two main types of multitasking, pre-emptive and co-operative. The latter is as you would expect, processes need to co-operate together in order to work, processes can't "do their own thing". Pre-emptive multitasking is the more flexible approach, because processes don't need to explicitly hand over the processor to another process, they just have it taken away from them if they use it for too long. So it was a pretty easy choice for which kind of multitasking JOS would have, pre-emptive of course!
You might think that the C64 already does multitasking because programs normally set up interrupt routines to go off during the processing of the program, so it can do more than one thing, but that's a very special case of what I'm referring to here. I'm referring to the ability to run seperate unrelated programs at the same time, like reading your email, and typing in a text editor. We'd all like to be able to do that wouldn't we? Particularly if we've got the processing power and RAM to do it, and the SuperCPU certainly does.
Each process "owns" resources. The resources I'm talking about are simply parts of the computer and OS like RAM, interrupts, kernel IPC objects, and some other things.
Along with the resources it owns, each process has a number of attributes. First of all it needs a unique identifier, so anything that wants to talk to it knows how to address it. In Unix-like systems, this is called a Process IDentification (PID). In JOS a PID is just a positive integer, simple.
Along with other processes being able to address it, the PID is used so that the OS can keep track of which resources the process actually owns, and when it exits (or is explicity killed) the OS can free up those things and let other processes use them.
Processes can start other processes, so everything except the first process keeps track of who its parent was in its Parent PID (PPID). You may wonder what use it is to keep track of the parent? It's always been used in UNIX to set up IPC, but it really isn't needed in JOS, apart from cosmetic purposes, since JOS has better IPC mechanisms. That's the first example of "Just because it's in UNIX doesn't mean it's needed", and there are plenty of others.
In JOS, a process can own multiple "threads" of execution. Threads are what most people's idea of what a process is: some code running.
Consider starting a C64 game, which has several different interrupt routines running concurrently. We certainly wouldn't consider each interrupt routine to be a seperate program, and that's generally the idea behind threads, except threads are at the mercy of the pre-emptive scheduler. Almost the same result can be achieved by creating multiple processes, but why go to the hassle of loading and executing two tightly related processes with 1 thread each, when you can do the same thing with 1 process that has 2 threads? A good example of this is JOS's very own web server, which creates new threads whenever a new connection has been established by a client.
Some new technologies are particularly keen on the use of threads, namely JAVA and the BeOS. A good example of using multiple threads is given by BeOS, which starts a seperate thread for every window displayed on the screen, so it can update its on-screen appearance and remain responsive to the user, while also doing other processing.
Unix programs have generally just started other processes if they wanted to do two of their own things at once. Threads are much cleaner and nicer. Threads themselves have their own attributes, such as priority (the higher the priority the more processor time it's likely to get), state (whether they are running or waiting for something), stack and zero page space, and some other things.
I know i've mentioned that JOS uses pre-emptive multitasking, but that doesn't mean that doing:
jmp *is a good idea! Programs should still try and co-operate.
A typical menu program on C64 using the kernel has a structure something like this:
If you were to run this program on a multitasking system, it would chew up a lot of processing time and slow everything else down. Polling for input on a multitasking system is generally a bad thing, but blocking and waiting for input is a good thing. So instead it would be best to do:
Now this is the correct way to do it, as it only uses up cpu time when it's actually received some input. But what happens if every process is waiting? What runs then? Well there is a special process that runs when no other processes are, it's called the Idle process, and does what it's name suggests, just sits there and idles. Here is the thread code that runs in my idle process:
nully jmp nully
For some reason I started calling it the Null process, and it's called that all throughout JOS...
I have introduced you to a couple of the main ideas behind multitasking, but wouldn't you like to know how it's done? Well here's how JOS does it..
For starters, since it's pre-emptive multitasking, JOS needs some way of interrupting the currently running process after it's consumed its alloted time. The C64 has 4 CIA timers capable of producing IRQ's and NMI's, and in JOS's case i've decided to let it use CIA 1 Timer A, which produces an IRQ. This of course means that a process could stop itself from being interrupted by doing an SEI, but if they behave well that won't happen!
Rather than set this timer to the amount of time before a process should be pre-empted (called a "timeslice"), I double up the use of TIMER A as the system counter, which is used for timing another kind of process resource: timers. Timers can either count upwards, or downwards and give off an alarm. They really need a higher precision than a timeslice, so they set the timer to 20 milliseconds (about 1 PAL screen). The timeslice is then calculated as 3 counts of this timer i.e. 60 milliseconds. Why don't I use TIMER B for the system timer? Well, because I want to leave as many resources open for application and device driver processes.
I mentioned that processes and threads each have their own attributes, these attributes are stored in Process Control Blocks (PCB's) and Thread Control Blocks (TCB's).
Every process has a PCB, and every process has at least one thread, which has it's own TCB. There is one process which is always loaded, and that's the Null process. Each process's PCB and TCB's are contained in everyone's favourite data structure, the circular (or double) linked lists. The Null PCB is always at the head of the PCB list, and PCB's will only ever be on this one list, since they are either alive (in the list), or dead (no PCB exists!).
Threads on the other hand can be in various states, but in particular they can be ready for the CPU, or waiting for something (blocked). When a thread is ready, it's just waiting for its turn at the CPU, and it goes on the Ready list, which is a queue. The Null thread is ALWAYS at the back of this queue, so it only gets to run if nothing else can. The ordering of this queue is up to a part of the kernel called the Scheduler.
Front Back
------------ ------------ --------------
-- | Thread A | ----- | Thread B | ----- | Null Thread | --
| ------------ ------------ -------------- |
-------------------------------------------------------------------
Some OSes have complex schedulers which take in many parameters, like priority and various CPU time measurements. On multi-user OSes like UNIX, this is important because it wants to be "fair" to all processes. But for our purposes and many other OSes, it's usually a whole lot simpler than that, it's just a simple matter of which ever process/thread has the highest priority can run. If two threads have the same priority, it normally comes down to "round robin" scheduling, where they just take it turns. JOS doesn't even implement priorities properly yet, because they actually don't make much difference to the normal processing, at the moment it's just a simple round robin scheduler that doesn't care about priorities.
What if a thread is blocked? It'll go onto a wait queue, and will return to the ready queue only when it's ready to run. At this stage of JOS, the only thing a thread will need to block for is IPC.
You may be wondering about the issue of relocatable code, as we all know the 6502 nor the 65816 is designed for running relocatable code. Sure, branches are relative to the PC, but nothing else is. So everything needs to be physically relocated before executing, and to do this properly without needing to code in a specific way, a relocatable binary format is needed. Fortunately for me, Andre Fachat had already designed such a format for OS/A65, and it fits JOS nicely because it includes 65816 extensions. Of course you need a special assembler to output this file format, which is where XA comes in. XA now even compiles for JOS, so self hosted development is now possible. The binary format will be talked about in greater detail in a future article.
Well, it's all very fine having a bunch of processes running, but that's no operating system. Who's looking after the devices? Who's managing the memory? And how do we ask the drivers to do something for us? It's all IPC...
Before I get into the specifics of IPC, I should give an idea of what typically happens when JOS boots. Because JOS has a very scalable microkernel design, it can load as many different device drivers and applications at boot time as it wants and infact they can loaded and removed anytime at all. So there is no one bootup procedure in JOS. There are certain things that happen every time, however.
For starters, JOS has 2 system processes, which are always started at bootup. They aren't actually loaded off disk because they are part of the microkernel code. One is the memory manager and the other is the process manager.
The memory manager as you would expect manages all the memory, but it doesn't manage the Process space memory (Bank 0), that's the job of the Microkernel. Process space memory (or kernel memory) is where all the PCB's, TCB's, Stack space and Direct Page space is located. The Memory manager, manages all the other RAM, e.g. Ram in Bank 1 and above, although, if there is no SuperRAM, it allocates 00e000-010000 as system Ram instead of using it as kernel space RAM, since it's more likely that you will run out of System Ram rather than kernel RAM.
I won't go into the specifics of the Memory Manager just yet, I'll just tell you that it performs the following requests:
All these things are requested via IPC, but there are Shared library routines (such as malloc, free, realloc etc) for preparing the right IPC messages to send.
The process manager's main functions are loading new processes + shared libraries, and looking up device drivers & file-systems. Whenever you open a file, you first must send a message to the process manager asking it where to send the open message.
The very first process to start however is called the "init" process (it's actually built into the microkernel, "init" isn't a filename), which starts the 2 system processes, then it starts a simple Ramdisk process and loads another process from the ramdisk called "initp".
The "initp" process should then load a proper filesystem and disk device driver also from the ramdisk, and "mount" this filesystem and executes another file this time called "init".
Note that "mounting" is preparing a filesystem for use, and all filesystems should actually be "unmounted" before switching off, because all changes may not be actually written to disk yet, even though the applications think they are. I'm guessing this is why Macintoshes refuse to let you take a disk out without the OSes permission!
The "init" file will usually be a shell script, and is responsible for starting up most of the drivers. A shell script, if you've never heard of it, is a file that has lists of commands to be run by the system, or more specifically the shell program. If you've ever seen MS-DOS .bat files, you'll know what I mean.
A typical init script has to load a user interface, unless of course you're using your machine as some kind of server, in which case you wouldn't need one and could save yourself a bit of memory!
The text based interface would require the console driver (con.drv), and the shell (sh). The console driver is capable of 4 virtual consoles, which you can switch between by pressing CBM and 1-4. This lets you exploit multitasking, as you could be running 4 different text apps on each of the screens. The shell is a pretty basic shell at the moment (like DOS's command.com), but it's enough to let you load and run any program. It also has support for pipes, but now I'm off topic..
The init script could instead load the GUI, which I'm sure most people would prefer to a text based interface!
The script also should load other drivers like: tcp/ip, ppp, digi sound driver, other filesystems, modem drivers, etc. Everything is of course optional, which is where Microkernels really excell over their monolithic counterparts.
Well that's what happens at boot time, but how do the drivers and the applications communicate? I've been mentioning "messages", and that's all that JOS's IPC is: message passing. Message passing is a fast and effective way to do IPC, and for a microkernel this is essential. I chose message passing because it's the most flexible method, and you can actually implement other types of IPC by using message passing.
You can think of message passing as an extended subroutine call, but rather than being a call to a subroutine, it's a call to another process. A process, or in particular a thread, can "send" a message to another thread, the other thread "receieves" it, and then after it has processed it, "replies" to it.
You can't just send a message and expect it to be receieved straight away, the receiver has to be ready to receive it, which may not be straight away. If the receiver isn't ready, the thread that sent the message will block and wait until it's ready. Once the receiver has received it, it processes the message, and will issue a reply, which then unblocks the sender, which can then continue processing. This type of message passing is called "synchronous" message passing, as it requires synchronization between the two threads. It may help to think of "sending" as doing a JSR, "receiving" as the Program Counter being transferred to the routine, and "replying" as executing an RTS. It's a little more complicated than that, but essentially that's what it's like.
There is a great description of this kind of IPC at http://www.qnx.com/ in their technical section, with diagrams and all -- highly recommended!
Normally, OSes have to copy messages between processes, because each process gets its own address space, and can't view the memory of other processes, but as we know, the 65816 doesn't have an MMU so all memory is shared, which means that messages don't need to be copied, which gives it a significant speed increase over message passing in OSes with MMU's. Of course it does mean that processes can accidently screw up another process's memory, but who cares! :)
All messages in JOS is directed at Channels. Channels are a resource that allow threads to receive message from other threads. Generally device drivers register a channel and use it to receive requests from applications. Channels are referred to by number, the only channels that have fixed numbers are the memory manager (0) and the process manager (1). All other channels are looked up by sending a message to the process manager's channel, e.g. Channel 1.
What exactly is a message? All the JOS system calls for IPC just deal with 24 bit pointers to messages, and the actual message data itself can be anything! However the first byte of the message should be the message code, and always is in JOS system messages. You could of course make your own protocol for your own IPC, but it's probably not a good idea.
Each different kind of driver has its own set of message codes..
#define PROCMSG $80 #define MEMMSG $40 #define MMSG_Alloc 0+MEMMSG #define MMSG_AllocBA 1+MEMMSG #define MMSG_Free 2+MEMMSG #define MMSG_Left 3+MEMMSG #define MMSG_Large 4+MEMMSG #define MMSG_LeftK 5+MEMMSG #define MMSG_LargeK 6+MEMMSG #define MMSG_KillMem 7+MEMMSG #define MMSG_Realloc 8+MEMMSG #define PMSG_Spawn PROCMSG+0 #define PMSG_AddName PROCMSG+1 #define PMSG_ParseFind PROCMSG+2 #define PMSG_FindName PROCMSG+3 #define PMSG_QueryName PROCMSG+4 #define PMSG_Alarm PROCMSG+5 #define PMSG_KillChan PROCMSG+6 #define PMSG_WaitPID PROCMSG+7
Those are the messages defined for the Process manager and Memory manager. Each message code defines its own structure, for example the MMSG_Alloc message has the structure:
.word MMSG_Alloc
.word !Size
.byte ^Size,0
The message codes $e0-$ff are left for processes that want their threads to communicate with each other.
Anything that wants to receive messages needs to have some code like this:
jsr @S_makeChan ; make a channel System call
sta Chan ; save it
loop lda Chan ;
jsr @S_recv ; receieve a message from channel
stx MsgP ; Save X/Y in MsgP
sty MsgP+2 ; MsgP is a zero page variable
sta RcvID ; Save RcvID - for replying
lda [MsgP]
and #$ff ; 8 bit message code
cmp #MSGCODE ; check which type
beq processMes ; and process it
cmp #MSGCODE2
beq processMes2
...
ldx #-1 ; replying with $ffff in X and Y
txy ; means "message not understood"
lda RcvID
jsr @S_reply ; reply and loop back for more messages
bra loop
All device drivers have a message loop like that. Which forces them to be modular, and thus easier to code.
Ok now let's see what sending a message would look like:
lda #PROC_CHAN
ldx #!Message
ldy #^Message
jsr @S_sendChan ; Send the message
...
Message .word PMSG_WaitPID,2 ; Wait for PID 2 to finish.
*note: it's generally a good idea to put messages on the stack, rather than use global variables, since using the stack is thread safe. No other thread will accidentally wipe over the message because they each have their own stack.
Just about everything that you consider an OS to be is done in JOS via IPC. This includes file operations, such as opening and closing, reading and writing files. How does the filesystem driver know which file you want to access after you've opened it? It could include a connection number in the IO_READ and IO_WRITE messages (you guessed it, the message codes for reading and writing!). That's a little cumbersome, though. There is a better solution: connections.
What is a connection? It's a kernel object which keeps a track of the destination channel of the messages directed at it. It also has an ID associated with it, so server processes can tell which file, for example, it refers to. Each process has a so called "file descriptor list" associated with it. People who know much about UNIX programming will know about this. In JOS, this table is really just a connection table. This table is just an array of connection numbers, which the process can access. Each element in the array can point to any connection number, which means that two file descriptors can actually point to the same file, and in the case of the first three it usually does. The first three are STDIN, STDOUT & STDERR, and they usually point to the screen, but not always!
An example File Descriptor list: (0 = no connection)
0 1 2 3 4 5 6 7 8 9 .... 32 ------------------------------------------------------------------ | 1 | 1 | 1 | 2 | 3 | 0 | 0 | 0 | 0 | 0 .... | ------------------------------------------------------------------
E.G.
Connections are global objects, and whenever a process is loaded, it inherits its file descriptor table from the parent, which is how it receives its STDIN, STDOUT and STDERR. File descriptors can also be explicitly redirected to other connections, or just not inherited at all. This is how JOS performs shell redirection.
I've discussed JOS's synchronous message passing, but what happens if you don't want to block and wait for a reply? You might just want to notify a server that an event has occurred, and don't need to know if it received it, nor what it thinks about it.
In this case you can send a pulse. A pulse is a tiny message (just 4 bytes), which doesn't require a reply. Probably the best property of pulses are that they can be sent during an interrupt. A good example of doing this is the console driver, which implements virtual consoles. The console driver starts an interrupt routine which scans the keyboard and checks for CBM key plus 1-4 and then sends a pulse message to its channel telling it to switch consoles.
By now you might be thinking "Microkernels must be real slow with all that process switching", but the switching code is pretty fast, particularly at 20mhz. There isn't as much switching as you would expect either, considering that IO_READ and IO_WRITE messages deal with buffers as large as 64k, so it's not as if ever single character requires a switch.
--------------------------------------------------
| Device Independence - Everything's a file! |
--------------------------------------------------
One of the major things that people who are learning UNIX have to learn, is that
practically everything is a file. Devices such as the keyboard and screen (the
console) are accessed using a file. Why you may ask! Well there isn't one
compelling reason, but it just makes it handy if you can access the console as a
file, especially for debugging. Take for example, the ability to redirect screen
output to files, a program doesn't have to be explicity designed for doing that
if everything is a file, including the console, it's just a simple matter of
changing the output file.
Not only are devices files, but filesystems can be "mounted" on any directory,
which gets rid of the need for devices numbers. Navigating through different
filesystems is just a simple matter of changing directories. It also means that
applications don't concern themselves with what the actual filesystem and device
is, just that it's there. So applications will work with any devices that have
drivers.
Ok so now you know some of the reasons behind the "everythings a file", so how
is it done in JOS? I mentioned that the process manager is in charge of "looking
up" channels, but how does it perform this lookup?
The process manager contains a table with entries for file-systems, devices and
special processes. File-systems are names that end in a '/', device files
usually start with '/dev/' and special processes start with '*'. So the table
may look something like this:
Name Channel Unit
/ 2 1 ; file-system mounted at /
/usr/ 2 2 ; file-system mounted at /usr/
*digi 3 0 ; digi driver
*tcpip 4 0 ; tcp/ip
/net/ 4 1 ; tcp connections
*cbmfsys 5 0 ; the cbm file-system
*packet 6 0 ; the packet driver (ppp/slip)
/dev/null 1 0 ; the process manager handles
; this
The name and channel fields are self explainitory but the Unit field allows a
channel to determine which of its names was used.
Whenever the process manager receives a request to look something up, depending
on what type of request it is (special process requests don't), it will prepend
the processes Current Working Directory to the filename (unless the name starts
with a '/'), and then parse the name for '.' and '..' directories, which alter
the string.
So for example you ask for the file "./hello/./../afile.txt" and your CWD was
"/usr/files/" it would be parsed as:
"/usr/files/afile.txt"
This string is then compared to the table, and finds the longest full match, in
this case it would find "/usr/" and return channel 2, unit 2, plus the string
"files/afile.txt", which is what is left over after subtracting.
The great thing about this whole "pathname space" approach is that processes
don't necessarily need to know what they're dealing with, and pieces of the OS
can be loaded and unloaded at will for the ultimate in scalability and
modularity.
You might think that setting up the request and dealing with the responses,
every time you want to open a file is a bit tiresome, but it's all handled for
you with the "open" library call.
pea O_READ
pea ^devcon1
pea !devcon1
jsr @_open ; returns file number in x or -1 on failure
pla
pla
pla
...
devcon1 .asc "/dev/con/1",0
That's all for now. In the next article, i'll be writing about process
loading + shared libraries, networking, terminal IO (console + modems) + some
other things...
Hopefully you will have learned something from this article, and can see the
power that a real multitasking OS, such as JOS, can bring to the SuperCPU.
Any feedback goes to jmaginni@postoffice.utas.edu.au , i'm particularly on the
lookout for people who can help with hardware; docs, code etc...
Also, check the JOS homepage at http://www.jolz64.cjb.net/ and join the JOS
mailing list if you're interested in updates.
.......
....
..
. C=H 19
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
VIC KERNAL Disassembly Project - Part III
Richard Cini
September 1, 1999
Introduction
============
In the last installment of this series, we examined the two remaining
hard-coded processor interrupt vectors, the IRQ and NMI vectors. Although we
took a complete look at the routines, we did not examine some of the
subroutines that IRQ and NMI call. We'll examine these routines first.
Having completed the main processor vectors, we'll continue this
series by examining other Kernal routines.
Remaining Subroutines
=====================
The NMI and IRQ routines together call 11 subroutines, five of which
we previously examined in Part I of this series, and two call the NMI vectors
in the BASIC ROM and A0 Option ROM. So, let's examine the four remaining
subroutines.
UDTIM/IUDTIM
------------
The IRQ vector calls the update time function UDTIM through the
jump table at the end of the Kernal ROM, while the NMI function skips the
intermediate call through the jump table and directly calls the time function.
UDTIM:
FFEA 4C 34 F7 JMP IUDTIM ;$F734
F734 ;==========================================================
F734 ; IUDTIM - Update Jiffy Clock (internal)
F734 ; Called by IRQ; no params; no return
F734 ;
F734 IUDTIM
F734 A2 00 LDX #$00
F736 E6 A2 INC CTIMR2 ;bump timer tick
F738 D0 06 BNE UDTIM1 ;not 0, move on (no roll)
F73A E6 A1 INC CTIMR1 ;rolled-over, INC next reg
F73C D0 02 BNE UDTIM1 ;not 0, move on (no roll)
F73E E6 A0 INC CTIMR0 ;rolled-over, INC next reg
F740
F740 UDTIM1 ;done updating registers,
F740 ; check for 24hr roll
F740 ; A0-A2 hold max of 4F1A00
F740 38 SEC ;set carry
F741 A5 A2 LDA CTIMR2 ; get LSB
F743 E9 01 SBC #$01 ; minus 1
F745 A5 A1 LDA CTIMR1 ;
F747 E9 1A SBC #$1A ; minus 1Ah
F749 A5 A0 LDA CTIMR0 ;
F74B E9 4F SBC #$4F ; minus 4Fh
F74D 90 06 BCC UDTIM2 ; ok
F74F
F74F 86 A0 STX CTIMR0 ;24-hr roll-over, so reset
F751 86 A1 STX CTIMR1 ; registers to zero
F753 86 A2 STX CTIMR2
F755
F755 UDTIM2 ;no 24-hr rollover-continue
F755 AD 2F 91 LDA D2ORAH ;check for STOP key
F758 CD 2F 91 CMP D2ORAH
F75B D0 F8 BNE UDTIM2 ;not same, check again
F75D
F75D 85 91 STA STKEY ;same, save status and exit
F75F 60 RTS
UDTIM is called every 1/60th of a second by the IRQ routine, and
begins execution by incrementing each of the time-keeping registers in
the Zero Page locations $A0 to $A2. As each is incremented, it is checked
for roll-over (i.e., for the count exceeding the maximum allowed for the
register). Taken together, the three consecutive memory locations make-up
the "jiffy clock" (as the VIC's RTC is sometimes referred; a "jiffy" being
1/60 of one second).
At the label UDTIM1, the code checks for a 24hr roll-over. The three
byte-sized registers (no pun intended) can store the 24-hour jiffy count
of 5,184,000 decimal, or 4F1A00 hex. If the count exceeds this value, the
registers are reset to zero.
The BASIC TI function accesses the jiffy clock, representing the
count as a decimal number. Similarly, the TI$ function represents the jiffy
clock as a 24-hour HH:MM:SS clock instead of a jiffy count.
UDTIM is also responsible for processing the STOP key on behalf of
the IRQ and NMI routines, so if a user program handles either of these
interrupts, the programmer must remember to call UDTIM in order to maintain
the time clock and STOP key functionality.
CCOLRAM
-------
This short routine is responsible for determining the location of
the color ram. In the VIC, the screen and color memory locations change based
on the amount of RAM installed, as follows:
Function Unexpanded Expanded
-------- ---------- --------
User BASIC $1000 00010000 $1200 00010010
Screen Memory $1E00 00011110 $1000 00010000
Color RAM $9600 10010110 $9400 10010100
The two least significant bits of the most-significant byte of each
of the screen memory and color RAM pointer registers defines the resulting
location. If the bit pattern of the screen memory is "10", the code sets
the color RAM base to page $96. If the bit pattern is "00", the code sets
the color RAM base to page $94.
The two other possible bit patterns result from screen memory
beginning at $1100 or $1F00, and produce color RAM locations of $9500
and $9700, respectively. The $1100 starting location will actually work,
but result in 256 bytes of wasted user RAM. The $1F00 starting location
will not work since the color RAM locations overlap the I/O Block 2
addresses, which have no RAM associated with them.
EAB2 ;==========================================================
EAB2 ; CCOLRAM - Calculate pointer to color RAM
EAB2 ;
EAB2 CCOLRAM
EAB2 A5 D1 LDA LINPTR ;get ptr to screen RAM LSB
EAB4 85 F3 STA COLRPT ;save it as color LSB
EAB6 A5 D2 LDA LINPTR+1 ;get screen RAM MSB
EAB8 29 03 AND #%00000011 ;mask bits 0-1
EABA 09 94 ORA #%10010100 ;OR with $94 to get color
EABA ; RAM pointer
EABC 85 F4 STA COLRPT+1 ;save as color ptr MSB
EABE 60 RTS ;exit
ISCNKY
======
This is the low-level keyboard scan function which is called
60 times per second by the IRQ routine. ISCNKY scans the keyboard matrix
to retrieve a keypress, maps the key number to its ASCII equivalent, and
places the ASCII value at the end of the keyboard buffer. If IRQs are
disabled, the keyboard scanning is suspended. ISCNKY is accessible to user
programs through the Kernal jump table, although calling it with interrupts
enabled is not recommended.
To retrieve a character from the keyboard, a user program would typically
call GETIN ($FFE4), the buffered keyboard input routine. GETIN returns
the ASCII value of the character at the head of the keyboard buffer, or
zero if no character is available.
VIA2 is directly connected to the keyboard. Port B is used as the column
strobe and Port A is used as the row input. To read the keyboard matrix,
the code brings all column strobe lines to 0 and reads the row inputs, in
order, until a key is found (or not found). The code also begins decoding
the ASCII using the "unshifted" decoding table. Three other decoding tables
are for shifted, C= (Commodore) keys, and shift+C= keys.
EB1E ;===========================================================
EB1E ; ISCNKY - Scan keyboard
EB1E ; Scans keyboard for character. Called by IRQ routine.
EB1E ; ASCII value placed in keyboard buffer.
EB1E ISCNKY
EB1E A9 00 LDA #$00 ; set shft/ctrl flag to 0
EB20 8D 8D 02 STA SHFTFL
EB23 A0 40 LDY #$40 ; assume no keys pressed
EB25 84 CB STY KEYDN ; ($40=no keys)
EB27 8D 20 91 STA D2ORB ; bring all column bits low
EB2A AE 21 91 LDX D2ORA ; read row inputs
EB2D E0 FF CPX #$FF ; any character keys pressed?
EB2F F0 5E BEQ PROCK1A ; no, exit
EB31 A9 FE LDA #%11111110 ; begin testing at COL 0
EB33 8D 20 91 STA D2ORB ; output bit pattern
EB36 A0 00 LDY #$00 ; zero character count reg
; set default translation
; table to Table 1
EB38 A9 EA LDA #$EA ;FIXUP2+2;#$5E
EB3A 85 F5 STA KEYTAB
EB3C A9 EA LDA #$EA ;FIXUP2+3;#$EC
EB3E 85 F6 STA KEYTAB+1
EB40
EB40 ISCKLP1 ; begin testing loop
EB40 A2 08 LDX #$08 ; 8 rows to test in column
EB42 AD 21 91 LDA D2ORA ; get column
EB45 CD 21 91 CMP D2ORA ; test again - debounce
EB48 D0 F6 BNE ISCKLP1 ; not equal, retry
EB4A
EB4A ISCKLP2 ; got bit pattern
EB4A 4A LSR A ; shift through carry flag
EB4B B0 16 BCS ISCNK1+3 ; CY=1 for key not pressed
EB4D
EB4D 48 PHA ; save column bit pattern
EB4E B1 F5 LDA (KEYTAB),Y ; .Y is index into ASCII
EB4E ; translation table
EB50 C9 05 CMP #$05 ; ASCII > 5, move on
EB52 B0 0C BCS ISCNK1 ; (<5=shft, C=, STOP, CTRL)
EB54
EB54 C9 03 CMP #$03 ; ASCII=3 STOP key
EB56 F0 08 BEQ ISCNK1 ; got STOP so skip flag updt
EB58
EB58 0D 8D 02 ORA SHFTFL ; save SHFT, CTRL, C= flag
EB5B 8D 8D 02 STA SHFTFL
EB5E 10 02 BPL ISCNK1+2 ; move on to next row in col
EB60
EB60 ISCNK1
EB60 84 CB STY KEYDN ; save key#
EB62 68 PLA ; restore col bit pattern
EB63 C8 INY ; increment key count
EB64 C0 41 CPY #$41 ; 64 keys scanned?
EB66 B0 09 BCS ISCNEXIT ; yes, return ASCII value
EB68
EB68 CA DEX ; go on to next row in col
EB69 D0 DF BNE ISCKLP2 ; {loop}
EB6B
EB6B 38 SEC ; done with first column, so
EB6C 2E 20 91 ROL D2ORB ; move on to next column
EB6F D0 CF BNE ISCKLP1 ; {loop}
EB71
EB71 ISCNEXIT ; function evaluation vector
EB71 6C 8F 02 JMP (FCEVAL) ; CINT1A points this to SHEVAL
EB71 ; the shift evaluation code
EB74 ;
EB74 ; Process key image
EB74 ;
EB74 PROCKY
EB74 A4 CB LDY KEYDN ; get key number (as index)
EB76 B1 F5 LDA (KEYTAB),Y ; covert key# to ASCII code
EB78 AA TAX ; copy ASCII code to .X
EB79 C4 C5 CPY CURKEY ; is it the same as the
; current character?
EB7B F0 07 BEQ PROCK1 ; yes, do repeat eval
EB7D
EB7D A0 10 LDY #$10 ; set repeat delay
EB7F 8C 8C 02 STY KRPTDL
EB82 D0 36 BNE PROCK4 ; not same key, so exit
EB84
EB84 PROCK1
EB84 29 7F AND #%01111111 ; test for {REVERSE}
EB86 2C 8A 02 BIT KEYRPT ; do test
EB89 30 16 BMI PROCK2 ; BIT7 set? reverse only
EB8B 70 49 BVS PROCK5 ; BIT6 set? alpha or reverse
EB8D
EB8D C9 7F CMP #$7F ; last non-revs'd character
EB8F
EB8F PROCK1A
EB8F F0 29 BEQ PROCK4
EB91
EB91 C9 14 CMP #$14 ; {DEL}?
EB93 F0 0C BEQ PROCK2 ; process {DELETE}/INS
EB95
EB95 C9 20 CMP #$20 ; {SPACE}?
EB97 F0 08 BEQ PROCK2 ; process {SPACE}
EB99
EB99 C9 1D CMP #$1D ; {<-}?
EB9B F0 04 BEQ PROCK2 ; process cursor right/L
EB9D
EB9D C9 11 CMP #$11 ; {CRS DN}?
EB9F D0 35 BNE PROCK5 ; process cursor down/U
EBA1
EBA1 PROCK2
EBA1 AC 8C 02 LDY KRPTDL ; get repeat delay
EBA4 F0 05 BEQ PROCK3 ; if 0, check repeat speed
EBA6
EBA6 CE 8C 02 DEC KRPTDL ; not done delaying, so exit
EBA9 D0 2B BNE PROCK5 ; {exit}
EBAB
EBAB PROCK3
EBAB CE 8B 02 DEC KRPTSP ; decrement repeat speed cnt
EBAE D0 26 BNE PROCK5 ; not done delaying, so exit
EBB0
EBB0 A0 04 LDY #$04 ; delay speed cnt reached 0,
; so reset speed count
EBB2 8C 8B 02 STY KRPTSP ; save it
EBB5 A4 C6 LDY KEYCNT ; get count of keys in kbd
; buffer
EBB7 88 DEY ; at least one, so exit
EBB8 10 1C BPL PROCK5 ; {exit}
EBBA
EBBA PROCK4
EBBA A4 CB LDY KEYDN ; get current key number
EBBC 84 C5 STY CURKEY ; re-save as current
EBBE AC 8D 02 LDY SHFTFL ; get current shift pattern
EBC1 8C 8E 02 STY LSSHFT ; save as last shft pattern
EBC4 E0 FF CPX #$FF ; re-check for any keys down
EBC6 F0 0E BEQ PROCK5 ; none, so exit
EBC8
EBC8 8A TXA ; restore ASCII code to .A
EBC9 A6 C6 LDX KEYCNT ; get count of keys in buffer
EBCB EC 89 02 CPX KBMAXL ; more than maximum allowed?
EBCE B0 06 BCS PROCK5 ; yes, drop current key press
EBD0
EBD0 9D 77 02 STA KBUFFR,X ; save ASCII code in buffer
EBD3 E8 INX ; increment buffer count and
EBD4 86 C6 STX KEYCNT ; save it
EBD6
EBD6 PROCK5
EBD6 A9 F7 LDA #$F7 ; clear bit for COL3 (STOP key
EBD8 8D 20 91 STA D2ORB ; is in COL3); save it to VIA
EBDB 60 RTS ; exit routine
Part of the keyboard scanning includes evaluating whether or not
key modifier keys are pressed. Modifier keys include the SHIFT, Commodore,
and CTRL keys. The ASCII decoding table is changed based on whether or not
one of these keys is pressed. It also looks like the following code went
through several revisions considering the multiple patch areas (filled with
NOPs). Alternatively, these areas could support alternate decoding schemes
for different languages.
EBDC ;
EBDC ; Evaluate for shift/CTRL/Commodore keys
EBDC ;
EBDC SHEVAL
EBDC AD 8D 02 LDA SHFTFL ; 1=SHFT; 2=C> 4=CTRL
EBDF C9 03 CMP #$03 ; C> + shft?
EBE1 D0 2C BNE PROCK6A ; no, select proper decode
EBE3 ; table
EBE3 CD 8E 02 CMP LSSHFT ; is the pattern the same as
EBE6 F0 EE BEQ PROCK5 ; last one? Yes, exit.
EBE8
EBE8 AD 91 02 LDA SHMODE ; different pattern
EBEB 30 56 BMI PROCKEX ; {exit}
EBED
EBED EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea
EBF3 EAEA
EBF5 EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea
EBFB EAEA
EBFD EA EA EA .db $ea, $ea, $ea
EC00
EC00 AD 05 90 LDA VRSTRT ; get char ROM address
EC03 49 02 EOR #%00000010 ; flip between L/C and U/C
EC05 8D 05 90 STA VRSTRT ; ROMs
EC08
EC08 EA EA EA EA .db $ea, $ea, $ea, $ea
EC0C
EC0C PROCK6 ; proper ROM is set, so go
EC0C 4C 43 EC JMP PROCKEX ; on with key image process
EC0F
EC0F PROCK6A ; define correct decode table
EC0F 0A ASL A ; multiply index by 2
EC10 C9 08 CMP #$08 ; >= 8 (5 entries)?
EC12 90 04 BCC $+6 ; no, continue
EC14
EC14 A9 06 LDA #$06 ; yes, assume CTRL table
EC16
EC16 EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea
EC1C EAEA
EC1E EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea
EC24 EAEA
EC26 EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea
EC2C EAEA
EC2E EAEAEAEAEAEA .db $ea, $ea, $ea, $ea, $ea, $ea, $ea, $ea
EC34 EAEA
EC36 EA EA .db $ea, $ea
EC38
EC38 AA TAX ; reset pointer to point
EC39 BD 46 EC LDA KDECOD,X ; at right decoding table
EC3C 85 F5 STA KEYTAB ; .A is table index
EC3E BD 47 EC LDA KDECOD+1,X
EC41 85 F6 STA KEYTAB+1
EC43
EC43 PROCKEX
EC43 4C 74 EB JMP PROCKY ; continue processing image
EC46
EC46 ;========================================================
EC46 ; KDECOD - Pointers to keyboard decode tables
EC46 ;
EC46 KDECOD
EC46 5E EC .dw KDECD1 ;$EC5E Unshifted
EC48 9F EC .dw KDECD2 ;$EC9F Shifted
EC4A E0 EC .dw KDECD3 ;$ECE0 Commodore
EC4C A3 ED .dw KDECD5 ;$EDA3 Control
EC4E 5E EC .dw KDECD1 ;$EC5E Unshifted
EC50 9F EC .dw KDECD2 ;$EC9F Shifted
EC52 69 ED .dw KDECD4 ;$ED69 Decode
EC54 A3 ED .dw KDECD5 ;$EDA3 Control
EC56 21 ED .dw GRTXTF ;$ED21 Graphics/text control
EC58 69 ED .dw KDECD4 ;$ED69 Decode
EC5A 69 ED .dw KDECD4 ;$ED69 Decode
EC5C A3 ED .dw KDECD5 ;$EDA3 Control
Now, let's look at a few very simple routines just so that we can
check them off of the list:
IIOBASE
=======
IIOBASE is the internal label behind the Kernal IOBASE function.
Calling IOBASE results in code execution being transferred to IIOBASE:
IOBASE:
FFF3 4C 00 E5 JMP IIOBASE ;$E500 IOBASE
IOBASE returns the address of the beginning of the I/O region of
the VIC memory map in the .X and .Y registers. Locations $9110 to $912F are
the addresses reserved for the VIC's two 6522 VIAs. This is the first routine
in the Kernal ROM.
The value of this function in the VIC is questionable since there
is no way to change the address at which the VIAs appear, and interestingly,
the Kernal code does not call IOBASE at all. The Kernal instead relies on
hard-coded addresses.
However, one could conclude that the actual location of the VIAs
in the VIC's address space changed during the Kernal development process,
so IOBASE was somehow used to normalize the address. This also enabled code
portability between the VIC and the C64.
The BASIC ROM appears to call IOBASE in the RND function. The
existence of other calls is unknown at this time since the BASIC ROM has
yet to be disassembled.
E500 ;==========================================================
E500 ; IIOBASE - Return I/O base address
E500 ; Returns the IO Base address in .X(LSB) and .Y(MSB)
E500 IIOBASE
E500 A2 10 LDX #$10 ;return $9110 as IO Base
E502 A0 91 LDY #$91
E504 60 RTS
ISCREN
======
ISCREN is the internal label behind the Kernal SCREEN function.
Calling SCREEN results in code execution being transferred to ISCREN:
SCREEN:
FFED 4C 05 E5 JMP ISCREN ;$E505 SCREEN
E505 ;==========================================================
E505 ; ISCREN - Return screen organization
E505 ; Returns the screen organization .X(columns) and .Y(rows)
E505 ;
E505 ISCREN
E505 A2 16 LDX #$16 ;return 22 cols x 23 rows
E507 A0 17 LDY #$17
E509 60 RTS
This code returns the row and column organization of the screen in
the .X and .Y registers. It doesn't appear that the Kernal calls this
function to determine the screen size, instead relying on hard-coded
values under the assumption that the screen is 22x23. So, this function's
utility appears to be purely for the benefit of user code.
IPLOT
=====
IPLOT is the internal label behind the Kernal PLOT function.
Calling PLOT results in code execution being transferred to IPLOT:
PLOT:
FFF0 4C 0A E5 JMP IPLOT ;$E50A
E50A ;===============================================================
E50A ; IPLOT - Read/set cursor position
E50A ; On entry: SEC to read cursor position to .X(row) and .Y(col)
E50A ; CLC to save cursor position from .X(row) and .Y(col)
E50A ;
E50A IPLOT
E50A B0 07 BCS READPL ;carry set? yes, read position
E50C 86 D6 STX CURROW ;save row...
E50E 84 D3 STY CSRIDX ;...and column
E510 20 87 E5 JSR SCNPTR ;update position
E513
E513 READPL
E513 A6 D6 LDX CURROW ;return row...
E515 A4 D3 LDY CSRIDX ;...and column
E517 60 RTS
The Kernal again does not call this function, instead managing cursor
movement by changing the values of the current row and current cursor index
(i.e., the cursor's position in the row). Upon storing the new cursor
location, the code commits the changes by jumping to an internal routine
in CINT1 which is responsible for moving the cursor block in screen memory.
Conclusion
==========
In this installment, we examined several routines, two of which
are integral to the operation of the VIC. The Jiffy clock routine also
scans the STOP key, which is important to overall usability and the ability
to halt a program. The second routine, SCNKEY, is responsible for scanning
the keyboard matrix. That's pretty important, too.
Next time, we'll examine more routines in the VIC's KERNAL, including
I/O routines.
.......
....
..
. C=H 19
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
JPEG: Decoding and Rendering on a C64
------------------------------------- Stephen Judd
Adrian Gonzalez
In the C64 world there are a disturbing number of cases where
people have said, "It can't be done on a C64." This goes on for a while
until someone actually takes a look at the task and its requirements,
and says "Not only can it be done, but it can be done easily." JPEG is
one such case.
This article is divided into two parts. In part 1, I discuss
JPEGs and the decoding process. The primary focus is on several important
issues not covered well, if at all, in existing documentation, especially
the IDCT; the article also covers the principles of decoding JPEGs and
JFIF files.
In part 2, Adrian discusses Floyd-Steinberg dithering, and how it
can be applied to various C64 graphics modes (and how it can be used to
display jpegs!). In both articles the actual C64 code and algorithms will
of course be discussed, and the source code is available at
http://www.ffd2.com/fridge/jpeg
for both the decoder and the renderer.
The decoder is about 4k of code, the renderer is around 2k, and
there are about 9k of tables. With the grayscale versions, there is
ample memory left over. With the color IFLI versions, memory is extremely
tight -- there are 32k of graphics, six 24-bit image buffers. The Huffman
trees are stored in the screen RAM area. The renderer crams all the data
into the graphics area, which is why you see garbage while the image is
rendering. There are a few tens of bytes free in page 0, probably 100-200
bytes free in page 1, and a few tens of bytes free in page 2, and that's it!
Everything else just kind-of barely/exactly fits, and then only for
'typical' jpegs.
Finally, Errol Smith deserves a special mention as the guy who first
tracked down some decent JPEG documentation. Errol pointed me in the right
direction and within a few weeks we had JPEGs on a 64.
------
Part I: Decoding jpegs
------
Decoding jpegs is a fairly straightforward process, and in
recent years some free documentation has become available. This
article is meant to complement that documentation, by filling in
some of the gaps and detailing some of the broader issues, not to
mention some specific implementation issues. The first part of this
article covers general jpeg issues: encoding/decoding, Huffman tree
storage, Fourier transforms, JFIF files, and so on. The second part
covers implementation issues more specific to the C64.
There are several sources of JPEG documentation online and in
the library. Out of all of them, I found three that were particularly
useful:
Cryx's jpeg writeup at http://www.wotsit.org
ftp://ftp.uu.net/graphics/jpeg/wallace.ps.gz, an updated
article from one which appeared in the April 1991
"Communications of the ACM" (v34 no.4).
"JPEG Still Image Data Compression Standard" by William B. Pennebaker
and Joan L. Mitchell, published by Van Nostrand Reinhold, 1993,
ISBN 0-442-01272-1.
The first, Cryx's writeup, is a programmer's description of JPEG files, so
it has good, detailed descriptions of the encoding/decoding process and
the file structure/organization, including a list of all the JFIF segments
and markers. The second reference is also excellent, and explains most of
the basic principles of JPEGs, the how's and why's of the standard, and has
some helpful examples. The third reference (the book) is very comprehensive,
but is written in a way which I feel tends to obscure the important points.
Nevertheless, it has an entire chapter on the discrete cosine transform and
several fast DCT algorithms, which is invaluable. As an additional source
of information, some people might find the IJG's cjpeg/djpeg source code
helpful.
JPEG Encoding/Decoding
----------------------
It's really simple, folks.
Start with a grayscale image and divide it up into 8x8 pixel blocks
(just like a C64 bitmap). The first block is the upper-left corner of the
image; the second block is to the right of the first block, and so on until
the end of the row is reached, at which point the next row begins.
The next step is to take the two dimensional discrete cosine
transform of each 8x8 component, and filter out the small-amplitude
frequencies. This will be explained in detail later, but the net result
is that you are left with a lot of zeros in the 64-byte data block, and
a few nonzero elements from which you can reconstruct the main features
of the image. This filtering process is called the "quantization" step.
The next step is to RLE-encode the resulting 8x8 block (since most
of the components are zero), and finally to Huffman-encode the RLE-encoded
data. And that's it. Done. Finished. Repeat Until Done.
Color pictures are similar, but now each pixel has an 8-bit R, G,
and B value, so there will be three 8x8 blocks, for a total of 24 bits
(not quite like a C64 bitmap...). The RGB values are converted to
luminance/chrominance values (RGB -> YCrCb), but what's important is that
for each 8x8 section of a color image there are three 64-byte blocks of
data, and each block is encoded as above.
So to summarize: transform the data, filter ("quantize") the
transformed data, and RLE-encode and Huffman-encode the result. Do this
for each component, and then move on to the next 8x8 block. Therefore,
to decode the image data:
read in the bits,
find the Huffman code,
unpack the RLE,
de-quantize the data,
and perform the inverse transform,
for each 8x8 block of image data to be plotted to the screen. Repeat
until done.
It turns out that there are other methods of JPEG compression
in the standard, such as arithmetic compression, but this is rarely
supported due to legal reasons (lame software patent owned by IBM, AT&T,
and Mitsubishi), and it doesn't seem to offer substantial compression
gains. There are also different types of jpegs, most importantly
"baseline" or sequential jpegs, and "progressive" jpegs. In
a progressive jpeg the image is stored in a series of "scans" which go
from lower to higher resolution. I'll be focusing on baseline jpegs
(which are more common).
Finally, it turns out that an 8x8 block of image data doesn't
have to correspond to an 8x8 block of pixels. For example, each byte
of data might represent an average of a 2x2 block of pixels, so an 8x8
block of data might expand to a 16x16 block of pixels. In a JPEG
the "sampling factor" determines how to expand an 8x8 block of data.
You can see that this can offer substantial compression gains, but will
coarsen the data; on the other hand, if the data is already coarse, it's
a way of getting a whole lot for nothing. Most color jpegs use one-to-one
pixel mapping for the luminance, and one-to-four (each data byte = 2x2 pixel
block) mapping for the two chrominance components. From an implementation
standpoint, this means that a decoder typically decodes 16 scanlines at a
time (16x16 pixel chunks). For more details, see Cryx's document.
Before a JPEG can be decoded, though, the decoder needs a fair
amount of information, such as the Huffman trees used, the quantization
tables used, information about the image such as its size, whether it's
a color or a grayscale image, and so on. In a JPEG file, all information
is stored in "segments".
Segments
--------
A JPEG segment looks like the following:
[header] Two bytes, starting with $FF
[length] Two bytes, in hi/lo order (not usual 6502 lo/hi)
[data] Segment data
A list of JPEG (and JFIF) headers can be found in Cryx's document.
Let's have a look at a hex dump of a jpeg file (from unix, use
"od -tx1 file.jpg | more"):
0000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 00 48
0000020 00 48 00 00 ff fe 00 17 43 72 65 61 74 65 64 20
0000040 77 69 74 68 20 54 68 65 20 47 49 4d 50 ff db 00
The first two bytes are $ff $d8 -- these two bytes identify the
file as a jpeg. All jpegs start with ff d8.
Next we encounter the header ff e0. ff e0 is a special header
which identifies this file as a JFIF file. It turns out that in the
original JPEG standard a specific file format is not given; this
in turn led to different companies using their own formats, to try and
establish the "standard". The JFIF format was put forwards to remedy
this problem, and is the de-facto standard -- but more on this later.
In a JFIF file, the JFIF segment always follows the JPEG ID byte.
You can see that it is length 16, and that that length includes the two
length bytes. Immediately following the length byte are the four letters
J F I F and the number 0; following that are some bytes for revision numbers,
the x/y densities, and some thumbnail info.
The next segment starts with the header ff fe. This is the
"comment" header; the length is $17 bytes. Following the length bytes
are the ascii codes for "Created with The GIMP", a popular image
processing program. The next header is ff db, which is the "Define
Quantization Table" header. And on it goes, until the actual image
data -- a stream of Huffman-encoded bits -- is reached.
Huffman Decoding
----------------
If you don't know anything about Huffman decoding, then I suggest
you read Pasi's nice article in C=Hacking #16, which has a nice example.
Briefly, a Huffman tree is a binary tree whose left and right branches
correspond to bits 0 and 1 respectively; starting from the top of the
tree, you read bits and move left or right accordingly until a leaf
is reached, containing the Huffman code value. Then you start over again
at the top of the tree and decode the next Huffman code.
In a JPEG, Huffman trees are stored in "Define Huffman Tree"
segments (header = ff c4):
0000300 ff c4 00 1c 00 00
0000320 01 05 01 01 01 00 00 00 00 00 00 00 00 00 00 03
0000340 01 02 04 05 06 00 07 08
The first byte in the DHT segment (00) is an ID byte -- JPEGs can have up to
eight Huffman trees. This is then followed by 16 bytes, where each byte
represents the number of Huffman codes of lengths 1, 2, 3, ..., up to
length 16, followed by the Huffman code values. In the above example, there
are 0 codes of length 1, 1 code of length 2, 5 codes of length 3, and so
on. Following these 16 bytes are the Huffman values: 3, 1, 2, 4, ..., 8.
But what are the Huffman codes corresponding to those values?
It turns out that these trees are so-called "canonical Huffman trees",
and work as follows: to get the next code, add 1 to the current code.
When the length increases, add 1 and shift everything left. The exception
is that you don't increment until the first code is defined, so the first
code is always zeroes.
For example, to decode the above DHT segment, start with Huffman
code = 0. There are no codes of length 1, so we shift it left to get
code = 00 (and don't add 1 because the first code hasn't been defined yet).
There is one code of length 2, so we read the first Huffman value and
assign it to the current code
Code Value
00 3
That's the only code of length two, so now we move to length 3 by incrementing
and shifting: code = 010. There are five values of length 3, and the next
five Huffman values are 1, 2, 4, 5, 6, so the Huffman tree is now
Code Value
00 3
010 1
011 2
100 4
101 5
110 6
and the rest of the Huffman tree is given by
1110 0
11110 7
111110 8
What's the best way to implement a Huffman tree?
The most obvious way is to use five bytes per "node", i.e.
left pointer (2 bytes)
right pointer (2 bytes)
value (1 byte)
where the left and right pointers are just offsets to be added to the
current pointer, and if left = right = $FFxx then this is a leaf. If you
fetch a bit that says "go left", and the left pointer = $FFxx (but right
pointer is valid) then you've hit an invalid Huffman code -- i.e. decoding
error. This five-byte method is used in jpx (grayscale decoder).
But there is another rather cool method, first described to me by
Errol Smith, which uses only two bytes per node. Now, the five-byte method
works fine in jpx, but in the full-color IFLI jpz code -- well, suddenly
memory becomes extremely tight, and without this routine jpz probably
wouldn't have happened on a stock machine. The routine is also very
efficient, especially if implemented using 16-bit 65816 code.
The trick is simply to organize the tree such that if the current
node is at location NODE, then the left node is at NODE+2 and the right
node is at NODE+(NODE). Leaf nodes can be indicated by e.g. setting the
high bit. So the decoding process is:
get next bit
if 0 then pointer = pointer + 2
if 1 then pointer = pointer + node value
if high byte of node value < $80 then loop
For example, the first part of the earlier Huffman tree
00 3
010 1
011 2
100 4
would be encoded as
0d 00 04 00 03 80 04 00 01 80 02 80 00 00 00 00 04 80
-----|-----|-----|-----|-----|-----|-----|-----|-----|
Try decoding the Huffman values, using the above algorithm.
Astute readers may ask the question: won't you decode incorrectly if
there is no left node? Even more astute readers can answer it: in a
canonical Huffman tree, the only nodes without left-node pointers are
leafs.
To see this, consider a counterexample: a tree that looks like
o
/
o
\
o
This corresponds to Huffman code 01 -- one move left, one move right.
In a canonical Huffman tree, the only way to generate the code 01 is to
increment the code 00; since code 00 has already occured, there must be
a left-node. In a canonical Huffman tree, you always create a left-node
before creating a right-node. So error checking this kind of tree amounts
to checking the right-pointer; the only nodes without left-pointers are leafs.
Moreover, since left-nodes are always created first, you can add nodes in
the order they are created -- you never have to insert nodes between
existing nodes.
Pretty nifty, eh?
Restart Markers
---------------
The image data in a jpeg is a stream of Huffman-encoded bits.
The jpeg standard allows for "restart markers" to be perodically inserted
into the stream. Thus a decoder needs to keep count of how far it is
in the data stream, and periodically re-synchronize the bitstream. So
far so good -- this is explained in detail in Cryx's document.
What _isn't_ explained is that the restart markers do not merely
re-synchronize the data stream, but when a restart marker is hit the DC
coefficients need to be reset to zero. That is, it really does "restart"
the decoder.
What's a DC coefficient, you may ask? It's the very first element
in the 8x8 array, and instead of encoding the actual value a jpeg encodes
the _offset_ from the previous value. That is, the decoded DC element is
added to the current DC value to get the new value. That value needs
to be reset to zero when a restart marker is hit.
Most jpegs do not use restart markers, but unless you reset the
coefficient you're going to spend a few months wondering why Photoshop images
don't decode correctly.
Why is it called the DC coefficient? You'll have to read the section
on Fourier transforms for the answer.
Note also that when the byte $FF is encountered in the data stream
it must be skipped; the exception is if it is immediately followed by a 00,
in which case $FF00 represents the value $FF. Why do I bring this up?
Because Cryx's document could be interpreted by naive people like myself
as saying this is true throughout a jpeg file, and it's only true within
the image data -- that in other segments, $FF is a perfectly valid byte.
Unpacking the RLE
-----------------
Once a Huffman code is retrieved and decoded, the resulting byte
represents RLE-compressed data to be uncompressed. This procedure is
described quite well in Cryx's document, so I'll just refer you to it.
This is repeated until you are left with a 64-byte chunk of data which
needs to be re-ordered and dequantized. This process is again described
in Cryx's document; briefly, during the encoding process, the original
8x8 data is re-ordered into a 64-byte vector as follows:
0 1 5 6 ...
2 4 7 13 ...
3 8 12 17 ...
9 11 18 24 ...
10 19 23 ...
20 22 ...
...
That is, the first element in the vector is the (0,0) component of the
8x8 array, the next element is the (1,0) component, the next element is
the (0,1) component, and so on. The reason for this "zig-zag" ordering
is to enhance the RLE-compression, since it concentrates the lower
frequencies at the beginning of the vector and the higher frequencies --
most of which are typically zero-amplitude -- at the end of the vector
(more on this later). The decoder thus needs to "un-zigzag" the vector
back into an 8x8 array. All de-quantization amounts to is multiplying
each element by a corresponding element in a quantization table:
data[i,j] = data[i,j