Article 42468 of comp.sys.cbm: Xref: undergrad.math.uwaterloo.ca comp.sys.cbm:42468 Newsgroups: comp.sys.cbm Path: undergrad.math.uwaterloo.ca!csbruce From: csbruce@ccnga.uwaterloo.ca (Craig Bruce) Subject: Re: Commodore 128 Multitasking OS (LONG) Sender: news@undergrad.math.uwaterloo.ca (news spool owner) Message-ID: Date: Tue, 19 Sep 1995 23:50:52 GMT Nntp-Posting-Host: ccnga.uwaterloo.ca Organization: University of Waterloo, Canada (eh!) hermit@cats.UCSC.EDU (William R. Ward) writes: > Multitasking > >The multitasking kernel (BOS) that Craig Bruce wrote is very >interesting. See http://ccnga.uwaterloo.ca/~csbruce/bos2.txt for >info. I am somewhat interested in hearing more from Craig about >future plans for I/O and networking, and how this might relate to or >be compatible with ACE (if at all). My kernel is a distributed microkernel, and hence, the networking will be built right into the kernel but most of the I/O will be implemented as user-level- process servers. Accessing the various system services will be very different from ACE, since messages will be required to access the (potentially remote) servers. But, the the Application Program Interface will be a set of library calls compiled into each program that will present an interface that is identical to the ACE API, with maybe some minor exceptions. All of the ACE applications will be ported to BOS, with minor or no modifications. >Craig Bruce's kernel switches tasks once a jiffy, approximately. The >VIC raster interrupt is used to implement this. I think that's a good >way of doing it, but would like to consider the options. There are >three ways to interrupt on the 6502 family; IRQ (the VIC raster >interrupt), NMI (the RESTORE key), and the BRK instruction. The >set/disable interrupt bit in the status register only affects IRQ and >BRK, not NMI (hence non-maskable!). The interrupt-disable bit in the status register does not mask out BRKs; BRK is more of a "trap" rather than an "interrupt". >I seem to recall that the CIA >chip can generate an NMI at regular intervals; this may be a good way >to invoke the swapper, no? Care would have to be taken that an NMI >doesn't occur while one is already in progress, but otherwise it would >allow the use of IRQ for interrupts that can only happen when not in >kernel mode (assuming the kernel disables interrupts while not running >user processes). Is this ability worth the hassle of using the less >convenient NMI rather than IRQ? I think that using periodic IRQs is the only sane way to do this. Using NMIs gives no real advantage and means that all of the critical sections in your code can be interrupted. Good luck getting that to fly. > Memory and Stack Management > >I would like to learn more about how the 128's MMU relocates the stack >and zeropage. What restrictions are there? How does it confuse the >rest of the pages of memory? For instance, if you relocate the zero >page to $1100 and the stack to $1200, do the "real" zero page and >stack appear at those addresses, or a "copy" of the relocated pages? >Craig, would you mind explaining how the MMU does this, since you use >this in your BOS? Relocating the stack and zero-pages is very simple and is explained fully in the Commodore-128 Programmer's Reference Guide. You just poke the page number into the appripriate register, and poof, all references to $0000-$00ff and $0100-$01ff are headed off at the pass and redirected to the selected pages. You can also use the absolute addresses of the relocation pages to access their contents, though there is no reason to do this. The *real* zero page (the one at absolute address $0000 or RAM0) becomes de-selected when you use a different page for zero-page, and you can no longer access the absolute one until you select it again. It's very simple. But there is one restriction, I think. If you have selected any "common" memory between the internal banks (also described in the C128 PRG), then your relocated pages can only reside in RAM0 memory. This isn't really a big restriction unless you want to have way more processes than you probably have internal memory for. >I think because of the 128's MMU, more memory, and 2MHz mode it would >be much more reasonable to do a multitasking OS on the 128 than the >64. There aren't as many 128 users out there, but there are still >quite a few. Thus I have renamed the subject line to be 128-specific. IMHO, the C128's MMU and extra memory make multitasking reasonable. Anything less, and you're going to have too much context-switching overhead and too little space for application programs (when you have your kernel and device drivers all loaded). >Jonas suggested a means of doing stack management by using only the 1 >page for the stacks, where each process requests n bytes of stack >space, and the kernel allocates chunks of stack accordingly. When >stack space is needed for a new task and there is not enough >unallocated stack space available, the stack space of a process will >be "swapped" out. This is innovative but if the MMU can give a >separate stack page for each process that would be preferable. A >multitasking OS for the C64 might use this approach however. Its main >disadvantage is more overhead for swapping and predicting stack usage >for each process. Also it is a lot easier for processes to overrun >other processes' stack areas. You don't need this on the C128, since the MMU can efficiently provide an independent 256-byte stack to each process, for the context-switching cost of changing the contents of one MMU register. For the C64, it may be a reasonable way of doing things. Of course, there is the problem of two processes that want to use the same range of two different swapped versions of the stack. >Tommy suggested methods for allocating zeropage addresses. The >program loader would allocate specific zeropage addresses for each >process, and modify the code much like the relocatable code algorithms >mentioned later in this article. One problem with this is it prevents >programs from being reentrant (using the same binary image for >multiple processes works fine as long as (a) it is not self modifying >and (b) the data space is separate for each process). However since >the 128's MMU allows each process to have its own zeropage we >shouldn't need to deal with it, much like Jonas's stack algorithm. >Again, good for a 64 multitasking OS, but the 128 can do better. The MMU obviates this problem on the C128. >Tommy suggested using FORTH-style blocks for virtual memory. I will >need to read up on FORTH before I can comment on that, unless someone >here would care to give some thoughts on that. Might be an interesting thing to try, if you're really cramped for real memory. Of course, performance can very easily go down the toilet, and you need random access to disk storage. The ACE (BOS) far-memory mechanism is capable of implementing virtual memory in a similar way. > Relocatable Binaries > >When I originally started this thread I knew nothing about relocatable >binaries, but others have posted much detail lately in connection to >both this and ACE. What's basically needed is for the assembler to >produce a "header" that the program loader can use to shuffle all the >internal byte references. A long time ago, I posted an article describing what I intended to do about relocatable binaries for BOS. Here it is again: Article: 34516 of comp.sys.cbm From: csbruce@ccnga.uwaterloo.ca (Craig Bruce) Subject: Re: Thoughts on BOS Message-ID: Organization: University of Waterloo, Canada (eh!) Date: Tue, 7 Mar 1995 02:41:22 GMT 1. EXECUTABLE-FILE FORMAT The way that I am planning to do it is to have a special format for executable programs in BOS. The executable format would have three sections: the header, the raw program image, and a relocatable-references section. The format would look something like the following: OFFSET SIZE DESCRIPTION ------ ---- ------------ 0 3 magic number for executables 3 1 flag: executable or object module 4 2 size of raw code image 6 2 raw code start address 8 2 raw address of entry point 10 2 size of "bss" (uninitialized storage) section 12 4 size of relocatable-references section 16 4 size of exported-symbol section 20 4 size of unresolved-symbol section 24 4 size of unresolved-expression section 28 8 reserved for future expansion 36 x xx x This format will also double as the object module format, which will be a superset of the executable format (since there is more information to keep track of). The "magic number" field would tell the type of the file (and prevent you from accidentially executing something that isn't a program) and the "flag" field would distinguish between an executable file and an object module. An object module would not be executable because it wouldn't be a "complete" program. An object module would require a number of other sections to be appended to the file format, which are mentioned in the header but not included in an executable file. The "raw code image size" field tells the number of bytes in the "raw code image" section of the file, which contains simply the raw image of the program, assembled to execute at some address given by the "raw code start address" field (it's not really necessary to assemble the program from address 0; in fact, this can cause some confusion in the assembler in deciding between absolute and zero-page addressing modes). The "raw address of entry point" gives the point to start execution in the program image, in terms of the raw addresses used in the code. The "size of bss" field tells how many bytes of uninitialized variable space the program requires. The bss is logically part of the raw code image, but it is not included there because it would be wasteful to keep its bytes in the executable file since they are supposed to be "just random" (or all zeroes if you're talking about a Unix system). The "size of relocatable references" filed tells how many bytes are in the "relocatable-references section" in the file. The format of this section can be: OFFSET SIZE DESCRIPTION ------ ---- ------------ 0 2 number of full-address relocations 2 2 number of low-byte relocations 4 2 number of high-byte relocations 6 x xx x xx x There are three things that need to be relocated: full addresses (like JSR SUB), low bytes (like LDA #TABLE_ADDR). The first three fields give the number of each type of relocation that has to be done, and the next three fields give the addresses in the raw code range where these relocations need to be applied. Each address in this section takes two bytes. For a really big and complicated program, this section could exceed 64K in size, so a 32-bit value for this section's size is given in the file header. For the three example instructions given here, the relocation address would always be the second byte of the instruction. For assembler directives like: tableA: db code1, >code2, >code3 tableW: dw code1, code2, code3 Where the "codeN"s are all addresses in the program range, the relocation addresses would all be where the specified bytes and words get put into memory. Automatically coming up with the relocation addresses requires inside information that is available only at the symbolic level of a assembler, so generating this relocation information has to built into an assembler. The assembler has to keep track of which values are addresses that are inside of the code range and which operands are not. The assembler also has to carefully evaluate expressions like: LDA #>code3-code2 LDX #code3-code2+code1 (Note: the "<" and ">" initial operators here are low-precedence extract-low-byte and extract-high-byte operators, respectively.) The operand for the LDA instruction is not an address high-byte; it's a constant! Two code sections will have a constant difference. The LDX example IS a relocatable reference, and so is the LDY (the size of the distance between code3 and code2 is constant, plus a relocatable reference, gives a relocatable reference). Fortunately, such an assembler exists. Well, almost. It's called "the ACE assembler". It doesn't keep track fully of addresses now nor create relocatable executable files, but it will. It has the basic ingredients for doing this already implemented. And, even more important, I am on the inside track with respect to how it works and how to change it, since I created it. There are, of course, expressions that can't sensibly be resolved, like: code1 * code2 code1 + Reentrant Code > >If you want to run two copies of a particular process at the same >time, it would be best if you didn't have to load two identical copies >of it into memory, no? Well if the programmer is careful and the OS >is designed correctly this is possible. I would have a flag in the executable header that says whether the code is re-entrant or not. Keep on Hackin'! -Craig Bruce csbruce@ccnga.uwaterloo.ca "Never question the relevance of truth, but always question the truth of relevance."