LoThere: A 'Hello World!' Boot Block



version 0.01

by Joseph Osako

11 June 2002

last modified 11 June 2002


Copyright (C) Joseph Osako 2002.




Contents


1. Bootstrap Loaders: Basic Concepts
     1.1. The Boot Process
               1.1.1. The POST
               1.1.2. The Boot Sector
2. Design of the "Hello, World!" Boot Block Program
3. The System Startup
     3.1. What the Assembler needs to Know First
     3.2. The Entry Point
     3.3. Setting the Segment Registers
               3.3.1. The Code Segment
               3.3.2. The Data Segment
4. Printing to the Monitor
     4.1. Data Declarations and Definitions
     4.2. The Print String Routine
               4.2.1. Using a BIOS Interrupt
               4.2.2. The Print Loop
5. Finishing Touches



1. Bootstrap Loaders: Basic Concepts


1.1. The Boot Process

TBD


1.1.1. The POST

TBD


1.1.2. The Boot Sector

TBD


2. Design of the "Hello, World!" Boot Block Program

Now that we have established the basic concepts of the boot process, we need to lay down an outline of the boot loader design we will be implementing. There actually two programs at work: the boot loader itself, and the second stage which it loads and transfers control to. Let's begin with the structure of the boot loader:

1. File: LoThere.asm={
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; LoThere.asm - simple "Hello, World!" boot block program
; Joseph Osako  11 June 2002
;

Constant Declarations

Set the Instruction Bit Width
Set the Origin of the Bootstrap Code

Begin the Program Code
Initialize the Segment Registers

Print Hello World to the Screen

Variable Definitions

Boot Block Footer
}
This macro is attached to an output file.


3. The System Startup


3.1. What the Assembler needs to Know First

Before it can begin generating any code, NASM has to know certain things about the program you are assembling. In this case, all this means is that you have totell it to generate 16 bit instructions, instead of 32-bit instructions. This can be done using the BITS directive.

2. Set the Instruction Bit Width={
[BITS 16]
}
This macro is invoked in definition 1.


3.2. The Entry Point

3. Begin the Program Code={
;***************************************************
; start
; This is the beginning of the boot block code.

start:
}
This macro is invoked in definition 1.


3.3. Setting the Segment Registers


3.3.1. The Code Segment

The code segment (CS) register points to the first memory location in the segment containing the actual program instructions. The BIOS by default initializes the code segment to 0000, so that the complete address of the entry point is 0000:7C00. That is to say, when the boot program starts, the Instruction Pointer reads 0x7C00, and the CS register read 0000. However, the assembler does not know what kind of program we are writing, and where it will start, In order to keep track of where the labels are in memory, we have to inform it of the starting point. To do this we use the ORG directive (short for ORiGin):

4. Set the Origin of the Bootstrap Code={
[ORG 0x7C00]
}
This macro is invoked in definition 1.

This informs the assembler to count the memory locations starting from 0X7C00 instead of from 0x0000.


3.3.2. The Data Segment

In addition to the code, the boot block contains data the boot program needs in order to run. Since the code and data are all in the same segment, the easiest way to handle the DS register is to set ot to equal the CS register.

However, there remains one problem; you can't move values directly from one segment register to another! This is one of the peculiarities of the x86 design that many programmers run into trouble with. In order to get around it, you have to move the value of the CS into one of the general-purpose registers first, and then from there copy the value to the DS:

5. Initialize the Segment Registers={
   mov ax, cs
   mov ds, ax    ; set DS == CS
}
This macro is invoked in definition 1.


4. Printing to the Monitor

Now that we've made sure we can read our data, we need write the part that prints the data, and to define the data itself.


4.1. Data Declarations and Definitions

The data we want to use is simple: the string "Hello, World!". To define it, we use a directive called DB, which stands for Define Byte. The definition for our string is:

6. Variable Definitions={
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; data
hello db "Hello, World!"
      db EOL
}
This macro is invoked in definition 1.

You'll notice that after the "Hello, World!" string, there is another piece of data added as well without a label. This byte, called a delimiter, is used to let the print function know when it has reached the end of the string, and to stop printing. It is a common practice that should be familiar to most C programmers.

But what is EOL? In this case, it is simply 0x00 (zero). This is the same convention that C uses, and it convenient for our purposes, for reasons that will be explained shortly. However, NASM does not have a constant named EOL built into it; to let the assembler know what it means, we need to use yet another directive, called EQU (short for EQUate or EQUals). The definition for EOL is:

7. Constant Declarations+={
EOL   equ 0x00    ;end of line marker
}
This macro is defined in definitions 7 and 8.
This macro is invoked in definition 1.

These two directives differ in one crucial way: EQU declares a constant name, which is replaced by it's value at assemble time, while DB and its relative RESB (REServe Byte) set aside a section of memory to hold the value it is initialized with. This is very important to remember, as it means that a the name of a DBed location is in fact a type of label, the same as those used for jumps and loops. it also means that the data takes up actual memory space at the point in the instruction/data stream it is defined at. This makes it potentially very dangerous to mix code and data declarations carelessly, as it could result in the CPU trying to interpret you data as instructions, usually with unpredictable results.


4.2. The Print String Routine

Now we have come to the part you've been waiting for: print the "Hello, World!" string. Fortunately, the hard work of writing a function to put characters on the screen do has already been done for you, in the ROM BIOS. We can use a BIOS routine which writes to the screen one character at a time and moves the cursor forward, as if it were an old-fashioned teletype machine.


4.2.1. Using a BIOS Interrupt

To call the BIOS function, we need to use what is called an interrupt. This is a special instruction that says to the CPU, 'drop everything and take care of this right now'. These calls are referred to as 'soft' interrupts, as opposed to 'hard' interupts, which are messages to the CPU from peripherals such as the hard disk or the keyboard.

Every interrupt has a 1 byte code number. When the interrupt occurs, the CPU looks up that number in a table located in the first 256 bytes of memory, and jumps to the function (called an interrupt handler) that the table entry points to. Before the BIOS loads the boot sector, it initializes some interrupts to point to useful functions in the BIOS, so that programmers can use them and not have to reinvent the wheel every time they need to use the floppy drive or the text screen.

Most of the video functions in the BIOS are reached through interrupt 0x10. Since there are only 256 interrupts, and there are many more functions that the BIOS needs to provide, it is also neccessary to pass the interrupt handler an additional function code, which is put in AH register (the upper byte of the AX register).

To make the code easier to read, let's add these two codes to the constant definitions along with the one for EOL:

8. Constant Declarations+={
VBIOS equ 0x10   ; BIOS interrupt vector for video services
ttype equ 0x0E   ; insert character in AL as if screen were teletype
}
This macro is defined in definitions 7 and 8.
This macro is invoked in definition 1.

The value of ttype still needs to be put into AH, however. Also, The BIOS routine needs one more argument to work: the location of the character to print. This is passed to it in the SI register, a special register used as a String Index.

9. BIOS Arguments={
   mov ah, ttype   ; set function to 'teletype mode'
   mov si, hello   ; set SI to point to the first character in 'hello'
}
This macro is invoked in definition 10.


4.2.2. The Print Loop

We are now ready to begin the printing loop itself (whew!). Like most loops, this one has three parts, a conditional, an action, and an exit. Together with the argument settings, they make up the print routine:

10. Print Hello World to the Screen={
BIOS Arguments
printloop:
   
Conditional
   
Action
   jmp short printloop

endstring:
Exit
}
This macro is invoked in definition 1.

The conditional section tests to see if the character pointed to by SI is equal to EOL. if it is, it jumps to the exit; otherwise, it continues with the loop.

11. Conditional={
   mov al, [si]   ; update byte to print
   cmp al, EOL    ; test that it isn't EOL
   jz endstring
}
This macro is invoked in definition 10.

The action part simply calls the BIOS interrupt and increments SI by one,

12. Action={
   int VBIOS     ; put character in AL at next cursor position
   inc si
}
This macro is invoked in definition 10.

after which the loop continues.

The exit code is simpy a loop that jumps back on itself; the dollar sign ('$') is a special directive meaning 'pointer to the current character'.

13. Exit={
   jmp $
}
This macro is invoked in definition 10.


5. Finishing Touches

We have now finished all of the actual code of the boot block program, and are almost ready to assemble our code. One detail still remains, however. Most, but not all, PCs check the last two bytes of a boot sector for a special code, the bootable disk signature, before trying to boot from it. Without it, the PC assumes that the floppy is a data disk, and returns a 'no operating system' message or somthing similar. So wee need to add this special code to our disk:

14. Bootable Disk Signature={
bootsig  dw 0xAA55
}
This macro is invoked in definition 16.

(the DW directive means 'Define Word', and allows you to define variables two bytes at a time. There is also a DD or Define Doubleword, which defines chunks of 4 bytes at a time.)

There is only one problem: how do we make sure that it get put into the last two bytes of the sector? The easiest way is to simply fill the space between the end of the data and the last two bytes with zeroes:

15. Space Filler={
space    times (0x0200 - 2) - ($-$$) db 0
}
This macro is invoked in definition 16.

What this does is it defines the label 'space' as a variable whose size is the difference between the beginning of the label and the beginning of the code ('$$'), subtracted from 0x200 (512 decimal) minus 2. This, followed by the boot signature, gives us the footer or end of the boot block.

16. Boot Block Footer={
Space Filler
Bootable Disk Signature
}
This macro is invoked in definition 1.

Now the code is ready to test.


End Of File