<<

Post-assembly Program Relocation Dictionary (RLD) and External Symbol Dictionary (ESD)

1. Terminology: Relocatable vs. not Relocatable

"relocatable" refers to something that can be moved without adjustment; i.e., the program semantics are not changed by moving the item. To say something is not relocatable means that if you move it, you will have to adjust it. For simple SIC most instructions use absolute referencing, and so most operands are not relocatable. In contrast, most SIC/XE operands are relocatable, the exception being 4-byte instructions, which usually have operands that are not relocatable.

Symbols such as statement labels and self-defining literals have a value that is relative to the START statement. These are called relative references. When a relative reference is used in an absolute context, it is not relocatable; e.g., for +LDA #STUFF the operand STUFF (which is a relative reference) is not relocatable.

A reference whose value does not depend on the START statement is called an absolute reference. Actual numbers are always absolute references, so in contrast, +LDA #1234 has a relocatable operand. Note that LDA 1234 is relocatable for either simple SIC or SIC/SE.

2. Load Process

The initial load process for a relocating is the same as that for an absolute loader (such as the one used for the SIC Simulator); i.e., the loader simply loads the object module into memory starting from a specified memory location. For an absolute loader, this location is specified as part of the load module and since the assembler has determined all addresses in the module using this location, no addressing adjustments are required. For a relocating loader, the memory location is not specified until load time (presumably provided by the or some other control program); i.e., the binding of the load module to a specific load location in memory is not determined until the time of load. Since this location was not known at the time the object module was assembled, any absolute references in the must be adjusted to reflect the load location in order for the loaded code to work as planned. This process is called program relocation.

3. Relocation Dictionary (RLD)

If programs are to be dynamically relocated in memory, the assembler must generate a table of those program locations that are not relocatable, called the Relocation Dictionary or RLD. For discussion purposes, we assume that the program START is 0. It is a simple adjustment otherwise. The RLD is used by the loader, which (for a START of 0) simply adds the program load point to the value pointed to by each (relative) location specified in the RLD. In contrast to SIC/XE, where most operands are relocatable, the RLD for a simple SIC program would need to specify almost every instruction. The RLD is normally generated "on the fly" during pass 2 of the assembler, with each entry derived from the current value of the location counter. The only information needed in an RLD entry is the (relative) location of the 3-byte word within the object module that needs to be adjusted for the object code to work. In particular, for the most common case in SIC/XE, a 4-byte instruction with a non-relocatable operand, the assembler simply adds locctr + 1 to the RLD as part of the process of generating code for the instruction. Note that it is during code generation that the assembler determines if generated code is relocatable or non- relocatable.

Example: Assume that the WORD storage directive allows symbolic references in the operand field and suppose that a program has the following lines:

loc statement object code 2B1 LOOP LDB ADDR,X 6BA010 . . . 2C4 ADDR WORD STUFF 0052A1 . . . 410C +JSUB LOOP 4B1002B1 4110 +LDB #12345 6B103039 . . . 52A1 STUFF RESW 12000

Since LOOP is a label in a program, then the statement +JSUB LOOP has a relative reference (LOOP) used in an absolute context and so it is not "relocatable". This is determined during code generation in pass 2 and the location of the operand 410C+1 = 00410D is added to the RLD. In contrast, for the statement +LDB #12345 the operand is absolute, so no RLD entry is generated.

For the statement ADDR WORD STUFF the relative reference STUFF also occurs in an absolute context, which means that the 3-byte word generated by the assembler is not relocatable. Hence, the location for this particular WORD directive needs to be added to the RLD. Just as with "+" instructions, this is normally determined "on the fly" during pass 2 of the assembler, in this case when the WORD statement is resolved. The location counter points directly to the location of the operand; hence, the value 0002C4 is added to the RLD (which for this example also happens to be the value of ADDR).

4. Externally Defined Symbols, Control Sections (CSECT)

Suppose that a program consists of a main routine and 2 subroutines and that these are being written independently. In order to assemble these routines, the source files must essentially be amalgamated, because each may use symbolic references defined in one of the other routines. As program size increases, the need to be able to work with subroutines and assemble them independently increases, so it is advantageous to automate this process.

The mechanism employed is that of a control section (CSECT). A control section is a block of code that can be assembled independently. The first line of the block is

There are two types of symbolic references used in a control section: 1. symbols defined in the control section that are referenced only within the section (local symbols) 2. symbols used in the control section that are not defined within the section (externally defined symbols) A local symbol name may appear in more than one control section since its reference within the section is unambiguous. Example: P 87 5. EXTREF and EXTDEF statements

An externally defined symbol used within a control section must be identified by using EXTREF statements within the control section; e.g., EXTREF SUB1, SUB2 The EXTREF only needs to be issued before first use of the symbol in an operand, although good form is to place all EXTREF statements at the beginning of the CSECT. It is an error for a control section to have an EXTREF for a symbol that is also defined within the section.

A control section identifies symbols that are to be made available to other controls sections by using EXTDEF statements; e.g., EXTDEF TABSIZE, ADDR1 If there is a label on the CSECT statement, it is automatically included as an EXTDEF (i.e., specification as EXTDEF is inferred). 6. External Symbol Dictionary (ESD)

The RLD provides the information needed for the loader to relocate a control section. The means for dealing with externally defined symbols is called the external symbol dictionary or ESD. In contrast to the RLD, the ESD for each control section must contain both symbolic and location information. There are 2 basic parts to the ESD: 1. EXTDEF Part: finalized in pass 1 as pairs consisting of (, ) [ = value of the symbol in the symbol table] 2. EXTREF Part: finalized in pass 2 as triples consisting of (, , ) [ = (relative) location of operand referencing the symbol] [ = +, -, *, / ]

Example: EXTDEF part Given a control section labeled SUB1, EXTDEF TABSIZE, ADDR1 might generate as the EXTDEF part of the ESD the table: SUB1 000000 TABSIZE 0000F3 ADDR1 0000DD where each EXTDEF address is taken straight from the symbol table at the end of pass 1.

Each entry in the EXTDEF part of an ESD provides the value of a symbolic reference. It is an error if a symbol appears in the EXTDEF part of more than one ESD. Note that the references in the EXTDEF part are not relocatable; i.e., their values must be adjusted at load time by adding on the load point for their associated module.

The EXTDEF part is set up by EXTDEF statements and can be finished at the end of pass 1. In contrast, the EXTREF part can only be constructed incrementally as external operands are encountered during pass 2 code generation. At module load time, each location in the EXTREF part must be adjusted by adding on the load point for the associated module (as is also the case for the EXTDEF part). If operand arithmetic is not supported, the entry is redundant (defaults to "+"), because the value of the external reference, once known, is just added to the 3-byte value at the operand location.

Example: EXTREF part Suppose that TABSIZE is an EXTREF for some module, and the assembler (in pass 2) encounters the statement +LDA TABSIZE with location counter at 12B. The pass 2 object code line generated is then 03100000 and the entry for the EXTREF part of the ESD is TABSIZE 00012C + (the operand is at locctr+1) In essence, in generating the object code, the external reference to TABSIZE in this example is treated as an absolute reference (0), to be resolved at load time once the (relocated) value of TABSIZE becomes known.

7. Loader utilization of ESD information

During program load, the loader gathers together all (relocated) EXTDEF parts and all (relocated) EXTREF parts to form a global ESD. At the end of load, each global EXTREF entry is processed against the global EXTDEFs to finalize the loaded module. By relocating all module RLDs and combining these with the values from the EXTREF part of the global ESD, a global RLD can be formed. The global RLD provides all information necessary for relocating the module elsewhere (i.e., once the program has been loaded, it can be treated as a monolithic module and can henceforward be relocated just by adjusting the locations given by the global RLD). As each EXTREF is resolved it is moved from the global ESD to the global RLD. If the module is internally complete, all EXTREFs will be resolved and the resulting global ESD will have an empty EXTREF part. In this case, any subsequent relocation of the module (e.g., in conjunction with a page swap) will require only RLD processing. The global ESD provides the means for accessing any symbolic reference specified at assembly time as an EXTDEF. If the module was designed to be a subroutine for use by other programs, then the ESD is the means whereby a calling routine can successfully link into the module using only symbolic references.

A loader that does not retain the global dictionaries is said to link, load and go. The process of generating the global dictionaries and amalgamating the modules is called linkage- edit. If the global dictionaries are retained with the module, then the module is in the same format as those produced by the assembler, so additional modules can be linked in later.

Example: follow-up to previous examples Suppose that the module which defines SUB1 is loaded at EXTDEF 2F8. Then the loader updates the module's EXTDEF provides the entries and in particular we get value to use for TABSIZE TABSIZE 0003EB [F3+2F8] Suppose that the module that references is TABSIZE EXTREF loaded at 4B7. Then this module's EXTREF entry for locates where TABSIZE TABSIZE becomes was used in TABSIZE 0005E3 + [12C+4B7] the module At the end of program load, the loader adjusts the content of 5E3 by adding the (relocated) value of TABSIZE obtained from the EXTDEF part of the global ESD yielding 031003EB [100000+3EB]

addr 5E3

The entry for TABSIZE in the global ESD is deleted and the address it references is placed in the global RLD (5E3).

8. generation of ESD entries

Compilers are expected to produce object modules compatible with the system loader. If the system loader is a relocating loader, then the compiler must produce both an RLD and an ESD in accord with the specifications for the loader. It is straight-forward to identify the high order language constructions that correspond to EXTDEFs and EXTREFs since high level languages such as are designed to work with other system software and system loaders in particular. In C, both global variable names and function names produce EXTDEF entries. extern declarations allow the programmer to specify EXTREFs. Any externally defined variables must be specified as extern; however, an undefined function reference is handled as an EXTREF even if there is no extern declaration. Note that in particular, standard C functions such as printf are undefined (unless the programmer has defined a local version) and have no extern designation; i.e., they receive an EXTDEF entry in the ESD. Many extern references are contained in the include files such as stdio.h.

The cc compiler driver (gcc for Linux) is designed to invoke the C compiler for any uncompiled modules (which may lead to cc aborting), and then invoke a linkage-editor (ld) to produce a link-edited object module. cc processes input files through one or more of four stages: preprocessing, compilation, assembly, and linking. Unix systems do not provide for load and go except by shell command script. If all compiles are successful, and if the -c option has not been specified, cc attempts to produce an (absolute) module, returning an error if there is no EXTDEF named main or if there are any unresolved EXTREFs. The resulting object module is fully link-edited and ready to run. It is named a.out by default, and can be named something else by using the -o option. It is also possible to produce relocatable modules using the ld command directly (-r option), which in Unix jargon are said to be partially linked.

In the link-edit process, an external reference is resolved by looking at the ESDs of provided object modules only until the reference is located; i.e., for a reference defined in more than one module only the first one encountered is used. For cc, the ESDs for modules on the command line are examined first, then the ESD for the C library, then those for any other specified libraries. This allows a programmer to write an alternate version of a library function and have that used in place of the existing function.

The -c option signals cc that after compiling it is to produce only a partially linked module rather than a module ready to execute. In this case, in addition to all global variable names and all function names, the ESD retains all unresolved extern references as EXTREFs. The module can then be used in a subsequent cc command to be linked in with other modules.

So long as all modules observe the specifications for the system loader, ld can link modules coming from multiple sources (if C is providing the module to be initially called, note that the final link-edit needs an EXTREF named main).