Chapter 22. Storage Management

The reason this chapter comes so late in this document is that, except for special cases, MDL programs have their storage needs handled automatically. There is usually no need even to consider storage management, except as it affects efficiency (chapter 24). This chapter gives some explanation of why this is so, and covers those special means by which a program can assume control of storage management.

The MDL address space is divided into five parts, which are usually called

movable garbage-collected space,
immovable space (both garbage-collected and not),
user pure/page space,
pure-RSUBR mapping space, and
internal storage.

Internal storage occupies both the highest and lowest addresses in the address space, and its size never changes as MDL executes. The other spaces can vary in size according to the needs of the executing program. Generally the interpreter allocates a contiguous set of addresses for each space, and each space gradually fills up as new objects are created and as disk files are mapped in. The action taken when space becomes full varies, as discussed below.

22.1. Movable Garbage-collected Storage

Most storage used explicitly by MDL programs is obtained from a pool of free storage managed by a "garbage collector". Storage is obtained from this pool by the SUBRs which construct objects. When a SUBR finds that the pool of available storage is exhausted, it automatically calls the garbage collector.

The garbage collector has two algorithms available to it: the "copying" algorithm, which is used by default, and the "mark-sweep" algorithm. Actually, one often speaks of two separate garbage collectors, the "copying" one and the "mark-sweep" one, because each is an independent module that is mapped in to the interpreter's internal storage from disk only during garbage collection. For simplicity, this document speaks of "the" garbage collector, which has two algorithms.

The garbage collector examines the storage pool and marks all the objects there, separating them into two classes: those which cannot possibly be referenced by a program, and those which can. The "copying" algorithm then copies the latter into one compact section of the pool, and the remainder of the pool is made available for newly constructed objects. The "mark-sweep" algorithm, instead, puts all objects in the former class (garbage) into "free lists", where the object-construction SUBRs can find them and re-use their storage.

If the request for more storage still cannot be satisfied from reclaimed storage, the garbage collector will attempt to obtain more total storage from the operating system under which MDL runs. (Also, if there is a gross superfluity of storage space, the garbage collector will politely return some storage to the operating system.) Only when the total system resources are exhausted will you finally lose.

Thus, if you just "forget about" an object, that is, lose all possible means of referencing it, its storage is automatically reclaimed. "Object" in this context includes that stack-structured storage space used in PROCESSes for functional application.

22.1.1. Stacks and Other Internal Vectors

Control stacks are used in MDL to control the changes in environment caused by calling and binding. Each active PROCESS has its own control stack. On this stack are stored LVALs for ATOMs; PRIMTYPE TUPLEs, which are otherwise like VECTORs; PRIMTYPE FRAMEs, which are generated by calling Subroutines; and ACTIVATIONs, which are generated by calling FUNCTIONs with named ACTIVATIONs, PROG, and REPEAT. TAG and LLOC can make TAGs and LOCDs (respectively) that refer to a specific place on a specific control stack. (LEGAL? returns T if and only if the portion of the control stack in which its argument is found or to which its argument refers is still active, or if its argument doesn't care about the control stack. The garbage collector may change a non-LEGAL? object to TYPE ILLEGAL before reclaiming it.) As the word "stack" implies, things can be put on it and removed from it at only one end, called the top. It has a maximum size (or depth), and attempting to put too many things on it will cause overflow. A stack is stored like a VECTOR, and it must be GROWn if and when it overflows.

A control stack is actually two stacks in one. One section is used for "top-level" LVALs -- those SET while the ATOM is not bound by any active Function's argument LIST or Subroutine's SPECIAL binding -- and the other section is used for everything else. Either section can overflow, of course. The top-level-LVAL section is below the other one, so that a top-level LVAL will be found only if the ATOM is not currently bound elsewhere, namely in the other section.

MDL also has an internal stack, used for calling and temporary storage within the interpreter and compiled programs. It too is stored like a VECTOR and can overflow. There are other internal vectors that can overflow: the "global vector" holds pairs ("slots") of ATOMs and corresponding GVALs ("globally bound" or GBOUND? means that the ATOM in question is in this vector, whether or not it currently has a global value), and the "TYPE vector" holds TYPE names (predefined and NEWTYPEs) and how they are to be treated.

22.2. Immovable Storage

22.2.1. Garbage-collected: FREEZE

In very special circumstances, such as debugging RSUBRs, you may need to prevent an object from being moved by the garbage collector. FREEZE takes one argument, of PRIMTYPE VECTOR, UVECTOR, STRING, BYTES or TUPLE. It copies its argument into non-moving garbage-collected space. FREEZE returns the copy CHTYPEd to its PRIMTYPE, except in the case of a TUPLE, which is changed to a VECTOR.

22.2.2. Non-garbage-collected: STORAGE (the PRIMTYPE)

An object of PRIMTYPE STORAGE is really a frozen UVECTOR whose UTYPE is of PRIMTYPE WORD, but it is always pointed to by something internal to MDL and thus is never garbage-collectible. The use of FREEZE is always preferable, except when for historical reasons a STORAGE is necessary.

22.3. Other Storage

User pure/page space serves two purposes. First, when a user program PURIFYs (see below) MDL objects, they are copied into this space. Second, so-called hand-crafted RSUBRs (assembled but not compiled) can call on the interpreter to map pages of disk files into this space for arbitrary purposes.

Pure-RSUBR mapping space is used by the interpreter to dynamically map pages of pure compiled programs into and out of the MDL address space. Pure code can refer to impure storage through the "transfer vector", another internal vector. This space is the most vulnerable to being compressed in size by the long-term growth of other spaces.

Internal storage has both pure and impure parts. The interpreter program itself is pure and sharable, while impure storage is used for internal pointers, counters, and flags, for example, pointers to the boundaries of other spaces. In the pure part of this space are most of the ATOMs in an initial MDL, along with their OBLIST buckets (LISTs) and GVAL slots (a pure extension of the global vector), where possible. A SET or SETG of a pure ATOM automatically impurifies the ATOM and as much of its OBLIST bucket as needs to be impure.

22.4. Garbage Collection: Details

When either of the garbage-collected spaces (movable or immovable) becomes full, MDL goes through the following procedure:

A "DIVERT-AGC" interrupt occurs if the garbage collection can be deferred temporarily by shifting boundaries between storage spaces slightly. The interrupt handler may postpone a garbage collection by moving boundaries itself with a call to BLOAT (below).
The garbage collector begins execution. The "copying" algorithm creates an inferior operating-system process (named AGC in the ITS version) whose address space is used to hold the new copies of non-garbage objects. MDL accesses the inferior's address space through two pages ("frontier" and "window") in its internal space that are shared with the inferior. If the garbage collection occurred because movable garbage-collected space was exhausted, then the "mark-sweep" algorithm might be used instead (see below) and no inferior process is created.
The garbage collector marks and moves all objects that can possibly be referenced hereafter. It begins with the <MAIN> PROCESS and the currently running PROCESS <ME>, considered as vectors containing the control stacks, object pointers in live registers, etc. Every object in these "PROCESS vectors" is marked "accessible", and every element of these objects (bindings, etc.), and so on recursively. The "copying" algorithm moves objects into the inferior process's address space as it marks them.
If the garbage collection is "exhaustive" -- which is possible only in the "copying" algorithm -- then both the chain of associations and top-level local/global bindings are examined thoroughly, which takes more time but is more likely to uncover garbage therein. In a normal garbage collection these constructs are not treated specially.
Finally, the "mark-sweep" algorithm sweeps through the storage space, adding unmarked objects to the internal free lists for later re-use. The "copying" algorithm maps the inferior process's address space into MDL's own, replacing old garbagey with the new compact storage, and the inferior process is destroyed.

22.5. GC

<GC min:fix exh?:false-or-any ms-freq:fix>

causes the garbage collector to run and returns the total number of words of storage reclaimed. All of its arguments are optional: if they are not supplied, a call to GC simply causes a "copying" garbage collection.

If min is explicitly supplied as an argument, a garbage-collection parameter is changed permanently before the garbage collector runs. min is the smallest number of words of "free" (unclaimed, available for use) movable garbage-collected storage the garbage collector will be satisfied with having after it is done. Initially it is 8192 words. If the total amount of reclaimed storage is less than min, the garbage collector will ask the operating system for enough storage (in 1024 word blocks) to make it up. N.B.: the system may be incivil enough not to grant the request; in that case, the garbage collector will be content with what it has, unless that is not enough to satisfy a pending request for storage. Then it will inform you that it is losing. A large min will result in fewer total garbage collections, but they will take longer since the total quantity of storage to be dealt with will generally be larger. Smaller mins result in shorter, more frequent garbage collections.

22.6. BLOAT

BLOAT is used to cause a temporary expansion of the available storage space with or without changing the garbage-collection parameters. BLOAT is particularly useful for avoiding unnecessary garbage collections when loading a large file. It will cause (at most) one garbage collection, at the end of which the available storage will be at least the amount specified in the call to BLOAT. (Unless, of course, the operating system is cranky and will not provide the storage. Then you will get an error. <ERRET 1> from this error will cause the BLOAT to return 1, which usually just causes you to lose at a later time -- unless the operating system feels nicer when the storage is absolutely necessary.)

A call to BLOAT looks like this:

<BLOAT fre stk lcl glb typ sto pstk
       min plcl pglb ptyp imp pur dpstk dstk>

where all arguments on the first line above are FIX, optional (0 by default), and indicate the following:

fre: number of words of free movable storage desired (for LISTs, VECTORs, ATOMs, etc.)
stk: number of words of free control-stack space desired (for functional applications and binding of ATOMs)
lcl: number of new top-level LVALs for which to leave space (SETs of ATOMs which are not currently bound)
glb: number of new GVALs for which to leave space (in the global vector)
typ: number of new TYPE definitions for which to leave space (in the TYPE vector)
sto: number of words of immovable garbage-collected storage desired
pstk: number of words of free internal-stack space desired (for READing large STRINGs, and calling routines within the interpreter and compiled programs)

Arguments on the second line are also FIX and optional, but they set garbage-collection parameters permanently, as follows:

min: as for GC
plcl: number of slots for LVALs added when the space for top-level LVALs is expanded (initially 64)
pglb: number of slots for GVALs added when the global vector is grown (initially 64)
ptyp: number of slots for TYPEs added when the TYPE vector is grown (initially 32)
imp: number of words of immovable garbage-collected storage added when it is expanded (initially 1024)
pur: number of words reserved for pure compiled programs, if possible (initially 0)
dpstk: most desirable size for the internal stack, to prevent repeated shrinking and GROWing (initially 512)
dstk: most desirable size for the control stack (initially 4096)

BLOAT returns the actual number of words of free movable garbage-collected storage available when it is done.

22.7. BLOAT-STAT

BLOAT-STAT can be used with BLOAT to "tune" the garbage collector to particular program requirements.

<BLOAT-STAT length-27:uvector>

fills the uvector with information about the state of storage of MDL. The argument should be a UVECTOR of length 27 and UTYPE FIX. If BLOAT-STAT does not get an argument, it will provide its own UVECTOR. The information returned is as follows: the first 8 elements indicate the number of garbage collections that are attributable to certain causes, and the other 19 give information about certain areas of storage. In detail:

number of garbage collections caused by exhaustion of movable garbage-collected storage
ditto by overflow of control stack(s)
ditto by overflow of top-level-LVAL section of control stack(s)
ditto by overflow of global vector
ditto by overflow of TYPE vector
ditto by exhaustion of immovable garbage-collected storage
ditto by overflow of internal stack
ditto by overflow of both stacks at the same time (rare)
number of words of movable storage
number of words of movable storage used since last BLOAT-STAT
maximum number of words of movable storage ever existing
number of words of movable storage used since MDL began running
maximum size of control stack
number of words on control stack in use
maximum size of control stack(s) ever reached
number of slots for top-level LVALs
number of top-level LVALs existing
number of slots for GVALs in global vector
number of GVALs existing
number of slots for TYPEs in TYPE vector
number of TYPEs existing
number of words of immovable garbage-collected storage
number of words of immovable storage unused
size of largest unused contiguous immovable-storage block
number of words on internal stack
number of words on internal stack in use
maximum size of internal stack ever reached

22.8. GC-MON

<GC-MOND pred>

("garbage-collector monitor") determines whether or not the interpreter will hereafter print information on the terminal when a garbage collection starts and finishes, according to whether or not its argument is true. It returns the previous state. Calling it with no argument returns the current state. The initial state is false.

When typing is enabled, the "copying" garbage collector prints, when it starts:

GIN reason subr-that-caused:atom

and, when it finishes:

GOUT seconds-needed

The "mark-sweep" garbage collector prints MSGIN and MSGOUT instead of GIN and GOUT.

Two SUBRs, described next, use only part of the garbage-collector algorithm, in order to find all pointers to an object. GC-DUMP and GC-READ, as their names imply, also use part in order to translate between MDL objects and binary representation thereof.

22.9.1. SUBSTITUTE

<SUBSTITUTE new:any old:any>

returns old, after causing a miniature garbage collection to occur, during which all references to old are changed so as to refer to new. Neither argument can be of PRIMTYPE STRING or BYTES or LOCD or live on the control stack, unless both are of the same PRIMTYPE. One TYPE name cannot be substituted for another. One of the few legitimate uses for it is to substitute the "right" ATOM for the "wrong" one, after OBLISTs have been in the wrong state. This is more or less the way ATOMs are impurified. It is also useful for unlinking RSUBRs. SUBSTITUTE returns old as a favor: unless you hang onto old at that point, it will be garbage-collected.

22.9.2. PURIFY

<PURIFY any-1 ... any-N>

returns its last argument, after causing a miniature garbage collection that results in all the arguments becoming pure and sharable, and ignored afterward by the garbage collector. No argument can live on the control stack or be of PRIMTYPE PROCESS or LOCD or ASOC. Sharing between operating-system processes actually occurs after a SAVE, if and when the SAVE file is RESTOREd.