Chapter 22. Storage Management
The reason this chapter comes so late in this document is that, except for special cases, MDL programs have their storage needs handled automatically. There is usually no need even to consider storage management, except as it affects efficiency (chapter 24). This chapter gives some explanation of why this is so, and covers those special means by which a program can assume control of storage management.
The MDL address space is divided into five parts, which are usually called
- movable garbage-collected space,
- immovable space (both garbage-collected and not),
- user pure/page space,
- pure-
RSUBR
mapping space, and - internal storage.
Internal storage occupies both the highest and lowest addresses in the address space, and its size never changes as MDL executes. The other spaces can vary in size according to the needs of the executing program. Generally the interpreter allocates a contiguous set of addresses for each space, and each space gradually fills up as new objects are created and as disk files are mapped in. The action taken when space becomes full varies, as discussed below.
22.1. Movable Garbage-collected Storage
Most storage used explicitly by MDL programs is obtained from a pool
of free storage managed by a "garbage collector". Storage is obtained
from this pool by the SUBR
s which construct objects. When a SUBR
finds that the pool of available storage is exhausted, it
automatically calls the garbage collector.
The garbage collector has two algorithms available to it: the "copying" algorithm, which is used by default, and the "mark-sweep" algorithm. Actually, one often speaks of two separate garbage collectors, the "copying" one and the "mark-sweep" one, because each is an independent module that is mapped in to the interpreter's internal storage from disk only during garbage collection. For simplicity, this document speaks of "the" garbage collector, which has two algorithms.
The garbage collector examines the storage pool and marks all the
objects there, separating them into two classes: those which cannot
possibly be referenced by a program, and those which can. The
"copying" algorithm then copies the latter into one compact section
of the pool, and the remainder of the pool is made available for
newly constructed objects. The "mark-sweep" algorithm, instead, puts
all objects in the former class (garbage) into "free lists", where
the object-construction SUBR
s can find them and re-use their
storage.
If the request for more storage still cannot be satisfied from reclaimed storage, the garbage collector will attempt to obtain more total storage from the operating system under which MDL runs. (Also, if there is a gross superfluity of storage space, the garbage collector will politely return some storage to the operating system.) Only when the total system resources are exhausted will you finally lose.
Thus, if you just "forget about" an object, that is, lose all
possible means of referencing it, its storage is automatically
reclaimed. "Object" in this context includes that stack-structured
storage space used in PROCESS
es for functional application.
22.1.1. Stacks and Other Internal Vectors
Control stacks are used in MDL to control the changes in environment
caused by calling and binding. Each active PROCESS
has its own
control stack. On this stack are stored LVAL
s for ATOM
s;
PRIMTYPE
TUPLE
s, which are otherwise like VECTOR
s; PRIMTYPE
FRAME
s, which are generated by calling Subroutines; and
ACTIVATION
s, which are generated by calling FUNCTION
s with named
ACTIVATION
s, PROG
, and REPEAT
. TAG
and LLOC
can make TAG
s
and LOCD
s (respectively) that refer to a specific place on a
specific control stack. (LEGAL?
returns T
if and only if the
portion of the control stack in which its argument is found or to
which its argument refers is still active, or if its argument doesn't
care about the control stack. The garbage collector may change a
non-LEGAL?
object to TYPE
ILLEGAL
before reclaiming it.) As the
word "stack" implies, things can be put on it and removed from it at
only one end, called the top. It has a maximum size (or depth), and
attempting to put too many things on it will cause overflow. A stack
is stored like a VECTOR
, and it must be GROW
n if and when it
overflows.
A control stack is actually two stacks in one. One section is used
for "top-level" LVAL
s -- those SET
while the ATOM
is not bound
by any active Function's argument LIST
or Subroutine's SPECIAL
binding -- and the other section is used for everything else. Either
section can overflow, of course. The top-level-LVAL
section is
below the other one, so that a top-level LVAL
will be found only if
the ATOM
is not currently bound elsewhere, namely in the other
section.
MDL also has an internal stack, used for calling and temporary
storage within the interpreter and compiled programs. It too is
stored like a VECTOR
and can overflow. There are other internal
vectors that can overflow: the "global vector" holds pairs ("slots")
of ATOM
s and corresponding GVAL
s ("globally bound" or GBOUND?
means that the ATOM
in question is in this vector, whether or not
it currently has a global value), and the "TYPE
vector" holds
TYPE
names (predefined and NEWTYPE
s) and how they are to be
treated.
22.2. Immovable Storage
22.2.1. Garbage-collected: FREEZE
In very special circumstances, such as debugging RSUBR
s, you may
need to prevent an object from being moved by the garbage collector.
FREEZE
takes one argument, of PRIMTYPE
VECTOR
, UVECTOR
,
STRING
, BYTES
or TUPLE
. It copies its argument into non-moving
garbage-collected space. FREEZE
returns the copy CHTYPE
d to its
PRIMTYPE
, except in the case of a TUPLE
, which is changed to a
VECTOR
.
22.2.2. Non-garbage-collected: STORAGE (the PRIMTYPE)
An object of PRIMTYPE
STORAGE
is really a frozen UVECTOR
whose
UTYPE
is of PRIMTYPE
WORD
, but it is always pointed to by
something internal to MDL and thus is never garbage-collectible. The
use of FREEZE
is always preferable, except when for historical
reasons a STORAGE
is necessary.
22.3. Other Storage
User pure/page space serves two purposes. First, when a user program
PURIFY
s (see below) MDL objects, they are copied into this space.
Second, so-called hand-crafted RSUBR
s (assembled but not compiled)
can call on the interpreter to map pages of disk files into this
space for arbitrary purposes.
Pure-RSUBR
mapping space is used by the interpreter to dynamically
map pages of pure compiled programs into and out of the MDL address
space. Pure code can refer to impure storage through the "transfer
vector", another internal vector. This space is the most vulnerable
to being compressed in size by the long-term growth of other spaces.
Internal storage has both pure and impure parts. The interpreter
program itself is pure and sharable, while impure storage is used for
internal pointers, counters, and flags, for example, pointers to the
boundaries of other spaces. In the pure part of this space are most
of the ATOM
s in an initial MDL, along with their OBLIST
buckets
(LIST
s) and GVAL
slots (a pure extension of the global vector),
where possible. A SET
or SETG
of a pure ATOM
automatically
impurifies the ATOM
and as much of its OBLIST
bucket as needs to
be impure.
22.4. Garbage Collection: Details
When either of the garbage-collected spaces (movable or immovable) becomes full, MDL goes through the following procedure:
- A
"DIVERT-AGC"
interrupt occurs if the garbage collection can be deferred temporarily by shifting boundaries between storage spaces slightly. The interrupt handler may postpone a garbage collection by moving boundaries itself with a call toBLOAT
(below). - The garbage collector begins execution. The "copying" algorithm
creates an inferior operating-system process (named
AGC
in the ITS version) whose address space is used to hold the new copies of non-garbage objects. MDL accesses the inferior's address space through two pages ("frontier" and "window") in its internal space that are shared with the inferior. If the garbage collection occurred because movable garbage-collected space was exhausted, then the "mark-sweep" algorithm might be used instead (see below) and no inferior process is created. - The garbage collector marks and moves all objects that can
possibly be referenced hereafter. It begins with the
<MAIN>
PROCESS
and the currently runningPROCESS
<ME>
, considered as vectors containing the control stacks, object pointers in live registers, etc. Every object in these "PROCESS
vectors" is marked "accessible", and every element of these objects (bindings, etc.), and so on recursively. The "copying" algorithm moves objects into the inferior process's address space as it marks them. - If the garbage collection is "exhaustive" -- which is possible only in the "copying" algorithm -- then both the chain of associations and top-level local/global bindings are examined thoroughly, which takes more time but is more likely to uncover garbage therein. In a normal garbage collection these constructs are not treated specially.
- Finally, the "mark-sweep" algorithm sweeps through the storage space, adding unmarked objects to the internal free lists for later re-use. The "copying" algorithm maps the inferior process's address space into MDL's own, replacing old garbagey with the new compact storage, and the inferior process is destroyed.
22.5. GC
<GC min:fix exh?:false-or-any ms-freq:fix>
causes the garbage collector to run and returns the total number of words of storage reclaimed. All of its arguments are optional: if they are not supplied, a call to GC simply causes a "copying" garbage collection.
If min is explicitly supplied as an argument, a garbage-collection parameter is changed permanently before the garbage collector runs. min is the smallest number of words of "free" (unclaimed, available for use) movable garbage-collected storage the garbage collector will be satisfied with having after it is done. Initially it is 8192 words. If the total amount of reclaimed storage is less than min, the garbage collector will ask the operating system for enough storage (in 1024 word blocks) to make it up. N.B.: the system may be incivil enough not to grant the request; in that case, the garbage collector will be content with what it has, unless that is not enough to satisfy a pending request for storage. Then it will inform you that it is losing. A large min will result in fewer total garbage collections, but they will take longer since the total quantity of storage to be dealt with will generally be larger. Smaller mins result in shorter, more frequent garbage collections.
22.6. BLOAT
BLOAT
is used to cause a temporary expansion of the available
storage space with or without changing the garbage-collection
parameters. BLOAT
is particularly useful for avoiding unnecessary
garbage collections when loading a large file. It will cause (at
most) one garbage collection, at the end of which the available
storage will be at least the amount specified in the call to BLOAT
.
(Unless, of course, the operating system is cranky and will not
provide the storage. Then you will get an error. <ERRET 1>
from
this error will cause the BLOAT
to return 1
, which usually just
causes you to lose at a later time -- unless the operating system
feels nicer when the storage is absolutely necessary.)
A call to BLOAT looks like this:
<BLOAT fre stk lcl glb typ sto pstk
min plcl pglb ptyp imp pur dpstk dstk>
where all arguments on the first line above are FIX
, optional (0
by default), and indicate the following:
- fre: number of words of free movable storage desired (for
LIST
s,VECTOR
s,ATOM
s, etc.) - stk: number of words of free control-stack space desired (for
functional applications and binding of
ATOM
s) - lcl: number of new top-level
LVAL
s for which to leave space (SET
s ofATOM
s which are not currently bound) - glb: number of new
GVAL
s for which to leave space (in the global vector) - typ: number of new
TYPE
definitions for which to leave space (in theTYPE
vector) - sto: number of words of immovable garbage-collected storage desired
- pstk: number of words of free internal-stack space desired (for
READ
ing largeSTRING
s, and calling routines within the interpreter and compiled programs)
Arguments on the second line are also FIX
and optional, but they
set garbage-collection parameters permanently, as follows:
- min: as for
GC
- plcl: number of slots for
LVAL
s added when the space for top-levelLVAL
s is expanded (initially 64) - pglb: number of slots for
GVAL
s added when the global vector is grown (initially 64) - ptyp: number of slots for
TYPE
s added when theTYPE
vector is grown (initially 32) - imp: number of words of immovable garbage-collected storage added when it is expanded (initially 1024)
- pur: number of words reserved for pure compiled programs, if possible (initially 0)
- dpstk: most desirable size for the internal stack, to prevent
repeated shrinking and
GROW
ing (initially 512) - dstk: most desirable size for the control stack (initially 4096)
BLOAT
returns the actual number of words of free movable
garbage-collected storage available when it is done.
22.7. BLOAT-STAT
BLOAT-STAT
can be used with BLOAT
to "tune" the garbage collector
to particular program requirements.
<BLOAT-STAT length-27:uvector>
fills the uvector with information about the state of storage of
MDL. The argument should be a UVECTOR
of length 27 and UTYPE
FIX
. If BLOAT-STAT
does not get an argument, it will provide its
own UVECTOR
. The information returned is as follows: the first 8
elements indicate the number of garbage collections that are
attributable to certain causes, and the other 19 give information
about certain areas of storage. In detail:
- number of garbage collections caused by exhaustion of movable garbage-collected storage
- ditto by overflow of control stack(s)
- ditto by overflow of top-level-
LVAL
section of control stack(s) - ditto by overflow of global vector
- ditto by overflow of
TYPE
vector - ditto by exhaustion of immovable garbage-collected storage
- ditto by overflow of internal stack
-
ditto by overflow of both stacks at the same time (rare)
-
number of words of movable storage
- number of words of movable storage used since last
BLOAT-STAT
- maximum number of words of movable storage ever existing
- number of words of movable storage used since MDL began running
- maximum size of control stack
- number of words on control stack in use
- maximum size of control stack(s) ever reached
- number of slots for top-level
LVAL
s - number of top-level
LVAL
s existing - number of slots for
GVAL
s in global vector - number of
GVAL
s existing - number of slots for
TYPE
s inTYPE
vector - number of
TYPE
s existing - number of words of immovable garbage-collected storage
- number of words of immovable storage unused
- size of largest unused contiguous immovable-storage block
- number of words on internal stack
- number of words on internal stack in use
- maximum size of internal stack ever reached
22.8. GC-MON
<GC-MOND pred>
("garbage-collector monitor") determines whether or not the interpreter will hereafter print information on the terminal when a garbage collection starts and finishes, according to whether or not its argument is true. It returns the previous state. Calling it with no argument returns the current state. The initial state is false.
When typing is enabled, the "copying" garbage collector prints, when it starts:
GIN reason subr-that-caused:atom
and, when it finishes:
GOUT seconds-needed
The "mark-sweep" garbage collector prints MSGIN
and MSGOUT
instead of GIN
and GOUT
.
22.9. Related Subroutines
Two SUBR
s, described next, use only part of the garbage-collector
algorithm, in order to find all pointers to an object. GC-DUMP
and
GC-READ
, as their names imply, also use part in order to translate
between MDL objects and binary representation thereof.
22.9.1. SUBSTITUTE
<SUBSTITUTE new:any old:any>
returns old, after causing a miniature garbage collection to occur,
during which all references to old are changed so as to refer
to new. Neither argument can be of PRIMTYPE
STRING
or BYTES
or LOCD
or live on the control stack, unless both are of the same
PRIMTYPE
. One TYPE
name cannot be substituted for another. One of
the few legitimate uses for it is to substitute the "right" ATOM
for the "wrong" one, after OBLIST
s have been in the wrong state.
This is more or less the way ATOM
s are impurified. It is also
useful for unlinking RSUBR
s. SUBSTITUTE
returns old as a favor:
unless you hang onto old at that point, it will be
garbage-collected.
22.9.2. PURIFY
<PURIFY any-1 ... any-N>
returns its last argument, after causing a miniature garbage
collection that results in all the arguments becoming pure and
sharable, and ignored afterward by the garbage collector. No argument
can live on the control stack or be of PRIMTYPE
PROCESS
or LOCD
or ASOC
. Sharing between operating-system processes actually occurs
after a SAVE
, if and when the SAVE
file is RESTORE
d.