Chapter 5. Simple Functions

5.1. General [1]

The MDL equivalent of a "program" (uncompiled) is an object of TYPE FUNCTION. Actually, full-blown "programs" are usually composed of sets of FUNCTIONs, with most FUNCTIONs in the set acting as "subprograms".

A FUNCTION may be considered to be a SUBR or FSUBR which you yourself define. It is "run" by using a FORM to apply it to arguments (for example, <function arg-1 arg-2 ...>), and it always "returns" a single object, which is used as the value of the FORM that applied it. The single object may be ignored by whatever "ran" the FUNCTION -- equivalent to "returning no value" -- or it may be a structured object containing many objects -- equivalent to "returning many values". MDL is an "applicative" language, in contrast to "imperative" languages like Fortran. In MDL it is impossible to return values through arguments in the normal case; they can be returned only as the value of the FORM itself, or as side effects to structured objects or global values.

In this chapter a simple subset of the FUNCTIONs you can write is presented, namely FUNCTIONs which "act like" SUBRs with a fixed number of arguments. While this class corresponds to about 90% of the FUNCTIONs ever written, you won't be able to do very much with them until you read further and learn more about MDL's control and manipulatory machinery. However, all that machinery is just a bunch of SUBRs and FSUBRs, and you already know how to "use" them; you just need to be told what they do. Once you have FUNCTIONs under your belt, you can immediately make use of everything presented from this point on in the document. In fact, we recommend that you do so.

5.2. Representation [1]

A FUNCTION is just another data object in MDL, of TYPE FUNCTION. It can be manipulated like any other data object. PRINT represents a FUNCTION like this:

#FUNCTION (elements)

that is, a number sign, the ATOM FUNCTION, a left parenthesis, each of the elements of the FUNCTION, and a right parenthesis. Since PRINT represents FUNCTIONs like this, you can type them in to READ this way. (But there are a few TYPEs for which that implication is false.)

The elements of a FUNCTION can be "any number of anythings"; however, when you use a FUNCTION (apply it with a FORM), EVAL will complain if the FUNCTION does not look like

#FUNCTION (act:atom arguments:list decl body)

where act and decl are optional (section 9.8 and chapter 14); body is at least one MDL object -- any old MDL object; and, in this simple case, arguments is

(any number of ATOMs)

that is, something READ and PRINTed as: left parenthesis, any number -- including zero -- of ATOMs, right parenthesis. (This is actually a normal MDL object of TYPE LIST, containing only ATOMs.)

Thus, these FUNCTIONs will cause errors -- but only when used:

Input Explanation
#FUNCTION () -- no argument LIST or body
#FUNCTION ((1) 2 7.3) -- non-ATOM in argument LIST
#FUNCTION ((A B C D)) -- no body
#FUNCTION (<+ 1 2> A C) -- no argument LIST

These FUNCTIONs will never cause errors because of format:

#FUNCTION (() 1 2 3 4 5)
#FUNCTION (()()()()()()()())
#FUNCTION ((Q) <SETG C <* .Q ,C>> <+ <MOD ,C 3> .Q>)

and the last two actually do something which might be useful. (The first three are rather pathological, but legal.)

5.3. Application of FUNCTIONs: Binding [1]

FUNCTIONs, like SUBRs and FSUBRs, are applied using FORMs. So,

<#FUNCTION ((X) <* .X .X>) 5>$

applied the indicated FUNCTION to 5 and returned 25.

What EVAL does when applying a FUNCTION is the following:

  1. Create a "world" in which the ATOMs of the argument LIST have been SET to the values applied to the FUNCTION, and all other ATOMs have their original values. This is called "binding".
  2. In the above, this is a "world" in which X is SET to 5.
  3. In that new "world", evaluate all the objects in the body of the FUNCTION, one after the other, from first to last.
  4. In the above, this means evaluate <* .X .X> in a "world" where X is SET to 5.
  5. Throw away the "world" created, and restore the LVALs of all ATOMs bound in this application of the FUNCTION to their originals (if any). This is called "unbinding".
  6. In the above, this simply gives X back the local value, if any, that it had before binding.
  7. Return as a value the last value obtained when the FUNCTION's body was evaluated in step (2).
  8. In the above, this means return 25 as the value.

The "world" mentioned above is actually an object of TYPE ENVIRONMENT. The fact that such "worlds" are separate from the FUNCTIONs which cause their generation means that all MDL FUNCTIONs can be used recursively.

The only thing that is at all troublesome in this sequence is the effect of creating these new "worlds", in particular, the fact that the previous world is completely restored. This means that if, inside a FUNCTION, you SET one of its argument ATOMs to something, that new LVAL will not be remembered when EVAL leaves the FUNCTION. However, if you SET an ATOM which is not in the argument LIST (or SETG any ATOM) the new local (or global) value will be remembered. Examples:

<SET X 0>$
<#FUNCTION ((X) <SET X <* .X .X>>) 5>$

On the other hand,

<SET Y 0>$
<#FUNCTION ((X) <SET Y <* .X .X>>) 5>$

By using PRINT as a SUBR, we can "see" that an argument's LVAL really is changed while EVALuating the body of a FUNCTION:

<SET X 5>$
<#FUNCTION ((X) <PRINT .X> <+ .X 10>) 3>$
3 13

The first number after the application FORM was typed out by the PRINT; the second is the value of the application.

Remembering that LVALs of ATOMs not in argument LISTs are not changed, we can reference them within FUNCTIONs, as in

<SET Z 100>$
<#FUNCTION ((Y) </ .Z .Y>) 5>$

ATOMs used like Z or Y in the above examples are referred to as "free variables". The use of free variables, while often quite convenient, is rather dangerous unless you know exactly how a FUNCTION will always be used: if a FUNCTION containing free variables is used within a FUNCTION within a FUNCTION within ..., one of those FUNCTIONs might just happen to use your free variable in its argument LIST, binding it to some unknown value and possibly causing your use of it to be erroneous. Please note that "dangerous", as used above, really means that it may be effectively impossible (1) for other people to use your FUNCTIONs, and (2) for you to use your FUNCTIONs a month (two weeks?) later.

5.4. Defining FUNCTIONs (FUNCTION and DEFINE) [1]

Obviously, typing #FUNCTION (...) all the time is neither reasonable nor adequate for many purposes. Normally, you just want a FUNCTION to be the GVAL of some ATOM -- the way SUBRs and FSUBRs are -- so you can use it repeatedly (and recursively). Note that you generally do not want a FUNCTION to be the LVAL of an ATOM; this has the same problems as free variables. (Of course, there are always cases where you are being clever and want the ATOM to be re-bound....)

One way to "name" a FUNCTION is

#FUNCTION ((X) <* .X .X>)

So that

<SQUARE 100>$

Another way, which is somewhat cleaner in its typing:

#FUNCTION ((X) <* .X .X>)

FUNCTION is an FSUBR which simply makes a FUNCTION out of its arguments and returns the created FUNCTION.

This, however, is generally the best way:

<DEFINE SQUARE (X) <* .X .X>>$
#FUNCTION ((X) <* .X .X>

The last two lines immediately above are just to prove that DEFINE did the "right thing".

DEFINE is an FSUBR which SETGs EVAL of its first argument to the FUNCTION it makes from the rest of its arguments, and then returns EVAL of its first argument. DEFINE obviously requires the least typing of the above methods, and is "best" from that standpoint. However, the real reason for using DEFINE is the following: If EVAL of DEFINE's first argument already has a GVAL, DEFINE produces an error. This helps to keep you from accidentally redefining things -- like MDL SUBRs and FSUBRs. The SETG constructions should be used only when you really do want to redefine something. DEFINE will be used in the rest of this document.

[Actually, if it is absolutely necessary to use DEFINE to "redefine" things, there is a "switch" which can be used: if the LVAL of the ATOM REDEFINE is T (or anything not of TYPE FALSE), DEFINE will produce no errors. The normal state can be restored by evaluating <SET REDEFINE <>>. See chapter 8.]

5.5. Examples (Comments) [1]

Using SQUARE as defined above:

        ;"This is a comment. This FUNCTION finds the
          length of the hypotenuse of a right triangle
          of sides SIDE-1 and SIDE-2."
    <SQRT <+ <SQUARE .SIDE-1> <SQUARE .SIDE-2>>>>$
<HYPOT 3 4>$

Note that carriage-returns, line-feeds, tabs, etc. are just separators, like spaces. A comment is any single MDL object which follows a ; (semicolon). A comment can appear between any two MDL objects. A comment is totally ignored by EVAL but remembered and associated by READ with the place in the FUNCTION (or any other structured object) where it appeared. (This will become clearer after chapter 13.) The "s (double-quotes) serve to make everything between them a single MDL object, whose TYPE is STRING (chapter 7). (SQRT is the SUBR which returns the square root of its argument. It always returns a FLOAT.)

A whimsical FUNCTION:

<DEFINE ONE (THETA) ;"This FUNCTION always returns 1."
        <+ <SQUARE <SIN .THETA>>
           <SQUARE <COS .THETA>>>>$
<ONE 5>$
<ONE 0.23>$

ONE always returns (approximately) one, since the sum of the squares of sin(x) and cos(x) is unity for any x. (SIN and COS always return FLOATs, and each takes its argument in radians. ATAN (arctangent) returns its value in radians. Any other trigonometric function can be compounded from these three.)

MDL doesn't have a general "to the power" SUBR, so let's define one using LOG and EXP (log base e, and e to a power, respectively; again, they return FLOATs).

<DEFINE ** (NUM PWR) <EXP <* .PWR <LOG .NUM>>>>$
<** 2 2>$
<** 5 3>$
<** 25 0.5>$

Two FUNCTIONs which use a single global variable (Since the GVAL is used, it cannot be rebound.):

<DEFINE STEP () <SETG GV <+ ,GV 1>>>$

START and STEP take no arguments, so their argument LISTs are empty.

An interesting, but pathological, FUNCTION:

<DEFINE INC (ATM) <SET .ATM <+ ..ATM 1>>>$
<SET A 0>$
<INC A>$
<INC A>$

INC takes an ATOM as an argument, and SETs that ATOM to its current LVAL plus 1. Note that inside INC, the ATOM ATM is SET to the ATOM which is its argument; thus ..ATM returns the LVAL of the argument. However, there is a problem:

<SET ATM 0>$

[ATM 1]

The error occurred because .ATM was ATM, the argument to INC, and thus ..ATM was ATM also. We really want the outermost . in ..ATM to be done in the "world" (ENVIRONMENT) which existed just before INC was entered -- and this definition of INC does both applications of LVAL in its own "world". Techniques for doing INC "correctly" will be covered below. Read on.