D-Expressions Lisp Power Dylan Style.pdf

D-Expressions: Lisp Power, Dylan Style

Jonathan Bachrach

Articial Intelligence Laboratory

Massachussetts Institute of Technology

Cambridge, MA 02139

jrb@ai.mit.edu

Keith Playford

Functional Objects, Inc.

86 Chandler Street

Somerville, MA 02144

keith@functionalobjects.com

1 Abstract

in languages with more conventional syntaxes like those of

C or Pascal, but in comparison with what Lisp provides, the

solutions have been restrictive, dicult to explain and use,

or both. None have been standardized. Further, the utility

of syntax manipulation tools independent of the macroex-

pansion process is typically forgotten.

This paper aims to demonstrate that it is possible for a lan-

guage with a rich, conventional syntax to provide Lisp-style

macro power and simplicity. We describe a macro system

and syntax manipulation toolkit designed for the Dylan pro-

gramming language that meets, and in some areas exceeds,

this standard. The debt to Lisp is great, however, since

although Dylan has a conventional algebraic syntax, the ap-

proach taken to describe and represent that syntax is dis-

tinctly Lisp-like in philosophy.

2.2 Lisp Power, Dylan Style

This paper aims nally to demonstrate that it is possible for

a language with a richer, more conventional syntax to pro-

vide Lisp-style macro power and simplicity. We describe a

macro system and syntax manipulation toolkit designed for

the Dylan programming language (cf [13], [7]) that meets our

goals as well as, if not better than, those commonly found in

Lisps. The debt to Lisp is great, however, since although Dy-

lan has a conventional algebraic syntax, the approach taken

to describe and represent its syntax is distinctly Lisp-like in

philosophy.

Taking Dylan's already capable rule-based pattern match-

ing and template substitution macro system as a starting

point, we explore its underpinnings. Within a broader com-

parison with other styles, we relate Dylan's approach to the

Lisp model and show how their common basis can (and has,

in other contexts) be applied to a family of languages whose

syntax is based on what we term a skeleton syntax tree . We

go on to describe a library which provides a code representa-

tion ( d-expressions ), source-level pattern matching and code

construction utilities, and source code IO. Some models of

compile-time evaluation are proposed that are suitable for

Dylan, a language in which compiler and run-time are sep-

arated. Finally, putting the d-expressions library together

with a compile-time evaluation model leads us to a natu-

ral procedural macro extension of Dylan's rule-based macro

system, one comparable to those available in Lisp.

2 Introduction

The ability to extend a programming language with new

constructs is a valuable tool. With it, system designers can

grow a language towards their problem domain and enhance

productivity and ease of maintenance. A macro system pro-

vides this capability in a portable, high-level fashion by al-

lowing new constructs to be implemented in terms of exist-

ing ones via programmer-dened source-to-source transfor-

mations.

Beyond the above, the ability to read, write, and eas-

ily manipulate the syntax of a language from within that

language can be especially powerful. It can allow the full

language to be brought to bear when implementing macros.

It can provide a convenient means of saving and restoring

structured data in text form. It can form the basis of code

analysis tools and be the starting point for experiments with

new language processors or into modied language seman-

tics.

2.1 Successes and Failures

Lisp is the only language family (cf [9], [10]) that has suc-

ceeded in providing integrated macro systems along with

simple and powerful syntax manipulation tools like these.

They are considered one of Lisp's unique strengths, perhaps

even the most important and distinctive feature of the lan-

guage. But the key to their viability in Lisp is the simplicity

and regularity of its syntax. Recognizing their utility, at-

tempts have been made to provide powerful macro facilities

2.3 Requirements and Goals

A macro facility provides a mechanism to extend a base

syntax. Such a facility can be measured along a set of axes,

covering such aspects as power, generality, reliability, and

ease of use.

Macro facilities dier greatly in their ability to perform

syntactic transformations. Their power can be judged by

the complexity of their underlying syntactic representations

and the extent of their programmability. Macro systems

have used input representations such as characters, tokens,

syntax, and semantic information, with the latter leading to

more powerful transformations than the impoverished for-

mer. First, the underlying syntax representation aects the

ability of the macro system to maintain source locations.

During debugging, a user should not have to step through

macro expanded cruft, no more than they should have to

step through code obfuscated by optimizations. Second, the

syntax representation determines the ease with which syntax

can be manipulated. Ultimately, in order to perform pow-

erful transformations, the underlying representation needs

to be at least rich enough to represent recursive structures

such as expressions.

A macro facility's programmability is a function of the

strength of its transformational engine, starting from a pure-

ly substitution system (e.g., the C Preprocessor [11]) all the

way up to a full programming language. In between, some

macro systems augment substitution with support for rep-

etition and conditionals. The more powerful the transfor-

mational engine, the more expressive the macro facility will

be. A good test of a macro facility is the extent to which it

can represent the base language on which it was built. This

helps determine whether a language can be seamlessly ex-

tended or whether macro dened constructs will be conned

to a limited syntactic space.

A powerful macro facility should provide an ability to

read and write syntactic elements. Input facilities are obvi-

ously crucial for writing a le compiler, while output facili-

ties are useful for warnings and debugging. Ideally, a macro

facility could be provided as a separate library upon which

to base compilers and code analysis tools.

Another important axis along which to measure a macro

facility is its reliability. Will it produce syntactically correct

output and produce correct results in all cases? Can a macro

introduce a syntax error by way of an admissible usage? In

other words, users should only see syntax errors from the

actual code they write. Can syntactic constituents interfere

with each other? In some systems (e.g., C Preprocessor [11])

extra parentheses are required to prevent unwanted operator

precedence bugs.

From an ease of use point of view as well as a reliabil-

ity perspective, it is important that a macro facility ensures

that variable references copied from a macro call and from

a macro denition mean the same thing in an expansion. In

particular, the macro system should (1) avoid accidental col-

lisions of macro variables with program variables of the same

name regardless of the context in which a given macro is

used and (2) maintain consistency when macro variables re-

fer to identically named variables introduced within a given

macro denition. (1) is generally referred to as hygiene [12]

and (2) is generally referred to as referential transparency .

A violation of either is called a variable capture error . Vari-

able capture errors create subtle bugs that are dicult to

debug. Much has been written about how to avoid intro-

ducing these sorts of errors [8], but we maintain that there

is no reason to introduce them in the rst place.

Other aspects that improve usability are intuitiveness of

a macro's denition and ease with which it's debugged. We

assert that a macro facility should enable a programmer to

write down what they want the input and output to look like

augmented only by substitution parameters. The alterna-

tive of constructing and manipulating representations man-

ually or specifying productions in terms of Abstract Syntax

Tree classes is too tedious and error prone requiring inti-

mate knowledge of an underlying syntactic representation.

Debugging shift/reduce errors is a daunting task for even

the best programmers. Furthermore, some form of a trace

of the transformation engine and selected expansions is a

great help to macro writers.

3 An Overview of Syntax Representations

We split the space of syntax representations into four broad

categories: text based, token based, abstract syntax trees,

and skeleton syntax trees. Each one in considered in turn

in the following sections and evaluated with respect to the

requirements set out above.

3.1 Text-Based Representations

The most basic representation of a piece of code is its origi-

nal text. Such unstructured text is dicult to analyse and

manipulate, and the simple string-based pattern matching

tools often used are neither syntax aware nor context aware.

Even simple things like matching only code, as opposed to

text contained within strings or comments, can be problem-

atical.

Repeated application of a full parser, if available, is ex-

pensive and may not be possible on subexpressions taken

out of context. Also, unless auxiliary data is maintained

somehow, source location information is lost by transforma-

tions.

Because of these problem and others, a text-based rep-

resentation clearly does not meet our requirements.

3.2 Token-Based Representations

In a token-based representation, code is manipulated as a

sequence of tokens as generated by a lexer. This has a num-

ber of benets over straight text. For example, the lexer

typically strips out comments, leaving only code. Also, the

problem of accidental token merging when constructing new

phrases can be eliminated. The C preprocessor [11] and

Pop-11 [1] are examples of languages oering a token-based

macro system.

If each token is represented by a structure rather than a

simple string, it is possible to record its classication (e.g.

literal string, identier, punctuation) and source location

along with it. This information can be maintained through

source-to-source transformations.

Processing an unstructured token stream is not very con-

venient, however. If a macro takes as its arguments a xed

number of tokens following the macro name, it works cor-

rectly only when its arguments are single-token elements. If

a macro needs to work in terms of expressions, the number

of tokens it accepts is variable. Consider the following:

define token-macro increment!

(loc1, comma, loc2) => (replacement-tokens)

assert(comma = ",");

concatenate(loc2, ":=", loc1, "+", delta)

end token-macro;

// OK

increment! n, delta;

// Assertion fails, comma gets bound to "["

increment! a[i], delta;

C macros and one kind of Pop-11 macro actually con-

strain the shape of macro input to being of function call

form. This allows the parser to easily determine the extent

of a macro's input and split it into comma-separated chunks

before nally passing these on to the macro.

At the other extreme, macros can be given full control

over the token stream. While this is the ultimate in versa-

tility, it presents some practical problems. For example, the

extent of forms cannot be determined without performing

macro expansion. This can be a problem for development

environments and code analysis tools, but also for program-

mers. The approach lacks modularity. Token macros be-

come, in essence, part of an ad-hoc recursive descent parser

and programmers must be aware of the implications of that.

They must always take care to respect any macro syntax

contained within the subexpressions manipulated because

those subexpressios are not isolated within nested structure.

One further problem familiar to most C programmers

is inadvertent expression merging. When bringing code to-

gether within a macro or when returning replacement code

to a macro call point, care must be taken to isolate the

inserted code from its surroundings. Wrapping with paren-

theses where appropriate is normally sucient to avoid, typ-

ically, operator precedence related bugs. Failing to do so is

still a common source of bugs, however.

Token-based solutions can clearly be very powerful. How-

ever, we feel that the pitfalls and complexities that come

with that power make them an unlikely basis for a macro

system intended to be as simple and accessible as Lisp's.

Referring back to a previous section, another way to look

at an SST is as a lightly-structured, token-based representa-

tion. SST's have most of the power of general token-based

approaches, but without many of the pitfalls discussed. The

challenge is to dene a simple SST upon which it is possible

to build a language with a conventional algebraic syntax.

It is worth noting that although Lisp is SST-based, the

SST used in most Lisps would not meet our requirements.

Because standard run-time objects (numbers, symbols, lists)

are reused to represent the Lisp SST, there is no reliable

way to record auxiliary information like source location or

hygiene context. Scheme macro systems [5] that implement

automatic hygiene have to face this deciency and work in

terms of special-purpose syntax objects rather than lists and

symbols.

4 Dylan's Rewrite-Rule Only Macro System

Dylan macros provide a way to dene new syntactic forms.

Their denitions are called macro denitions and their us-

ages are called macro calls. At compile-time, macro ex-

pansion replaces macro calls with other constructs per their

denition. This macro expansion process is repeated until

the resulting expansion contains no more macro calls.

3.3 Abstract Syntax Tree Representations

Abstract syntax trees (AST's) are highly structured repre-

sentations of source code. In an object-oriented implementa-

tion, there would typically be one class per denition, state-

ment, or other language construct. Down at the expression

level, there might be classes corresponding to function calls,

array accesses, literal references, and so on.

Typically with an AST, adding a new syntactic construct

requires extending the parser to understand it and adding

a new class to represent that construct. How dicult this

is is related to the complexity of the parser because there is

no intermediate form between tokens and full AST. Worst

case, there is no special support at all for modular macro

extensions, and new language constructs must be dened as

a cooperating element of the full language grammar [3]. It

could require extensive knowledge of the parser's intricacies

to avoid or resolve problems like shift/reduce conicts in an

LALR parser, for example. We don't consider requiring such

knowledge of programmers to be reasonable.

4.1 Dylan's Rewrite Rules

The macro system dened in the Dylan Reference Manual

(DRM) [13] is a rewrite-rule only pattern matching template

substitution system. Macros are named with module bind-

ings using the usual Dylan namespace mechanism. Each

macro is comprised of a sequence of rewrite rules where the

left-hand side of a rule is a pattern that matches a fragment

of code and the right-hand side of a rule is a template which

is substituted for the matched fragment. Pattern variables

in the left-hand side serve as macro arguments. Pattern vari-

ables appearing on the right-hand side substitute arguments

into the expansion. The rules are attempted in order until

a given rule's pattern is found to match, at which time, the

matched fragment is replaced with the corresponding tem-

plate. If no patterns are found to match, then an error is

raised.

3.4 Skeleton Syntax Tree Representations

The distinguishing feature of a skeleton syntax tree (SST) is

that it has few classes and corresponds to a very simple core

grammar. That grammar describes \shapes" rather than

particular language constructs. Many dierent constructs

may t within the same basic shape.

The most extreme example of a language based on an

SST is, of course, Lisp. Lisp's underlying grammar says

nothing about cond or defun , it simply species the single

basic shape of all Lisp constructs. Although not a program-

ming language, XML has much the same property.

SST's dene the axioms to which all syntax must con-

form. In Lisp, it is, essentially, that brackets match. In

XML, that tags must match. The simplicity of the axioms

makes the overall structure of the language easy for people

to understand and to remember and also for programs to

work with.

Language-level constructs are dened modularly, above

the core grammar and without requiring any changes to it.

User-dened constructs have the same status as standard

syntactic forms.

4.2 Dylan's Skeleton Syntax Tree

Before we delve into the intricacies of the rewrite-rule only

macro system, it is important to explore Dylan's SST. Dy-

lan's SST is a richer version of Lisp's s-expression based

SST. Lisp's SST can roughly be described as follows:

form: symbol

form: literal

form: "(" elements ")"

elements: form ...

Dylan adds to this a few more basic shapes:

form: identifier

form: literal

form: "(" elements ")"

form: "define" modifiers LIST-DEFINITION elements ";"

form: "define" modifiers BODY-DEFINITION elements "end" ";"

form: STATEMENT elements "end" ";"

elements: [form | punctuation ] ...

where modifiers are a sequence of adjective names. Consult

the DRM [13] for the full grammar.

Although, the Dylan rule-based macro system could be

made to work with more involved and stricter grammars, a

skeleton syntax tree oers a number of advantages. First,

an SST modularizes parsing, that is, descending structures

without doing an expansion is now possible, changes in a

given expansion algorithm don't require redoing a top-level

parse, and syntax errors can be kept localized. Second, an

SST gives the user a lot of latitude to choose an appropriate

grammar for a given macro, that is, the Dylan grammar

does not dictate the precise form of templates, but instead

very loosely constrains their basic shape.

with ?test bound to close? and ?body bound to close(stream) .

Templates do not produce a parsed output but instead

produce a well formed sequence of tokens whose parsing

is delayed, where well-formed means that all occurrences

of Dylan brackets match. Delaying their parsing permits

fewer inter-macro dependencies thereby allowing macros to

be compiled independent of each other and permitting re-

cursive macros without forward declarations.

4.5 Example Macros

We present a number of Dylan macros in order to give a

sense of the range of their uses. Graham's book [8] is an

excellent source for macro motivation and examples. An

example of a function macro is increment! :

4.3 Source-level Pattern Matching

Now that we have described Dylan's SST we are in a position

to describe the constituents of rules. Patterns appearing on

the left hand side of rules provide a \what you see is what

you get" (WYSIWYG) mechanism for specifying the desired

input of a macro in terms of concrete syntax. A pattern ba-

sically looks like the construct that it is to match augmented

by pattern variables which match and bind to appropriate

parts of the construct. Pattern variables are denoted with a

? prexing their names. Pattern variables have constraints

that restrict the syntactic type of fragments (based on the

Dylan grammar) that they match when appearing in a pat-

tern. Some of the various pattern constraints are token,

name, variable, expression, and body. Constraints are de-

noted with a : separated pattern variable name's sux.

An example pattern (forming the basis of a complement

to Dylan's unless macro) is the following:

define macro increment!

f increment!(?place:expression) g

=> f ?place := ?place + 1 g

f increment!(?place:expression, ?amount:expression) g

=> f ?place := ?place + ?amount g

end macro;

increment!(x);

increment!(height(x), 10);

which is the Dylan equivalent of C's ++ .

An example statement macro is with-open-file :

define macro with-open-file

f with-open-file (?stream:name, ?options:*) ?:body end g

=> f let ?stream = #f;

block ()

?stream := make(<file-stream>, ?options);

?body

cleanup

?stream & close(?stream)

end block g

f when (?test:expression) ?body:body end g

where the following would be a well-formed when macro call:

end macro;

when (close?) close(stream) end

with-open-file (stream, locator: "phonenumbers")

process-phone-numbers(stream);

end with-open-file;

Patterns match their input based on a simple matching

procedure. In order to avoid easily constructed ambiguous

patterns, patterns are matched using left to right processing

combined with a match shortest rst priority for matching

wildcard pattern variables. A pattern matches if and only

if all of its sub-patterns match. For more details, please

consult the DRM [13] .

where the * constraint on the pattern variable ?options

means that options can match any parsed fragment. This

macro is typical of a whole range of macros, called with-

macros, where resources, datastructures, or special variables

must be managed (and cleanups performed) or more gener-

ally where context needs to be built.

The following dene macro:

4.4 Source-level Code Generation

Templates appearing on the right hand side of rules provide

a WYSIWYG mechanism for specifying the desired output

of a macro in terms of concrete syntax. A template basically

looks like the construct that it produces augmented by pat-

tern variables that stand in for parts of the construct bound

to the same pattern variables in the corresponding left hand

side pattern. For example, the full macro containing the

example pattern from above would be:

define macro functional-variable-definer

f define functional-variable ?var:name = ?init:expression g

=> f define variable ?var ## "-value" = ?init;

define method ?var ()

?var ## "-value"

end method;

define method ?var ## "-setter" (new-value)

?var ## "-value" := new-value;

end method g

end macro;

define macro when

f when (?test:expression) ?body:body end g

=> f if (?test) ?body end if g

end macro;

? define functional-variable time = 0;

? time();

> 0

? time() := 25;

> 25

where

when (close?) close(stream) end

would expand into

provides a functional interface to variables, dening acces-

sors for variables, and thus hiding implementation details.

The above example demonstrates how to create new identi-

ers through the ## concatenation operator.

if (close?) close(stream) end if

4.6 Hygiene

Dylan's macro system is hygienic and maintains referen-

tial transparency such that variable references copied from

a macro call and from a macro denition mean the same

thing in an expansion. For an example of hygiene, consider

the following macro:

define macro defaulted-variable-definer

f define defaulted-variable ?:name ?default g

=> f define variable ?name = ?default g

default:

f g => f #f g

f = ?:expression g => f ?expression g

end macro;

Instead of adding more main rules, only extra rules for the

varying fragments need to be written.

It turns out though that local rewrite rule sets are partic-

ularly useful for macros involving iterating over a sequence

of constituent fragments:

define macro or

f or(?x:expression, ?y:expression)

=> f let tmp = ?x;

if (tmp) tmp else ?y end g

end define;

where a temporary binding, called tmp , is introduced in or-

der to avoid multiple evaluations of the rst argument, x .

The macro system ensures that tmp introduced by the or

macro does not collide with visible bindings named tmp in

the enclosing program. For example, the following

define macro properties-definer

f define properties ?kind:name ?properties end g

=> f define variable ?kind = list(?properties) g

properties:

f g => f g

f ?prop:name; ... g => f ?#"prop", ... g

end macro;

begin

let tmp = 5;

or(1, 2);

tmp

end

where, the substitution ?#"property" coerces the identier

into a similarly named symbol. The ellipses ... are a short-

hand for aux-rule recursion, that is,

properties:

f g => f g

f ?prop:name; ... g => f ?#"prop", ... g

returns 5 .

For an example of referential transparency, consider the

following increment! macro from above. The + in the re-

sulting template refers to + in the dening macro's names-

pace regardless of where the macro is used. In other words,

even in the face of + being renamed or even not imported

at all into the namespace of a macro call, the macro system

ensures that increment! 's + is the same and is available.

Sometimes it is necessary to circumvent hygiene in order

to permit macros to introduce variables accessible by the

lexical context from which the macro was called. For exam-

ple, imagine writing a loop macro that provides a standard

named function, called break , for calling when terminating

iteration:

is a shorthand for

properties:

f g => f g

f ?prop:name; ?properties g => f ?#"prop", ?properties g

5 The D-Expressions Library

In order to write macros that require more complicated

macro expansion processing, we introduce d-expressions and

later a full procedural macro facility. The d-expressions li-

brary provides a collection of classes suitable for representing

fragments of source code in skeleton syntax tree form. If de-

sired, skeleton syntax trees may be constructed and pulled

apart manually using the exported interface of these classes.

In addition to the raw, skeleton syntax class hierarchy,

source level tools are provided for easier parsing and con-

struction of source code fragments held in that form. These

tools are the moral equivalents of the destructuring-bind and

backquote macros familiar to Lisp programmers.

The nal facility provided is source code IO. An input

method is dened that is parameterizable with syntactic

context (i.e. a map from identiers to their syntactic cate-

gories) and also, optionally, with a home context (e.g. the

code's top level module context) and source location context.

An output method is also dened and writes a re-readable

text form of any given skeleton tree. These methods are the

moral equivalents of Lisp's read and write functions.

define macro repeat

f repeat ?:body end g

=> f block (?=break)

local method repeat () ?body end;

repeat();

end block g

end macro;

A user can then write the following:

repeat

if ((input := read(*standard-input*, eof: #f)) == #f)

break()

else

write(input, *standard-output*)

end if

end repeat;

and have it terminate upon eof by calling the repeat macro's

provided exit function, break . This mechanism can also be

used for writing anaphoric macros [8].

5.1 Skeleton Syntax Tree Classes

The basic d-expression AST class hierarchy is agreeably sim-

ple:

4.7 Local Rewrite Rules

Dylan provides a mechanism for dening local named rewrite

rule sets which act like local subroutines. These help in

breaking a macro up into rewrite rule subproblems and per-

mit a form of modularization. They are invoked after an

initial pattern matching using the main rule set and before

template substitution. Instead of backtracking, a macro is

deemed invalid if none of a given local rewrite rules match.

Consider the following example:

<d-expression> [Abstract]

<d-leaf> [Abstract]

<d-identifier>

<d-literal>

<d-punctuation>

<d-compound> [Abstract]

<d-nested>

<d-macro-call>

<d-sequence>

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: