D-Expressions Lisp Power Dylan Style.pdf

(160 KB) Pobierz
356367214 UNPDF
D-Expressions: Lisp Power, Dylan Style
Jonathan Bachrach
Articial Intelligence Laboratory
Massachussetts Institute of Technology
Cambridge, MA 02139
jrb@ai.mit.edu
Keith Playford
Functional Objects, Inc.
86 Chandler Street
Somerville, MA 02144
keith@functionalobjects.com
1 Abstract
in languages with more conventional syntaxes like those of
C or Pascal, but in comparison with what Lisp provides, the
solutions have been restrictive, dicult to explain and use,
or both. None have been standardized. Further, the utility
of syntax manipulation tools independent of the macroex-
pansion process is typically forgotten.
This paper aims to demonstrate that it is possible for a lan-
guage with a rich, conventional syntax to provide Lisp-style
macro power and simplicity. We describe a macro system
and syntax manipulation toolkit designed for the Dylan pro-
gramming language that meets, and in some areas exceeds,
this standard. The debt to Lisp is great, however, since
although Dylan has a conventional algebraic syntax, the ap-
proach taken to describe and represent that syntax is dis-
tinctly Lisp-like in philosophy.
2.2 Lisp Power, Dylan Style
This paper aims nally to demonstrate that it is possible for
a language with a richer, more conventional syntax to pro-
vide Lisp-style macro power and simplicity. We describe a
macro system and syntax manipulation toolkit designed for
the Dylan programming language (cf [13], [7]) that meets our
goals as well as, if not better than, those commonly found in
Lisps. The debt to Lisp is great, however, since although Dy-
lan has a conventional algebraic syntax, the approach taken
to describe and represent its syntax is distinctly Lisp-like in
philosophy.
Taking Dylan's already capable rule-based pattern match-
ing and template substitution macro system as a starting
point, we explore its underpinnings. Within a broader com-
parison with other styles, we relate Dylan's approach to the
Lisp model and show how their common basis can (and has,
in other contexts) be applied to a family of languages whose
syntax is based on what we term a skeleton syntax tree . We
go on to describe a library which provides a code representa-
tion ( d-expressions ), source-level pattern matching and code
construction utilities, and source code IO. Some models of
compile-time evaluation are proposed that are suitable for
Dylan, a language in which compiler and run-time are sep-
arated. Finally, putting the d-expressions library together
with a compile-time evaluation model leads us to a natu-
ral procedural macro extension of Dylan's rule-based macro
system, one comparable to those available in Lisp.
2 Introduction
The ability to extend a programming language with new
constructs is a valuable tool. With it, system designers can
grow a language towards their problem domain and enhance
productivity and ease of maintenance. A macro system pro-
vides this capability in a portable, high-level fashion by al-
lowing new constructs to be implemented in terms of exist-
ing ones via programmer-dened source-to-source transfor-
mations.
Beyond the above, the ability to read, write, and eas-
ily manipulate the syntax of a language from within that
language can be especially powerful. It can allow the full
language to be brought to bear when implementing macros.
It can provide a convenient means of saving and restoring
structured data in text form. It can form the basis of code
analysis tools and be the starting point for experiments with
new language processors or into modied language seman-
tics.
2.1 Successes and Failures
Lisp is the only language family (cf [9], [10]) that has suc-
ceeded in providing integrated macro systems along with
simple and powerful syntax manipulation tools like these.
They are considered one of Lisp's unique strengths, perhaps
even the most important and distinctive feature of the lan-
guage. But the key to their viability in Lisp is the simplicity
and regularity of its syntax. Recognizing their utility, at-
tempts have been made to provide powerful macro facilities
2.3 Requirements and Goals
A macro facility provides a mechanism to extend a base
syntax. Such a facility can be measured along a set of axes,
covering such aspects as power, generality, reliability, and
ease of use.
Macro facilities dier greatly in their ability to perform
syntactic transformations. Their power can be judged by
the complexity of their underlying syntactic representations
and the extent of their programmability. Macro systems
have used input representations such as characters, tokens,
syntax, and semantic information, with the latter leading to
1
more powerful transformations than the impoverished for-
mer. First, the underlying syntax representation aects the
ability of the macro system to maintain source locations.
During debugging, a user should not have to step through
macro expanded cruft, no more than they should have to
step through code obfuscated by optimizations. Second, the
syntax representation determines the ease with which syntax
can be manipulated. Ultimately, in order to perform pow-
erful transformations, the underlying representation needs
to be at least rich enough to represent recursive structures
such as expressions.
A macro facility's programmability is a function of the
strength of its transformational engine, starting from a pure-
ly substitution system (e.g., the C Preprocessor [11]) all the
way up to a full programming language. In between, some
macro systems augment substitution with support for rep-
etition and conditionals. The more powerful the transfor-
mational engine, the more expressive the macro facility will
be. A good test of a macro facility is the extent to which it
can represent the base language on which it was built. This
helps determine whether a language can be seamlessly ex-
tended or whether macro dened constructs will be conned
to a limited syntactic space.
A powerful macro facility should provide an ability to
read and write syntactic elements. Input facilities are obvi-
ously crucial for writing a le compiler, while output facili-
ties are useful for warnings and debugging. Ideally, a macro
facility could be provided as a separate library upon which
to base compilers and code analysis tools.
Another important axis along which to measure a macro
facility is its reliability. Will it produce syntactically correct
output and produce correct results in all cases? Can a macro
introduce a syntax error by way of an admissible usage? In
other words, users should only see syntax errors from the
actual code they write. Can syntactic constituents interfere
with each other? In some systems (e.g., C Preprocessor [11])
extra parentheses are required to prevent unwanted operator
precedence bugs.
From an ease of use point of view as well as a reliabil-
ity perspective, it is important that a macro facility ensures
that variable references copied from a macro call and from
a macro denition mean the same thing in an expansion. In
particular, the macro system should (1) avoid accidental col-
lisions of macro variables with program variables of the same
name regardless of the context in which a given macro is
used and (2) maintain consistency when macro variables re-
fer to identically named variables introduced within a given
macro denition. (1) is generally referred to as hygiene [12]
and (2) is generally referred to as referential transparency .
A violation of either is called a variable capture error . Vari-
able capture errors create subtle bugs that are dicult to
debug. Much has been written about how to avoid intro-
ducing these sorts of errors [8], but we maintain that there
is no reason to introduce them in the rst place.
Other aspects that improve usability are intuitiveness of
a macro's denition and ease with which it's debugged. We
assert that a macro facility should enable a programmer to
write down what they want the input and output to look like
augmented only by substitution parameters. The alterna-
tive of constructing and manipulating representations man-
ually or specifying productions in terms of Abstract Syntax
Tree classes is too tedious and error prone requiring inti-
mate knowledge of an underlying syntactic representation.
Debugging shift/reduce errors is a daunting task for even
the best programmers. Furthermore, some form of a trace
of the transformation engine and selected expansions is a
great help to macro writers.
3 An Overview of Syntax Representations
We split the space of syntax representations into four broad
categories: text based, token based, abstract syntax trees,
and skeleton syntax trees. Each one in considered in turn
in the following sections and evaluated with respect to the
requirements set out above.
3.1 Text-Based Representations
The most basic representation of a piece of code is its origi-
nal text. Such unstructured text is dicult to analyse and
manipulate, and the simple string-based pattern matching
tools often used are neither syntax aware nor context aware.
Even simple things like matching only code, as opposed to
text contained within strings or comments, can be problem-
atical.
Repeated application of a full parser, if available, is ex-
pensive and may not be possible on subexpressions taken
out of context. Also, unless auxiliary data is maintained
somehow, source location information is lost by transforma-
tions.
Because of these problem and others, a text-based rep-
resentation clearly does not meet our requirements.
3.2 Token-Based Representations
In a token-based representation, code is manipulated as a
sequence of tokens as generated by a lexer. This has a num-
ber of benets over straight text. For example, the lexer
typically strips out comments, leaving only code. Also, the
problem of accidental token merging when constructing new
phrases can be eliminated. The C preprocessor [11] and
Pop-11 [1] are examples of languages oering a token-based
macro system.
If each token is represented by a structure rather than a
simple string, it is possible to record its classication (e.g.
literal string, identier, punctuation) and source location
along with it. This information can be maintained through
source-to-source transformations.
Processing an unstructured token stream is not very con-
venient, however. If a macro takes as its arguments a xed
number of tokens following the macro name, it works cor-
rectly only when its arguments are single-token elements. If
a macro needs to work in terms of expressions, the number
of tokens it accepts is variable. Consider the following:
define token-macro increment!
(loc1, comma, loc2) => (replacement-tokens)
assert(comma = ",");
concatenate(loc2, ":=", loc1, "+", delta)
end token-macro;
// OK
increment! n, delta;
// Assertion fails, comma gets bound to "["
increment! a[i], delta;
C macros and one kind of Pop-11 macro actually con-
strain the shape of macro input to being of function call
form. This allows the parser to easily determine the extent
of a macro's input and split it into comma-separated chunks
before nally passing these on to the macro.
At the other extreme, macros can be given full control
over the token stream. While this is the ultimate in versa-
tility, it presents some practical problems. For example, the
2
extent of forms cannot be determined without performing
macro expansion. This can be a problem for development
environments and code analysis tools, but also for program-
mers. The approach lacks modularity. Token macros be-
come, in essence, part of an ad-hoc recursive descent parser
and programmers must be aware of the implications of that.
They must always take care to respect any macro syntax
contained within the subexpressions manipulated because
those subexpressios are not isolated within nested structure.
One further problem familiar to most C programmers
is inadvertent expression merging. When bringing code to-
gether within a macro or when returning replacement code
to a macro call point, care must be taken to isolate the
inserted code from its surroundings. Wrapping with paren-
theses where appropriate is normally sucient to avoid, typ-
ically, operator precedence related bugs. Failing to do so is
still a common source of bugs, however.
Token-based solutions can clearly be very powerful. How-
ever, we feel that the pitfalls and complexities that come
with that power make them an unlikely basis for a macro
system intended to be as simple and accessible as Lisp's.
Referring back to a previous section, another way to look
at an SST is as a lightly-structured, token-based representa-
tion. SST's have most of the power of general token-based
approaches, but without many of the pitfalls discussed. The
challenge is to dene a simple SST upon which it is possible
to build a language with a conventional algebraic syntax.
It is worth noting that although Lisp is SST-based, the
SST used in most Lisps would not meet our requirements.
Because standard run-time objects (numbers, symbols, lists)
are reused to represent the Lisp SST, there is no reliable
way to record auxiliary information like source location or
hygiene context. Scheme macro systems [5] that implement
automatic hygiene have to face this deciency and work in
terms of special-purpose syntax objects rather than lists and
symbols.
4 Dylan's Rewrite-Rule Only Macro System
Dylan macros provide a way to dene new syntactic forms.
Their denitions are called macro denitions and their us-
ages are called macro calls. At compile-time, macro ex-
pansion replaces macro calls with other constructs per their
denition. This macro expansion process is repeated until
the resulting expansion contains no more macro calls.
3.3 Abstract Syntax Tree Representations
Abstract syntax trees (AST's) are highly structured repre-
sentations of source code. In an object-oriented implementa-
tion, there would typically be one class per denition, state-
ment, or other language construct. Down at the expression
level, there might be classes corresponding to function calls,
array accesses, literal references, and so on.
Typically with an AST, adding a new syntactic construct
requires extending the parser to understand it and adding
a new class to represent that construct. How dicult this
is is related to the complexity of the parser because there is
no intermediate form between tokens and full AST. Worst
case, there is no special support at all for modular macro
extensions, and new language constructs must be dened as
a cooperating element of the full language grammar [3]. It
could require extensive knowledge of the parser's intricacies
to avoid or resolve problems like shift/reduce conicts in an
LALR parser, for example. We don't consider requiring such
knowledge of programmers to be reasonable.
4.1 Dylan's Rewrite Rules
The macro system dened in the Dylan Reference Manual
(DRM) [13] is a rewrite-rule only pattern matching template
substitution system. Macros are named with module bind-
ings using the usual Dylan namespace mechanism. Each
macro is comprised of a sequence of rewrite rules where the
left-hand side of a rule is a pattern that matches a fragment
of code and the right-hand side of a rule is a template which
is substituted for the matched fragment. Pattern variables
in the left-hand side serve as macro arguments. Pattern vari-
ables appearing on the right-hand side substitute arguments
into the expansion. The rules are attempted in order until
a given rule's pattern is found to match, at which time, the
matched fragment is replaced with the corresponding tem-
plate. If no patterns are found to match, then an error is
raised.
3.4 Skeleton Syntax Tree Representations
The distinguishing feature of a skeleton syntax tree (SST) is
that it has few classes and corresponds to a very simple core
grammar. That grammar describes \shapes" rather than
particular language constructs. Many dierent constructs
may t within the same basic shape.
The most extreme example of a language based on an
SST is, of course, Lisp. Lisp's underlying grammar says
nothing about cond or defun , it simply species the single
basic shape of all Lisp constructs. Although not a program-
ming language, XML has much the same property.
SST's dene the axioms to which all syntax must con-
form. In Lisp, it is, essentially, that brackets match. In
XML, that tags must match. The simplicity of the axioms
makes the overall structure of the language easy for people
to understand and to remember and also for programs to
work with.
Language-level constructs are dened modularly, above
the core grammar and without requiring any changes to it.
User-dened constructs have the same status as standard
syntactic forms.
4.2 Dylan's Skeleton Syntax Tree
Before we delve into the intricacies of the rewrite-rule only
macro system, it is important to explore Dylan's SST. Dy-
lan's SST is a richer version of Lisp's s-expression based
SST. Lisp's SST can roughly be described as follows:
form: symbol
form: literal
form: "(" elements ")"
elements: form ...
Dylan adds to this a few more basic shapes:
form: identifier
form: literal
form: "(" elements ")"
form: "define" modifiers LIST-DEFINITION elements ";"
form: "define" modifiers BODY-DEFINITION elements "end" ";"
form: STATEMENT elements "end" ";"
elements: [form | punctuation ] ...
3
where modifiers are a sequence of adjective names. Consult
the DRM [13] for the full grammar.
Although, the Dylan rule-based macro system could be
made to work with more involved and stricter grammars, a
skeleton syntax tree oers a number of advantages. First,
an SST modularizes parsing, that is, descending structures
without doing an expansion is now possible, changes in a
given expansion algorithm don't require redoing a top-level
parse, and syntax errors can be kept localized. Second, an
SST gives the user a lot of latitude to choose an appropriate
grammar for a given macro, that is, the Dylan grammar
does not dictate the precise form of templates, but instead
very loosely constrains their basic shape.
with ?test bound to close? and ?body bound to close(stream) .
Templates do not produce a parsed output but instead
produce a well formed sequence of tokens whose parsing
is delayed, where well-formed means that all occurrences
of Dylan brackets match. Delaying their parsing permits
fewer inter-macro dependencies thereby allowing macros to
be compiled independent of each other and permitting re-
cursive macros without forward declarations.
4.5 Example Macros
We present a number of Dylan macros in order to give a
sense of the range of their uses. Graham's book [8] is an
excellent source for macro motivation and examples. An
example of a function macro is increment! :
4.3 Source-level Pattern Matching
Now that we have described Dylan's SST we are in a position
to describe the constituents of rules. Patterns appearing on
the left hand side of rules provide a \what you see is what
you get" (WYSIWYG) mechanism for specifying the desired
input of a macro in terms of concrete syntax. A pattern ba-
sically looks like the construct that it is to match augmented
by pattern variables which match and bind to appropriate
parts of the construct. Pattern variables are denoted with a
? prexing their names. Pattern variables have constraints
that restrict the syntactic type of fragments (based on the
Dylan grammar) that they match when appearing in a pat-
tern. Some of the various pattern constraints are token,
name, variable, expression, and body. Constraints are de-
noted with a : separated pattern variable name's sux.
An example pattern (forming the basis of a complement
to Dylan's unless macro) is the following:
define macro increment!
f increment!(?place:expression) g
=> f ?place := ?place + 1 g
f increment!(?place:expression, ?amount:expression) g
=> f ?place := ?place + ?amount g
end macro;
increment!(x);
increment!(height(x), 10);
which is the Dylan equivalent of C's ++ .
An example statement macro is with-open-file :
define macro with-open-file
f with-open-file (?stream:name, ?options:*) ?:body end g
=> f let ?stream = #f;
block ()
?stream := make(<file-stream>, ?options);
?body
cleanup
?stream & close(?stream)
end block g
f when (?test:expression) ?body:body end g
where the following would be a well-formed when macro call:
end macro;
when (close?) close(stream) end
with-open-file (stream, locator: "phonenumbers")
process-phone-numbers(stream);
end with-open-file;
Patterns match their input based on a simple matching
procedure. In order to avoid easily constructed ambiguous
patterns, patterns are matched using left to right processing
combined with a match shortest rst priority for matching
wildcard pattern variables. A pattern matches if and only
if all of its sub-patterns match. For more details, please
consult the DRM [13] .
where the * constraint on the pattern variable ?options
means that options can match any parsed fragment. This
macro is typical of a whole range of macros, called with-
macros, where resources, datastructures, or special variables
must be managed (and cleanups performed) or more gener-
ally where context needs to be built.
The following dene macro:
4.4 Source-level Code Generation
Templates appearing on the right hand side of rules provide
a WYSIWYG mechanism for specifying the desired output
of a macro in terms of concrete syntax. A template basically
looks like the construct that it produces augmented by pat-
tern variables that stand in for parts of the construct bound
to the same pattern variables in the corresponding left hand
side pattern. For example, the full macro containing the
example pattern from above would be:
define macro functional-variable-definer
f define functional-variable ?var:name = ?init:expression g
=> f define variable ?var ## "-value" = ?init;
define method ?var ()
?var ## "-value"
end method;
define method ?var ## "-setter" (new-value)
?var ## "-value" := new-value;
end method g
end macro;
define macro when
f when (?test:expression) ?body:body end g
=> f if (?test) ?body end if g
end macro;
? define functional-variable time = 0;
? time();
> 0
? time() := 25;
> 25
where
when (close?) close(stream) end
would expand into
provides a functional interface to variables, dening acces-
sors for variables, and thus hiding implementation details.
The above example demonstrates how to create new identi-
ers through the ## concatenation operator.
if (close?) close(stream) end if
4
4.6 Hygiene
Dylan's macro system is hygienic and maintains referen-
tial transparency such that variable references copied from
a macro call and from a macro denition mean the same
thing in an expansion. For an example of hygiene, consider
the following macro:
define macro defaulted-variable-definer
f define defaulted-variable ?:name ?default g
=> f define variable ?name = ?default g
default:
f g => f #f g
f = ?:expression g => f ?expression g
end macro;
Instead of adding more main rules, only extra rules for the
varying fragments need to be written.
It turns out though that local rewrite rule sets are partic-
ularly useful for macros involving iterating over a sequence
of constituent fragments:
define macro or
f or(?x:expression, ?y:expression)
=> f let tmp = ?x;
if (tmp) tmp else ?y end g
end define;
where a temporary binding, called tmp , is introduced in or-
der to avoid multiple evaluations of the rst argument, x .
The macro system ensures that tmp introduced by the or
macro does not collide with visible bindings named tmp in
the enclosing program. For example, the following
define macro properties-definer
f define properties ?kind:name ?properties end g
=> f define variable ?kind = list(?properties) g
properties:
f g => f g
f ?prop:name; ... g => f ?#"prop", ... g
end macro;
begin
let tmp = 5;
or(1, 2);
tmp
end
where, the substitution ?#"property" coerces the identier
into a similarly named symbol. The ellipses ... are a short-
hand for aux-rule recursion, that is,
properties:
f g => f g
f ?prop:name; ... g => f ?#"prop", ... g
returns 5 .
For an example of referential transparency, consider the
following increment! macro from above. The + in the re-
sulting template refers to + in the dening macro's names-
pace regardless of where the macro is used. In other words,
even in the face of + being renamed or even not imported
at all into the namespace of a macro call, the macro system
ensures that increment! 's + is the same and is available.
Sometimes it is necessary to circumvent hygiene in order
to permit macros to introduce variables accessible by the
lexical context from which the macro was called. For exam-
ple, imagine writing a loop macro that provides a standard
named function, called break , for calling when terminating
iteration:
is a shorthand for
properties:
f g => f g
f ?prop:name; ?properties g => f ?#"prop", ?properties g
5 The D-Expressions Library
In order to write macros that require more complicated
macro expansion processing, we introduce d-expressions and
later a full procedural macro facility. The d-expressions li-
brary provides a collection of classes suitable for representing
fragments of source code in skeleton syntax tree form. If de-
sired, skeleton syntax trees may be constructed and pulled
apart manually using the exported interface of these classes.
In addition to the raw, skeleton syntax class hierarchy,
source level tools are provided for easier parsing and con-
struction of source code fragments held in that form. These
tools are the moral equivalents of the destructuring-bind and
backquote macros familiar to Lisp programmers.
The nal facility provided is source code IO. An input
method is dened that is parameterizable with syntactic
context (i.e. a map from identiers to their syntactic cate-
gories) and also, optionally, with a home context (e.g. the
code's top level module context) and source location context.
An output method is also dened and writes a re-readable
text form of any given skeleton tree. These methods are the
moral equivalents of Lisp's read and write functions.
define macro repeat
f repeat ?:body end g
=> f block (?=break)
local method repeat () ?body end;
repeat();
end block g
end macro;
A user can then write the following:
repeat
if ((input := read(*standard-input*, eof: #f)) == #f)
break()
else
write(input, *standard-output*)
end if
end repeat;
and have it terminate upon eof by calling the repeat macro's
provided exit function, break . This mechanism can also be
used for writing anaphoric macros [8].
5.1 Skeleton Syntax Tree Classes
The basic d-expression AST class hierarchy is agreeably sim-
ple:
4.7 Local Rewrite Rules
Dylan provides a mechanism for dening local named rewrite
rule sets which act like local subroutines. These help in
breaking a macro up into rewrite rule subproblems and per-
mit a form of modularization. They are invoked after an
initial pattern matching using the main rule set and before
template substitution. Instead of backtracking, a macro is
deemed invalid if none of a given local rewrite rules match.
Consider the following example:
<d-expression> [Abstract]
<d-leaf> [Abstract]
<d-identifier>
<d-literal>
<d-punctuation>
<d-compound> [Abstract]
<d-nested>
<d-macro-call>
<d-sequence>
5
Zgłoś jeśli naruszono regulamin