Friday, 27 July 2007, 22:34
Gary King recently blogged a question about whether a use of
EVAL he had recently made in some Common Lisp code was legit.
This caught my eye because I suspect this question came up while Gary
was working on a project that I once worked on. Anyway, since he
asked, the short answer is, “No. The old rule that if you’re using
EVAL you’re doing something wrong is still true.” A slightly
longer answer follows.
The problem Gary set out to solve is that he’s got a a bunch of chunks of text coming out of some sort of database and he wants to query them with logical expressions like this one:
(or
(and "space" "mission")
(and (or "moon" "lunar") "landing"))
These expressions are obviously not Lisp but rather a query language
where the logical operators AND and OR (and presumably
other ones like NOT) have their usual logical meaning and
literal strings should evaluate to true if the text we’re matching
against testing contains the string and false if not. Thus the
expression above would match text containing the words “space” and
“mission” or either of “moon” or “lunar” along with “landing”. Gary
considered, and rejected, writing “a simple recursive evaluator to
deal with ands and ors”. Instead he wrote some code to munge around
expressions like the one above into a form that he could EVAL.
In other words, instead of writing his own interpreter he just munged
his code into a form that Lisp could interpret for him. Which is okay.
Except for the rule that if you’re using EVAL you’re doing
something wrong.
What he should have done is written a compiler. Luckily Common Lisp comes with a Lisp compiler built in so all we have to do to write our own compiler is translate our source language into Lisp. Here’s how I’d do it.
First define a function that takes a query expression and returns two
values, an equivalent expression with all string literals replaced
with GENSYM’d variable names, and an alist of the strings and
the corresponding GENSYM’d names. In this phase we also detect
if the same literal string appears more than once so we only generate
a single binding for each unique string.
(defun translate (tree &optional bindings)
(typecase tree
(cons
(multiple-value-bind (newcar newbindings)
(translate (car tree) bindings)
(multiple-value-bind (newcdr newbindings2)
(translate (cdr tree) newbindings)
(values (cons newcar newcdr) newbindings2))))
(symbol (values tree bindings))
(string
(let ((binding (assoc tree bindings :test #'equal)))
(cond
(binding (values (cdr binding) bindings))
(t
(let ((sym (gensym)))
(values sym (acons tree sym bindings)))))))))
With this function we can translate an expression like:
(or (and "a" "b") (and (or "c" "d") (or "a" "b")))
Into this expression:
(OR (AND #:G1 #:G2) (AND (OR #:G3 #:G4) (OR #:G1 #:G2)))
and this list of bindings:
(("d" . #:G4) ("c" . #:G3) ("b" . #:G2) ("a" . #:G1))
Now we just need to use those two values to build up a bit of Lisp.
Gary’s solution was to build up an expression that he could
EVAL. But it’s better to generate a LAMBDA expression
because then we can compile it. Here’s the function:1
(defun make-lambda-expression (expr)
(multiple-value-bind (tree bindings) (translate expr)
(let ((string (gensym)))
`(lambda (,string)
(let (,@(loop for (word . sym) in bindings collect
`(,sym (find-word-in-string ,word ,string))))
,tree)))))
If we pass the same expression to this function we get the following lambda expression back.
(LAMBDA (#:G9)
(LET ((#:G8 (FIND-WORD-IN-STRING "d" #:G9))
(#:G7 (FIND-WORD-IN-STRING "c" #:G9))
(#:G6 (FIND-WORD-IN-STRING "b" #:G9))
(#:G5 (FIND-WORD-IN-STRING "a" #:G9)))
(OR (AND #:G5 #:G6) (AND (OR #:G7 #:G8) (OR #:G5 #:G6)))))
We could use FUNCALL to evaluate this expression which would at
least get us out of using EVAL.2 But the
real advantage of this approach is that we can compile this
expression. Since Gary said he wanted to find all the strings in his
database that match a given expression, he’s probably going to be
evaluating his query once per string in his database. In that case
it’s probably worth it to take a small up front hit in order to speed
up the execution of the query since we’re going to be executing it
many times. Luckily compiling a lambda expression is about as trivial
as EVALing any other expression:
(defun compile-expression (expr)
(compile nil (make-lambda-expression expr)))
This function, fed a query expression, returns a compiled function
that takes a single string argument and returns true if the query
expression matches and false if not. On Lisps with native compilers
the returned function will be compiled down to machine code just the
same as if we had written it by hand in our source code. We can
FUNCALL this function such as in this code that collects all
the strings returned by a cursor function that match the query
expression:
(defun query-strings (query database)
(loop with predicate = (compile-expression query)
with cursor = (string-cursor database)
for string = (next-string cursor)
while string
when (funcall predicate string) collect string))
We can also use the query function with all of Lisp’s
higher-order-functions, such as REMOVE-IF-NOT:
(remove-if-not (compile-expression query) *all-strings*)
Now, one could argue that there’s a whole heck of a lot of difference
between using EVAL and wrapping something in a lambda
expression and compiling it — in both cases you can generate, and then
evaluate, arbitrary Lisp code. But there is an important difference,
namely that EVAL just evaluates — it takes some data and
interprets it as Lisp and gives you an answer straight away whereas
compiling a lambda expression gives you a function, something that can
interact with the rest of your code, as an argument to higher-order
functions, and so on. Or, if you don’t buy that, at least you avoid
breaking the no EVAL rule.
Update: It hit me as I was brushing my teeth that while nicely
avoiding EVAL, my first solution had a big problem — because
all the FIND-WORD-IN-STRING calls are done in the LET
before the boolean expression is evaluated we lose all the advantage
of AND and OR’s short-circuiting behavior. That is, the
generated code searches for all the strings and then combines the
results of all those searches. Much better (and simpler) would be to
implement TRANSLATE and MAKE-LAMBDA-EXPRESSION this way:
(defun translate (tree &optional (stringvar (gensym)))
(typecase tree
(cons
(cons (translate (car tree) stringvar) (translate (cdr tree) stringvar)))
(symbol tree)
(string `(find-word-in-string ,tree ,stringvar))))
(defun make-lambda-expression (expr)
(let ((string (gensym)))
`(lambda (,string) ,(translate expr string))))
This has the slight disadvantage that if the same literal string appears in the pattern more than once, we will potentially search for it more than once. But that’s probably much less of an issue than the problem this code fixes. Obviously if we cared to, we could generate code that caches searches once they are done to get the best of both worlds. But I’ll leave that as an exercise for the reader.
1. This
function generates calls to a helper function
FIND-WORD-IN-STRING. Gary’s version was somewhat more complex
but for the purposes of illustration this definition should do:
(defun find-word-in-string (word string)
(search word string :test #'char-equal))
2. Gary’s solution was haired
up a bit because he didn’t simply generate a LET form to
EVAL but instead generated just the boolean expression and then
wrapped the call to EVAL in a PROGV to establish dynamic
bindings at run-time. Which, while sort of clever, was another sign
from the gods that he had gone down a wrong path somewhere.
Last updated 2007-07-28T00:07:59+7:00.