Typed and modular macros for OCaml
It is mainly an effort to make a “compile-time MetaOCaml” 2 (though you don't need prior knowledge of MetaOCaml to read what follows). Its compile-time nature requires additional subtleties to try and make the programming experience as pleasant as in MetaOCaml. If you want to try it out, the repository is available.
What does it look like?
The idea is to write explicitly staged code. There are two phases: compilation (phase 1) and execution (phase 0). The numbering may seem counter-intuitive; it is mostly because, theoretically, there may be several macro expansion phases before compilation (macros inside macros), the execution being the last phase. In that case, it is more natural to refer to phases by 0, 1, 2, 3, etc. than to use negative numbers, or to introduce an abritrary offset.
Compile-time values and functions can be declared using
static instead of
let in toplevel bindings:
# static x = 42;; static val x : int = 42
x is a compile-time value, although of course the difference is not visible in the toplevel.
It is impossible to mix compile-time and runtime values as it wouldn't make sense:
# static x = 42;; static val x : int = 42 # let y = x + 1;; Error: Attempt to use value x of phase 1 in an environment of phase 0
If we try now to create a static
# static stat_succ x = x + 1;; Error: Attempt to use value + of phase 0 in an environment of phase 1
This happens because
(+) is part of the
Pervasives module, which is considered phase-0. To use it in phase 1, one must add a tilda (
~) before its name, like so:
# static stat_succ x = ~Pervasives.(+) x 1;; static val stat_succ : int -> int = <fun>
# static stat_succ x = let open ~Pervasives in x + 1;; static val stat_succ : int -> int = <fun>
This operation is called module lifting. For phase separation to be enforced, only external modules (i.e. modules in separate files, separately compiled) can be lifted.
Now, if we want to write macros, we will need a way to manipulate code fragments. This is done via quotations, à la MetaOCaml:
# macro x = << int_of_string "42" >>;; macro x : int expr = << Pervasives(*global*).int_of_string "42" >>
Note the replacement of the keyword
macro. Quoting is only allowed in
macro declarations, to avoid capture of local identifiers (see path closures below).
The result of a quotation is of type
'a expr, where
'a is the type of the quoted expression. Also note that we were able to use directly
Pervasives.int_of_string inside the quotation, without module lifting. That's because the code inside a quote is of phase 0: it is destined to be spliced inside phase-0 code during macro expansion.
The splicing operator is the dollar and is used like so:
# $x;; - : int = 42
What actually happened here is that the quoted expression in
x was spliced into the code and evaluated (at runtime) to
42, just as if we had directly typed:
# int_of_string "42";; - : int = 42
Splicing can be done in arbitrary expressions (as long as the types match):
# let () = Printf.printf "%d\n" ($x + 1);; 43
Expressions can also be spliced inside quotations:
# macro square = << fun x -> x * x >>;; macro square : (int -> int) expr = << fun x_1 -> Pervasives(*global*).( * ) x_1 x_1 >> # macro n = << $square (42+1) >>;; macro n : int expr = << (fun x_3 -> Pervasives(*global*).( * ) x_3 x_3) (Pervasives(*global*).(+) 42 1) >> # $n;; - : int = 1849
You can visualize what code was actually spliced in with the
# $n;; $n;; splice #0: (fun x_3 -> Pervasives(*global*).( * ) x_3 x_3) (Pervasives(*global*).(+) 42 1) - : int = 1849
Here is a classic example: a compile-time power function.
let square x = x * x open ~Pervasives macro rec power' n x = if n = 0 then << 1 >> else if n mod 2 = 0 then << square $(power' (n/2) x) >> else << Pervasives.( * ) $x $(power' (pred n) x) >> macro power n = << fun x -> $(power' n <<x>>) >>
power macro, of type
int -> (int -> int) expr, can then be used like this:
# let x = $(power 9) 2 ;; val x : int = 512
This is very similar to what would have been done in MetaOCaml. You may have noticed that there is only one splicing construct (
$), while MetaOCaml has two: escape (
.~) and run. We chose to use the same operator because the semantics are the same, but under the hood, toplevel splices (that is, splices that are not inside a quotation) are implemented differently than splices inside quotations (see below).
For a more "real-world" example, you may consider the
printf example from Yallop and White's paper. A runtime
printf function may be defined like this:
type (_, _) fmt = Int : (int → 'a, 'a) fmt | Lit : string → ('a, 'a) fmt | Cat : ('a, 'b) fmt * ('b, 'c) fmt → ('a, 'c) fmt let (%) x y = Cat (x, y) (* analogous to "(%d, %d)" *) let p = Lit "(" % Int % Lit "," % Int % Lit ")" let rec printk : type 'a 'b. (string → 'b) → ('a, 'b) fmt → 'a = fun k → function Int → fun s → k (string_of_int s) | Lit s → k s | Cat (l, r) → printk (fun x → printk (fun y → k (x ˆ y)) r) l let sprintf fmt = printk (fun x → x) fmt
We may then write:
# sprintf p 3 4 ;; - : string = "(3,4)"
sprintf abstraction is more pleasant than writing by hand
"(" ^ string_of_int 3 ^ "," ^ string_of_int 4 ^ ")", but it comes at an interpretative cost.
To avoid this cost, the
printk function can be staged into a macro:
type (_, _) fmt = Int : (int -> 'a, 'a) fmt | Lit : string -> ('a, 'a) fmt | Cat : ('a, 'b) fmt * ('b, 'c) fmt -> ('a, 'c) fmt static (%) x y = Cat (x, y) macro rec printk : type a b. (string expr -> b expr) -> (a, b) fmt -> a expr = fun k -> function | Int -> << fun s -> $(k <<string_of_int s>>) >> | Lit s -> k (Expr.of_string s) | Cat (l, r) -> printk (fun x -> printk (fun y -> k << $x ^ $y >>) r) l macro sprintf fmt = printk (fun x -> x) fmt
We may then write a function that prints a pair:
# static p = Lit "(" % Int % Lit "," % Int % Lit ")";; static val p : (int -> int -> '_a, '_a) fmt = Cat (Cat (Cat (Cat (Lit "(", Int), Lit ","), Int), Lit ")") # let print_pair = $(sprintf p);; splice #0: fun s_1 -> fun s_2 -> Pervasives(*global*).(^) (Pervasives(*global*).(^) (Pervasives(*global*).(^) (Pervasives(*global*).(^) "(" (Pervasives(*global*).string_of_int s_1)) ",") (Pervasives(*global*).string_of_int s_2)) ")" val print_pair : int -> int -> string = <fun>
We get the abstraction we wanted at zero interpretative cost.
Integration within modules
We want macros to be seamlessly integrated into OCaml's module system.
For example, modules can export static values as well as regular values. The above printing functions could be gathered in a
sprintf.ml file. A
sprintf.mli interface file may then be written:
type (_, _) fmt = Int : (int -> 'a, 'a) fmt | Lit : string -> ('a, 'a) fmt | Cat : ('a, 'b) fmt * ('b, 'c) fmt -> ('a, 'c) fmt static val ( % ) : ('a, 'b) fmt -> ('b, 'c) fmt -> ('a, 'c) fmt macro sprintf : ('a, string) fmt -> 'a expr
We have hidden
printk and exported only
The static components of the
Sprintf module will be saved to a
sprintf.cmm file (the format is the same as for
.cmo files). The
sprintf macro can now be used in other files.
Consider again the
power' macro from above:
macro rec power' n x = if n = 0 then <<1>> else if n mod 2 = 0 then <<square $(power' (n/2) x)>> else <<Pervasives.( * ) $x $(power' (pred n) x)>>
As you can see, applying
power' will result in some code, possibly containing the run-time
square function. But isn't that dangerous? For example, what if I put
power in a module that only exports
power? With a naive implementation of macros, the result of splicing that macro would be an error like
Unbound value square.
To avoid that, a macro "carries" all the free identifiers that are used in its body in what we call a path closure. Additionnally, every macro takes an implicit path argument when it's called.
To see the result of macro expansion more easily, you can use
;;; (triple semicolon) that lets you evaluate static values in the toplevel:
# let x = 42;; val x : int = 42 # macro y () = <<x>>;; macro y : unit -> int expr = <fun> # y ();;; - : int expr = << y.x(*0*) >>
The meaning of
y.x(*0*) is: field of index 0 in the path closure of macro
y. For clarity, along with the field index is printed the name of the field, i.e.
To go back to our slightly more complicated
# Power.power 5;;; - : (int -> int) expr = << fun x_2 -> Pervasives(*global*).( * ) x_2 (Power.power.power'(*0*).square(*0*) (Power.power.power'(*0*).square(*0*) (Pervasives(*global*).( * ) x_2 1))) >>
This quote tells us that the runtime function
square can be found in the closure of macro
power', field 0. Additionnally,
power' itself is in the closure of
power (necessary because
power' might be out of scope, just like
Static modules and functors
If a module or a functor is intended to contain only
static definitions, you can make it static:
static module F(X : sig val succ : int -> int end) = struct let x = X.succ 42 end static module M = F(~Pervasives) (* M.x = 43 *)
Note that static functors can only take static modules as their arguments. Also,
let is used instead of
static in a static module.
Passing values between phases
Imagine you have a compile-time integer (not an
int expr, a real
int) and you want to use in your runtime code. This is done via
Expr.of_int of type
int -> int expr:
static x = 42 let () = Printf.printf "%d\n" $(Expr.of_int x)
Expr is a new module of the standard library, containing functions for all the standard types:
of_list… However, if you need to pass a value of a custom type, you have to write the translation function yourself:
type t = A | B of int static x = B 42 macro expr_of_t = function | A -> << A >> | B x -> << B $(Expr.of_int x) >> let dyn_x = $(expr_of_t x)
Execution of static code
In order to run code at compile-time, all static bindings and splices are translated to bytecode, then executed using the
Meta.reify_bytecode function available in the compiler (this function is the one used in the toplevel to execute phrases).
One problem is that the optimised versions of the compiler, namely
ocamlopt.opt, are native programs; but
Meta.reify_bytecode can only be called from bytecode programs. This limitation seems to exist because it would be slightly non-trivial for native and bytecode programs to share a heap.
One solution would be to compile macros and splices to native code, and then link them dynamically to the compiler. But there is no portable way to do dynamic linking in OCaml. The existing solution, developed by Alain Frisch, resorts to the GNU linking toolchain.
The retained solution for optimised versions of the compiler is therefore the following: in
ocamlopt.opt, static code is translated to bytecode and written to a temporary
.cmo file; this file is then linked with its dependencies into an executable bytecode file. The compiler then calls the external program
ocamlrun on this file.
This way of doing things is portable, but not very aesthetic. It could be made more natural by integrating a way to execute native code on-the-fly inside a native program, as proposed by Meurer and Fischbach 3.
The quoting part, developed by Leo White, is very similar to what can be found in MetaOCaml, although we did not reuse MetaOCaml code directly. It consists of a set of combinators transforming typed syntax trees into AST-generating lambda code. This lambda code is destined to be executed as part of the phase-1 code.
typed AST (Typedtree) | | v lambda code (Lambda) | | v untyped AST (Parsetree)
The generated AST is untyped, mainly to avoid the hassle of propagating type information to the generated tree. Instead, the typechecker is called on the quoted tree during splicing.
Static code is constructed in such a way that it returns, when evaluated, an array of untyped ASTs (
Parsetree.expression array). These ASTs are then typed and inserted in place of the splicings into the big syntax tree of runtime code before it is translated to lambda code.
In the case of
ocamlopt.opt, the splice array needs to be marshalled to a temporary file (since the compiler cannot share memory with an external program).
Note that the above only applies to toplevel splices, that is, splices that are not in a quotation. Splices inside quotations do not imply any communication between phases, and are only here for the typechecker to keep track of phase. Once the typechecker has done its job, they are mostly ignored.
Testing and contributing
All the examples presented above are working examples. The corresponding repository is available. Of course, as a prototype it is evolving rapidly and we cannot guarantee that future changes won't break your code.
You are very welcome to try out macros. It will be very useful if you report any bugs or inconvenience by opening an issue on this repo.
If you want to set up an Opam switch with our version of the compiler, you must be aware that there is currently no working version of
camlp4 that compiles with the macro compiler, but hopefully this will be fixed soon.
Let me know if you write something nice involving macros!