Implementing i18n-format, a Rust procedural macro

This will relate my short learning journey in implementing a procedural macro in Rust: i18n-format.

The problem

Building Rust applications means having to deal with localization, ie allowing the UI to be translated in other languages. This is a big deal for users, and thanks to existing tooling, if you use gtk-rs and if you use the rust-gtk-template to boostrap your project you should be all set. The GNOME community has an history of caring about this and most of the infrastructure is here.

However there is one sore point into this: gettext, the standard package used to handle strings localization at runtime doesn't support Rust. For some reason the patch to fix it is stuck in review.

While it mostly works, the lack of support cause one thing: the getttext! macro that allow the use of localized strings as format is ignored by the xgettext command used to extract and strings and generate the PO template. It is the !, a part of the macro invocation syntax in Rust, that is the blocker. Another problem is that in Rust, for safety reasons, you can only pass a string literal to format!, which is why gettext! exists in the first place.

Until xgettext is fixed, either these strings are missed in the localization process, or you end up reimplementing it with less flexibility. That's what some applications do.

What if I implement a macro that would allow using a function syntax and still use the macro underneath? Oh boy, that mean I need to implement a procedural macro1, something I have been dreading because it looks like it has a steep learning curve. It's a topic so complicated that it is glanced over by most books about Rust. Turns out it wasn't that difficult, thanks to a great tooling.

The idea is as follow: convince xgettext to recognize the string. It should look like a plain function, while we need to call a macro.

Macros

Rust has two type of macros. Declarative macros that replace a pattern with another. Not disimilar with C macros but with hygiene and parameter checking, declarative macros are very powerful. Unlike C macros, Rust macros are processed at the same time as the rest of the code, not with a pre-processor. You can't use them to rewrite keywords.2 A declarative macro call is marked with a ! and that's the source of our problem.

The second type is procedural macros. They are the type we are interested in. Procedural macros run code written in Rust, at compile time and allow (re)writing code. Procedural macros come in three flavours.

  • Derive macros, allow using the #[derive()] atrtribute, to generate code related to a specific type. One notable example of derive macro is the crate serde that uses it to automatically generate serialization and deserialization for types.

  • Attribute macros, allow tagging an attribute #[attribute] (it can have other parameters) to apply the macro to a specific item.

  • Function like macros, that are called like declarative macros allow generating code procedurally.

Lets put aside derive macros, as we are not doing this.

Reading

Some articles I read prior to doing this:

Cargo L Helper

Let's start by some preparation for working with procedural macros.

First, a procedural macro is implemented in its own crate, so you'll need to create one, even within a bigger project. You can group several procedural macros together, but you can't have anything else. The Cargo.toml needs to have:

[lib]
proc-macro = true

Second, debugging macro code is not always simple. Just install cargo-expand. This tool allow expanding the macro so you can review what is being generated. It's not disimilar to running the C preprocessor.

The can run cargo expand --test to expand your tests. Testing a macro is done in separate tests in the tests directory as you can't run the tests from inside the crate, and without a bigger effort is mostly limited to expanding the macro.

Note: for cargo expand you need to have the Rust nightly compiler toolchain installed.

First try

Let's see how we could solve the problem with an attribute macro. Let's name our macro i18n_fmt. As an attribute macro it could work this way:

#[i18n_fmt]
let s = i18n_fmt("My string is {}", value);

The macro called by the attribute would parse the statement and replace i18n_fmt with gettext!.

But this doesn't work. Attributes applied to statements are not supported (yet). You can apply them to a type, a function, a module, a type field (struct or enum). Too bad this looked like it could work. Trying to implement it, will be left as an exercise to the reader. All you need is to stub the macro implementation to be a no-op and as you try to use it in your testing code, the compiler will complain.

Second try

Let's see with a function-like macro. The macro would, inside its block, replace the function call with a macro call. This could work like that:

let s = i18n_fmt! {
    i18n_fmt("My string is {}", value)
}

The procedural macro function is i18n_fmt! and inside the block will replace i18n_fmt by gettext!. This should work. xgettext with the keyword i18n_fmt would find the string and add it to the po template. Note that here i18n_fmt isn't a function, it isn't even defined. Without the macro work compilaton would fail. It could be literally any other name, but let's be reasonable on naming things.

At least this work: after implementing a no-op macro, I get an error that i18n_fmt isn't a function, confirming this approach will possibly work.

Disclaimer: I still don't know what I'm doing, it's my first procedural macro.

Macro programming

A procedural macro is implemented with a function that gets a TokenStream, a list of token, and returns a TokenStream. The identity function would return what it gets. You can inspect, modify, do whatever you want with the token stream. A panic will cause a compilation error in the client code (code using the macro): this is how we'll report an error.

The no-op macro:

#[proc_macro]
pub fn i18n_fmt(body: TokenStream) -> TokenStream {
    body
}

This is the signature for the procedural macro entry point. This is the function called by the compiler when it encounters the macro in the source. The name of the function is the name of the macro.

The test has to be in a separate test as you can't invoke a procedural macro from its own crate. test/macro.rs contains:

use i18n_experiment::*;

#[test]
fn i18n_fmt_works() {
    let s = i18n_fmt! {
        i18n_fmt("This string {}", "formatted")
    };
    assert_eq!(&s, "This string formatted");
}

Runing cargo test give the first error:

error[E0423]: expected function, found macro `i18n_fmt`
  --> tests/macro.rs:11:13
   |
11 |         s = i18n_fmt("This string {}", "formatted");
   |             ^^^^^^^^ not a function
   |
help: use `!` to invoke the macro
   |
11 |         s = i18n_fmt!("This string {}", "formatted");
   |                     +

It doesn't complain about the syntax, but the macro does nothing, and i18n_fmt is not a function. Expected.

We need to replace i18n_fmt with a macro. It's just i18n_fmt! but we are working with tokens, not with text. i18n_fmt is an identifier, which is important because we don't want to replace something in a string.

Let's replace it.

use proc_macro::{Ident, Span, TokenStream, TokenTree};

#[proc_macro]
pub fn i18n_fmt(body: TokenStream) -> TokenStream {
    body.into_iter()
        .map(|tt| {
            if let TokenTree::Ident(ref ident) = tt {
                if &ident.to_string() == "i18n_fmt" {
                    return TokenTree::Ident(Ident::new("gettext!", Span::call_site()));
                }
            }
            tt
        })
        .collect()
}

We call map() on the tokens and if we encounter the proper identifier, we replace it with the macro identifier.

This should work.

error: proc macro panicked
  --> tests/macro.rs:8:13
   |
8  |       let s = i18n_fmt! {
   |  _____________^
9  | |         i18n_fmt("This string {}", "formatted")
10 | |     };
   | |_____^
   |
   = help: message: `"gettext!"` is not a valid identifier

Nope. Apparently gettext! isn't a proper identifier. This is checked at runtime (the runtime of the proc macro) so the compilation failed. Looks like we can't insert a macro with Ident::new.

Quote me

Digging a little bit in some documentation I learn about the quote!() macro. That right a macro to write a macro. Whatever you pass to the macro is turned into a TokenStream. So quote!(gettext!) should be enough.

quote! is provided by the crate of the same name, maintained by David Tolnay. The only caveat is that it returns a proc_macro2::TokenStream instead of a proc_macro::TokenStream. But just call .into() as needed.

proc_macro2 is just proc_macro but as an external crate that can be used in other contexts. It's not really essential to understand here, we are just putting some pieces together.

Add quote to your Cargo.toml. I'll spare the iteration, you need to emit a TokenTree::Group with the result of quote!.

use proc_macro::{Delimiter, Group, TokenStream, TokenTree};
use quote::quote;

#[proc_macro]
pub fn i18n_fmt(body: TokenStream) -> TokenStream {
    body.into_iter()
        .map(|tt| {
            if let TokenTree::Ident(ref ident) = tt {
                if &ident.to_string() == "i18n_fmt" {
                    return TokenTree::Group(Group::new(Delimiter::None, quote!(gettext!).into()));
                }
            }
            tt
        })
        .collect()
}

Compile this and get:

error: cannot find macro `gettext` in this scope
  --> tests/macro.rs:8:13
   |
8  |       let s = i18n_fmt! {
   |  _____________^
9  | |         i18n_fmt("This string {}", "formatted")
10 | |     };
   | |_____^
   |
   = note: this error originates in the macro `i18n_fmt` (in Nightly builds, run with -Z macro-backtrace for more info)

The macro didn't panic, but the compilation failed in the macro expansion. Looks like we are missing gettext. Add it to the cargo, and in the test file add use gettextrs::*;. It should build. But ideally we don't want to have to use gettext in the client code to use the macro. This allowed us to test that it's working the way we want. We use the gettext! to use a localized string as format, and I can tell you xgettext will find it. xgettext --keyword=i18n_fmt tests/macro.rs -o - will show you that.

How can we ensure that gettext is used when calling the macro?

Into good use

We should emit the use in the macro expansion. I have no idea how to produce the tokens, but we already use quote!, so why shall we make our life hard? quote!(use gettextrs::*;) should do it. The TokenStream passed to the procedural macro is the content of that block. We need to prepend this statement, we'll use .extend() to extend this new TokenStream with what we did before. And because we need a block, we need to put that TokenStream into a TokenTree::Group delimited with Brace.

use proc_macro::{Delimiter, Group, TokenStream, TokenTree};
use quote::quote;

#[proc_macro]
pub fn i18n_fmt(body: TokenStream) -> TokenStream {
    let mut macro_block: TokenStream = quote!(
        use gettextrs::*;
    )
    .into();
    macro_block.extend(
        body.into_iter()
            .map(|tt| {
                if let TokenTree::Ident(ref ident) = tt {
                    if &ident.to_string() == "i18n_fmt" {
                        return TokenTree::Group(Group::new(Delimiter::None, quote!(gettext!).into()));
                    }
                }
                tt
            })
    );

    [TokenTree::Group(Group::new(Delimiter::Brace, macro_block))]
        .into_iter()
        .collect()
}

So if you call it like this:

    let s = i18n_fmt! {
        i18n_fmt("This string {}", "formatted")
    };

The macro expansion will end up being something like:

    let s = {
        use gettextrs::*;
        gettext!("This string {}", "formatted")
    };

Technically the gettext! macro is also expanded, which cargo expand will, but the snippet above should compile.

That's basically it. The resulting crate is a bit longer because I also handle ngettext! that is used for plural forms.

Bottom line

While writing a procedural macro looked like a daunting task, in the end it was not as complicated as I thaught and it offers a lot of value, worth the initial effort.

Postscriptum

And now I wonder if I should have discarded declarative macros for this. Maybe it would just have worked.

1

See postscriptum.

2

True story. In my youth I prefered Pascal, so I wrote a set of C macros so I could write almost Pascal to compile with C, like:

#define PROCEDURE void
#define BEGIN {
#define END }
#define IF if(
#define THEN )
#define RETURN return

PROCEDURE something()
BEGIN
    IF done THEN
        RETURN;
END

I stopped quickly, it was idiotic.