On writing bindings in Odin

And what do the users care about when using third-party libraries

Oct 26, 2023

There’s many ways to write bindings. You could just get a header file and 1:1 map it to Odin code, replacing enums and booleans with integers, leaving everything basically as-is. The end user ends up writing lots of code for every function call and making lots of typecasts for no reason other than to satisfy sloppily-written interfaces, which *can* be a source of some potentially hideous errors (or security exploits).

I want to go into my experience of using bindings in different programming languages and explore the question of “How do I make these bindings actually useable for wide range of usecases”. Most of this post is going to be Odin-specific and featuring it’s syntax, but I think there are certain things to take out of it which can be applied to any “non-C” programming language.

Spoiler alert, 90% of this post is just me commenting about the things I like to see in bindings, not really claiming for this to be an objective analysis.

The naming problem

Okay, so you want to bind a hypothetical libcool to Odin and publish it on your GitHub so that other people can copy the package and call libcool’s functions. Quickly opening up the reference to libcool, you see that all the functions are PascalCase while Odin uses snake_case pretty much everywhere.

There are two main questions when it comes to naming bound variables:

Do I need to preserve the identifier casing of the library or do I change it to match the conventions of the language and it’s standard library.
Is it OK to rename stuff if it’s named badly? Can I e.g. rename `stat` function which has a name that makes close to zero sense to something more meaningful like `get_file_info`

For the first question, the answer is surprisingly simple — either is fine. I mean, seriously. It’s not like by changing the casing or inserting underscores between PascalCase casing breaks the documentation would suddenly become unsearchable. An experienced user, opening up documentation or the bindings source will be able to pick up that you’ve changed the function names. For inexperienced with said library user, the situation is slightly different because now the documentation slightly differs, which would have brought me to another point — document your bindings and differences from original. This will help the situation somewhat.

My preferred state of things is that the identifiers should get changed to language’s conventions if it has wrappers that allocate memory, or basically any high-level wrappers. At that point it’s not a binding, but a wrapper. You can even combine the two approaches like this:

libcoolbindings/
|- ffi/
|   |- types.odin
|   |- functions.odin   <- has LibCoolInit(id: cstring)
|- types.odin           <- mostly re-exports from ffi/types.odin
|- functions.odin       <- has lib_cool_init(id: string)

This makes things _extremely_ clear. You’ll have to spend a lot of time binding it, but binding has never been an easy process. But personally I wouldn’t choose this way of structuring code. A lot of functions in the ffi package are going to be basically unused unless they are no-memory-allocation alternatives for the wrapper.

Instead you can make wrappers part of your package, but then you’ll have to adapt the language’s convention to all functions, not just the wrapper ones.

libcoolbindings/
|- types.odin           <- mostly re-exports from ffi/types.odin
|- functions.odin       <- has lib_cool_init(id: cstring)
|- wrappers.odin        <- has lib_cool_init_str(id: string)

You can probably move the code from wrappers to functions

For the second question, the answer is definetely, absolutely, in no world, in no universe — NO. You DO NOT rename things, because this makes the documentation unsearchable.

Though sometimes it is acceptable when the pattern can be picked up. C has lots of prefixes because it had no almost no namespacing, removing those prefixes removes a lot of unnecessary information in the user’s code, meaning that the usage code is easier to read (example later).

Leveraging language’s strength

The second point of good usability of bindings: leverage the language’s strengths to create bindings that make sense.

I’m a big fan of many of Odin’s features - tuple returns, filling up multiple arguments of a function with a tuple, bitsets, sized enums, slices. I’m sure other people enjoy using them too. If your bindings just take ints and rawptrs as parameters, these bindings require lots of casts for a plain-looking average Odin code. If a function takes in flags, maybe use a bit_set, if a function takes a pointer and a length maybe use a polymorphic slice, or just a slice, depending on the semantics.

Compare the two bindings of open() syscall, where we create a file available for reading and writing, and if we created it, it would have a permission mask 0600. One call that doesn’t use Odin’s cool-ass types and the other that does:

fd, errno := linux.open("myfile.txt", linux.O_RDWR|linux.O_CREAT, linux.S_IWUSR|linux.S_IRUSR)
fd, errno := linux.open("myfile.txt", {.RDWR,.CREAT}, {.IWUSR,.IRUSR})

I don’t know how it is rendered in your browser, but the first one doesn’t fit in one line for me. Using enums and bitsets is really great!

Sometimes functions do weird things with their arguments. I’ve ran into it when binding futex syscall on linux. It has an futex_op argument that specifies what futex operation you want to perform, which can be things like FUTEX_WAIT or FUTEX_WAKE. The most logical thing to do is to use an enum for the operation, but then there’s a funny story: on any of those operations you can OR it together with FUTEX_PRIVATE and get a different operation. Oops, this is not an enum nor a bitset anymore, but something in-between.

There are multiple approaches to take here:

Forget about the OR part, just provide compound values like FUTEX_WAIT_PRIVATE, And use enums
Forget about enums, give up and use integers
Bitset hacks by setting individual bits… actually let me shut up here, no it’s bad.
Use a pair of arguments instead, one for the operation, the other for flags — the wrapper will unify them.

This last approach is what I typically use, it tends to be the most compact in the usage code. Forgetting about the OR part can also be useful, but may also be harmful when there’s most flags than 1-2.

The main reason for splitting the argument into two is that the user doesn’t care about storing these two together. They are unrelated, but packed for whatever reason. Anyway this shouldn’t cause a problem.

Wrap up

In any case, the reason I’m writing this post is I want to make people that write bindings a little bit more conscious about the usage of said bindings. Just saying “hey you can call this procedure from Odin now”, one should also ask themselves — okay, but how is it going to be searched in docs, how well does it integrate into existing codebases and how useful is it.

I’m kinda tired of using the bindings where every parameter is an int

bumbread

Discussion about this post