Last three months in OCaml (July 2025)

Thanks to sponsorship from Tarides I've been able to spend some time over the last few months hacking on various OCaml-related projects. Some of these I've already published as blog posts, while others are still works in progress but this post gives brief summaries and ties some of them together for a bigger picture.

AI Coding Agents

Jon Ludlam, Anil Madhavapeddy and I have spent a fair amount of time thinking broadly about AI coding agents. In addition to this post it's worth checking this short draft paper we've written that summaries our work so far and our thoughts for what could be needed going forward for OCaml to work well with AI tooling.

Benchmarking

Anil and Jon lecture the first year Foundation of Computer Science here at Cambridge, so we tested how well self-hostable models performed on first year coding exercises. We found the Qwen3 models were very effective, rivaling much bigger and older models. We plan to do some more benchmarking, especially as we've heard good things from community members about newer models like Mistral's devstral which has just been updated. Before we can do that though, we want to build a more extensive set of benchmarks. A step in that direction is opam-archive-dataset which is a continuously-updated parquet dataset of all source code in opam. We've also been looking at SWE-Synth and how we would apply a similar method for synthesising across community OCaml projects.

Agentic tooling

There are coding models that we can't modify ourselves¹ and which users access remotely. Nearly all agents using these coding models support tool use through the Model Context Protocol. We've started working on a tool odoc-llm which enables natural language search of functionality over all packages and libraries in opam - in a way that can be hosted centraly as a remote MCP server. It's still a work in progress and Jon is planning to write something more extensive once we've fixed the last few issues. We've mentioned a few other potential tools in the draft piece and if anyone in the community is working on any of them or would like to collaborate then please do get in touch!

I've also been exploring using simple collaboration between coding models and created a very simple MCP server that lets you use gemini-cli as tool in another agent. Mostly I use this to have Claude Code call Gemini to check it's work - this can be helpful for tasks that involve analysing larger code bases where Gemini's larger context window is useful.

The OCaml runtime

In addition to the forward-looking work around AI models I've also been doing a bit of runtime maintenance. The change to the shared heap's free list representation has gone through two stages of review and should land fairly soon. We're just waiting on some final performance testing before merging.

I had a fun debugging session with Jan Midtgaard trying to figure out what could have caused ocaml issue 13739. It turned out that terminating domains could orphan shared pools in parallel with a running stop-the-world section, which was unexpected and bad as it could lead to a segfault or memory corruption if you were very unfortunate. Gabriel Scherer fixed this in ocaml pr 14025.

Lastly with the GC there is the long-running compactor unification between the trunk ocaml compactor and OxCaml compactor². The OxCaml compactor is a bit more expensive than trunk but plays much better with virtual memory. This helps reduce memory usage and improve performance in large, long-running applications. I suspect proposing the OxCaml compactor for trunk will be something worth doing next quarter.

Finally, there have a number of small fixes to runtime events. It's nice to see these and new feature requests appearing.

Projects

I had hoped to kick off a couple of undergraduate or masters projects along with KC Sivaramakrishnan, either at Cambridge or at IIT Madras. These are intended to be ways of mentoring students who think they might like to become contributors to the OCaml runtime. Unfortunately, February and March are a bit late for picking projects on academic courses.

The first is implementing 'Early Release' for the OCaml minor collector. At present, all domains pause for a minor collection and wait until all domains have finished promoting their minor heaps before resuming. Consider each domain, it's minor heap consists of values that are reachable from other domains but also values that only it can reach. If all domains have promoted the global roots, their local roots, and their remembered set, then any domain that has finished promoting everything in their minor heap should be free to leave the minor collection and resume the mutator. This should mean that domains that do few minor heap allocations spend very little time in minor collections. We had an early prototype of this implemented a few years ago on the ocaml-multicore repo but we weren't convinced it was correct and ran out of time to fix it before upstreaming multicore. I think now is a good time to try again.

The second project is 'Domain runtime work-sharing'. At present the number of domains needs to be less than the number of available physical cores. Going over that number can significantly reduce performance as domains wait to enter or complete a stop-the-world section in the runtime. Could we address this by restricting the number of domains running in a stop-the-world section to the number of physical cores and have that set of domains take on the work of all stopped domains? This is probably most advantageous for minor collections but could be done for major stop-the-world sections too.

A third wild card project is exploring how we could use information from the GC and Linux's sched_ext to better schedule highly parallel OCaml programs. This is a lot more speculative but might make for an interesting Masters project.

If you are interested in any of these projects for the next academic year, please do get in touch with me or KC.

Mainly models from proprietary providers like OpenAI, Anthropic and Google or where the models are huge and uneconomical to run in anything less than large multi-user deployments, like Deepseek V3/R1 or Kimi 2. ↩
OxCaml actually has two compactors, the trunk compactor and a new one. I only mean the new one here. ↩

Sadiq Jaffer

Blog

Sadiq Jaffer's blog