Briefly, an RLM wraps an existing language model (LM) together with an environment that can dynamically manipulate the prompt that will be fed into the LM.
The authors use as an environment a Python REPL that itself can call other instances of the LM. The prompt is programmatically manipulated as a Python variable on the REPL.
The motivation is for the LM to use Python commands, including commands that call other LM instances, to figure out how best to modify the context at inference time.
The results from early testing look impressive at a first glance: An RLM wrapping GPT-5-mini outperforms GPT-5 by a wide margin on long-context tasks, at significant lower cost.
> Lastly, in our experiments we only consider a recursive depth of 1 — i.e. the root LM can only call LMs, not other RLMs. It is a relatively easy change to allow the REPL environment to call RLMs instead of LMs, but we felt that for most modern “long context” benchmarks, a recursive depth of 1 was sufficient to handle most problems. However, for future work and investigation into RLMs, enabling larger recursive depth will naturally lead to stronger and more interesting systems.
It feels a little disingenuous to call it a Recursive Language Model when the recursive depth of the study was only 1.
This feels primarily like an issue with machine learning, at least among mathematical subdisciplines. As new people continue to be drawn into the field, they rarely bother to read what has come even a few years prior (nevermind a few decades prior).
Briefly, an RLM wraps an existing language model (LM) together with an environment that can dynamically manipulate the prompt that will be fed into the LM.
The authors use as an environment a Python REPL that itself can call other instances of the LM. The prompt is programmatically manipulated as a Python variable on the REPL.
The motivation is for the LM to use Python commands, including commands that call other LM instances, to figure out how best to modify the context at inference time.
The results from early testing look impressive at a first glance: An RLM wrapping GPT-5-mini outperforms GPT-5 by a wide margin on long-context tasks, at significant lower cost.
I've added this to my reading list.
> Lastly, in our experiments we only consider a recursive depth of 1 — i.e. the root LM can only call LMs, not other RLMs. It is a relatively easy change to allow the REPL environment to call RLMs instead of LMs, but we felt that for most modern “long context” benchmarks, a recursive depth of 1 was sufficient to handle most problems. However, for future work and investigation into RLMs, enabling larger recursive depth will naturally lead to stronger and more interesting systems.
It feels a little disingenuous to call it a Recursive Language Model when the recursive depth of the study was only 1.
This isn't just context optimization. Not much different from agent-to-agent workflow IMO.
This is old news! Agent-loops are not a model architechture
Loops aren’t recursion?
Everything old is new again when you are in academia
This feels primarily like an issue with machine learning, at least among mathematical subdisciplines. As new people continue to be drawn into the field, they rarely bother to read what has come even a few years prior (nevermind a few decades prior).
It broke new ground!
https://arxiv.org/abs/2510.04871 another recursive based model
It's a completely different kind of recursion for a completely different (non-language) task.
Recursion is so popular in computing that this term “recursive language model” is heavily overloaded
It was even before the rise of LLMs
The authors may want to consider a more specific name