Intelligence complexity is not model complexity, must include training dynamics and data as well.

For decades, computer scientists and mathematicians have tried to classify the complexity of algorithms and models—often in “big-O” notation similar to P, NP, PSPACE, etc. under the assumption that the model itself is the core entity we need to analyze. But when it comes to intelligence, focusing on the model in isolation overlooks critical factors like the training process (the solver), and data distribution. For example, curriculum learning can greatly improve without curriculum. More trainnig steps outperforms less steps. All with the same model class. Using model alone is not useful as even a 2 layer NN can fit ANY function. So why in practice can't it? Because its not possible to find the magical theta that gets you there. So the solver / training dynamics must be taking into consideration for us to have a useful bound on intelligence to start defining intelligence classes. This post proposes a new perspective: an agent’s intelligence depends on the synergy of three components—model (architecture + capacity), training data (or experience more broadly), and the solver (all the training dynamics like: SGD, MeZO, adam, lr schedule, weight decay, # of iters, etc etc).

We might call this integrated measure:

O_Intelligence(model + solver + data)

where each part plays a role analogous to how, in Einstein’s theory of relativity, matter and space-time cannot be treated as independent. In the same way, the solver, data, and model architecture are tightly interwoven and shape each other.

1. The Pitfall of Looking at Models in Isolation

In the machine learning world, we often talk about model complexity—like the number of parameters in a neural network, or the depth/width of layers—as if that alone determines learning capacity. The classic big-O notation helps us discuss computational or memory costs of running or training the model, but it rarely captures the model’s actual ability to generalize.

Key takeaway: Looking at any one of these three (model, data, or solver) without the others is akin to analyzing a planet’s motion in purely Newtonian terms, ignoring the fact that mass and spacetime curvature are intertwined in Einstein’s relativity.

2. Intelligence as “Performance Over Tasks”

2.1 A More General Definition

One way to define intelligence is in terms of average performance on a broad set of tasks. This echoes ideas from the Legg-Hutter measure of universal intelligence, which looks at how an agent performs across all computable environments.

2.2 Why the Holistic View Matters

If we let 𝕌 represent a set of tasks and Perf(agent, 𝕌) represent the agent’s average performance on these tasks, then a rough measure of intelligence might be:

Intelligence(agent) ≈ 𝔼_t∈𝕌[Perf(agent, t)]

3. The Analogy: Newton vs. Einstein

3.1 Matter and Spacetime

In Newtonian mechanics, we often see space and time as fixed backgrounds and treat objects (matter) as separate. Einstein’s theory of general relativity flips that picture: matter-energy warps spacetime, and spacetime curvature affects how matter moves. They are intertwined, not independent.

4. From Big-O to “Holistic” Complexity

4.1 Classical Big-O in Machine Learning

Classical big-O typically gives you something like:

Training Complexity: O(N × model cost) for N training samples.
Inference Complexity: O(d × model cost) for d input features.

5. Potential Objections & Discussion

1. Isn’t This Just Legg-Hutter or Universal Intelligence?

Similar spirit, different emphasis.

2. Measuring “All Possible Tasks” is Impossible

Indeed, we can’t measure literally everything.

Intelligence complexity is not model complexity, must include the solver, and data as well.