Shots on Goal: What are the hypothesized paths to AGI and which ones are the most promising

Recently, I have been overwhelmed by the floodgates of research, opinions, blog posts, and discussions on AGI. Its non-stop. Its important to take in some of it, to reassess, update your priors, and change direction, but taking in even 1% will result in you switching directions every week or so. Instead, went I ventured to do with this post is take a BIG step back and assemble all of the "credible" (at least to me), actively pursued theories for getting from current levels of AI to full AGI/ASI. For each approach you’ll find: a short description, why advocates think it will work, where I think it may fall short, a rough training recipe for how it could work, an architecture sketch, and primary links from the authors or original teams plus recent papers that show progress. Enjoy!

1) "NTP is Enough" (Originally Ilya Sutskever): keep pushing stacked transformers, next-token prediction and backprop to the limit, then add test-time reasoning

What it is

We have EMPERICALLY measured that capability improves as we scale parameters, tokens, and compute with a power-law (i.e. the scaling laws). Recently, we have also discovered that adding heavy test-time thinking improves reasoning and competence as well.

  1. Primary sources: Scaling Laws (Kaplan et al., 2020), Chinchilla (Hoffmann et al., 2022), OpenAI o1 blog, Learning to Reason with LLMs

Why it could work

  1. Scaling curves have been broadly predictive across tasks.
  2. Compute-optimal regimes reduce undertraining by increasing tokens.
  3. Test-time search, tool use, and judge models can convert prediction into reasoning.

Why it might fail

  1. Diminishing returns for systematic generalization and causal reasoning.
  2. Data quality, coverage, and cost become bottlenecks.
  3. Reliability and alignment issues are not solved by scale alone.
  4. Even the largest models get very poor performance on ARC2 today.
  5. With the highly anticipated and underwhelming launch of GPT5, despite billions of investment, I think its safe to say that this path is dead.

How to train

  1. Pretrain on large, deduplicated, filtered web, books, code, math, and scientific corpora at Chinchilla-optimal token counts.
  2. Reasoning fine-tunes on math sets such as GSM8K and MATH, code sets such as HumanEval and MBPP, and synthetic multi-step curricula with tool traces.
  3. At inference, use multi-sample self-consistency, retrieval, external tools, and verifier-guided reranking.

Architecture

  1. Large decoder-only transformer with retrieval plug-ins and tool APIs.
  2. Separate or shared verifier/judge model to score candidate solutions.
  3. Optional controller that allocates test-time compute for hard problems.

2) Lecun-ism: World-models and self-supervised predictive learning

What it is

LeCun’s JEPA-based roadmap posits configurable predictive world models with intrinsic motivation and hierarchical planning. Recent work includes masked and multi-context JEPAs.

  1. Primary sources: A Path Towards Autonomous Machine Intelligence (LeCun, 2022), MC-JEPA (Assran et al., 2023)

Why it could work

  1. Predictive latent state supports planning, sample efficiency, and common sense.
  2. Self-supervision avoids sparse rewards and expensive labels.

Why it might fail

  1. Scaling stability and head-to-head benchmarks versus frontier LLMs are still developing.
  2. Bridging to long-horizon tool use and action remains challenging.

How to train

  1. Pretrain world models on egocentric video, proprioception, and audio (for example Ego4D), plus synthetic physics video.
  2. Finetune for robotics and embodied QA using model-based RL over latent rollouts.

Architecture

  1. Encoders produce object-centric or scene latents.
  2. Predictor forecasts future latents under actions; planner searches in latent space with intrinsic objectives.

3)Schmidhuber-ism: Gödel Machine (from Jürgen Schmidhuber): the provably optimal recursively self-improving agent

What it is

A self-referential, self-improving agent that searches for formal proofs that rewriting its own code will increase expected utility, and only then rewrites. It’s about optimal self-modification under provable improvement.

  1. Primary sources: Schmidhuber, 2003 (Gödel Machine), Overview page

Why it could work

  1. Provides a theoretical framework for safe and provably beneficial self-modification.
  2. Ensures that rewrites cannot make the agent worse according to its own utility function.
  3. Aligns with the long-term need for recursive self-improvement in ASI.

Why it might fail

  1. Proof search is extremely expensive; full generality is impractical.
  2. Requires specifying utility functions and axioms up front in a formal system, which is a major unsolved challenge.
  3. No large-scale practical demonstrations to date.

How to train

  1. Start with constrained domains (toy languages, theorem provers) where proof search is tractable.
  2. Use meta-learning or automated theorem proving to guide proof discovery for code modifications.
  3. Incorporate techniques from proof-carrying code and verifiable RL for practical approximations.

Architecture

  1. Base agent with policy and utility function formalized in axioms.
  2. Meta-level proof searcher that looks for candidate self-modifications.
  3. Self-modification executor that commits rewrites once a proof of utility gain is found.

4) Sutton-ism: and the OaK Architecture; Reward Is Enough

What it is

In sufficiently rich environments, maximizing reward yields the competencies associated with intelligence. AlphaGo, AlphaZero, and successors demonstrate emergence via self-play and planning.

  1. Primary sources: Reward Is Enough (Silver, Singh, Precup, Sutton, 2021), AlphaGo (Nature, 2016)

Why it could work

  1. Self-play and search can bootstrap powerful planning and abstraction.
  2. Single objective and domain-agnostic learning principle.

Why it might fail

  1. Reward misspecification, sparse credit assignment, and sample complexity.
  2. Sim-to-real transfer and safety oversight are hard at scale.

How to train

  1. Open-ended multi-task worlds such as NetHack, MineRL, and XLand with curricula and self-play.
  2. Use model-based RL with latent dynamics and verifier-guided planning for complex reasoning goals.

Architecture

  1. Actor-critic or MuZero-style policy and value networks.
  2. Search over actions with learned or explicit dynamics models.

5) Noam-ism: Generator–Verifier gap (tweak of 1 but worth the mention)

What it is

Exploit domains where verification is easier than generation by pushing inference-time search and strong verifiers or judges. Noam Brown has articulated this path in recent talks.

  1. Primary sources: Noam Brown public talk
  2. Supporting research: Self-Consistency (Wang et al., 2022), AI Safety via Debate (Irving et al., 2018), ReAct (Yao et al., 2022), Toolformer (Schick et al., 2023), LLM-as-a-Judge (Zheng et al., 2023)

Why it could work

  1. Mirrors AlphaZero’s proposal-and-evaluation loop with scalable verifiers.
  2. Unit tests, compilers, and proof checkers provide crisp signals.

Why it might fail

  1. Many real-world tasks lack crisp verifiers; inference compute can be large.

How to train

  1. Collect triplets of problem, candidate, and verdict across math, code, theorem proving, and structured QA.
  2. Train verifiers on labeled refutations and formal tests; at inference, sample many candidates and rerank with verifiers and judges.

Architecture

  1. Generator LLM plus formal verifiers such as unit tests and proof assistants.
  2. Learned judge models to arbitrate where formal checks are absent.

6) Chollet-ism: Program synthesis and neurosymbolic methods

What it is

Target compositional abstraction via program search guided by neural priors. Chollet’s ARC measure focuses on generalization and skill acquisition efficiency. DreamCoder demonstrates wake-sleep library learning and neural proposal for program synthesis.

  1. Primary sources: On the Measure of Intelligence (Chollet, 2019), ARC Prize page, DreamCoder (Ellis et al., 2021)

Why it could work

  1. Programs naturally support compositionality, systematic generalization, and verifiability.
  2. Neural hints and learned libraries make search more tractable.

Why it might fail

  1. Search combinatorics and perception-to-symbol pipelines remain difficult.

How to train

  1. Mix ARC-style puzzles, DSL corpora, code tasks with unit tests, and mini-proof sets.
  2. Alternate solving new tasks with expanding reusable library primitives and retraining the neural proposer.

Architecture

  1. Neural proposer such as a transformer, a symbolic DSL and library, and a verifier comprising unit tests or proof tools.

7) Bengio-ism: System-2 and causality-aware deep learning

What it is

Add explicit variables, structured search over objects, and causal representation learning. GFlowNets learn to sample diverse high-reward objects such as molecules. Causal representation learning seeks to discover and manipulate underlying factors. Pearl and Schölkopf provide the causal foundations.

  1. Primary sources: GFlowNet Foundations (Bengio et al.), GFlowNets JMLR, Towards Causal Representation Learning (Schölkopf et al., 2021), Pearl book excerpt

Why it could work

  1. Causal variables enable robust out-of-distribution generalization and planning.
  2. GFlowNets handle multi-modal solution distributions and structured discovery.

Why it might fail

  1. Causal discovery from high-dimensional data is difficult and unstable at scale.

How to train

  1. Use synthetic structural causal models and OOD splits for evaluation, plus molecule and program domains where rewards are clear.
  2. Train GFlowNets to sample objects proportionally to reward; evaluate by intervention and counterfactual tasks.

Architecture

  1. Perception front-end that proposes candidate variables or graphs.
  2. Structured sampler or planner such as a GFlowNet over discrete objects, with verifiers for consequences.

8) Hinton-ism: Brain-inspired learning rules and part-whole structure

Hinton: Forward-Forward and GLOM

Forward-Forward proposes a local, backprop-free learning rule using positive and negative phases. GLOM proposes a representational scheme for part-whole hierarchies via iterative consensus.

  1. Primary sources: Forward-Forward (Hinton, 2022), GLOM (Hinton, 2021)

Why it could work

  1. Closer to biological plausibility and potentially better for continual learning.
  2. Richer hierarchical parsing of structure.

Why it might fail

  1. Early stage; unclear parity with large-scale backprop results.

How to train

  1. Begin with vision and audio positive-negative sampling, then scale to sequences.
  2. Compare energy or likelihood surrogates to backprop baselines on standard benchmarks.

Architecture

  1. Layer-local objectives for Forward-Forward or recurrent consensus among columns for GLOM, possibly atop transformer primitives.

9) Hawkins-ism: Thousand Brains theory of intelligence

What it is

Hawkins proposes that the neocortex comprises many parallel map-like models that vote to reach consensus, with sensorimotor prediction at the core.

  1. Primary source: A Thousand Brains resources

Why it could work

  1. Strong grounding in continuous sensorimotor prediction and object persistence.

Why it might fail

  1. Engineering translation to frontier ML systems is still evolving.

How to train

  1. Egocentric video plus proprioception with continual learning regimes.
  2. Tasks emphasizing object permanence, active perception, and manipulation.

Architecture

  1. Many parallel cortical-column analogs maintaining object-centric latent maps with a motor loop for hypothesis testing and voting.

10) Parr-ism: Active Inference and the Free-Energy Principle

What it is

Perception, learning, and action are unified as minimizing expected free energy under a generative model. This provides a principled account of epistemic exploration.

  1. Primary sources: Friston review (2010), MIT Press book, Friston et al., 2017

Why it could work

  1. Unified objective with built-in epistemic drive for information seeking.

Why it might fail

  1. Scaling practical implementations to internet and robot scale remains an open challenge.

How to train

  1. Start with partially observed control tasks and multi-agent settings, using amortized inference and explicit generative models.

Architecture

  1. Probabilistic generative world model for states, transitions, and observations, paired with a policy that minimizes expected free energy.

11. AIXI (Hutter-ism): the idealized Bayesian RL agent

What it is

An idealized reinforcement learning agent that does Bayesian prediction over all computable environments (Solomonoff induction) and plans by expectimax. It’s about optimal prediction and control under a universal prior.

  1. Primary sources: Universal Artificial Intelligence (Hutter, 2005), AIXI: A Survey (Leike & Hutter, 2015)

Why it could work

  1. Defines an optimal agent in a very general sense: decision-making across any computable environment.
  2. Provides a rigorous mathematical north star for approximations.
  3. Has inspired bounded variants such as AIXItl and MC-AIXI-CTW.

Why it might fail

  1. Incomputable in full generality; only approximations are feasible.
  2. Universal prior assumptions are uncomputable and unrealistic in practice.
  3. Still requires reward specification, which remains a bottleneck.

How to train

  1. Approximate Solomonoff induction using compression-based priors or powerful learned world models.
  2. Benchmark with environments requiring generalization and planning, e.g., gridworlds, Atari, or NetHack, then scale up.
  3. Combine Bayesian model averaging with tractable Monte Carlo search approximations.

Architecture

  1. World model: approximated universal prior (compression models, learned dynamics).
  2. Planner: bounded expectimax search over possible futures.
  3. Policy: action chosen to maximize expected reward under model distribution.

12) Goertzel-ism: Cognitive architectures and knowledge graphs

What it is

OpenCog Hyperon aims to unify diverse learning and reasoning modules in a shared memory and metalanguage, enabling neural-symbolic fusion and explicit long-term memory.

  1. Primary sources: OpenCog Hyperon site, Atomspace documentation

Why it could work

  1. Heterogeneous reasoning over explicit memory may help with long-horizon tasks and knowledge-intensive reasoning.

Why it might fail

  1. Integration complexity and limited benchmarking versus frontier LLM systems.

How to train

  1. Iteratively build tasks that require long-term memory, multi-step symbolic reasoning, and tool orchestration, with continual knowledge graph growth.

Architecture

  1. Symbolic Atomspace with neural encoders and retrievers, probabilistic logic, planners, and LLMs as generators inside a verified system loop.

Return Home