Francois Chaubard

Background: Linked In

I am a PhD Student at Stanford co-advised by Chris Ré and Mykel Kochenderfer. Before this, I founded and led Focal Systems. Right now, I think the highest and best use of my FLOPS is to help push the AGI ball up the hill in any way I can. This is the first time in the history of our universe (that we are aware of) where a species has a chance at developing a superintelligence and working on anything else seems remarkably meek in the face of that fact. What a time to have been born, to be alive, and to have the skill set that I have. What else could I possibly be working on? So, I done with the startup game for now, and I am going back to Stanford to get my PhD to do exactly that. I believe strongly in the "Science in the Open" ethos and as part of that, I want to open a window into the research I am doing even in the "nascent" stages of such research, even if that means it may get poached, thats ok, if it will accelerate the attainment of man-kind's research goals. So I will try to dedicate a few hours a week to documenting what I am doing / thinking about here in a raw / open format.

Here is what I am currently working on / interested in:

I think that SGD + stacked transformers is "perhaps" a sufficient path to get to AGI, but at what cost. We need huge terrawatt datacenters while our brains are better at learning (as of now) and operate on 100W. Something is missing. If Universal Apprx Theorem is true, and if neurons don't backprop, then there must exist another learning algorithm (solver) + architecture that gets us there as well, perhaps much quicker and less expensive. I think there are many "sufficient" paths to get there. We have found (perhaps) one element in the set of AGI, but its certainly not necessary. I agree with Yann Lecunn that we did not need flapping wings to fly, but we did need two wings (initially). Similarly there are many elements in the set of flight, and they all have different trade offs (e.g. helicopters are great for their use case, rockets are great for theirs, planes are great for theirs) there will be many elements in the set of AGI, and they will simiarly have different tradeoffs that we will want to use some for some use cases and others for others.

So right now I want to find another one. The biggest reason why LSTMs, DNCs, NTMs, and other RNNs with big external / internal memories did not scale was BPTT. Well lets get rid of it. Can we find a forward-forward algo that outperforms? Can achieve infinite context length and does not need to keep around all temporal activations! Lets try! If you are interested in this please reach out.

Publications, Inventions, Awards, and Talks:

  1. Scaling RNNs to Billions of Parameters with Zero Order Optimization (5/22/2025)
  2. Gradient Agreement Filtering (GAF) (11/1/2024)
  3. Stanford Engineering Talk on Automating Retail (3/23/2023)
  4. AI for Retail (9/1/2022)
  5. Automated checkout system through mobile shopping units Automated checkout system through mobile shopping units US US20180218351A1 (1/13/2018)
  6. Automatic labeling of products via expedited checkout system Automatic labeling of products via expedited checkout system US US20180330196A1 (4/2/2017)
  7. Determining in-store location based on images US US20180025412A1 (3/1/2017)
  8. Expedited checkout system through portable checkout units US US10319198B2(1/12/2017)
  9. Out-of-stock detection based on images US US20180260772A1 (6/1/2016)
  10. CNBC "The Exchange" Interview (6/1/2022)
  11. Stanford AI Safety Panel (Francois Chaubard - CEO Focal Systems, Azalia Mirhoseini - Google Brain/Anthropic, Lisa Einstein - Executive Director CISA (6/1/2022)
  12. Nvidia AI Podcast (9/18/2019)
  13. In-Q-Tel 1st Place - Most Innovative Startup Competition (6/21/2018)
  14. Nvidia Startup Competition - 1st Place (4/21/2017)
  15. University of Delaware Startup Competition - 1st place (10/22/2009)

Blog links below:

  1. Analysis of the Amazon Go Platform and why it won't scale till >2040 (7/28/2019)
  2. Universal Address Spaces for GPUs (6/1/2024)
  3. Strange Loop Networks (6/6/2024)
  4. Modality Curriculum (6/9/2024)
  5. Performing a Lobotomy on Llama3 8B (6/14/2024)
  6. Proving Vanilla SGD just memorizes, inspiration for gradient agreement filtering (GAF) paper. (10/1/2024)
  7. Perhaps the Golden Rule is a Nash Equilibrium point for sufficient intelligence, and we can sleep easy while developing AGI (10/4/2024)
  8. Is diffusion all you need? Is diffusion the free lunch we have all been waiting for? I think so! (11/28/2024)
  9. Intelligence complexity is not model complexity, must include training dynamics and data as well. (1/10/2025)
  10. My master plan to solve AGI (1/12/2025)
  11. Some VERY interesting plots for my up comming paper "Generalization Efficiency Scaling Laws for RNNs without BPTT" (1/28/2024)
  12. My Journey Going from First Order to Zeroth Order to Reduce the Cost of Intelligence 1 million fold. (2/1/2024)
  13. Are "you" your dual agent training your primary agent? Warning: May break your brain. (2/12/2025)
  14. Novel Token Spickets: Put yourself right next to the source of novel tokens, not second-hand smoke, but first-hand. (2/20/2025)
  15. Future of Work and the Rate of Automation (2/21/2025)
  16. All you need is a World Model + Policy + Value Model; Thats it. (4/15/2025)
  17. Whats greek to some is native tongue (6/8/2025)
  18. Brains dont backprop; Neurons are one-way! (6/27/2025)
  19. The Copernican Epiphany: Realizing You’re Not the Center of the Universe (7/3/2025)
  20. An After Action Review (AAR) for scientific progress over the last 1000 years: Humanity’s Regret Curve (7/10/2025)
  21. (Train Time) Recurrence as a necessary condition for General Intelligence (8/18/2025)
  22. Le Chátelier Principle applied to startups, software, and AI (8/19/2025)
  23. Demerzel vs. Democracy: What will be the system of government in 100 years? (8/28/2025)
  24. Shots on Goal: What are the hypothesized paths to AGI and which ones are the most promising (9/17/2025)
  25. AI Academic Brain vs. AI Founder Brain (9/22/2025)