Francois Chaubard

Background: Linked In

I am a PhD Student at Stanford co-advised by Chris Ré and Mykel Kochenderfer. Before this, I founded and led Focal Systems. Right now, I think the highest and best use of my FLOPS is to help push the AGI ball up the hill in any way I can. This is the first time in the history of our universe (that we are aware of) where a species has a chance at developing a superintelligence and working on anything else seems remarkably meek in the face of that fact. What a time to have been born, to be alive, and to have the skill set that I have. What else could I possibly be working on? So, I done with the startup game for now, and I am going back to Stanford to get my PhD to do exactly that. I believe strongly in the "Science in the Open" ethos and as part of that, I want to open a window into the research I am doing even in the "nascent" stages of such research, even if that means it may get poached, thats ok, if it will accelerate the attainment of man-kind's research goals. So I will try to dedicate a few hours a week to documenting what I am doing / thinking about here in a raw / open format.

Here is what I am currently working on / interested in:

I think that SGD + stacked transformers is "perhaps" a sufficient path to get to AGI, but at what cost. We need huge terrawatt datacenters while our brains are better at learning (as of now) and operate on 100W. Something is missing. If Universal Apprx Theorem is true, and if neurons don't backprop, then there must exist another learning algorithm (solver) + architecture that gets us there as well, perhaps much quicker and less expensive. I think there are many "sufficient" paths to get there. We have found (perhaps) one element in the set of AGI, but its certainly not necessary. I agree with Yann Lecunn that we did not need flapping wings to fly, but we did need two wings (initially). Similarly there are many elements in the set of flight, and they all have different trade offs (e.g. helicopters are great for their use case, rockets are great for theirs, planes are great for theirs) there will be many elements in the set of AGI, and they will simiarly have different tradeoffs that we will want to use some for some use cases and others for others.

So right now I want to find another one. The biggest reason why LSTMs, DNCs, NTMs, and other RNNs with big external / internal memories did not scale was BPTT. Well lets get rid of it. Can we find a forward-forward algo that outperforms? Can achieve infinite context length and does not need to keep around all temporal activations! Lets try! If you are interested in this please reach out.

Francois Chaubard

Here is what I am currently working on / interested in:

Publications, Inventions, Awards, and Talks:

Blog links below: