My path to OpenAI

I started programming seriously during a gap year after high school. I’d read Turing’s Computing Machinery and Intelligence, and was inspired by the notion of writing code that could understand something that I, as the code’s author, did not. I started writing a chatbot — how hard could it possibly be?

I managed to build something that could talk about the weather very convincingly. But no matter where I looked, it seemed that no one had any techniques that could make my bot really work.

I soon shelved my chatbot pursuits. I decided to focus on creating systems that can have a real impact, and have been doing that ever since.


In college, I found a field that captured what drew me to AI: programming languages. I was thrilled that a compiler or static analyzer could “understand” a program in a way that I couldn’t, and then apply that understanding to do something very useful (such as generate fast code or prove correctness).

I kept trying to find time for programming language research. But I also kept getting distracted by new startup ideas (generally pretty bad), and new people to work on them with (generally pretty good). I’d started out at Harvard and transferred to MIT, trying to constantly surround myself by people who I could learn from and build something useful with.

Junior year, I decided that it didn’t make sense to try to do a startup while still in school. Instead, I was going to meet with people doing startups, and over time pattern match what works and what doesn’t. In the meanwhile, I finally started my programming language research, securing research funds from a professor and recruiting some of my friends for a static buffer overrun detection project.

A few weeks later, I was contacted by an unlaunched startup in Palo Alto. Normally I would have discarded the email, but I’d decided to start meeting startups. The team and I instantly clicked, and I knew that these were exactly the kind of people I’d been looking for all along. So I left school, never to actually get our buffer overrun detector working.


That company is now Stripe. I helped scale it from 4 to 250 people, and in the year since I left it’s continued scaling without any of my help to over 450.

When I considered leaving, it was primarily because I felt like the company was in a great place, and it would continue to do great things with or without me. I cared most about working with great people to make something amazing happen — but developer infrastructure wasn’t the problem that I wanted to work on for the rest of my life.

However, there was one problem that I could imagine happily working on for the rest of my life: moving humanity to safe human-level AI. It’s hard to imagine anything more amazing and positively impactful than successfully creating AI, so long as it’s done in a good way.

 Leaving Stripe

Before I finalized my decision to leave, Patrick asked me to go talk to Sam Altman. He said Sam had a good outsider’s perspective, had seen lots of people in similar circumstances, and would probably have a good recommendation on what I should do.

Within five minutes of talking to Sam, he told me I was definitely ready to leave. He said to let him know if he could be helpful in figuring out my next thing.

I replied that AI was top of my list (and it was definitely my life goal). However, I wasn’t yet sure whether it was the right time, or what the best way for me to contribute would be.

He said, “We’ve been thinking about spinning up an AI lab through YC. We should keep in touch.”

 Initial exploration

I left Stripe about a week or two later, and started digging into AI to try to better understand what was happening in the field. Even just from watching posts that made it onto Hacker News (such as char-rnn), it was clear that there was mounting excitement and activity around AI generally and deep learning particularly. But I approached the field with healthy skepticism: I wanted to be sure things were really working before diving in.

My first goal was to figure out what deep learning actually was. It turned out this was surprisingly hard. For example, just says “Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence” — which sounds exciting but isn’t very descriptive [1].

Fortunately, I had some friends working in AI, Dario Amodei and Chris Olah. I asked them for some pointers, and they gave me some good starter resources. The most useful of these was Michael Nielsen’s book, and after reading it I practiced my newfound skills on Kaggle. (I was even number 1 for a while on my first contest!)


Along the way, I kept meeting super smart people in AI, and reconnected with some of my smartest friends from college, such as Paul Christiano and Jacob Steinhardt, who were now working in the field. This was a strong signal.

The more I dug, the more I became convinced that AI was poised for impact. Deep learning capabilities are incredibly impressive: for example, we can now classify objects in images with extreme accuracy (despite this 2014 XKCD), speech recognition has gotten very good, and we can generate surprisingly realistic images. That being said, these technologies are new enough that they haven’t yet changed how anyone lives: their impact today is limited to powering certain product features.

I remember saying this to one of my friends who had built Facebook News Feed back in the day. His reply was skeptical. “Simple algorithms, lots of data.” Everyone tries to peddle cool new AI algorithms, but in reality, if you just scale up a logistic regression it works really well. I then pulled out the Google Translate app from my pocket, put it in airplane mode, and demonstrated how it translates the text under the camera directly on the image. He was suitably impressed, and admitted simple algorithms wouldn’t help there. (It’s mostly but not 100% deep learning, but that’s not the point — the point is it works.)

 Initial spark

In June, Sam pinged me asking if I’d figured out yet what to do next. I told him my current plan was to start an AI company within the next year. We jumped on a call, where he mentioned that they were moving forward with the YC AI project. I asked Sam what the purpose of the lab was.

“To build safe human-level AI”, he said.

At that moment I knew he was the right partner to build my next company with. Very few people today would have the audacity to explicitly try building human-level AI. I realized that sometimes an effort needs only someone bold enough to pronounce a goal, and then the right people will join them.

 The dinner

About a month later, Sam set up a dinner in Menlo Park. On the list were Dario, Chris, Paul, Ilya Sutskever, Elon Musk, Sam, and a few others.

We talked about the state of the field, how far off human-level AI seemed to be, what you might need to get there, and the like. The conversation centered around what kind of organization could best work to ensure that AI was beneficial.

It was clear that such an organization needed to be a non-profit, without any competing incentives to dilute its mission. It also needed to be at the cutting edge of research (per the Alan Kay quote, “the best way to predict the future is to invent it”). And to do that, it would need the best AI researchers in the world.

So the question became: would it be possible to create from scratch a lab with the best AI researchers? Our conclusion: not obviously impossible.

This was my first time meeting Elon and Ilya, and I strongly remember my impressions of both. I was struck by how inquisitive Elon was, and how much he sought others opinions and really listened to them. Ilya on the other hand was a source of grounding: he was a clear technical expert with a breadth of knowledge and vision, and could always dive into the specifics of the limitations and capabilities of current systems.

After the dinner concluded, Sam gave me a ride back to the city. We both agreed that it seemed worth starting something here. I knew it would only happen if someone was willing to go full-time on figuring out exactly what that would be and who would be a part of it. I volunteered myself as tribute.

And so the next day, I had something impactful to build once again.


[1] I asked Ilya to suggest a good definition:

The goal of supervised deep learning is to solve almost any problem of the form “map X to Y”. X can include images, speech, or text, and Y can include categories or even sentences. Mapping images to categories, speech to text, text to categories, go boards to good moves, and the like, is extremely useful, and cannot be done as well with other methods.

An attractive feature of deep learning is that it is largely domain independent: many of the insights learned in one domain apply in other domains.

Under the hood, the model builds up layers of abstraction. These abstractions get the job done, but it’s really hard to understand how exactly they do it. The model learns by gradually changing the synaptic strengths of the neural network using the incredibly simple yet mysteriously effective backpropagation algorithm. As a result, we can build massively sophisticated systems using very few lines of code (since we only code the model and the learning algorithm, but not the end result).


Now read this

Recurse Center

Coding requires collaboration. As Andrew Bosworth said recently: doing anything meaningful past a certain point requires more than one person. So if you want to build, it’s important to do so as part of a welcoming, collaborative... Continue →