Published on

You Don't Need a Masters/PhD – How These 9 Engineers Broke Into ML

Authors

Intro

There is a common misconception that you must have a masters or PhD in AI/ML to work in a machine learning engineer or research scientist role. This is plain wrong. In fact, in today's job market, there are plenty of AI/ML masters and PhD graduates who are struggling to find a job. In this post, I'll share the careers of multiple engineers who had zero experience in AI/ML but eventually transitioned to an AI role at Google, Meta, Amazon, or OpenAI. My hope is that by reading their stories, you'll adopt a healthier and more realistic mindset that it is more than possible to break into AI without a graduate degree. 1

Let's be clear, though: having a graduate AI/ML degree doesn't hurt. A masters program condenses a lot of knowledge acquisition into 1-2 years, and obtaining a PhD will strengthen your ability to conduct research and possibly join academia. But if your goal is to work in industry, what will become apparent from reading all the stories below is that the top factors for convincing recruiters and hiring managers that you deserve a role in their AI labs or programs is your passion, work ethic, and self-marketing, which you can demonstrate in many ways. We'll recap these strategies below the stories. But now, without further ado, here are the profiles of engineers who broke into AI/ML!

George Sung – Machine Learning Engineer at Amazon

George Sung

Current role: George is currently a Machine Learning Engineer at Amazon, designing and deploying deep learning models for Amazon Music's personalization and recommender systems, which serve 50+ million customers. (LinkedIn, Twitter)

Background: George obtained his bachelors and masters in electrical engineering from the University of Michigan in 2009, several years before the field of AI experienced a renaissance in 2012.

Breaking into AI/ML: George worked as a chip design engineer at AMD for six years before quitting his job and studying ML intensely to break into the field. Here's the story of his journey into AI that he wrote back in 2017:

A year ago I left my full-time job in computer chip design, to go “back to school” for a career change into machine learning. However, I didn’t go back to school in a traditional sense — I decided to pursue online education via Udacity. After a year of online study plus two months of job searching, I landed a job at BMW’s Technology Office in Silicon Valley, working on machine learning applied to their self-driving car efforts.

Prior to my career change, my professional background was in computer chip design...around late-2014 I started feeling the semiconductor industry was stagnating, and I kept hearing news of consolidation within the industry. I wanted to be in an industry with high growth potential...I felt ML/AI were the most interesting...so I decided to focus on that.

My goal in 2015 was to gain expertise in machine learning. After work and on weekends, I would study Andrew Ng’s machine learning class on Coursera, read posts on the /r/machinelearning sub-Reddit (mostly being totally confused), and read ML tutorials online (such as Andrej Karpathy’s blog posts; this was before CS231n was available). In November 2015, Udacity announced their Machine Learning Nanodegree (MLND), and I gladly signed up. Unfortunately, I felt my pace of progress was too slow while holding a full-time job...

Knowing how fast technology moves, I really wanted to be part of the AI revolution ASAP. The opportunity cost of staying put was too high. Financially, I had 2–3 years worth of living expenses in liquid savings, so it was feasible to quit my job and study full-time. I had briefly considered applying to a traditional university to get an MS in Computer Science. Having graduated during the Great Recession, I saw first-hand that a degree from a reputable university does not guarantee a job, so the cost vs. expected-value proposition was not attractive to me.

Further in the post, George writes about his enrollment in the Udacity Self-Driving Car Nanodegree program. Upon completion, he then spent four weeks implementing a traffic sign detection project from scratch in TensorFlow and uploaded his talk about the experience of taking the online self-driving car course to increase his visibility among potential employers.

In the end (mid-March) I had 9 interviews out of my ~90 job applications, i.e. around 10% of applications lead to interviews...Ultimately, I decided to accept the full-time offer from BMW.

George ended up working at BMW for three years on reinforcement learning and deep learning for self-driving cars before leaving for Amazon in 2020, where he has worked as an MLE for the past four and half years.

Susan Zhang – AI Researcher at Google DeepMind

Susan Zhang

Current role: Susan is a Google DeepMind AI researcher who works on large language models. (LinkedIn, Twitter)

Background: Susan graduated from Princeton in 2012 with a bachelors degree in mathematics. For the first six years of her career, she bounced around from job to job as an analyst at Morgan Stanley and AppNexus before pivoting her career as a data engineer at several startups. In 2017, she landed a data engineering role at Unity Technologies.

Breaking into AI/ML: In 2018, Susan left Unity to work at the non-profit AI lab backed by Elon Musk and LinkedIn founder Reif Hoffman, OpenAI. Back then, OpenAI hadn't yet created ChatGPT and was busy creating OpenAI Five, AI that could beat human players in a 5-vs-5 game of Dota 2. This is where Susan made a name for herself by capitalizing on her previous data engineering and analyst experience, parlaying it into large scale reinforcement learning. After the OpenAI Five project wrapped up in 2019, Susan then acted as an AI/ML consultant for almost two years before joining Meta as an AI research engineer. At Meta, Susan led the development and open-sourcing effort of a GPT-3-sized large language model. Last summer, in 2023, Susan left Meta to work on Google's Gemini project.

Susan Zhang's pre-OpenAI career

As we can see from Susan's pre-OpenAI career, her journey to becoming a prominent AI researcher was not straightforward. Six years into her career, there were no obvious signs from her resume that she'd become an influential figure in the field of AI. However, if you analyze it closely, you can see signs of her building up knowledge in ML-adjacent fields such as data analysis and data pipeline engineering, which play a crucial role in large language model development.

Alexei Baevski – AI Researcher at Meta

Alexei Baevski

Current role: Alexei currently works at Meta on the Llama large language models and generative AI research. (LinkedIn, Twitter)

Background: Alexei graduated from the University of Toronto in 2007, and worked as a software engineer at financial companies and startups for nine and a half years from November 2007 to January 2017 at companies such as Hebbian Inc., R2 Financial Technologies, S&P Global Market Intelligence, and Bloomberg. In other words, his professional career had zero AI/ML engineering nor research.

Breaking into AI/ML: Alexei spoke about how he broke into the field on the Towards Data Science podcast:

I spent a bunch of time working as a software engineer for different startups, financial companies, and so on. But throughout this whole time I've been very interested in AI and I had a bunch of homegrown kind of self-driven projects to explore the space, but nothing that I could really apply commercially or anything like this.

Once I joined Facebook, which was over five years ago, one of my motivations of joining Facebook was that it's one of the leaders in developing new AI algorithms. So once I joined Facebook, I spent some time working on production teams improving spell corrections, query suggestions, and things like this that allowed me to see what industrial scale AI looks like.

After that, I moved to FAIR which is now Meta AI Research I suppose, where I wanted to work on developing brand new algorithms and trying to improve state of the art of existing techniques. When I joined, FAIR was quite unique in the sense that you have a lot of freedom to choose which area you work on.

The way I looked at it is I wanted to work on something that creates the most impact. So if I improve some particular algorithm, it improves not just for one specific use case, but across a wide different sets of use cases. And this is how I got into self-supervised learning.

Let's take a look at Alexei's career right before he joined Facebook.

Alexei Baevski's pre-Facebook career

As noted above, it appeared that for almost the first decade of his career, Alexei's resume showcased that he worked in the financial field.

Alexei Baevski's AI career

Once Alexei got his foot in the door at Facebook, he first worked on projects adjacent to ML such as spell correction and query suggestions to get an overview of how large scale ML worked. Then he transitioned to Facebook's AI lab, FAIR. To summarize, in the first 15 years of Alexei's career, Alexei spent 10 of those as a financial software developer and then spent almost 1.5 years getting familiar with professional AI before fully immersing himself in AI research. Fast forward to today, and Alexei has over 15,000 citations across all his research papers.

Enjoying this article? Subscribe for more AI/ML advice. Unsubscribe any time.

Rai Pokorny – Member of Technical Staff at OpenAI

Rai Pokorny

Current role: Rai works at OpenAI on ChatGPT and scalable alignment. (LinkedIn, Blog)

Background: Technically, Rai's journey included a masters, but as I mentioned in the beginning of the post, a masters/PhD is not enough to work in industry in the current job market. In Rai's case, it didn't help him automatically land an AI role. Rai describes the self-study he undertook to enter into the prestigious OpenAI residency program. We begin his story while he's at Google, trying to break into AI research:

At Google...my team was responsible for...software engineering - building pipelines, services, etc. - than research engineering.

My original hope was that I’d try to position myself into a team that would be AI or AI-adjacent, do researchy stuff, eventually position myself to work on AI safety. But it ended up very hard trying to switch myself from “software engineer working in team X doing software engineer things” into “research engineer in ML”. I didn’t have any proven experience saying “I can do ML” - all I had was “hey I did all this stuff in uni”. But whenever I tried to apply for an internal transfer, somehow it didn’t end up working out. I think there must have been very hard competition. And getting rejections was always an ow.

My self-confidence suffered and I started feeling like I’m “not smart enough” or “not good enough to do this”. I envied people I knew who were working with ML.

I thought about it a bunch, and decided that I just felt dissatisfied with the work at Google. It wasn’t evolving me in the direction of working in AI. I was good at my work, but I wasn’t getting the skills I wanted to get.

I’ve decided to try to put some more effort re-learning all the stuff I learned about AI in uni, except this time I wanted to actually grok it, where you could wake me up at 2 AM 5 years from now and I’d still be able to explain to you how it works. I went over old materials and re-learned them, making Anki cards, and started refilling the holes where my knowledge was stuck before 2017-era progress in AI - like deep RL or Transformer language models.

Eventually I found the paper Proximal Policy Optimization Algorithms...One day I wanna write a blog post explaining it...The first author on that paper is John Schulman. I sent him an email. John responded to the email, and invited me to apply to the OpenAI residency...OpenAI had 2 residency tracks: software engineer and research. Originally I was scared of applying for research, because of all the self-doubt and impostor syndrome...But John encouraged me to try the research track, maybe believing in myself more than I did - and I made it through...I ended up getting accepted...for the OpenAI residency

Like George, Rai increased his own visibility among prospective employers, reaching out to a prominent AI researcher via email. This eventually led to him working at OpenAI.

Priya Goyal – Founding Member of DatologyAI, ex-DeepMind, ex-FAIR@Meta

Priya Goyal

Current role: Priya was an AI researcher at both Meta FAIR and Google DeepMind before becoming a founding member of DatologyAI, a GenAI startup, in January 2024. (LinkedIn, Twitter)

Background: Priya earned a bachelors and masters in mathematics and scientific computing from IIT, Kanpur in 2015. She joined Facebook as a software engineer in 2016.

Breaking into AI/ML: In the spring of 2016, Priya applied to and got accepted into the Facebook AI Residency program, a program designed to train software engineers to become AI researchers. Once she joined the program, she capitalized on the opportunity and went on to lead research on representation learning, train large scale computer vision models, and build socially responsible AI models.

Sholto Douglas – Software Engineer at Google DeepMind

Sholto Douglas

Current role: Sholto is a Google DeepMind engineer who works on scaling Google's large language models. (LinkedIn, Twitter)

Background: Sholto graduated from the University of Sydney with a bachelors in engineering and in business just four years ago in 2020. He does not have a masters nor PhD.

Breaking into AI/ML: Let's hear from Sholto himself how he did it (link to interview):

I didn't get into the grad programs that I wanted to get into...In the meantime, on nights and weekends, basically every night from 10pm to 2am, I would do my own research. And every weekend, for at least 6-8 hours each, I would do my own research and coding projects and this kind of stuff...I was trying to work out how to scale [a large multimodal model] effectively. James Bradbury, who at the time was at Google and is now at Anthropic, saw some of my questions online where I was trying to work out how to do this properly, and he was like, "I thought I knew all the people in the world who were asking these questions. Who on earth are you?" He looked at that and he looked at some of the robotic stuff that I'd been putting up on my blog. He reached out and said, "hey, do you want to have a chat, and do you want to explore working with us here?" I was hired, as I understand it later, as an experimenting in trying to take someone with extremely high enthusiasm and agency and pairing them with some of the best engineers that he knew.

As we can hear from Sholto himself, he managed to nab a role in the field of LLM research at Google DeepMind despite being rejected from the graduate school programs he wanted. Furthermore, Sholto worked as a consultant before completely changing his career to become an AI researcher. He did this by posting online about his work until he was discovered by a Google researcher.

Alec Radford – ML Researcher at OpenAI

Alec Radford

Current role: Alec currently works as a researcher at OpenAI, where he pioneered large language models through his papers on GPT-1, GPT-2, and GPT-3. He also contributed to GPT-4, GPT-4o, open source speech recognition with Whisper, and the core research behind the voice and image capabilities of ChatGPT. (LinkedIn, Twitter, Reddit)

Background: Alec enrolled at Olin College of Engineering as an engineering undergraduate student in 2012, right before deep learning swept through the industry. At the time, only a handful of universities such as NYU, University of Toronto, and University of Montreal were conducting deep learning research. Within a year, companies such as Google and Facebook created industrial deep learning AI labs. Alec and his close friend, Slater Victoroff, were convinced that deep learning would be part of the future, and they didn't want to wait for it to arrive at Olin. They wanted developers to have access to this technology right away, so they cofounded Indico in 2012.

Alec and Slater pitching Indico

Indico's aim was to make the then nascent deep learning technology accessible to software developers, and they eventually entered into the 2014 winter class of Techstars Boston, an accelerator program (above is a photo of them pitching at Techstars Boston). Through the accelerator, they raised a $3 million seed round, which gave them a boost of confidence in their abilities despite their youth.

Breaking into AI/ML: Upon founding the company, Alec, still an undergraduate student, coauthored a paper with two other researchers, one of whom was Soumith Chintala, a researcher at Facebook. This paper introduced deep convolutional generative adversial networks (DCGANs), which was featured by Nvidia CEO Jensen Huang at the annual GTC 2016 conference. To Alec and the rest of Indico's dismay, Jensen attributed the work solely to Soumith Chintala and Facebook AI. Shortly after, Alec decided to go west to OpenAI. The research he was conducting required billions of dollars for training, which only West Coast tech giants could provide.

In the spring of 2016, Alec wrapped up his time at Olin and joined OpenAI, where he has worked for the past eight years, producing groundbreaking research after groundbreaking research.

Jeff Johnson – SysML Researcher at Meta

Jeff Johnson

Current role: Jeff is a principal research engineer at Meta and company-wide GPU consultant. He was the first person to program GPUs at Meta, authored the original PyTorch GPU backend, and created Faiss, which kickstarted the vector database industry used for LLMs and recommender systems. (LinkedIn)

Background: From 2001 to 2012, Jeff primarily worked in the video game industry, spending nine of those years at a Boston based company called Turbine, best known for its MMORPGs such as Dungeons & Dragons Online. Then in 2013, Jeff joined Facebook, becoming the tech lead for Apollo, a NoSQL database software. His willingness to dive into low level details and implement difficult algorithms earned the attention of Facebook execs.

Breaking into ML: In December 2013, Facebook announced the creation of a new research lab, Facebook AI Research (FAIR). Meanwhile, elsewhere in the company, execs decided to disband the Apollo team. Since he didn't have a team anymore, the execs pitched Jeff on joining FAIR, which they considered the future of the industry. Jeff agreed and joined the AI lab in April 2014. Given his background in low level programming in the video game industry, he decided to work on GPUs. At the time, there was renewed interest in Convolutional Neural Networks (ConvNets). ConvNets require a large number of arithmetic operations on a small amount of data loaded, so Jeff investigated how to make this more efficient. This work resulted in an ICLR paper. Around the same time, Jeff started building what would eventually be a part of the PyTorch GPU backend, the most widely used ML framework today.

Greg Brockman – Cofounder at OpenAI

Greg Brockman

Current role: Greg Brockman is a cofounder and the president of OpenAI. (LinkedIn, Wikipedia)

Background: Greg joined Stripe as an early employee and went on to become its CTO. After five years, he left Stripe and eventually cofounded OpenAI along with Elon Musk, Sam Altman, Reid Hoffman, and several others. It must be noted that Greg did not have any AI/ML background. So how did he end up cofounding one of the most important AI companies in the world today?

Breaking into AI/ML: Greg's managerial, company-building, and non-AI technical experience from Stripe proved to be very valuable in cofounding OpenAI. Yet, Greg still wanted to become a machine learning expert. Let's hear from Greg himself his journey to becoming an ML practitioner:

For the first three years of OpenAI, I dreamed of becoming a machine learning expert but made little progress towards that goal. Over the past nine months, I’ve finally made the transition to being a machine learning practitioner. It was hard but not impossible, and I think most people who are good programmers and know (or are willing to learn) the math can do it too. There are many online courses to self-study the technical side, and what turned out to be my biggest blocker was a mental barrier — getting ok with being a beginner again.

In July 2017...I began work on a machine learning project. My goal was to use behavioral cloning to teach a neural network from human training data...I kept being frustrated by small workflow details which made me uncertain if I was making progress, such as not being certain which code a given experiment had used or realizing I needed to compare against a result from last week that I hadn’t properly archived. To make things worse, I kept discovering small bugs that had been corrupting my results the whole time.

I learn best when I have something specific in mind to build. I decided to try building a chatbot. I started self-studying the curriculum we developed for our Fellows program, selecting only the NLP-relevant modules. For example, I wrote and trained an LSTM language model and then a Transformer-based one. I also read up on topics like information theory and read many papers, poring over each line until I fully absorbed it. It was slow going, but this time I expected it...I kept thinking of how many years it had taken to achieve a feeling of mastery. I honestly wasn’t confident that I would ever become good at machine learning. But I kept pushing...

From our Fellows and Scholars programs, I’d known that software engineers with solid fundamentals in linear algebra and probability can become machine learning engineers with just a few months of self study...You’re probably not an exception either. If you’d like to become a deep learning practitioner, you can. You need to give yourself the space and time to fail. If you learn from enough failures, you’ll succeed — and it’ll probably take much less time than you expect.

Greg has posted such a gem online for those who aspire to enter the field of AI because his humility and truth about his struggles in his ML journey showcase that even the cofounder of OpenAI took a nonlinear path toward becoming an ML practitioner.

Recap

Passion and self-motivation are essential – A common thread across all these stories is the engineers' deep passion for AI/ML that drove them to relentlessly pursue the field, even without formal credentials. They were intrinsically motivated to learn, often spending nights and weekends immersing themselves in online courses, research papers, and side projects. As Greg Brockman candidly shared, pushing through initial failures and doubts requires genuine enthusiasm.

Leverage your existing skills and experience – Many of these engineers smartly built upon their prior experience in adjacent domains like software engineering, data analysis, chip design etc. to eventually bridge into AI/ML roles. For example, Susan Zhang and Alexei Baevski first worked on ML-related infrastructure and pipelines before transitioning to core AI research. Jeff Johnson applied his GPU programming prowess to optimize deep learning models. Recognizing how your current expertise can be valuable to AI/ML is key.

Use content creation to create warm intros to prospective employers – Another striking pattern is how blogging, open-source projects, online discussions, etc. played a pivotal role in these engineers getting noticed by AI/ML hiring managers and researchers. George Sung uploaded a project video that increased his visibility. Sholto Douglas' blog posts caught a Google researcher's attention. Alec Radford published an influential paper as an undergrad. Publicly sharing your AI/ML learning journey and creations is a powerful strategy.

Seize unconventional entry points into AI/ML organizationsk – When opportunities like the OpenAI or Facebook residencies came along, these engineers boldly seized them, even if they didn't feel fully ready. Many also proactively reached out to researchers and hiring managers, like how Rai contacted John Schulman.

Be willing to make career leaps – Transitioning into AI/ML often required these engineers to make courageous leaps, like how George quit his full-time job to study ML, how Alec left the East Coast for OpenAI, and how Greg had to embrace being a beginner again. Pursuing your AI/ML dreams may require bold, unconventional career shifts.

Footnotes

  1. Incidentally, while researching these engineers' backgrounds for this article, I got inspired because I saw how circuitous and non-obvious their career paths toward AI were. One can look at their accomplishments and say this isn't a fair list of engineers – they were "destined" for AI. But is that really truly? Take a close look at their career paths right before they broke into the field of AI. You'll find that they were not following the conventional path toward AI at all.