My AI Research Program

A project-driven AI research program to deeply understand architectures, new age applications, research directions and learn concepts from first principles.

Harshit Tyagi
9 min readAug 2, 2024

“If you can’t explain it simply, you don’t understand it well enough.” — Albert Einstein

Amidst all these AI updates and news, do you feel this urge to dive deep into model architectures, understanding what it takes to train a model like Llama 3.1 or Dall-E? Well, I have been feeling this for quite some time now and I finally gave in and decided to create my own research program to dig deep into the world of AI research and engineering.

Here’s the thing: I’m not cut out for a Ph.D. My brain is wired to build, to implement, to bring ideas to life as soon as they click. But I also knew that to truly grasp AI, I needed more than just surface-level knowledge. I needed a deep dive.

While I couldn’t get a Ph.D., I decided I can create a research program for myself just like Tim Ferriss created a personal MBA program.

This post lays out every single detail on how I will go about executing this study plan.

https://youtu.be/gs_Sz4zzFks

Designing a Research Program can go in thousands of direction, for this program, I am trying to reverse engineer the pathway to the job of a Research Engineer (RE) at great AI labs like Meta AI, OpenAI, Anthropic.

I don’t wish to work at these orgs but doing this will give direction to my program and keep me focused on what matters.

When I asked Eugene Yan to review the curriculum, he said it’s great and advised to keep the journey project driven.

There are going to be many topics that I’ll have to re-learn and many topics that won’t provide value, my mindset is to enjoy doing every bit of it without feeling the pressure to reach somewhere.

So, here’s how it flows -

  • understand what REs do from JDs,
  • lay out the most important skills,
  • find or craft projects around them,
  • build those projects deeply and iteratively (as Andrej Karpathy puts it)
  • write / teach those topics

What Research Engineers do

I dove into job descriptions from Anthropic, OpenAI and Meta AI to understand what top AI Research Engineers do.

Let me break it down for you.

Research Engineers are the Swiss Army knives of the AI world.

Building Large-Scale Systems

  • Research Engineers build massive ML systems from the ground up.
  • They’re experts in distributed computing and system architecture.
  • A typical project? Scaling a training job to thousands of GPUs — a real test of patience and debugging skills.

Optimization and Performance Tuning

  • These folks are constantly optimizing and tinkering with systems.
  • They might spend weeks fine-tuning a new attention mechanism for peak performance.
  • This requires deep understanding of ML algorithms and a talent for performance optimization.

Data Wrangling and Preparation

  • Data preparation is a crucial part of the job.
  • Imagine transforming Wikipedia into a format ML models can easily consume.
  • It’s like being a data chef — you need to know your ingredients (data structures) and how to prepare them (ETL processes).

Experimentation and Research

  • Research Engineers design and run scientific experiments.
  • They compare different model architectures or training techniques.
  • It’s like being a mad scientist, but with more computers and less scary laughter.

Ethical AI Development

  • These engineers are at the forefront of developing safe and trustworthy AI systems.
  • They don’t just ask “Can we build it?” but also “Should we build it?”
  • This requires a solid understanding of AI ethics and societal impacts.

Collaboration and Communication

  • Research Engineers often engage in pair programming or work with distributed teams.
  • They need to be able to communicate complex ideas clearly, often through publications and open-source contributions.

Mathematical Foundation

While not explicitly stated, the complex nature of the work implies a strong foundation in mathematics and statistics, which I’ve included in my Foundational Knowledge pillar.

This analysis helped me define the pillars of my AI Research Program:

Pillars of the AI Research Program

⚠️ This entails a long list of topics, many topics are overlapping, many require cross-topic understanding and this is not supposed to be consumed in a linear fashion. You will have to pick and choose based on your area of interest, project and current level of understanding.

Learning is going to be fractal here but your project will keep you focused.

My AI Research Program is built on several key pillars, each representing a crucial area of knowledge and skill:

  1. Pillar 1: Foundational Knowledge — Core mathematical and theoretical concepts underlying AI and machine learning.
  2. Pillar 2: Programming and Tools — Essential programming skills and tools for AI development and deployment.
  3. Pillar 3: Deep Learning Fundamentals — Key concepts and architectures in deep learning and neural networks.
  4. Pillar 4: Reinforcement Learning — Principles and applications of reinforcement learning in AI.
  5. Pillar 5: NLP / LLM Research — Advanced topics in natural language processing and large language models.
  6. Pillar 6: Research Paper Analysis and Replication — Skills for critically analyzing and reproducing cutting-edge AI research.
  7. Pillar 7: High-Performance AI Systems and Applications — Techniques for optimizing and scaling AI systems for real-world applications.
  8. Pillar 8: Large-scale ETL and Data Engineering — Methods for handling and processing large-scale data for AI applications.
  9. Pillar 9: Ethical AI and Responsible Development — Ethical considerations and responsible practices in AI development.
  10. Pillar 10: Community Engagement and Networking — Building connections and contributing to the AI research community.
  11. Pillar 11: Research and Publication — Conducting original AI research and sharing findings through academic publications.

Go through the GitHub repo to learn what topics, subtopics, learning resources, projects you should develop expertise in.

This will feel overwhelming and humbling at first but remind yourself that you don’t have to master everything here.

While I’ve designed this program for my personal interest in Gen AI, it can be adapted as per your area of interest. You can make it more computer-vision focused.

How to go through this program

Here’s how you can approach it:

  1. Assess Your Starting Point: Determine your current knowledge level in each pillar.
  2. Craft a project and goal for yourself: Define what you want to achieve and design / pick projects that you find deeply interesting.
  3. Learn by Doing: Build your projects end-to-end in an iterative manner. Go back to learning resources if you don’t know something. Create concrete deliverables out of projects, document your process, challenges, and ideas (trust me you’ll get a lot of them) that you might want to pursue at a later time.
  4. Engage with the Community: While building something complex, you might find yourself alone at times, that’s where communities help. Join AI-focused forums, Discord servers, attend conferences or meetups, and share your learning journey.
  5. Write / Teach: As the last step, don’t keep your learnings and findings to yourself, write about them, this will expose gaps in your understanding and the shortcuts you took to get your project done. Go back to your resources to fill those gaps, and simplify complex ideas.

Don’t rush or you’ll burn yourself out.

Lastly, don’t compare yourself to anyone else. You might be ahead of many and behind many.

Alright, here’s how I am starting this program.

Kickstarting this Program with LLM Research

Project: Building and Analyzing a Miniature Language Model from Scratch

Learning Objective: Grok LLM Research and Engineering

Time: I have given myself 3 months to finish off this project. Giving it 1–2 hours per day and more on weekends.

This project will involve creating a smaller-scale language model from the ground up, allowing you to deeply understand the architecture, training process, and key concepts behind LLMs.

As I progress, I’ll document my journey, insights, and findings here(make sure you’re subscribed) and on my YouTube channel.

High-level steps would include(will keep updating):

  • PyTorch Fundamentals and Text Processing
  • Master PyTorch basics (tensors, autograd, nn.Module)
  • Implement text processing techniques (tokenization, encoding)
  • Understand and implement word embeddings
  • Model Architecture and Implementation
  • Study and implement a basic transformer model using PyTorch
  • Analyze key components (self-attention, feed-forward networks)
  • Data Pipeline and Training
  • Develop efficient data loading and batching for text data
  • Implement and optimize the training process (including distributed training)
  • Coding an LLM
  • Visualize attention patterns and analyze learned embeddings
  • Experiment with model size, training data, and decoding strategies
  • Fine-tuning a foundation model
  • Implement fine-tuning for specific tasks and explore few-shot learning
  • Apply optimization techniques (quantization, pruning) for improved performance
  • Core Concepts: I’ll be diving deep into transformer architectures, embedding models, attention mechanisms, preparing the dataset for finetuning, coding the attention mechanism and the LLM from scratch using PyTorch and lastly understanding the hardware / compute requirements for training.

By starting with LLM research, I’m positioning myself at the forefront of AI research while building a strong foundation in neural network architectures and NLP techniques.

Resources

https://github.com/dswh/ai-research-program/tree/main/resources

While I have a good number of resources that I always refer to, a large part of this is dynamic and I will keep adding more and more high-value resources in the resources folder of the GitHub repository and will also right about them here and in my other newsletter High Signal AI.

Books

To kick things off for my LLM research project, I am referring to 2 books:

  1. Understanding Deep Learning
  2. Build a Large Language Model (From Scratch)

Research Papers

I will share a more thorough list in the coming weeks but for starters:

more to come…

Courses / YT videos

At the moment I might refer to a bunch of courses for get better at writing PyTorch code:

  • PyTorch by fast.ai and their documentation
  • Andrej’s video on training GPT
  • 3b1b’s video on transformers and GPT

There will also be a bunch of blogs, newsletters and sometimes podcasts which I will share in the coming weeks.

How to sustain this mindset for learning

All such learning endeavours come down to how long you can sustain the “motivation“ to learn or keep at it.

  • Don’t over-exert out of excitement: My first advice(especially to myself) would be to not too hard on yourself all the time, cycle through, there will be days when you’d need intensity and there will be days when just a relaxed read or watch would help.
  • Committing in public: This is not something everyone likes doing but it works a lot of times. It keeps you accountable. You will have to get back to it every day. I am doing this for the first time, let’s see how it goes.
  • No Zero Days: Being consistent is key. You must have come across challenges like “100 Days of learning” or “75 Hard” but I personally prefer “No Zero Days“. Make some progress every day, it’s a long journey, it will compound.
  • Find more people like yourself: Having people in the same boat as you is always re-assuring which is required for the days when you doubt yourself. Do make friends in the process, best way to do so is by focusing on what you can give instead of what you can get. It will come back to you. You can join my Discord Server to find more people like yourself.
  • Follow me :p

Stay in touch with me. Daily updates on IG

Subscribe

Gaps in this program

Even though I’ve spent decent number of hours planning this, the program is not complete and definitely has gaps to fill.

One big gap that I have to fill is in this research program is mentorship.

As you work on your projects, learn new topics, write something or teach something, getting actionable and high-quality feedback is also very important.

A mentor or coach is very important in this pursuit.

For now, I’d recommend leveraging communities, ask for peer reviews, publishing in public, cold emailing, and working in teams.

Follow for daily and weekly updates:

  • I’ll share more about my evolving process and learnings here, make sure you’re subscribed.
  • Videos on YouTube.
  • Daily updates on Instagram
  • To connect with me or be part of discussions, make sure you’re part of our Discord Server.

Check out the video version of this post on YouTube

--

--

Harshit Tyagi
Harshit Tyagi

Written by Harshit Tyagi

Director Data Science & ML at Scaler | LinkedIn's Bestselling Instructor | YouTuber