AI Research Scientist/Engineer Program

This post is inspired by Harshit Tyagi’s amazing github repo AI Research Program where he created an awesome independent AI research program for those who are interested to apply for research engineer / research scientist roles in companies like Meta, OpenAI , Anthropic etc. I will be focusing on pillar 5 i.e. NLP / LLM Research.

Key Projects in NLP and LLM Research

In order to gain deep understanding of topics will use karpathy’s advice and start with a project and learn things around that project “on demand”. Following are the list of projects and topics that I will be focusing on while building the projects in next few posts.

Project	Learning Objectives	Evaluation Criteria
- Implement word2vec from scratch - Build a custom tokenizer for a specific language or domain - Create a data preprocessing pipeline for a large text corpus	- Word Vectors/Embeddings - Tokenization - Preprocessing - Data Sampling	- Accuracy of word embeddings on analogy tasks - Efficiency and coverage of tokenization - Quality and cleanliness of preprocessed data
- Implement a part-of-speech tagger using HMMs - Build a spam classifier using Naive Bayes - Develop a named entity recognition system using CRFs	- Hidden Markov Models - Naive Bayes - Maximum Entropy Markov Models - Conditional Random Fields	- Accuracy on standard POS tagging datasets - Precision, recall, and F1 score for spam classification - F1 score on CoNLL 2003 NER dataset
- Implement a sentiment analysis model using CNNs - Build a language model using LSTMs - Create a machine translation system using Transformers	- Feed-forward Neural Networks - Recurrent Neural Networks - Convolutional Neural Networks - Attention Mechanisms - Transformers	- Accuracy on sentiment analysis benchmarks (e.g., IMDb) - Perplexity of language model on test set - BLEU score for machine translation
- Fine-tune GPT-2 for text generation - Implement a BERT-based question answering system - Create a multimodal model for image captioning	- N-gram Models - Neural Language Models - Autoregressive vs. Autoencoder Models - Large Language Models (LLMs) - Vision-Language Models (VLMs)	- Perplexity and cross-entropy loss - F1 and Exact Match scores for QA - BLEU, METEOR, and CIDEr scores for image captioning
- Implement different decoding strategies (greedy, beam search, top-k, top-p) - Develop a method to extend context length of a pre-trained LLM - Create a personalized language model using adapters	- LLM Alignment - Token Sampling Methods - Context Length Extension - Personalization	- Human evaluation of model alignment - Quality and diversity of generated text - Perplexity on long-context tasks - Personalization accuracy on user-specific tasks
- Build an end-to-end neural machine translation system - Develop a RAG system for question answering - Create a document understanding system for invoice processing	- Machine Translation - Named Entity Recognition - Textual Entailment - Retrieval Augmented Generation (RAG) - Document Intelligence	- BLEU, METEOR scores for MT - F1 score for NER - Accuracy on textual entailment datasets (e.g., SNLI) - Relevance and accuracy of RAG responses
- Construct a knowledge graph from a text corpus - Develop a question answering system using a knowledge graph	- Knowledge Graphs - Semantic Networks	- Coverage and accuracy of extracted knowledge - Precision and recall of graph-based QA system
- Develop a fact-checking system for LLM outputs - Create an AI text detector - Implement bias mitigation techniques in word embeddings	- Hallucination Mitigation - AI Text Detection - Bias Detection and Mitigation	- Reduction in hallucination rate - Accuracy of AI text detection - Reduction in bias measures (e.g., WEAT score)
- Evaluate an LLM on multiple NLP tasks using GLUE benchmark - Implement and compare different evaluation metrics for a specific NLP task	- LLM/VLM Benchmarks - Task-specific Metrics	- Performance across multiple benchmarks - Inter-annotator agreement for human evaluation
- Design and implement an LLMOps pipeline - Conduct an ethical audit of an NLP system	- Large Language Model Ops (LLMOps) - Ethical Considerations	- Efficiency and reliability of deployment pipeline - Compliance with ethical AI principles