The AI Blog

My notes on AI research, reinforcement learning, and software engineering

Understanding GPT-OSS architecture

August 6, 2025

An in-depth look at the architecture of GPT-OSS, an open-source large language model released by OpenAI on 5th August 2025. It is a MoE GPT-2-style Transformer with 36/24 layers, 128/32 experts with top-4 routing, RMSNorm, Grouped Query Attention + RoPE attn, 131 K context via YaRN, 4-bit MXFP4 packs 120 B on 80 GB & 20 B on 16 GB, and SwiGLU activation.

Normalization Techniques in Transformer-Based LLMs: LayerNorm, RMSNorm, and Beyond

July 26, 2025

Deep dive into the evolution of normalization techniques in transformer-based LLMs, from the trusty LayerNorm to newer variants like RMSNorm, and even experimental tweaks.

Model Context Protocol (MCP): The USB-C of AI Integrations

June 16, 2025

A deep dive into the Model Context Protocol (MCP) - the open standard that lets large-language-model agents securely plug into any tool, database, or codebase.

RAG Techniques - OpenAI API + Qdrant

May 6, 2025

A guide for Retrieval-Augmented Generation from simple to advanced techniques with just OpenAI API + Qdrant

Training LLMs on GPUs

November 30, 2024

How GPUs are used to train Large Language Models LLMs and why they are so effective. How does TFLOPs

How AI monitors calls of 5,000+ sales representatives for actionable insights

November 23, 2024

Deep dive into the engineering of scaling a production ready AI-powered sales monitoring system. Leveraging a complex pipeline of LLMs, ASR, and diarization to monitor calls of more than 5000+ salespeople

The Path to the Ultimate Prize

November 22, 2024

A critique of the current AI narrative and its limitations, and a call for a more comprehensive approach to understanding and replicating human intelligence.

Vision Transformers

October 14, 2024

An in-depth introduction to Vision Transformers and their potential to revolutionize the field of computer vision.

Paligemma: Versatile VLM - Vision Language Models

October 11, 2024

Deep-dive into Paligemma Google's innovative vision language model. Learn about its architecture, capabilities, and potential applications in AI and machine learning.

SigLIP: Sigmoid Loss in Language Image Pretraining

October 6, 2024

How SigLIP works and addresses the shortcomings of contrastive loss of LIP Language Image Pretraining

Whisper: Transformer for Speech Recognition

May 7, 2024

Understand the Transformer model and its architecture for speech recognition. Learn how Transformers are used in ASR systems and how they compare to traditional RNNs and LSTMs

Phi 3: Highly Capable Language Model on Phone

May 2, 2024

Phi 3, developed by Microsoft AI, is a highly capable language model that can run on a phone

LLaVA: Large Multimodal Model

April 30, 2024

LLaVA is a multimodal large language model that has a vision encoder and a language encoder. It is trained on a large-scale multimodal dataset and can generate text conditioned on images

Llama 3: SOTA open-weights LLM

April 23, 2024

Llama 3 by Meta AI is trained on more than 15T tokens of data. It is currently the SOTA open-weights LLM available beating out the competition by a fair margin. It will be released in three different variants: 8B, 70B and 400B.

Grouped Query Attention

April 16, 2024

Grouped Query Attention is a new attention mechanism that can be used to improve the performance of transformer models.

RoPE: Rotary Positional Embedding

April 16, 2024

An in-depth look at RoPE, a novel positional encoding method that uses sinusoidal embeddings to capture the relative positions of tokens in a sequence.

Multi Query Attention

April 13, 2024

Understand the Multi Query Attention mechanism in the context of Transformers and how it helps in capturing multiple aspects of the input sequence

Gemma: Google's family of Open LLMs

April 12, 2024

Gemma is a state of the art family of models developed by Google Research. Gemma models are open-source and can be used for a variety of tasks.

How Stable Diffusion works

April 11, 2024

A deep dive into the architecture and inner workings of the revolutionary generative model, Stable Diffusion developed by Stability AI

CLIP: Bridging Vision and Language in AI

April 10, 2024

An in-depth and practical guide into how generative AI models are bridging the gap between vision and language using Contrastive Learning approaches as seen in CLIP, released by OpenAI in 2021.

Mixtral of Experts

April 9, 2024

An in-depth guide into the architecture of Mistral AI's state of the art Sparse Mixture of Experts (SMoE) model, Mixtral of Experts

MoE: Mixture of Experts

April 8, 2024

Mixture of Experts (MoE) architecture, a cutting-edge approach revolutionizing Large Language Models (LLMs). Let's explore how MoE enhances LLM performance by leveraging specialized expert networks, leading to more efficient and accurate language processing

RAG Triad - Evaluating Quality of Response from LLMs

April 8, 2024

RAG has become the standard architecture for providing LLMs with the relevant context to reduce hallucinations and improve performance. In this blog post, we will look at the RAG triad and how it can be used to evaluate quality of response from LLMs

DDPM: Denoising Diffusion Probabilistic Models

March 23, 2024

Deep dive into understanding the basic building blocks of Denoising Diffusion Probabilistic Models and code implementation using PyTorch.

GPT-2: A Deep Dive

March 22, 2024

An in-depth look at GPT-2, a large language model developed by OpenAI, and its architecture, training, and applications.

BERT: Bidirectional Encoder Representations from Transformers

March 21, 2024

Understand how the widely used BERT model works and its architecture is related to the Transformer model. Also, learn how BERT is pre-trained and fine-tuned for various NLP tasks

Transformers: Attention is All You Need

March 20, 2024

Transformers have become the building blocks of many state-of-the-art AI models. This post provides a detailed explanation of the Transformer architecture which was introduced in the paper 'Attention is All You Need'

Tokenization in Large Language Models

March 19, 2024

A detailed guide into how tokenization strategies work in large language models. This post will cover the basics of tokenization, the different tokenization strategies, and how they are implemented in large language models

VAE: Variational Autoencoder

March 18, 2024

Gentle introduction to Variational Autoencoder (VAE) - a type of generative model used to learn efficient representations of data by learning the parameters of a probability distribution representing the data. We explore the architecture, training process, and the reparameterization trick used in VAEs