The AI Blog
My notes on AI research, reinforcement learning, and software engineering
Vision Transformers
An in-depth introduction to Vision Transformers and their potential to revolutionize the field of computer vision.
Paligemma: Versatile VLM - Vision Language Models
Deep-dive into Paligemma Google's innovative vision language model. Learn about its architecture, capabilities, and potential applications in AI and machine learning.
SigLIP: Sigmoid Loss in Language Image Pretraining
How SigLIP works and addresses the shortcomings of contrastive loss of LIP Language Image Pretraining
Whisper: Transformer for Speech Recognition
Understand the Transformer model and its architecture for speech recognition. Learn how Transformers are used in ASR systems and how they compare to traditional RNNs and LSTMs
Phi 3: Highly Capable Language Model on Phone
Phi 3, developed by Microsoft AI, is a highly capable language model that can run on a phone
LLaVA: Large Multimodal Model
LLaVA is a multimodal large language model that has a vision encoder and a language encoder. It is trained on a large-scale multimodal dataset and can generate text conditioned on images
Llama 3: SOTA open-weights LLM
Llama 3 by Meta AI is trained on more than 15T tokens of data. It is currently the SOTA open-weights LLM available beating out the competition by a fair margin. It will be released in three different variants: 8B, 70B and 400B.
Grouped Query Attention
Grouped Query Attention is a new attention mechanism that can be used to improve the performance of transformer models.
RoPE: Rotary Positional Embedding
An in-depth look at RoPE, a novel positional encoding method that uses sinusoidal embeddings to capture the relative positions of tokens in a sequence.
Multi Query Attention
Understand the Multi Query Attention mechanism in the context of Transformers and how it helps in capturing multiple aspects of the input sequence
Gemma: Google's family of Open LLMs
Gemma is a state of the art family of models developed by Google Research. Gemma models are open-source and can be used for a variety of tasks.
How Stable Diffusion works
A deep dive into the architecture and inner workings of the revolutionary generative model, Stable Diffusion developed by Stability AI
CLIP: Bridging Vision and Language in AI
An in-depth and practical guide into how generative AI models are bridging the gap between vision and language using Contrastive Learning approaches as seen in CLIP, released by OpenAI in 2021.
Mixtral of Experts
An in-depth guide into the architecture of Mistral AI's state of the art Sparse Mixture of Experts (SMoE) model, Mixtral of Experts
MoE: Mixture of Experts
Mixture of Experts (MoE) architecture, a cutting-edge approach revolutionizing Large Language Models (LLMs). Let's explore how MoE enhances LLM performance by leveraging specialized expert networks, leading to more efficient and accurate language processing
RAG Triad - Evaluating Quality of Response from LLMs
RAG has become the standard architecture for providing LLMs with the relevant context to reduce hallucinations and improve performance. In this blog post, we will look at the RAG triad and how it can be used to evaluate quality of response from LLMs
DDPM: Denoising Diffusion Probabilistic Models
Deep dive into understanding the basic building blocks of Denoising Diffusion Probabilistic Models and code implementation using PyTorch.
GPT-2: A Deep Dive
An in-depth look at GPT-2, a large language model developed by OpenAI, and its architecture, training, and applications.
BERT: Bidirectional Encoder Representations from Transformers
Understand how the widely used BERT model works and its architecture is related to the Transformer model. Also, learn how BERT is pre-trained and fine-tuned for various NLP tasks
Transformers: Attention is All You Need
Transformers have become the building blocks of many state-of-the-art AI models. This post provides a detailed explanation of the Transformer architecture which was introduced in the paper 'Attention is All You Need'
Tokenization in Large Language Models
A detailed guide into how tokenization strategies work in large language models. This post will cover the basics of tokenization, the different tokenization strategies, and how they are implemented in large language models
VAE: Variational Autoencoder
Gentle introduction to Variational Autoencoder (VAE) - a type of generative model used to learn efficient representations of data by learning the parameters of a probability distribution representing the data. We explore the architecture, training process, and the reparameterization trick used in VAEs