Phi 3: Highly Capable Language Model on Phone

Technical Specifications

Microsoft is realeasing Phi-3 in two versions: phi-3-mini and phi-3-small. The technical specifications of these models are as follows:

	phi-3-mini	phi-3-small
Model Parameters	3.8 billion	7 billion
Vocabulary Size	32,064	100,352
Hidden dimension	3072	4096
Layers [Transformer Blocks]	32	32

Training

Phi-3 is trained on a large-scale dataset of 3.3T tokens. Phi-3 follows the sequence of works initiated in Phi-1 paper “Textbooks Are All You Need” [4], which utilizes high quality training data to improve the performance of small language models and deviate from the standard scaling-laws.

These methods allow Phi-3 to reach the level of highly capable models such as GPT-3.5 or Mixtral with only 3.8B total parameters (while Mixtral has 45B total parameters for example). Our training data of consists of heavily filtered web data (according to the “educational level”) from various open internet sources, as well as synthetic LLM-generated data.

Pre-training is performed in two disjoint and sequential phases; phase-1 comprises mostly of web sources aimed at teaching the model general knowledge and language understanding.

Phase-2 merges even more heavily filtered webdata (a subset used in Phase-1) with some synthetic data that teach the model logical reasoning and various niche skills.

Model Architecture

Context length: The phi-3-mini model is a transformer decoder-only architecture [Vaswani et al, 2], with default context length 4K. A long context version via LongRope [3] that extends the context length to 128K, called phi-3-mini-128K.

Inference on iPhone

Quantized phi-3-mini can generate more than 12 tokens per second on iPhone 14 with A16 Bionic chip.

Thanks to its small size, phi-3-mini can be quantized to 4-bits so that it only occupies ≈ 1.8GB of memory. When tested the quantized model phi-3-mini on iPhone 14 with A16 Bionic chip running natively on-device and fully offline achieved more than 12 tokens per second.

References

Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv:2404.14219
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention is All You Need. arXiv:1706.03762
Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang. LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens. arXiv:2402.13753
Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li. Textbooks Are All You Need. arXiv:2306.11644