AI

Microsoft Releases Phi-2, a Small Language Model AI That Outperforms Llama 2 and Mistral 7B

By Scott FaulknerUpdated:December 14, 20234 Mins Read

Microsoft Releases Phi-2, a Small Language Model AI That Outperforms Llama 2 and Mistral 7B

What is Microsoft’s Phi-2? Despite nearing the end of 2023 and the customary quietude of the winter holiday season, the rapid flow of news and revelations in the realm of generative AI remains unabated.

A noteworthy instance is the latest disclosure from Microsoft Research, the forward-thinking division of the software giant. Today, they introduced the Phi 2 Small Language Model (SML), a text-to-text AI program explicitly designed to be “Compact enough to run on a laptop or mobile device,” as outlined in a post on X.

Simultaneously, the Phi 2, equipped with its 2.7 billion parameters (The connections between artificial neurons), showcases performance that stands toe-to-toe with more expansive models. This includes the likes of Meta’s Llama 2-7B, boasting 7 billion parameters, and Mistral-7B, another model with an extensive 7 billion parameters. Despite its relatively smaller scale, Phi 2 demonstrates noteworthy performance comparable to these larger counterparts.

Also Read: NVIDIA CEO Jensen Huang Says AI Will Be ‘Fairly Competitive’ with Humans In 5 Years

Navigator

The Model Showcases Exemplary AI Performance

According to the blog post accompanying the Phi 2 release by Microsoft researchers, the model exhibits superior performance compared to Google’s recent Gemini Nano 2 model, even with an additional half a billion parameters. Notably, Phi-2 also demonstrates a lower level of “Toxicity” and bias in its responses compared to Llama 2.

Despite these promising results, there is a significant constraint with Phi-2, at least for the time being. It is specifically licensed for “Research purposes only” under a custom Microsoft Research License. The license explicitly mentions that Phi-2 is restricted to “Non-commercial, non-revenue generating, research purposes.” This limitation means that businesses seeking to develop products using Phi-2 are currently unable to do so.

The exceptional performance of Phi-2, according to the company, is attributed to its training on meticulously curated, high-quality data designed to impart reasoning, knowledge, and common sense. This focus enables Phi-2 to extract meaningful insights from a smaller amount of information. Notably, Microsoft’s researchers implemented techniques allowing the incorporation of knowledge from smaller models.

Related: Surprising Breakthrough by Mistral AI: New LLM Outperforms GPT-3.5

What Sets the Phi-2 Apart?

Microsoft just released Phi-2 (2.7B) which seems to come close to Llama 70B.

It’s trained on 1.4T tokens of heavily filtered web data and a lot of GPT 3.5/4 outputs.

Knowledge distillation really does work, and combining it with other high quality data could be the key for OSS. pic.twitter.com/i9RbUtSmCK
— Mark Tenenholtz (@marktenenholtz) December 12, 2023

What sets Phi-2 apart is its ability to achieve robust performance without relying on commonly employed techniques such as reinforcement learning based on human feedback or instructional fine-tuning. These techniques are often utilised to enhance the behavior of AI models. Despite foregoing these methods, Phi-2 demonstrated superior performance, particularly in mitigating bias and toxicity, when compared to other open-source models that do utilise such techniques. Microsoft attributes this success to the careful curation of its training data.

Phi-2 represents the latest installment in a series of what Microsoft’s researchers refer to as “Small language models” (SLMs). The series commenced with Phi-1 earlier in the year, featuring 1.3 billion parameters and specifically fine-tuned for basic Python coding tasks. Subsequently, in September, the company introduced Phi-1.5, also with 1.3 billion parameters but trained on novel data sources, including synthetic texts generated through natural language programming.

Microsoft emphasises that Phi-2’s efficiency positions it as an ideal platform for researchers looking to delve into areas such as advancing AI safety, interpretability, and the ethical development of language models. The model’s capabilities make it a valuable tool for exploring and addressing critical aspects of AI research and development.

In other news, check out how the Calm app is using Jimmy Stewart’s voice to help thousands of people sleep more soundly each night here.

Impressively Built Despite Its Tiny Size

Microsoft Releases Phi-2, a Small Language Model AI That Outperforms Llama 2 and Mistral 7B

Microsoft attributes the impressive performance of Phi-2 at its smaller scale to two key insights:

Training Data Quality: The quality of training data significantly influences the capabilities of the model. Phi-2’s proficiency is a result of being trained on high-quality “textbook” data intentionally designed to teach reasoning, knowledge, and common sense. This focus allows Phi-2 to glean more meaningful insights from a smaller amount of data.
Knowledge Embedding Techniques: Microsoft employed techniques such as embedding knowledge from smaller models to efficiently scale the insights of Phi-2. Building on the foundation of the 1.3 billion parameter Phi-1.5, methods like knowledge transfer were utilised to unlock the surprisingly robust abilities of the 2.7 billion parameter Phi-2 without requiring an exponential increase in the amount of data used for training. This approach highlights the efficiency gains achieved through strategic knowledge transfer within the model development process.

Phi-2’s high performance at its compact scale is attributed to strategic insights: prioritizing quality in “Textbook” training data that emphasizes reasoning and knowledge, enabling more learning from less data. Microsoft’s use of knowledge transfer techniques from smaller models, like the 1.3 billion parameter Phi-1.5, allowed Phi-2’s 2.7 billion parameters to demonstrate strong capabilities without an exponential increase in required data. These methods enhance efficiency in model development.

For the latest news on AI, check out https://player.me/category/ai/.

Author Profile

Scott Faulkner

Latest entries

Visited 64 times, 1 visit(s) today

Artificial Intelligence Phi-2 Technology

Previous ArticleWhen Is Life by You Early Access Coming Out?

Next Article Google Messages Editing Feature Coming Soon? It Might Finally Let Users Better Edit Messages

Scott Faulkner