docs: add Mixtral-8x22B, Phi-3, Qwen-1.5-110B model (#58)

2024-05-03 20:12:53 +08:00 · 2024-05-03 20:12:53 +08:00 · cbe14f023e
parent 3905589b7a
commit cbe14f023e
2 changed files with 21 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -32,6 +32,7 @@ We warmly welcome contributions from everyone, whether you've found a typo, a bu
 ## 📜 Contents
 - [**Awesome Text2SQL**🎉🎉🎉](#awesome-text2sql)
  - [🌱 How to Contribute](#-how-to-contribute)
+  - [🔔 Leaderboard](#-leaderboard)
  - [📜 Contents](#-contents)
  - [👋 Introduction](#-introduction)
  - [📖 Survey](#-survey)
@ -348,10 +349,18 @@ for Text-to-SQL
  [[model](https://huggingface.co/openbmb)]
   - 2024/02, ModelBest Inc. and TsinghuaNLP proposes the open source LLM MiniCPM, which is an End-Side LLM, with only 2.4B parameters excluding embeddings (2.7B in total). It is worth that MiniCPM has very close performance compared with Mistral-7B on open-sourced general benchmarks with better ability on Chinese, Mathematics and Coding after SFT. The overall performance exceeds Llama2-13B, MPT-30B, Falcon-40B, etc.

+ - Mixtral-8x22B [[paper](https://mistral.ai/news/mixtral-8x22b/)] [[code](https://docs.mistral.ai/getting-started/open_weight_models/)] [[model](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)]
+   - 2024/04, Mistral AI proposed the latest open model Mixtral 8x22B. It sets a new standard for performance and efficiency within the AI community. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size.
+
+ - Phi-3 [[paper](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/)] [[model](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)]
+   - 2024/04, Microsoft proposed the Phi-3 models, which are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks. Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality. Phi-3-small (7B) and Phi-3-medium (14B) will be available in the Azure AI model catalog and other model gardens shortly.   

  - Llama 3 [[paper](https://ai.meta.com/blog/meta-llama-3/)] [[code](https://github.com/meta-llama/llama3)] [[model](https://huggingface.co/meta-llama)]
    - 2024/04, Meta AI proposed the third generation Llama series open source large model Llama 3. The model has 2 parameter specifications, 8b and 70b, with base and instruct versions respectively. Excitingly, Llama 3 models are a major leap over Llama 2 and establish a new state-of-the-art for LLM models at those scales.

+  - Qwen-1.5-110B [[paper](https://qwenlm.github.io/blog/qwen1.5-110b/)] [[code](https://github.com/QwenLM/Qwen1.5)] [[model](https://huggingface.co/Qwen/Qwen1.5-110B)]
+    - 2024/04, Alibaba Cloud proposed the first 100B+ model of the Qwen1.5 series, Qwen1.5-110B, which achieves comparable performance with Meta-Llama3-70B in the base model evaluation, and outstanding performance in the chat evaluation, including MT-Bench and AlpacaEval 2.0. Qwen1.5 is the beta version of Qwen2, which has 9 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B, and 110B dense models, and an MoE model of 14B with 2.7B activated.
+

 ## 💡 Fine-tuning
 - P-Tuning [[paper](https://arxiv.org/pdf/2103.10385.pdf)] [[code](https://github.com/THUDM/P-tuning)] 
--- a/README.zh.md
+++ b/README.zh.md
@ -35,6 +35,7 @@
 ## 📜 目录
 - [**Awesome Text2SQL**🎉🎉🎉](#awesome-text2sql)
  - [🌱 如何贡献](#-如何贡献)
+  - [🔔 排行榜](#-排行榜)
  - [📜 目录](#-目录)
  - [👋 简介](#-简介)
  - [📖 综述](#-综述)
@ -361,9 +362,20 @@ for Text-to-SQL
  [[model](https://huggingface.co/openbmb)]
   - 2024年2月, 面壁智能与清华大学自然语言处理实验室开源了大模型MiniCPM，这是一个系列端侧大模型，主体语言模型 MiniCPM-2B 仅有 24亿（2.4B）的非词嵌入参数量, 总计2.7B参数量。值得注意的是，经过 SFT 后，MiniCPM 在公开综合性评测集上，MiniCPM 与 Mistral-7B相近（中文、数学、代码能力更优），整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。

+ - Mixtral-8x22B [[paper](https://mistral.ai/news/mixtral-8x22b/)][[code](https://docs.mistral.ai/getting-started/open_weight_models/)] [[model](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)]
+   - 2024年4月, Mistral AI提出了最新的开源模型 Mixtral 8x22B。 它为人工智能社区的性能和效率设立了新标准。 它是一个稀疏专家混合 (SMoE) 模型，仅使用 141B 个活动参数中的 39B 个，在其规模下提供了无与伦比的成本效率。
+
+
+ - Phi-3 [[paper](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/)] [[model](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)]
+   - 2024年4月, 微软提出了 Phi-3 模型，这是现有的功能最强大、最具成本效益的小语言模型 (SLM)，在各种语言、推理、编码和计算方面，其性能优于相同大小和更高大小的模型。 数学基准。 Phi-3-mini 有两种上下文长度变体 4K 和 128K 令牌。 它是同类产品中第一个支持最多 128K 个令牌的上下文窗口的模型，对质量影响很小。 Phi-3-small (7B) 和 Phi-3-medium (14B) 很快就会在 Azure AI 模型目录和其他模型花园中提供。
+
+
  - Llama 3 [[paper](https://ai.meta.com/blog/meta-llama-3/)] [[code](https://github.com/meta-llama/llama3)] [[model](https://huggingface.co/meta-llama)]
    - 2024年4月，Meta AI提出第三代Llama系列开源大模型Llama 3，模型有8b、70b共2种参数规格, 每种规则都有base和instruct版本。令人兴奋的是，Llama 3 模型是 Llama 2 的重大飞跃，并为这些参数规模的 LLM 模型建立了新的最先进水平。

+  - Qwen-1.5-110B [[paper](https://qwenlm.github.io/blog/qwen1.5-110b/)] [[code](https://github.com/QwenLM/Qwen1.5)] [[model](https://huggingface.co/Qwen/Qwen1.5-110B)]
+    - 2024年4月, 阿里云提出Qwen1.5系列首个100B+模型Qwen1.5-110B，该模型在基础模型评测中达到与Meta-Llama3-70B相当的性能，在包括MT-Bench和AlpacaEval 2.0在内的聊天评测中表现出色。Qwen1.5是Qwen2的测试版，有9种模型尺寸，包括0.5B、1.8B、4B、7B、14B、32B、72B和110B密集模型，以及14B激活2.7B的MoE模型。
+
 ## 💡 微调
 - P-Tuning [[paper](https://arxiv.org/pdf/2103.10385.pdf)] [[code](https://github.com/THUDM/P-tuning)] 
  - 2021年3月，清华大学等提出了针对大模型微调方法P-Tuning，采用可训练的连续提示词嵌入，降低了微调成本。