docs: add Mixtral-8x22B, Phi-3, Qwen-1.5-110B model (#58)

This commit is contained in:
junewgl 2024-05-03 20:12:53 +08:00 committed by GitHub
parent 3905589b7a
commit cbe14f023e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 21 additions and 0 deletions

View File

@ -32,6 +32,7 @@ We warmly welcome contributions from everyone, whether you've found a typo, a bu
## 📜 Contents
- [**Awesome Text2SQL**🎉🎉🎉](#awesome-text2sql)
- [🌱 How to Contribute](#-how-to-contribute)
- [🔔 Leaderboard](#-leaderboard)
- [📜 Contents](#-contents)
- [👋 Introduction](#-introduction)
- [📖 Survey](#-survey)
@ -348,10 +349,18 @@ for Text-to-SQL
[[model](https://huggingface.co/openbmb)]
- 2024/02, ModelBest Inc. and TsinghuaNLP proposes the open source LLM MiniCPM, which is an End-Side LLM, with only 2.4B parameters excluding embeddings (2.7B in total). It is worth that MiniCPM has very close performance compared with Mistral-7B on open-sourced general benchmarks with better ability on Chinese, Mathematics and Coding after SFT. The overall performance exceeds Llama2-13B, MPT-30B, Falcon-40B, etc.
- Mixtral-8x22B [[paper](https://mistral.ai/news/mixtral-8x22b/)] [[code](https://docs.mistral.ai/getting-started/open_weight_models/)] [[model](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)]
- 2024/04, Mistral AI proposed the latest open model Mixtral 8x22B. It sets a new standard for performance and efficiency within the AI community. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size.
- Phi-3 [[paper](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/)] [[model](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)]
- 2024/04, Microsoft proposed the Phi-3 models, which are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks. Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality. Phi-3-small (7B) and Phi-3-medium (14B) will be available in the Azure AI model catalog and other model gardens shortly.
- Llama 3 [[paper](https://ai.meta.com/blog/meta-llama-3/)] [[code](https://github.com/meta-llama/llama3)] [[model](https://huggingface.co/meta-llama)]
- 2024/04, Meta AI proposed the third generation Llama series open source large model Llama 3. The model has 2 parameter specifications, 8b and 70b, with base and instruct versions respectively. Excitingly, Llama 3 models are a major leap over Llama 2 and establish a new state-of-the-art for LLM models at those scales.
- Qwen-1.5-110B [[paper](https://qwenlm.github.io/blog/qwen1.5-110b/)] [[code](https://github.com/QwenLM/Qwen1.5)] [[model](https://huggingface.co/Qwen/Qwen1.5-110B)]
- 2024/04, Alibaba Cloud proposed the first 100B+ model of the Qwen1.5 series, Qwen1.5-110B, which achieves comparable performance with Meta-Llama3-70B in the base model evaluation, and outstanding performance in the chat evaluation, including MT-Bench and AlpacaEval 2.0. Qwen1.5 is the beta version of Qwen2, which has 9 model sizes, including 0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B, and 110B dense models, and an MoE model of 14B with 2.7B activated.
## 💡 Fine-tuning
- P-Tuning [[paper](https://arxiv.org/pdf/2103.10385.pdf)] [[code](https://github.com/THUDM/P-tuning)]

View File

@ -35,6 +35,7 @@
## 📜 目录
- [**Awesome Text2SQL**🎉🎉🎉](#awesome-text2sql)
- [🌱 如何贡献](#-如何贡献)
- [🔔 排行榜](#-排行榜)
- [📜 目录](#-目录)
- [👋 简介](#-简介)
- [📖 综述](#-综述)
@ -361,9 +362,20 @@ for Text-to-SQL
[[model](https://huggingface.co/openbmb)]
- 2024年2月, 面壁智能与清华大学自然语言处理实验室开源了大模型MiniCPM这是一个系列端侧大模型主体语言模型 MiniCPM-2B 仅有 24亿2.4B)的非词嵌入参数量, 总计2.7B参数量。值得注意的是,经过 SFT 后MiniCPM 在公开综合性评测集上MiniCPM 与 Mistral-7B相近中文、数学、代码能力更优整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。
- Mixtral-8x22B [[paper](https://mistral.ai/news/mixtral-8x22b/)][[code](https://docs.mistral.ai/getting-started/open_weight_models/)] [[model](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)]
- 2024年4月, Mistral AI提出了最新的开源模型 Mixtral 8x22B。 它为人工智能社区的性能和效率设立了新标准。 它是一个稀疏专家混合 (SMoE) 模型,仅使用 141B 个活动参数中的 39B 个,在其规模下提供了无与伦比的成本效率。
- Phi-3 [[paper](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/)] [[model](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)]
- 2024年4月, 微软提出了 Phi-3 模型,这是现有的功能最强大、最具成本效益的小语言模型 (SLM),在各种语言、推理、编码和计算方面,其性能优于相同大小和更高大小的模型。 数学基准。 Phi-3-mini 有两种上下文长度变体 4K 和 128K 令牌。 它是同类产品中第一个支持最多 128K 个令牌的上下文窗口的模型,对质量影响很小。 Phi-3-small (7B) 和 Phi-3-medium (14B) 很快就会在 Azure AI 模型目录和其他模型花园中提供。
- Llama 3 [[paper](https://ai.meta.com/blog/meta-llama-3/)] [[code](https://github.com/meta-llama/llama3)] [[model](https://huggingface.co/meta-llama)]
- 2024年4月Meta AI提出第三代Llama系列开源大模型Llama 3模型有8b、70b共2种参数规格, 每种规则都有base和instruct版本。令人兴奋的是Llama 3 模型是 Llama 2 的重大飞跃,并为这些参数规模的 LLM 模型建立了新的最先进水平。
- Qwen-1.5-110B [[paper](https://qwenlm.github.io/blog/qwen1.5-110b/)] [[code](https://github.com/QwenLM/Qwen1.5)] [[model](https://huggingface.co/Qwen/Qwen1.5-110B)]
- 2024年4月, 阿里云提出Qwen1.5系列首个100B+模型Qwen1.5-110B该模型在基础模型评测中达到与Meta-Llama3-70B相当的性能在包括MT-Bench和AlpacaEval 2.0在内的聊天评测中表现出色。Qwen1.5是Qwen2的测试版有9种模型尺寸包括0.5B、1.8B、4B、7B、14B、32B、72B和110B密集模型以及14B激活2.7B的MoE模型。
## 💡 微调
- P-Tuning [[paper](https://arxiv.org/pdf/2103.10385.pdf)] [[code](https://github.com/THUDM/P-tuning)]
- 2021年3月清华大学等提出了针对大模型微调方法P-Tuning采用可训练的连续提示词嵌入降低了微调成本。