Qwen-72B

Maintainer: Qwen

Total Score

324

Last updated 5/28/2024

🤯

PropertyValue
Model LinkView on HuggingFace
API SpecView on HuggingFace
Github LinkNo Github link provided
Paper LinkNo paper link provided

Create account to get full access

or

If you already have an account, we'll log you in

Model overview

Qwen-72B is the 72B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-72B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-72B, Qwen releases Qwen-72B-Chat, a large-model-based AI assistant, which is trained with alignment techniques.

Key features of Qwen-72B include:

  1. Large-scale high-quality training corpora: It is pretrained on over 3 trillion tokens, including Chinese, English, multilingual texts, code, and mathematics, covering general and professional fields.
  2. Competitive performance: It significantly surpasses existing open-source models on multiple Chinese and English downstream evaluation tasks.
  3. More comprehensive vocabulary coverage: Compared to other models, Qwen-72B uses a vocabulary of over 150K tokens, allowing for more efficient encoding of multiple languages.
  4. Longer context support: Qwen-72B supports a context length of up to 32k tokens.

Model inputs and outputs

Inputs

  • Text: Qwen-72B accepts text inputs in a variety of languages, including Chinese, English, and others.

Outputs

  • Text: Qwen-72B generates fluent and coherent text outputs in response to the input, drawing upon its broad knowledge base.
  • Code: In addition to natural language, Qwen-72B can also generate code in various programming languages.

Capabilities

Qwen-72B demonstrates impressive performance on a wide range of tasks, including commonsense reasoning, language understanding, mathematical problem-solving, and code generation. For example, it achieves state-of-the-art results on benchmarks like MMLU, C-Eval, and HumanEval, outperforming many other large language models of similar or even larger scale.

What can I use it for?

With its broad capabilities, Qwen-72B can be leveraged for a variety of applications, such as:

  • Content creation: Generating high-quality text, articles, stories, and dialogues in multiple languages.
  • Conversational AI: Powering intelligent chatbots and virtual assistants with advanced language understanding and generation abilities.
  • Code generation and programming: Assisting developers with tasks like code completion, refactoring, and even full-fledged program generation.
  • Multilingual applications: Developing multilingual applications that can seamlessly handle and translate between various languages.

Things to try

One interesting aspect of Qwen-72B is its ability to handle long-form text and extended context. You could try generating coherent and relevant output based on lengthy prompts or multi-turn dialogues, exploring how the model maintains context and produces consistent responses over time.

Another area to experiment with is the model's code generation capabilities. You could provide Qwen-72B with programming prompts or partially completed code snippets and observe how it can extend and refine the code to solve specific tasks or implement desired functionalities.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

📈

Qwen-72B-Chat

Qwen

Total Score

138

The Qwen-72B-Chat model is a 72B-parameter version of the large language model series called Qwen, proposed by Alibaba Cloud. It is a Transformer-based large language model that has been pre-trained on a vast amount of data, including web texts, books, code, and more. Based on the pre-trained Qwen-72B model, the Qwen-72B-Chat model has been further trained using alignment techniques to create a large-model-based AI assistant. The Qwen-72B model features a large-scale high-quality training corpus of over 3 trillion tokens, covering Chinese, English, multilingual texts, code, and mathematics. It demonstrates competitive performance, surpassing existing open-source models on a variety of Chinese and English downstream evaluation tasks. The model also has more comprehensive vocabulary coverage, using over 150K tokens, which makes it more friendly to multiple languages. Additionally, it supports a longer context length of up to 32k tokens. Model inputs and outputs Inputs Text**: The Qwen-72B-Chat model can take in text input, such as prompts or conversations, to generate relevant responses. Outputs Text**: The model will generate text output in response to the input, which can be used for a variety of language-related tasks, such as chatbots, content generation, and question answering. Capabilities The Qwen-72B-Chat model demonstrates strong performance on a wide range of tasks, including commonsense reasoning, mathematical problem-solving, and code generation. It also exhibits the ability to handle long-context understanding and tool usage, such as calling plugins and APIs through ReAct Prompting. What can I use it for? The Qwen-72B-Chat model can be used for a variety of natural language processing tasks, such as building chatbots, generating content, and assisting with research and analysis. Its large-scale training and strong capabilities make it a powerful tool for developers and researchers working on language-related projects. Things to try One interesting aspect of the Qwen-72B-Chat model is its ability to handle long-context understanding. By incorporating techniques like NTK-aware interpolation and LogN attention scaling, the model can extend its context length to over 8,000 tokens, making it suitable for tasks that require processing and generating long-form text. Developers can explore how to leverage this capability to build more sophisticated language applications. Another area to experiment with is the model's tool usage capabilities, which allow it to call external plugins and APIs through ReAct Prompting. Developers can create custom plugins and integrate them with the Qwen-72B-Chat model to expand its functionality and enable it to perform a wider range of tasks.

Read more

Updated Invalid Date

🛸

Qwen-7B

Qwen

Total Score

349

Qwen-7B is the 7B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-7B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-7B, the maintainers release Qwen-7B-Chat, a large-model-based AI assistant, which is trained with alignment techniques. Qwen-7B significantly surpasses existing open-source models of similar scale on multiple Chinese and English downstream evaluation tasks, and even outperforms some larger-scale models in several benchmarks. Compared to other open-source models, Qwen-7B uses a more comprehensive vocabulary of over 150K tokens, which is more friendly to multiple languages. Model inputs and outputs Inputs Text prompt**: Qwen-7B accepts text prompts as input to generate output text. Outputs Generated text**: Qwen-7B generates relevant text based on the input prompt. Capabilities Qwen-7B demonstrates strong performance on a variety of benchmarks, including commonsense reasoning, coding, mathematics, and more. The model is also capable of engaging in open-ended conversation through the Qwen-7B-Chat version. What can I use it for? Qwen-7B and Qwen-7B-Chat can be used for a wide range of natural language processing tasks, such as text generation, question answering, and language understanding. The large-scale pretraining and strong performance make these models suitable for tasks like content creation, customer service chatbots, and even code generation. The maintainers also provide an API for users to integrate the models into their applications. Things to try Given Qwen-7B's strong performance on benchmarks, users can experiment with fine-tuning the model on specialized datasets to further enhance its capabilities for specific domains or tasks. The maintainers also provide intermediate checkpoints during the pretraining process, which can be used to study the model's learning dynamics. Additionally, the quantized versions of Qwen-7B-Chat offer improved inference speed and memory usage, making them suitable for deployment on resource-constrained environments.

Read more

Updated Invalid Date

🎲

Qwen-7B-Chat

Qwen

Total Score

742

Qwen-7B-Chat is a large language model developed by Qwen, a team from Alibaba Cloud. It is a transformer-based model that has been pretrained on a large volume of data including web texts, books, and code. Qwen-7B-Chat is an aligned version of the Qwen-7B model, trained using techniques to improve the model's conversational abilities. Compared to similar models like Baichuan-7B, Qwen-7B-Chat leverages the Qwen model series which has been optimized for both Chinese and English. The model achieves strong performance on standard benchmarks like C-EVAL and MMLU. Unlike LLaMA, which prohibits commercial use, Qwen-7B-Chat has a more permissive open-source license that allows for commercial applications. Model Inputs and Outputs Inputs Text prompts**: Qwen-7B-Chat accepts text prompts as input, which can be used to initiate conversations or provide instructions for the model. Outputs Text responses**: The model generates coherent and contextually relevant text responses based on the input prompts. The responses aim to be informative, engaging, and helpful for the user. Capabilities Qwen-7B-Chat demonstrates strong performance across a variety of natural language tasks, including open-ended conversations, question answering, summarization, and even code generation. The model can engage in multi-turn dialogues, maintain context, and provide detailed and thoughtful responses. For example, when prompted with "Tell me about the history of the internet", Qwen-7B-Chat is able to provide a comprehensive overview covering the key developments and milestones in the history of the internet, drawing upon its broad knowledge base. What Can I Use It For? Qwen-7B-Chat can be a valuable tool for a wide range of applications, including: Conversational AI assistants**: The model's strong conversational abilities make it well-suited for building engaging and intelligent virtual assistants that can help with a variety of tasks. Content generation**: Qwen-7B-Chat can be used to generate high-quality text content, such as articles, stories, or even marketing copy, by providing relevant prompts. Chatbots and customer service**: The model's ability to understand and respond to natural language queries makes it a good fit for building chatbots and virtual customer service agents. Educational applications**: Qwen-7B-Chat can be used to create interactive learning experiences, answer questions, and provide explanations on a variety of topics. Things to Try One interesting aspect of Qwen-7B-Chat is its ability to engage in open-ended conversations and provide detailed, contextually relevant responses. For example, try prompting the model with a more abstract or philosophical question, such as "What is the meaning of life?" or "How can we achieve true happiness?" The model's responses can provide interesting insights and perspectives, showcasing its depth of understanding and reasoning capabilities. Another area to explore is the model's ability to handle complex tasks, such as providing step-by-step instructions for a multi-part process or generating coherent and logical code snippets. By testing the model's capabilities in these more challenging areas, you can gain a better understanding of its strengths and limitations.

Read more

Updated Invalid Date

Qwen-14B

Qwen

Total Score

197

Qwen-14B is the 14B-parameter version of the large language model series, Qwen (abbr. Tongyi Qianwen), proposed by Alibaba Cloud. Qwen-14B is a Transformer-based large language model, which is pretrained on a large volume of data, including web texts, books, codes, etc. Additionally, based on the pretrained Qwen-14B, Qwen-14B-Chat is released, a large-model-based AI assistant, which is trained with alignment techniques. Qwen-14B features a large-scale high-quality training corpus of over 3 trillion tokens, covering Chinese, English, multilingual texts, code, and mathematics. It significantly surpasses existing open-source models of similar scale on multiple Chinese and English downstream evaluation tasks. Qwen-14B also uses a more comprehensive vocabulary of over 150K tokens, enabling users to directly enhance capabilities for certain languages without expanding the vocabulary. Model inputs and outputs Inputs Text**: Qwen-14B accepts text input of up to 2048 tokens. Outputs Text**: Qwen-14B generates text output in response to the input. Capabilities Qwen-14B demonstrates competitive performance across a range of benchmarks. On the C-Eval Chinese evaluation, it achieves 69.8% zero-shot and 71.7% 5-shot accuracy, outperforming similarly-sized models. On MMLU, its zero-shot and 5-shot English evaluation accuracy reaches 64.6% and 66.5% respectively. Qwen-14B also performs well on coding tasks, scoring 43.9% on the HumanEval zero-shot benchmark, and 60.1% on the zero-shot GSM8K mathematics evaluation. What can I use it for? The large scale and broad capabilities of Qwen-14B make it suitable for a variety of natural language processing tasks. Potential use cases include: Content generation**: Qwen-14B can be used to generate high-quality text on a wide range of topics, from creative writing to technical documentation. Conversational AI**: Building on the Qwen-14B-Chat model, developers can create advanced chatbots and virtual assistants. Multilingual support**: The model's comprehensive vocabulary allows it to handle multiple languages, enabling cross-lingual applications. Code generation and reasoning**: Qwen-14B's strong performance on coding and math tasks makes it useful for programming-related applications. Things to try One interesting aspect of Qwen-14B is its ability to handle long-form text. By incorporating techniques like NTK-aware interpolation and LogN attention scaling, the model can maintain strong performance even on sequences up to 32,768 tokens long. Developers could explore leveraging this capability for tasks like long-form summarization or knowledge-intensive QA. Another intriguing area to experiment with is Qwen-14B's tool usage capabilities. The model supports ReAct prompting, allowing it to interact with external plugins and APIs. This could enable the development of intelligent assistants that can seamlessly integrate diverse functionalities.

Read more

Updated Invalid Date