chatglm3-6b

Maintainer: nomagick

Last updated 6/29/2024

Property	Value
Model Link	View on Replicate
API Spec	View on Replicate
Github Link	View on Github
Paper Link	View on Arxiv

Create account to get full access

Model overview

ChatGLM3-6B is an open-source bilingual conversational language model co-developed by Zhipu AI and Tsinghua University's KEG Lab. Building upon the strengths of previous ChatGLM models in conversational fluency and low deployment barriers, ChatGLM3-6B introduces several key enhancements:

Stronger base model: The ChatGLM3-6B-Base model powering ChatGLM3-6B uses a more diverse training dataset, more extensive training steps, and a more robust training strategy. Evaluations on various datasets show that ChatGLM3-6B-Base has the strongest performance among sub-10B base models.
Expanded functionality: ChatGLM3-6B adopts a new prompt format that supports not only normal multi-turn conversations, but also native tool invocation (Function Call), code execution (Code Interpreter), and agent tasks.
Comprehensive open-source series: In addition to the conversational model ChatGLM3-6B, the team has also open-sourced the base model ChatGLM3-6B-Base, the long-text conversation model ChatGLM3-6B-32K, and the ChatGLM3-6B-128K model with further enhanced long-text understanding capabilities. All these weights are completely open for academic research and free commercial use after registration.

Model inputs and outputs

Inputs

Prompt: The input prompt for the model to generate a response. The prompt should be organized using the specific format described in the prompt guide.
Max Tokens: The maximum number of new tokens to generate in the response.
Temperature: A value controlling the randomness of the model's output. Higher values lead to more diverse but less coherent outputs.
Top P: A value controlling the diversity of the model's output. Lower values lead to more focused and less diverse outputs.

Outputs

The model will generate a response as a sequence of text tokens. The response can be used as the next part of a conversational exchange.

Capabilities

ChatGLM3-6B has demonstrated strong performance across a wide range of tasks, including language understanding, reasoning, math, coding, and knowledge-intensive applications. Compared to previous ChatGLM models, it exhibits significant improvements in areas like long-form text generation, task-oriented dialogue, and multi-turn reasoning.

What can I use it for?

ChatGLM3-6B is a versatile model that can be applied to a variety of natural language processing tasks, such as:

Conversational AI: The model can be used to build intelligent conversational assistants that can engage in fluent and contextual dialogues.
Content Generation: The model can generate high-quality text for applications like creative writing, summarization, and question-answering.
Code Generation and Interpretation: The model can be used to generate, explain, and debug code across multiple programming languages.
Knowledge-Intensive Tasks: The model can be fine-tuned for tasks that require deep understanding of a specific domain, such as financial analysis, scientific research, or legal reasoning.

Things to try

Some key things to try with ChatGLM3-6B include:

Exploring the model's multi-turn dialogue capabilities: Engage the model in a back-and-forth conversation and see how it maintains context and coherence.
Testing the model's reasoning and problem-solving skills: Prompt the model with math problems, logical puzzles, or open-ended questions that require thoughtful analysis.
Evaluating the model's code generation and interpretation abilities: Ask the model to write, explain, or debug code in various programming languages.
Experimenting with different prompting strategies: Try different prompt formats, styles, and tones to see how the model's outputs vary.

By pushing the boundaries of what the model can do, you can uncover its strengths, limitations, and unique capabilities, and develop innovative applications that leverage the power of large language models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

chatglm2-6b

nomagick

chatglm2-6b is an open-source bilingual (Chinese-English) chat model developed by the Tsinghua Natural Language Processing Group (THUDM). It is the second-generation version of the chatglm-6b model, building upon the smooth conversation flow and low deployment threshold of the first-generation model while introducing several key improvements. Compared to chatglm-6b, chatglm2-6b has significantly stronger performance, with a 23% improvement on MMLU, 33% on CEval, 571% on GSM8K, and 60% on BBH datasets. This is achieved by upgrading the base model to use the hybrid objective function of GLM, along with 1.4T of bilingual pre-training and human preference alignment. The model also features a longer context length, expanded from 2K in chatglm-6b to 32K, with an 8K context used during dialogue training. This allows for more coherent and contextual conversations. Additionally, chatglm2-6b incorporates Multi-Query Attention for more efficient inference, with a 42% speed increase compared to the first-generation model. Under INT4 quantization, the model can now support dialogue lengths of up to 8K tokens on 6GB GPUs, a significant improvement over the 1K limit of chatglm-6b. Importantly, the chatglm2-6b model weights are completely open for academic research, and free commercial use is also allowed after completing a questionnaire. This aligns with the project's goal of driving the development of large language models in an open and collaborative manner. Model inputs and outputs Inputs Prompt**: The input text that the model will use to generate a response. This can be a question, statement, or any other text the user wants the model to continue or generate. Max tokens**: The maximum number of new tokens the model will generate in response to the prompt, up to a maximum of 32,768. Temperature**: A value between 0 and 1 that controls the "randomness" of the model's output. Lower temperatures result in more deterministic, coherent responses, while higher temperatures introduce more diversity and unpredictability. Top p**: A value between 0 and 1 that controls the amount of "probability mass" the model will consider when generating text. Lower values result in more focused, deterministic responses, while higher values introduce more variation. Outputs Response**: The text generated by the model in response to the input prompt. The model will continue generating text until it reaches the specified max_tokens or encounters an end-of-sequence token. Capabilities chatglm2-6b has demonstrated strong performance across a variety of benchmarks, including significant improvements over the first-generation chatglm-6b model. Some key capabilities of chatglm2-6b include: Multilingual understanding and generation**: The model is capable of understanding and generating both Chinese and English text, making it a useful tool for cross-lingual communication and applications. Improved reasoning and task-completion**: Compared to chatglm-6b, chatglm2-6b has shown notable improvements on benchmarks like MMLU, CEval, GSM8K, and BBH, indicating better reasoning and task-completion abilities. Longer context understanding**: The expanded context length of chatglm2-6b allows for more coherent and contextual conversations, maintaining relevant information over longer exchanges. Efficient inference**: The incorporation of Multi-Query Attention provides chatglm2-6b with faster inference speeds and lower GPU memory usage, making it more practical for real-world applications. What can I use it for? The open-source and multilingual nature of chatglm2-6b makes it a versatile model that can be utilized for a wide range of applications, including: Chatbots and conversational agents**: The model's strong dialogue capabilities make it well-suited for building chatbots and conversational AI assistants in both Chinese and English. Content generation**: chatglm2-6b can be used to generate various types of text, such as articles, stories, or creative writing, in both languages. Language understanding and translation**: The model's bilingual capabilities can be leveraged for cross-lingual tasks like language understanding, translation, and multilingual information retrieval. Task-oriented applications**: The model's improved reasoning and task-completion abilities can be applied to a variety of domains, such as question answering, problem-solving, and summarization. Things to try Some interesting things to explore with chatglm2-6b include: Multilingual dialogues**: Test the model's ability to seamlessly switch between Chinese and English, maintaining context and coherence across language boundaries. Long-form text generation**: Experiment with the model's expanded context length to see how it handles generating coherent and cohesive multi-paragraph text. Creative writing**: Prompt the model with open-ended creative writing tasks, such as story starters or worldbuilding prompts, and see the unique ideas it generates. Task-oriented prompts**: Challenge the model with complex, multi-step tasks that require reasoning and problem-solving, and observe its performance. Fine-tuning and adaptation**: Explore ways to fine-tune or adapt the chatglm2-6b model for specific domains or use cases, leveraging its strong foundation to create more specialized AI assistants. By experimenting with chatglm2-6b and exploring its capabilities, you can gain insights into the evolving landscape of large language models and find new and innovative ways to apply this powerful AI tool.

Updated Invalid Date

Text-to-Text

qwen-14b-chat

nomagick

qwen-14b-chat is a Transformer-based large language model developed by nomagick, a researcher at Alibaba Cloud. It is the 14 billion parameter version of the Qwen series of large language models, which also includes qwen-1.8b, qwen-7b, and qwen-72b. Like other models in the Qwen series, qwen-14b-chat was pretrained on a large corpus of web texts, books, and code. qwen-14b-chat is an AI assistant model, meaning it was further trained using alignment techniques like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to make it better at open-ended dialogue and task-completion. Similar to models like chatglm3-6b and chatglm2-6b, qwen-14b-chat can engage in natural conversations, answer questions, and help with a variety of tasks. The qwen-14b base model was trained on over 3 trillion tokens of multilingual data, giving it broad knowledge and capabilities. The qwen-14b-chat model builds on this to become a versatile AI assistant, able to chat, create content, extract information, summarize, translate, code, solve math problems, and more. It can also use tools, act as an agent, and even function as a code interpreter. Model Inputs and Outputs Inputs Prompt**: The text prompt to be completed by the model. This should be formatted in the "chatml" format used by the Qwen models, which includes special tokens like ` and ` to delineate different conversational turns. Top P**: The top-p sampling parameter, which controls the amount of diversity in the generated text. Max Tokens**: The maximum number of new tokens to generate. Temperature**: The temperature parameter, which controls the randomness of the generated text. Outputs The model outputs a list of strings, where each string represents a continuation of the input prompt. The output is generated in a streaming fashion, so the full response can be observed incrementally. Capabilities qwen-14b-chat can engage in open-ended dialogue, answer questions, and assist with a variety of tasks like content creation, information extraction, summarization, translation, coding, and math problem solving. It also has the ability to use external tools, act as an agent, and function as a code interpreter. In benchmarks, qwen-14b-chat has demonstrated strong performance on tasks like MMLU, C-Eval, GSM8K, HumanEval, and long-context understanding, often outperforming other large language models of comparable size. It has also shown impressive capabilities when it comes to tool usage and code generation. What Can I Use It For? qwen-14b-chat is a versatile AI assistant that can be used for a wide range of applications. Some potential use cases include: AI-powered chatbots and virtual assistants**: Use qwen-14b-chat to build conversational AI agents that can engage in natural dialogue, answer questions, and assist with tasks. Content creation**: Leverage qwen-14b-chat to generate articles, stories, scripts, and other types of written content. Language understanding and translation**: Utilize qwen-14b-chat's multilingual capabilities for tasks like text classification, sentiment analysis, and language translation. Code generation and programming assistance**: Integrate qwen-14b-chat into your development workflow to generate code, explain programming concepts, and debug issues. Research and education**: Use qwen-14b-chat as a tool for exploring language models, testing new AI techniques, and educating students about large language models. Things to Try Some interesting things to try with qwen-14b-chat include: Exploring the model's ability to follow different system prompts**: qwen-14b-chat has been trained on a diverse set of system prompts, allowing it to roleplay, change its language style, and adjust its behavior to suit different tasks. Integrating the model with external tools and APIs**: Take advantage of qwen-14b-chat's strong tool usage capabilities by connecting it to various APIs and services through the ReAct prompting approach. Pushing the model's limits on long-context understanding**: The techniques used to extend the context length of qwen-14b-chat, such as NTK-aware interpolation and LogN attention scaling, make it well-suited for tasks that require processing long passages of text. Experimenting with the quantized versions of the model**: The Int4 and Int8 quantized models of qwen-14b-chat offer improved inference speed and reduced memory usage, while maintaining near-lossless performance.

Updated Invalid Date

Text-to-Text

openchat_3.5-awq

nateraw

openchat_3.5-awq is an innovative open-source language model developed by Replicate's nateraw. It is part of the OpenChat library, which includes a series of high-performing models fine-tuned using a strategy called C-RLFT (Contextual Reinforcement Learning from Feedback). This approach allows the models to learn from mixed-quality data without explicit preference labels, delivering exceptional performance on par with ChatGPT despite being a relatively compact 7B model. The OpenChat models outperform other open-source alternatives like OpenHermes 2.5, OpenOrca Mistral, and Zephyr-β on various benchmarks, including reasoning, coding, and mathematical tasks. The latest version, openchat_3.5-0106, even surpasses the capabilities of ChatGPT (March) and Grok-1 on several key metrics. Model Inputs and Outputs Inputs prompt**: The input text prompt for the model to generate a response. max_new_tokens**: The maximum number of tokens the model should generate as output. temperature**: The value used to modulate the next token probabilities. top_p**: A probability threshold for generating the output. If = top_p (nucleus filtering). top_k**: The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering). prompt_template**: The template used to format the prompt. The input prompt is inserted into the template using the {prompt} placeholder. presence_penalty**: The penalty applied to tokens based on their presence in the generated text. frequency_penalty**: The penalty applied to tokens based on their frequency in the generated text. Outputs The model generates a sequence of tokens as output, which can be concatenated to form the model's response. Capabilities openchat_3.5-awq demonstrates strong performance in a variety of tasks, including: Reasoning and Coding**: The model outperforms ChatGPT (March) and other open-source alternatives on coding and reasoning benchmarks like HumanEval, BBH MC, and AGIEval. Mathematical Reasoning**: The model achieves state-of-the-art results on mathematical reasoning tasks like GSM8K, showcasing its ability to tackle complex numerical problems. General Language Understanding**: The model performs well on MMLU, a broad benchmark for general language understanding, indicating its versatility in handling diverse language tasks. What Can I Use It For? The openchat_3.5-awq model can be leveraged for a wide range of applications, such as: Conversational AI**: The model can be deployed as a conversational agent, engaging users in natural language interactions and providing helpful responses. Content Generation**: The model can be used to generate high-quality text, such as articles, stories, or creative writing, by fine-tuning on specific domains or datasets. Task-oriented Dialogue**: The model can be fine-tuned for task-oriented dialogues, such as customer service, technical support, or virtual assistance. Code Generation**: The model's strong performance on coding tasks makes it a valuable tool for automating code generation, programming assistance, or code synthesis. Things to Try Here are some ideas for what you can try with openchat_3.5-awq: Explore the model's capabilities**: Test the model on a variety of tasks, such as open-ended conversations, coding challenges, or mathematical problems, to understand its strengths and limitations. Fine-tune the model**: Leverage the model's strong foundation by fine-tuning it on your specific dataset or domain to create a customized language model for your applications. Combine with other technologies**: Integrate the model with other AI or automation tools, such as voice interfaces or robotic systems, to create more comprehensive and intelligent solutions. Contribute to the open-source ecosystem**: As an open-source model, you can explore ways to improve or extend the OpenChat library, such as by contributing to the codebase, providing feedback, or collaborating on research and development.

Updated Invalid Date

Text-to-Text

Model overview

Model inputs and outputs

Inputs

Outputs

Capabilities

What can I use it for?

Things to try

Related Models

chatglm2-6b

qwen-14b-chat

openchat_3.5-awq

meta-llama-3-70b-instruct