MiniCPM-Embedding

Maintainer: openbmb

Total Score

207

Last updated 10/4/2024

🖼️

PropertyValue
Run this modelRun on HuggingFace
API specView on HuggingFace
Github linkNo Github link provided
Paper linkNo paper link provided

Create account to get full access

or

If you already have an account, we'll log you in

Model overview

MiniCPM-Embedding is a bilingual and cross-lingual text embedding model developed by ModelBest Inc. and THUNLP. It is trained based on MiniCPM-2B-sft-bf16 and incorporates bidirectional attention and Weighted Mean Pooling. The model was trained on approximately 6 million examples, including open-source, synthetic, and proprietary data, to achieve exceptional Chinese and English retrieval capabilities as well as outstanding cross-lingual retrieval between the two languages.

Model inputs and outputs

Inputs

  • Instruction: {{ instruction }} Query: {{ query }} - MiniCPM-Embedding supports query-side instructions in this format.
  • Query: {{ query }} - MiniCPM-Embedding also works in instruction-free mode.

Outputs

  • The model generates text outputs in response to the provided input queries.

Capabilities

MiniCPM-Embedding features exceptional capabilities in Chinese and English text retrieval, as well as outstanding cross-lingual retrieval between the two languages. This makes it a powerful tool for tasks that require understanding and retrieving information across multiple languages.

What can I use it for?

With its strong bilingual and cross-lingual text embedding abilities, MiniCPM-Embedding can be useful for a variety of applications, such as:

  • Cross-lingual information retrieval
  • Multilingual question answering
  • Bilingual document classification and clustering
  • Multilingual text summarization

Things to try

Explore the other models in the RAG toolkit series, such as MiniCPM-Reranker and MiniCPM3-RAG-LoRA, to see how they can be used in conjunction with MiniCPM-Embedding for more advanced retrieval and ranking tasks.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

👀

MiniCPM-2B-sft-fp32

openbmb

Total Score

296

MiniCPM-2B-sft-fp32 is an end-size large language model (LLM) developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. It is built upon the MiniCPM architecture and has achieved impressive performance, outperforming larger models such as Llama2-13B, MPT-30B, and Falcon-40B on various benchmarks, especially in Chinese, mathematics, and coding tasks. The model has also been fine-tuned using both SFT (Supervised Fine-Tuning) and DPO (Decoding-Guided Prompt Optimization) techniques, further enhancing its capabilities. Model inputs and outputs Inputs Natural language text**: The model can accept natural language input for text generation tasks. Outputs Natural language text**: The model generates coherent and contextually relevant text outputs. Capabilities MiniCPM-2B-sft-fp32 has demonstrated strong performance across a variety of tasks, including language understanding, generation, and reasoning. After SFT, the model has very close performance to the larger Mistral-7B on open-sourced general benchmarks, with better abilities in Chinese, mathematics, and coding. Additionally, the model has been further improved through DPO, outperforming larger models such as Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, and Zephyr-7B-alpha on the MTBench benchmark. What can I use it for? MiniCPM-2B-sft-fp32 can be used for a wide range of natural language processing tasks, such as text generation, language understanding, and even coding and mathematics-related tasks. The model's compact size and high efficiency make it a suitable choice for deployment on mobile devices and resource-constrained environments. Potential use cases include chatbots, virtual assistants, content generation, and task-oriented language models. Things to try One interesting aspect of MiniCPM-2B-sft-fp32 is its ability to perform well on Chinese, mathematics, and coding tasks. Developers could explore using the model for applications that require these specialized capabilities, such as AI-powered programming assistants or language models tailored for scientific and technical domains. Additionally, the model's efficient design and the availability of quantized versions, such as MiniCPM-2B-SFT/DPO-Int4, could be investigated for deployment on low-power devices or in edge computing scenarios.

Read more

Updated Invalid Date

🔮

MiniCPM-2B-dpo-bf16

openbmb

Total Score

43

MiniCPM-2B-dpo-bf16 is an end-size large language model (LLM) developed by ModelBest Inc. and TsinghuaNLP. It has only 2.4 billion parameters, excluding embeddings, making it an efficient model for deployment. Compared to larger models like Mistral-7B, Llama2-13B, MPT-30B, and Falcon-40B, MiniCPM-2B-dpo-bf16 achieves very close performance on open-sourced general benchmarks, with better abilities in Chinese, mathematics, and coding. After further development through discriminative pre-training (DPO), it outperforms larger models like Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, and Zephyr-7B-alpha on the MTBench benchmark. Model inputs and outputs Inputs Text**: MiniCPM-2B-dpo-bf16 can accept text input for various natural language processing tasks. Images**: The model can also process visual inputs, including images of any aspect ratio up to 1.8 million pixels. Outputs Text**: The model can generate human-like text responses based on the provided input. Multi-modal**: In addition to text, MiniCPM-2B-dpo-bf16 can also produce multimodal outputs, such as image captions, scene descriptions, and visual question answering. Capabilities MiniCPM-2B-dpo-bf16 exhibits strong performance on a range of tasks, including open-domain question answering, textual entailment, sentiment analysis, and language generation. The model can also handle more specialized tasks like mathematical reasoning and coding problems. What can I use it for? MiniCPM-2B-dpo-bf16 can be used for a variety of applications, such as chatbots, virtual assistants, and content generation. Its multimodal capabilities make it suitable for tasks involving image-text interactions, like image captioning, visual question answering, and scene understanding. Things to try One interesting aspect of MiniCPM-2B-dpo-bf16 is its ability to be deployed and run on smartphones, with a relatively high streaming output speed compared to human speech. This makes it a promising candidate for mobile applications that require real-time language understanding and generation. Additionally, the model's efficient training process, which can be conducted on a single 1080/2080 GPU for parameter-efficient fine-tuning or a 3090/4090 GPU for full parameter fine-tuning, makes it an attractive option for companies and researchers with limited computational resources.

Read more

Updated Invalid Date

📶

MiniCPM3-4B

openbmb

Total Score

350

MiniCPM3-4B is the 3rd generation of the MiniCPM series of AI models developed by OpenBMB. Compared to earlier versions, MiniCPM3-4B has a more powerful and versatile skill set, supporting features like function calls and a code interpreter. The model's overall performance surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable to many recent 7B~9B models. MiniCPM3-4B has a 32k context window and is equipped with LLMxMapReduce, allowing it to handle infinite context theoretically without requiring huge amounts of memory. This makes it a more capable and flexible language model than its predecessors. Model inputs and outputs Inputs Text prompts**: MiniCPM3-4B can accept text prompts as input to generate responses. Chat messages**: The model supports a chat format with user messages and previous responses. Outputs Text generations**: The primary output of MiniCPM3-4B is generated text, which can range from short responses to longer, multi-sentence outputs. Code output**: In addition to text, the model can also execute and return the results of code snippets included in the input. Capabilities MiniCPM3-4B has a diverse set of capabilities that make it a versatile language model. It can engage in open-ended conversations, answer questions, summarize information, and even complete coding tasks. The model's strong performance on benchmarks suggests it is comparable to many larger language models in terms of general language understanding and generation. What can I use it for? With its broad capabilities, MiniCPM3-4B can be used for a variety of applications, including: Chatbots and virtual assistants**: The model's conversational abilities make it a good fit for building chatbots and virtual assistants that can engage in natural language interactions. Content generation**: MiniCPM3-4B can be used to generate text for things like articles, stories, or creative writing. Task automation**: The model's ability to understand and execute code snippets allows it to be used for automating various programming and coding tasks. Question-answering**: The model can be used to build systems that can answer questions on a wide range of topics. Things to try One interesting thing to explore with MiniCPM3-4B is its ability to handle infinite context through the use of LLMxMapReduce. This could allow you to build applications that can maintain a coherent state and memory over long conversations or tasks. Additionally, the model's code interpretation capabilities open up the possibility of building systems that can understand and respond to code-related prompts, making it a useful tool for developers and programmers.

Read more

Updated Invalid Date

🌀

MiniCPM-2B-sft-bf16

openbmb

Total Score

113

MiniCPM-2B-sft-bf16 is a large language model developed by OpenBMB and TsinghuaNLP, with only 2.4 billion parameters excluding embeddings. It is an "end-side" LLM, meaning it is designed for efficient deployment even on resource-constrained devices like smartphones. Compared to larger models like Mistral-7B, Llama2-13B, MPT-30B, and Falcon-40B, MiniCPM-2B-sft-bf16 achieves very close performance on open-source benchmarks, with better abilities in Chinese, mathematics, and coding after supervised fine-tuning (SFT). Its overall performance exceeds that of these larger models. After further training using data-prompted optimization (DPO), the MiniCPM-2B model outperforms even larger models like Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, and Zephyr-7B-alpha on the MTBench evaluation. The MiniCPM-V variant, based on the MiniCPM-2B architecture, achieves the best overall performance among multimodal models of a similar scale, surpassing existing large multimodal models like Phi-2 and even matching the performance of the 9.6B Qwen-VL-Chat model on some tasks. Model inputs and outputs Inputs Text input for language understanding and generation tasks Outputs Generated text based on the input Multimodal outputs (e.g. image captions, VQA) for the MiniCPM-V variant Capabilities MiniCPM-2B-sft-bf16 demonstrates strong performance across a variety of benchmarks, including open-domain language understanding, mathematics, coding, and Chinese language tasks. The MiniCPM-V variant extends these capabilities to multimodal tasks like image captioning and visual question answering. One key advantage of the MiniCPM models is their efficient deployment. They can be run on devices as small as smartphones, with the MiniCPM-V being the first multimodal model that can be deployed on mobile phones. The models also have a low cost of development, requiring only a single 1080/2080 GPU for parameter-efficient fine-tuning and a 3090/4090 GPU for full parameter fine-tuning. What can I use it for? The MiniCPM models are well-suited for a variety of natural language processing and multimodal applications, such as: General language understanding and generation Domain-specific applications (e.g. legal, medical, mathematical) Multimodal tasks like image captioning and visual question answering Conversational AI and virtual assistants Mobile and edge computing applications Thanks to their efficient design and deployment, the MiniCPM models can be particularly useful in resource-constrained environments or for applications that require low latency, such as on-device inference. Things to try One interesting aspect of the MiniCPM models is their ability to perform well on Chinese language tasks, in addition to their strengths in English. This makes them a compelling choice for multilingual applications or for users who require Chinese language capabilities. Additionally, the MiniCPM-V variant's strong multimodal performance, combined with its efficient deployment, opens up opportunities for novel applications that integrate vision and language, such as mobile-based visual question answering or image-guided dialogue systems. Researchers and developers may also be interested in exploring the technical details of the MiniCPM models, such as the use of supervised fine-tuning and data-prompted optimization, to better understand how to build performant and efficient large language models.

Read more

Updated Invalid Date