bge-en-icl

Maintainer: BAAI

Last updated 9/19/2024

🤔

Property	Value
Run this model	Run on HuggingFace
API spec	View on HuggingFace
Github link	No Github link provided
Paper link	No paper link provided

Create account to get full access

Model overview

The bge-en-icl model, developed by BAAI, demonstrates impressive in-context learning abilities. It can significantly enhance its performance on new tasks by incorporating few-shot examples provided in the query. The model has also achieved state-of-the-art results on both the BEIR and AIR-Bench benchmarks.

This model is part of the BAAI General Embedding (BGE) family, which includes a range of embedding models for both English and Chinese. The BAAI/bge-small-en and BAAI/bge-base-en models provide competitive performance, while the BAAI/bge-large-en model ranks 1st on the MTEB leaderboard. The Chinese counterparts, such as BAAI/bge-large-zh, also perform exceptionally well on the C-MTEB benchmark.

Model inputs and outputs

Inputs

Text: The model accepts text as input, which can be a query, a passage, or a pair of query and passage.

Outputs

Embeddings: The model produces dense vector representations (embeddings) of the input text, which can be used for tasks like retrieval, classification, and semantic search.
Similarity scores: When provided with a query and a passage, the model can output a relevance score indicating how well the passage matches the query.

Capabilities

The bge-en-icl model demonstrates impressive in-context learning abilities. By incorporating few-shot examples in the query, the model can adapt to new tasks with significantly improved performance. This makes it a versatile tool for a wide range of natural language processing applications where the task or domain may change dynamically.

What can I use it for?

The bge-en-icl model can be utilized in various applications that require text understanding and retrieval. Some examples include:

Retrieval-based Question Answering: Use the model to retrieve relevant passages that can answer a given query, and then leverage the in-context learning capability to refine the results based on provided examples.
Semantic Search: Leverage the model's ability to generate high-quality text embeddings to build semantic search engines that can find relevant content based on the meaning of the query, rather than just the keywords.
Personalized Recommendation Systems: Fine-tune the model on user preferences and behavior to create personalized recommendations for products, content, or services.

Things to try

One interesting aspect of the bge-en-icl model is its ability to adapt to new tasks through few-shot examples. You can experiment with providing different types of examples in the query and observe how the model's performance changes on your specific application. Additionally, you can explore fine-tuning the model on your own data to further improve its capabilities for your use case.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🌀

bge-base-zh-v1.5

BAAI

The bge-base-zh-v1.5 model is a text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence). It is part of the BAAI General Embedding (BGE) family of models, which can map any text to a low-dimensional dense vector. This can be used for tasks like retrieval, classification, clustering, or semantic search. The bge-base-zh-v1.5 model is the Chinese version of the base-scale BGE model, updated to version 1.5 to have a more reasonable similarity distribution compared to previous versions. The bge-base-zh-v1.5 model is similar in capability to the BAAI/bge-large-zh-v1.5 model, which is the large-scale Chinese BGE model, but the base-scale model has a smaller embedding size. The BAAI/bge-small-zh-v1.5 model is an even smaller-scale Chinese BGE model, with further reduced embedding size but still competitive performance. Model inputs and outputs Inputs Text**: The model can take any text as input, such as short queries or long passages. Outputs Embeddings**: The model outputs a low-dimensional dense vector representation (embedding) of the input text. Capabilities The bge-base-zh-v1.5 model can effectively map Chinese text to a semantic embedding space. It achieves state-of-the-art performance on the Chinese Massive Text Embedding Benchmark (C-MTEB), ranking 1st in multiple evaluation tasks. What can I use it for? The bge-base-zh-v1.5 embedding model can be used in a variety of natural language processing applications that require semantic understanding of text, such as: Retrieval**: Use the embeddings to find the most relevant passages or documents for a given query. Classification**: Train a classifier on top of the embeddings to categorize text into different classes. Clustering**: Group similar text together based on the proximity of their embeddings. Semantic search**: Find documents or passages that are semantically similar to a given query. The model can also be integrated into vector databases to support retrieval-augmented large language models (LLMs). Things to try One interesting aspect of the bge-base-zh-v1.5 model is that it has improved retrieval performance without using any instruction in the query, compared to previous versions that required an instruction. This makes it more convenient to use in many applications. You can experiment with using the model with and without instructions to see which setting works best for your specific task. Additionally, you can try fine-tuning the bge-base-zh-v1.5 model on your own data using the provided examples. This can help improve the model's performance on your domain-specific tasks.

Updated Invalid Date

Text-to-Text

🌀

bge-small-zh-v1.5

BAAI

The bge-small-zh-v1.5 model from BAAI is a small-scale version of the BAAI General Embedding (BGE) model, which can map any text to a low-dimensional dense vector. Unlike previous BGE models, version 1.5 has a more reasonable similarity distribution, enhancing its retrieval ability without the need for instruction. The bge-small-zh-v1.5 model is competitive in performance compared to larger models, making it a good option for projects with computational constraints. Model inputs and outputs The bge-small-zh-v1.5 model takes in text as input and outputs a fixed-size embedding vector. This embedding can then be used for tasks like retrieval, classification, clustering, or semantic search. The model supports both Chinese and English text. Inputs Text**: The model can accept any Chinese or English text as input. Outputs Embedding vector**: The model outputs a fixed-size vector representation of the input text, which can be used for downstream tasks. Capabilities The bge-small-zh-v1.5 model is capable of generating high-quality text embeddings that can be used for a variety of natural language processing tasks. Its performance is competitive with larger BGE models, making it a good choice for projects with limited computational resources. The model's improved similarity distribution helps to better differentiate between similar and dissimilar text. What can I use it for? The bge-small-zh-v1.5 embedding can be used in a wide range of applications, such as: Semantic search**: Use the embeddings to find relevant passages or documents for a given query. Text classification**: Train a classifier on top of the embeddings to categorize text into different classes. Clustering**: Group similar text together based on the embeddings. Recommendation systems**: Use the embeddings to find similar items or content for recommendation. Things to try One interesting thing to try with the bge-small-zh-v1.5 model is to fine-tune it on your specific data and task. The examples provided by the maintainers show how to prepare data and fine-tune the model to improve performance on your use case. Additionally, you can experiment with using the model in conjunction with the provided reranker models to further enhance retrieval performance.

Updated Invalid Date

Text-to-Text

📈

bge-small-en

BAAI

The bge-small-en model is a small-scale English text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) as part of their FlagEmbedding project. It is one of several bge (BAAI General Embedding) models that achieve state-of-the-art performance on text embedding benchmarks like MTEB and C-MTEB. The bge-small-en model is a smaller version of the BAAI/bge-large-en-v1.5 and BAAI/bge-base-en-v1.5 models, with 384 embedding dimensions compared to 1024 and 768 respectively. Despite its smaller size, the bge-small-en model still provides competitive performance, making it a good choice when computation resources are limited. Model inputs and outputs Inputs Text sentences**: The model can take a list of text sentences as input. Outputs Sentence embeddings**: The model outputs a numpy array of sentence embeddings, where each row corresponds to the embedding of the corresponding input sentence. Capabilities The bge-small-en model can be used for a variety of natural language processing tasks that benefit from semantic text representations, such as: Information retrieval**: The embeddings can be used to find relevant passages or documents for a given query, by computing similarity scores between the query and the passages/documents. Text classification**: The embeddings can be used as features for training classification models on text data. Clustering**: The embeddings can be used to group similar text documents into clusters. Semantic search**: The embeddings can be used to find semantically similar text based on their meaning, rather than just lexical matching. What can I use it for? The bge-small-en model can be a useful tool for a variety of applications that involve working with English text data. For example, you could use it to build a semantic search engine for your company's knowledge base, or to improve the text classification capabilities of your customer support chatbot. Since the model is smaller and more efficient than the larger bge models, it may be particularly well-suited for deployment on edge devices or in resource-constrained environments. You could also fine-tune the model on your specific text data to further improve its performance for your use case. Things to try One interesting thing to try with the bge-small-en model is to compare its performance to the larger bge models, such as BAAI/bge-large-en-v1.5 and BAAI/bge-base-en-v1.5, on your specific tasks. You may find that the smaller model provides nearly the same performance as the larger models, while being more efficient and easier to deploy. Another thing to try is to fine-tune the bge-small-en model on your own text data, using the techniques described in the FlagEmbedding documentation. This can help the model better capture the semantics of your domain-specific text, potentially leading to improved performance on your tasks.

Updated Invalid Date

Image-to-Text

📈

bge-base-en

BAAI

The bge-base-en is a text embedding model developed by the BAAI (Beijing Academy of Artificial Intelligence) that can map any text to a low-dimensional dense vector. It is part of the BAAI/bge-base-en model series, which also includes larger and smaller scale versions such as BAAI/bge-large-en and BAAI/bge-small-en. These models were trained using contrastive learning on a massive text corpus and demonstrate state-of-the-art performance on text embedding benchmarks like MTEB and C-MTEB. The bge-base-en model is a base-scale version that achieves similar performance to the larger bge-large-en model, making it a good option for applications with limited compute resources. All models in the BAAI/bge series have been recommended to use the newest v1.5 versions, which have an improved similarity distribution compared to earlier versions. Model inputs and outputs Inputs Text**: The bge-base-en model can take any text as input, such as a sentence, paragraph, or document. Instruction (optional)**: For text retrieval tasks, the input text can optionally be prefixed with an instruction to improve performance, such as "Represent this sentence for searching relevant passages:". Outputs Embedding vector**: The model outputs a fixed-size dense vector representation of the input text, which can be used for downstream tasks like retrieval, classification, clustering, or semantic search. Capabilities The bge-base-en model is a powerful text embedding model that can capture the semantic meaning of input text in a compact vector representation. It has been shown to excel at a variety of NLP tasks, achieving top performance on the MTEB and C-MTEB benchmarks. Some key capabilities of the model include: Retrieval**: The embedding vectors can be used to efficiently search large text corpora to find relevant documents or passages for a given query. Classification**: The embeddings can be leveraged as features for training classifiers on text data. Clustering**: The vector representations allow for effective grouping of similar text items. Semantic search**: The model can identify semantically related texts based on the proximity of their embedding vectors. What can I use it for? The bge-base-en model is a highly versatile tool that can be applied to a wide range of NLP applications. Some potential use cases include: Intelligent search**: Integrating the model into search engines or knowledge bases to enable more accurate and semantically-aware retrieval of information. Recommender systems**: Using the text embeddings to identify related content or products for recommendation. Content analysis**: Leveraging the model's ability to capture semantic meaning for tasks like topic modeling, sentiment analysis, or text summarization. Multimodal applications**: Combining the text embeddings with visual or audio representations for applications like image/video captioning or multimedia search. Things to try One interesting aspect of the bge-base-en model is its ability to generate high-quality embeddings without requiring an instruction prefix, while still maintaining strong retrieval performance. This makes the model convenient to use in many scenarios where adding an instruction may not be practical. Another thing to explore is fine-tuning the model on your own data using the provided examples. By incorporating domain-specific knowledge, you can further improve the model's performance on tasks relevant to your application. The FlagEmbedding library provides guidance on how to effectively fine-tune the bge models. Finally, you can experiment with using the bge-base-en model in combination with the larger bge-large-en model or the bge-reranker models to further enhance retrieval performance. The reranker models can be used to re-rank the top results from the embedding model, providing a more accurate relevance score.

Updated Invalid Date

Text-to-Text