Model overview

mdeberta-v3-base is a multilingual version of the DeBERTa language model developed by Microsoft. DeBERTa improves upon the BERT and RoBERTa models by using disentangled attention and an enhanced mask decoder, allowing it to outperform RoBERTa on a majority of natural language understanding (NLU) tasks with 80GB of training data.

The multilingual mDeBERTa-v3-base model was trained on the CC100 multilingual dataset and has 12 layers with a hidden size of 768, resulting in 86M backbone parameters and a vocabulary of 250K tokens. Compared to the original DeBERTa model, the V3 version significantly improves performance on downstream tasks by using ELECTRA-style pre-training with gradient-disentangled embedding sharing.

Model inputs and outputs


  • Natural language text in a variety of languages, including over 100 supported by the multilingual model.
  • Input sequences can be up to 8192 tokens long.


  • Contextual token embeddings that can be used for a variety of natural language processing tasks.
  • Zero-shot cross-lingual classification outputs on the XNLI dataset.


mDeBERTa-v3-base excels at multilingual natural language understanding, demonstrating strong zero-shot cross-lingual transfer capabilities on the XNLI dataset. Compared to the XLM-RoBERTa base model, mDeBERTa-v3-base achieves a significantly higher average accuracy of 79.8% across 15 languages, outperforming XLM-RoBERTa by over 3 percentage points.

What can I use it for?

The multilingual capabilities of mDeBERTa-v3-base make it well-suited for a variety of NLP tasks that require understanding text in multiple languages, such as:

  • Zero-shot cross-lingual classification: By leveraging the strong transfer learning performance of mDeBERTa-v3-base, you can build multilingual classification models without needing to annotate data in each target language.

  • Multilingual question answering and information retrieval: The model's ability to encode text in over 100 languages allows it to power cross-lingual search and question answering applications.

  • Multilingual text generation and data augmentation: The broad language coverage of mDeBERTa-v3-base makes it useful for generating synthetic text in multiple languages to augment training data.

Things to try

One interesting aspect of mDeBERTa-v3-base is its ability to process input sequences up to 8192 tokens long. This makes it well-suited for tasks involving long-form text, such as document retrieval and summarization. You could experiment with using the model's multi-granularity capabilities to improve the performance of your long document understanding applications.

Additionally, the model's support for hybrid retrieval techniques, combining both dense and sparse representations, presents opportunities to leverage its strengths in both embedding-based and lexical matching approaches. Exploring ways to effectively combine these complementary retrieval signals could lead to performance gains in your information retrieval workflows.

