Almanach

Models by this creator

↗️

camembert-base

almanach

Total Score

54

CamemBERT is a state-of-the-art language model for French based on the RoBERTa model. It is available in 6 different versions with varying numbers of parameters, amounts of pretraining data, and pretraining data source domains. The camembert-base model has 110M parameters and was trained on 138GB of text from the OSCAR dataset. Model inputs and outputs Inputs French text to be processed Outputs Contextualized token-level representations Predictions for masked tokens in the input text Next sentence prediction scores Capabilities CamemBERT can be used for a variety of French NLP tasks, such as text classification, named entity recognition, question answering, and text generation. For example, the model can accurately predict missing words in a French sentence, as shown by the example of filling in the mask token [MASK] in the sentence "Le camembert est un fromage de [MASK]!". The top predicted completions are "chèvre", "brebis", and "montagne", which are all plausible types of cheese. What can I use it for? CamemBERT can be fine-tuned on various French language datasets to create powerful task-specific models. For instance, the camembert-ner model, fine-tuned on the wikiner-fr named entity recognition dataset, achieves state-of-the-art performance on this task. This could be useful for applications like information extraction from French text. Additionally, the sentence-camembert-large model provides high-quality sentence embeddings for French, enabling semantic search and text similarity tasks. Things to try Beyond the standard text classification and generation tasks, one interesting application of CamemBERT could be to generate French text conditioned on a given prompt. The model's strong language understanding capabilities, combined with its ability to generate coherent text, could lead to novel creative applications in areas like automated content generation or language learning tools.

Read more

Updated 5/28/2024