camembert-ner

Last updated 5/27/2024

🔍

Property	Value
Run this model	Run on HuggingFace
API spec	View on HuggingFace
Github link	No Github link provided
Paper link	No paper link provided

Create account to get full access

Model overview

The camembert-ner model is a French Named Entity Recognition (NER) model fine-tuned from the camemBERT model. It was trained on the wikiner-fr dataset, which contains around 170,634 sentences. Compared to other models, the camembert-ner model performs particularly well on entities that do not start with an uppercase letter, such as in email or chat data. This model was created by Jean-Baptiste, whose profile can be found at https://aimodels.fyi/creators/huggingFace/Jean-Baptiste.

Similar models include the roberta-large-ner-english model, which is a fine-tuned RoBERTa-large model for English NER, and the bert-base-NER and bert-large-NER models, which are fine-tuned BERT models for English NER.

Model inputs and outputs

Inputs

Text: The camembert-ner model takes in French text as input and predicts named entities within that text.

Outputs

Named entities: The model outputs a list of named entities found in the input text, along with their start and end positions, entity types (e.g. Person, Organization, Location), and confidence scores.

Capabilities

The camembert-ner model is capable of accurately detecting a variety of named entities in French text, including person names, organizations, locations, and more. It performs particularly well on entities that do not start with an uppercase letter, making it a valuable tool for processing informal text such as emails or chat messages.

What can I use it for?

The camembert-ner model could be useful for a variety of French NLP applications, such as:

Extracting named entities from text for search, recommendation, or knowledge base construction
Anonymizing sensitive information in documents by detecting and removing personal names, organizations, etc.
Enriching existing French language datasets with named entity annotations
Developing chatbots or virtual assistants that can understand and respond to French conversations

Things to try

One interesting thing to try with the camembert-ner model is to compare its performance on formal and informal French text. The model's strength in handling lowercase entities could make it particularly useful for processing real-world conversational data, such as customer support logs or social media posts. Researchers and developers could experiment with the model on a variety of French language tasks and datasets to further explore its capabilities and potential use cases.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

🎯

camembert-ner-with-dates

Jean-Baptiste

CamemBERT-NER-with-dates is an extension of the French camembert-ner model, adding an additional date tag to the named entity recognition capabilities. The model was fine-tuned from the camemBERT language model and trained on an enriched version of the French WikiNER dataset, containing around 170,634 sentences. Compared to the dateparser library, this model achieved an F1 score of approximately 83% on a test set of chat and email data. Model inputs and outputs Inputs Text**: The model takes in French language text as input, such as sentences or paragraphs. Outputs Named entities**: The model outputs a list of recognized named entities, including organization, person, location, and date. For each entity, the output includes the entity type, the score (confidence), the text of the entity, and the start/end character positions. Capabilities [CamemBERT-NER-with-dates] is capable of accurately identifying a variety of named entities in French text, including dates. Compared to the base camembert-ner model, this model performs better on chat and email data, likely due to the additional date entity tag it was trained on. What can I use it for? This model could be useful for a variety of French language processing tasks, such as information extraction, content analysis, and data structuring. For example, you could use it to automatically extract key entities (people, organizations, locations, dates) from customer support conversations, news articles, or social media posts. The ability to recognize dates could be particularly valuable for applications like schedule management or event tracking. Things to try One interesting aspect of this model is its strong performance on informal text like chat and email data, compared to more formal text. This suggests it may be useful for processing user-generated content in French, where entities are not always capitalized or formatted consistently. You could experiment with using this model to extract structured data from conversational interfaces, social media, or other consumer-facing applications.

Updated Invalid Date

Text-to-Text

📊

roberta-large-ner-english

Jean-Baptiste

roberta-large-ner-english is an English named entity recognition (NER) model that was fine-tuned from the RoBERTa large model on the CoNLL2003 dataset. The model was developed by Jean-Baptiste and is capable of identifying entities such as persons, organizations, locations, and miscellaneous. It was validated on emails and chat data, and outperforms other models on this type of data, particularly for entities that do not start with an uppercase letter. Model inputs and outputs Inputs Raw text to be processed for named entity recognition Outputs A list of identified entities, with the entity type (PER, ORG, LOC, MISC), the start and end positions in the input text, the text of the entity, and the confidence score. Capabilities The roberta-large-ner-english model can accurately identify a variety of named entities in English text, including people, organizations, locations, and miscellaneous entities. It has been shown to perform particularly well on informal text like emails and chat messages, where entities may not always start with an uppercase letter. What can I use it for? You can use the roberta-large-ner-english model for a variety of natural language processing tasks that require named entity recognition, such as information extraction, question answering, and content analysis. For example, you could use it to automatically extract the key people, organizations, and locations mentioned in a set of business documents or news articles. Things to try One interesting thing to try with the roberta-large-ner-english model is to see how it performs on your own custom text data, especially if it is in a more informal or conversational style. You could also experiment with combining the model's output with other natural language processing techniques, such as relation extraction or sentiment analysis, to gain deeper insights from your text data.

Updated Invalid Date

Text-to-Text

↗️

camembert-base

almanach

CamemBERT is a state-of-the-art language model for French based on the RoBERTa model. It is available in 6 different versions with varying numbers of parameters, amounts of pretraining data, and pretraining data source domains. The camembert-base model has 110M parameters and was trained on 138GB of text from the OSCAR dataset. Model inputs and outputs Inputs French text to be processed Outputs Contextualized token-level representations Predictions for masked tokens in the input text Next sentence prediction scores Capabilities CamemBERT can be used for a variety of French NLP tasks, such as text classification, named entity recognition, question answering, and text generation. For example, the model can accurately predict missing words in a French sentence, as shown by the example of filling in the mask token [MASK] in the sentence "Le camembert est un fromage de [MASK]!". The top predicted completions are "chèvre", "brebis", and "montagne", which are all plausible types of cheese. What can I use it for? CamemBERT can be fine-tuned on various French language datasets to create powerful task-specific models. For instance, the camembert-ner model, fine-tuned on the wikiner-fr named entity recognition dataset, achieves state-of-the-art performance on this task. This could be useful for applications like information extraction from French text. Additionally, the sentence-camembert-large model provides high-quality sentence embeddings for French, enabling semantic search and text similarity tasks. Things to try Beyond the standard text classification and generation tasks, one interesting application of CamemBERT could be to generate French text conditioned on a given prompt. The model's strong language understanding capabilities, combined with its ability to generate coherent text, could lead to novel creative applications in areas like automated content generation or language learning tools.

Updated Invalid Date

Text-to-Text

🎯

bert-base-NER

dslim

415

The bert-base-NER model is a fine-tuned BERT model that is ready to use for Named Entity Recognition (NER) and achieves state-of-the-art performance for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). Specifically, this model is a bert-base-cased model that was fine-tuned on the English version of the standard CoNLL-2003 Named Entity Recognition dataset. If you'd like to use a larger BERT-large model fine-tuned on the same dataset, a bert-large-NER version is also available. The maintainer, dslim, has also provided several other NER models including distilbert-NER, bert-large-NER, and both cased and uncased versions of bert-base-NER. Model inputs and outputs Inputs Text**: The model takes a text sequence as input and predicts the named entities within that text. Outputs Named entities**: The model outputs the recognized named entities, along with their type (LOC, ORG, PER, MISC) and the start/end position within the input text. Capabilities The bert-base-NER model is capable of accurately identifying a variety of named entities within text, including locations, organizations, persons, and miscellaneous entities. This can be useful for applications such as information extraction, content analysis, and knowledge graph construction. What can I use it for? The bert-base-NER model can be used for a variety of text processing tasks that involve identifying and extracting named entities. For example, you could use it to build a search engine that allows users to find information about specific people, organizations, or locations mentioned in a large corpus of text. You could also use it to automatically extract key entities from customer service logs or social media posts, which could be valuable for market research or customer sentiment analysis. Things to try One interesting thing to try with the bert-base-NER model is to experiment with incorporating it into a larger natural language processing pipeline. For example, you could use it to first identify the named entities in a piece of text, and then use a different model to classify the sentiment or topic of the text, focusing on the identified entities. This could lead to more accurate and nuanced text analysis. Another idea is to fine-tune the model further on a domain-specific dataset, which could help it perform better on specialized text. For instance, if you're working with legal documents, you could fine-tune the model on a corpus of legal text to improve its ability to recognize legal entities and terminology.

Updated Invalid Date

Text-to-Text