piiranha-v1-detect-personal-information

Maintainer: iiiorg

Total Score

97

Last updated 9/18/2024

PropertyValue
Run this modelRun on HuggingFace
API specView on HuggingFace
Github linkNo Github link provided
Paper linkNo paper link provided

Create account to get full access

or

If you already have an account, we'll log you in

Model overview

piiranha-v1-detect-personal-information is a fine-tuned model developed by iiiorg that is trained to detect 17 types of personally identifiable information (PII) across six languages. It achieves an overall classification accuracy of 99.44% and successfully catches 98.27% of PII tokens. The model is especially accurate at detecting passwords, emails (100%), phone numbers, and usernames.

Similar models include StarPII, which is an NER model trained to detect PII in code datasets, and GLiNER PII, a NER model that can recognize various types of PII entities.

Model inputs and outputs

Inputs

  • Text data containing personally identifiable information

Outputs

  • Detected PII entities with their corresponding labels, such as:
    • Account Number
    • Email
    • Phone Number
    • Password
    • Social Security Number

Capabilities

piiranha-v1-detect-personal-information is highly accurate at identifying a wide range of PII entities, including sensitive information like passwords, credit card numbers, and social security numbers. This makes it a valuable tool for privacy protection and data anonymization use cases.

What can I use it for?

The piiranha-v1-detect-personal-information model can be used to automatically detect and redact or remove personally identifiable information from text data, such as customer records, support tickets, or user-generated content. This can help organizations comply with data privacy regulations and protect sensitive user information.

Things to try

You could try using the piiranha-v1-detect-personal-information model to analyze text data from your own organization and identify any PII that may need to be removed or protected. You could also experiment with fine-tuning the model on your own dataset to improve its performance for your specific use case.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Models

👀

starpii

bigcode

Total Score

104

The starpii model is a Named Entity Recognition (NER) model trained to detect Personal Identifiable Information (PII) in code datasets. It was fine-tuned by bigcode on a PII dataset they annotated, which is available with gated access. The model was initially trained on a pseudo-labeled dataset to enhance its performance on rare PII entities like keys. The model fine-tuned on the annotated dataset can detect six target classes: Names, Emails, Keys, Passwords, IP addresses and Usernames. It uses the bigcode-encoder as its base encoder model, which was pre-trained on 88 programming languages from the The Stack dataset. Model inputs and outputs Inputs Raw text containing code snippets or documents Outputs Annotated text with PII entities highlighted and classified into one of the six target classes Capabilities The starpii model demonstrates strong performance in detecting various types of PII entities within code, including rare ones like keys and passwords. This can be useful for privacy-preserving applications that need to automatically identify and redact sensitive information. What can I use it for? The starpii model can be applied to a variety of use cases where identifying PII in code is important, such as: Anonymizing code datasets before sharing or publishing Detecting sensitive information in internal code repositories Regulatory compliance by finding PII in financial or legal documents Things to try One interesting aspect of the starpii model is its use of a pseudo-labeled dataset for initial training. This technique can be helpful for improving model performance on rare entities that are difficult to obtain labeled data for. You could experiment with applying similar approaches to other domain-specific NER tasks.

Read more

Updated Invalid Date

⚙️

gliner_multi_pii-v1

urchade

Total Score

47

The gliner_multi_pii-v1 model is a named entity recognition (NER) model developed by Urchade Zaratiana that can identify a wide range of personally identifiable information (PII) in text. This model is based on the GLiNER architecture, which uses a bidirectional transformer encoder (similar to BERT) to provide a practical alternative to traditional NER models and large language models. Compared to other GLiNER models, the gliner_multi_pii-v1 version has been fine-tuned on the urchade/synthetic-pii-ner-mistral-v1 dataset to specialize in detecting PII entities. This includes common entity types like person, organization, phone number, and email address, as well as more specialized PII like passport numbers, credit card information, social security numbers, and medical data. Model inputs and outputs Inputs Text**: The input to the model is plain text. Outputs Entities**: The model outputs a list of detected entities, with each entity containing the following information: text: The text span of the detected entity. label: The entity type, such as "person", "phone number", "email", etc. Capabilities The gliner_multi_pii-v1 model excels at identifying a wide range of personally identifiable information within text. It can detect common PII like names, contact details, and identification numbers, as well as more specialized PII such as medical conditions, insurance information, and financial data. This capability makes the model useful for a variety of applications that require sensitive data extraction, such as: Compliance and regulatory monitoring Customer onboarding and identity verification Data anonymization and redaction Fraud detection and prevention What can I use it for? The gliner_multi_pii-v1 model is well-suited for any project that involves identifying and extracting personally identifiable information from text. Some potential use cases include: Compliance and Regulatory Monitoring**: Use the model to scan documents and communications for PII that may need to be protected or redacted to meet regulatory requirements. Customer Onboarding and Identity Verification**: Leverage the model to automatically extract relevant PII from customer documents and forms, streamlining the onboarding process. Data Anonymization and Redaction**: Apply the model to identify sensitive information that should be removed or obfuscated before sharing or publishing data. Fraud Detection and Prevention**: Integrate the model into fraud detection systems to identify suspicious patterns or anomalies in PII. Things to try One interesting aspect of the gliner_multi_pii-v1 model is its ability to recognize a diverse range of PII entity types. Instead of being limited to a predefined set of entities, the model can dynamically identify and classify a wide variety of personally identifiable information. To explore this capability, you could try providing the model with text that contains a mix of different PII elements, such as a resume or a customer support ticket. Observe how the model is able to accurately locate and categorize the various PII entities, ranging from names and contact details to more specialized information like medical conditions or financial data. Another interesting experiment would be to compare the performance of the gliner_multi_pii-v1 model to traditional, rule-based PII detection approaches. By testing the model on a diverse set of real-world data, you can assess its robustness and flexibility compared to more rigid, predefined systems.

Read more

Updated Invalid Date

🛠️

Phi-Hermes-1.3B

teknium

Total Score

42

The Phi-Hermes-1.3B model is an AI model created by teknium. It is a fine-tuned version of the Phi-1.5 model that was trained on the OpenHermes Dataset, a collection of over 240,000 synthetic data points primarily generated by GPT-4. The OpenHermes-13B model is a 13B parameter version of the Hermes model that was trained on a similar dataset, including data from sources like the GPTeacher, WizardLM, and Camel-AI datasets. It demonstrates improved performance on a variety of benchmarks compared to the original Hermes model. Model Inputs and Outputs The Phi-Hermes-1.3B model is a text-to-text transformer model that can take in natural language prompts and generate relevant responses. Inputs Natural language prompts or instructions Outputs Generated text responses to the input prompts Capabilities The Phi-Hermes-1.3B model demonstrates strong performance on a variety of natural language tasks, including question answering, reading comprehension, and commonsense reasoning. It is capable of engaging in coherent, multi-turn conversations and can provide detailed, thoughtful responses. What Can I Use It For? The Phi-Hermes-1.3B model could be useful for a wide range of applications, such as: Developing intelligent virtual assistants or chatbots Generating creative or persuasive written content Enhancing language learning and education applications Powering interactive storytelling or worldbuilding experiences The model's strong performance on benchmark tasks and ability to engage in open-ended dialogue make it a versatile tool for building AI-powered applications across many domains. Things to Try One interesting aspect of the Phi-Hermes-1.3B model is its ability to provide structured outputs in JSON format when prompted to do so. This could enable the model to be used as a conversational interface for querying and retrieving data from external APIs or knowledge bases. Researchers and developers could also explore fine-tuning or further training the model on specialized datasets to enhance its capabilities in specific domains or tasks. The model's strong foundation makes it well-suited for continued learning and refinement.

Read more

Updated Invalid Date

🚀

TinyDolphin-2.8-1.1b

cognitivecomputations

Total Score

52

The TinyDolphin-2.8-1.1b is an experimental AI model trained by Kearm on the new Dolphin 2.8 dataset by Eric Hartford. This model is part of the Dolphin series of AI assistants developed by Cognitive Computations. Similar Dolphin models include Dolphin-2.8-Mistral-7b-v02, Dolphin-2.2-Yi-34b, and MegaDolphin-120b. Model inputs and outputs The TinyDolphin-2.8-1.1b model is designed to take text prompts as input and generate text responses. It can handle a wide range of tasks, from creative writing to answering questions. Inputs Text prompts**: The model accepts free-form text prompts provided by the user. Outputs Text responses**: The model generates relevant and coherent text responses based on the input prompts. Capabilities The TinyDolphin-2.8-1.1b model is capable of a variety of tasks, such as generating creative stories, answering questions, and providing instructions. It can engage in open-ended conversations and demonstrate good understanding of context and nuance. What can I use it for? The TinyDolphin-2.8-1.1b model could be used for a range of applications, such as: Creative writing**: Generate unique and imaginative stories, poems, or other creative content. Conversational AI**: Develop chatbots or virtual assistants that can engage in natural language conversations. Question answering**: Create AI-powered question answering systems to help users find information. Task assistance**: Provide step-by-step instructions or guidance for completing various tasks. Things to try One interesting thing to try with the TinyDolphin-2.8-1.1b model is to experiment with different types of prompts and see how it responds. For example, you could try giving it open-ended prompts, such as "Write a story about a talking dolphin," or more specific prompts, like "Explain the process of training dolphins for military purposes." Observe how the model handles these varying types of inputs and the quality of the responses it generates.

Read more

Updated Invalid Date