Toxic Prompt RoBERTa
Like
A text classification model that can be used as a guardrail to protect against toxic prompts and responses in conversational AI systems.
Features
- AI-Powered
Tags
- safeguarding
- ai-safety
- safety
- safety-management
- ai-guardrails
- huggingface
Toxic Prompt RoBERTa News & Activities
Highlights All activities
Recent activities
- POX added Toxic Prompt RoBERTa as alternative to Llama Guard, WildGuard and ShieldGemma
- POX added Toxic Prompt RoBERTa
Toxic Prompt RoBERTa information
No comments or reviews, maybe you want to be first?
Post comment/reviewWhat is Toxic Prompt RoBERTa?
Toxic Prompt RoBERTa 1.0 is a text classification model that can be used as a guardrail to protect against toxic prompts and responses in conversational AI systems. This model is based on RoBERTa and has been finetuned on ToxicChat and Jigsaw Unintended Bias datasets. Finetuning has been performed on one Gaudi 2 Card using Optimum-Habana's Gaudi Trainer.


