
OpenAI releases Privacy Filter, a local open-weight model built for personal data masking
OpenAI has launched Privacy Filter, an open-weight model designed to detect and redact personally identifiable information (PII) in text. Unlike many previous approaches, Privacy Filter combines advanced language understanding with a privacy-focused labeling system, letting it identify more subtle or context-dependent PII. While traditional tools often rely on strict formatting rules, this model distinguishes between public and private data within unstructured text.
Privacy Filter can run directly on local environments, ensuring that sensitive information never leaves the machine. For settings requiring high-throughput processing, it supports efficient handling of long inputs, performing redaction in a single, quick pass. Developers can fine-tune the model for their specific needs and integrate it across training, indexing, logging, and review pipelines, strengthening privacy controls at every step.
On the PII-Masking-300k benchmark, Privacy Filter achieves a 96% F1 score, which increases to 97.43% on a corrected version of the dataset. The model is now available under the Apache 2.0 license and can be accessed on platforms such as Hugging Face and GitHub.

Comments
I guess Microsoft is behind it, pushing again its Recall functionality that no entreprise is actually using and hasn't actually boost Copilot computer sales, with soon a "Recall 2: 21% faster and 272% more secure" announcement.