On-device BERT · 17 PII types · English + Hinglish

Ask AI without
exposing people.

Maskara is a 4.37M-parameter token-classification model that finds names, Aadhaar, PAN, UPI IDs, passports and 12 more PII types — then masks them locally before a single token reaches the cloud.

Maskara · BERT-NER

Wrap a prompt, call any LLM, restore the reply. 99.98% F1 across 17 entity types, fine-tuned on 200K Indian and global PII samples — raw values never leave the device.

See it mask a prompt 99.98% F1

One local pass.
No raw PII leaves.

Pick a realistic prompt — Hinglish included — and watch Maskara detect sensitive spans, swap them for safe twins, and restore the originals after the AI replies.

Prompt composer Type or pick a sample with sensitive data
0/3,000
tiny browser demo
Privacy pass Ready.
Detect Find sensitive spans locally. 0 found
Replace Swap raw values with safe twins. local vault
Restore Put originals back after the reply. 0 spans
Cloud seessafe prompt

            
App receivesrestored reply

            

Small, local,
India-first.

A continued-training BERT checkpoint tuned for Indian PII formats and bilingual Hinglish text. Public on Hugging Face, MIT-licensed, and built to drop in front of any LLM call.

4.37Mparameters · CPU-friendly
99.98%F1 on held-out test set
17PII entity types
EN + Hinglishbilingual / code-mixed

Two ways to run it

Detect tokens straight from Transformers, or wrap the prompt path with the masking SDK.

1 · ProtectDetect & mask PII locally.
2 · CallSend only safe context.
3 · RestorePut originals back locally.
from maskara import Maskara

maskara = Maskara()                  # loads the 4.37M-param NER head
safe, ctx = maskara.protect(prompt)  # PII swapped for safe twins
reply = llm.complete(safe)           # only masked text hits the cloud
final = maskara.restore(reply, ctx)  # originals re-inserted locally

Hugging Face

somukandula/maskara ↗

BERT token classification · Safetensors · F32 · MIT license.

Built for code-mixed text

Trained on Hinglish and English-mixed prompts, so it catches PII even when sentences switch languages mid-stream.

Drop-in middleware

Wrap a prompt, call your provider, restore the response. No new app architecture, no data sent out to detect.

17 entity types

Five Indian-specific formats added in this checkpoint, on top of twelve global identifiers.

AADHAARIN PAN_CARDIN PASSPORTIN UPI_IDIN VEHICLE_REGIN PERSON_NAME ADDRESS PHONE EMAIL DATE_OF_BIRTH CREDIT_CARD PASSWORD API_KEY IP_ADDRESS USERNAME SSN DRIVER_LICENSE
Indian-specific (new) Global identifier
200K rows · maskara-indian-pii-200k 3 epochs · A100 2e-5 learning rate 0.9997 precision · 0.9998 recall