← back

about nerfdetector

You know when a model just feels off? Dumber than last week, slower to pick up context, more likely to botch a tool call?

nerfdetector crowdsources subjective model performance so you know when your model is actually nerfed.

Developers report how models are performing in real coding sessions. nerfdetector aggregates those reports into a live quality monitor for every major model.

Status

Each model’s score comes from two signals. Sentiment is the share of developer votes that say the model is working well. Telemetry is the average session health, computed from error rates and tool failure rates reported by the CLI agent. The final score weights sentiment at 70% and telemetry at 30%.

🟒 FINEβ‰₯65%
🟑 STRUGGLING40–65%
πŸ”΄ NERFED<40%

15-minute rolling window. Sparkline covers the last hour in 5-minute buckets.

The agent

npm i -g nerfdetector && nerfdetector init

The CLI installs a background hook in your coding agent that tracks which models you use and for how long. When a session ends, it prompts you to vote if the session was fine or nerfed. If you used multiple models, the vote splits by time spent on each. If you skip the vote, the agent still reports session health as telemetry.

Privacy

The agent only sends metadata: which models you used, how many calls, how long they took, and whether they errored. None of your prompts, responses, code, or file paths ever leave your machine. There are no accounts or tracking. Run nerfdetector export to see exactly what gets stored locally.