Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ZDCSlab 's Collections
Rubrics as an Attack Surface (RIPD)

Rubrics as an Attack Surface (RIPD)

updated 1 day ago

This collection releases the official artifacts accompanying “Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges.”

Upvote
-

  • Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

    Paper • 2602.13576 • Published 7 days ago • 1

  • ZDCSlab/ripd-dataset

    Preview • Updated 1 day ago • 16

  • ZDCSlab/ripd-ultra-real-llama3-8b-instruct-biased-bt

    Text Generation • Updated 1 day ago

  • ZDCSlab/ripd-ultra-real-llama3-8b-instruct-seed-bt

    Text Generation • Updated 1 day ago

  • ZDCSlab/ripd-anthropic-saferlhf-dolphin3-llama31-8b-biased-bt

    Text Generation • Updated 1 day ago

  • ZDCSlab/ripd-anthropic-saferlhf-dolphin3-llama31-8b-seed-bt

    Text Generation • Updated 1 day ago

  • ZDCSlab/ripd-ultra-real-gemma2-2b-it-biased-bt

    Text Generation • Updated 1 day ago

  • ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

    Text Generation • Updated 1 day ago

  • ZDCSlab/ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-biased-bt

    Text Generation • Updated 1 day ago

  • ZDCSlab/ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-seed-bt

    Text Generation • Updated 1 day ago
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs