reward hacking

English dictionary entry

Meanings

noun
  1. The exploitation of a reward function by an agent to maximize rewards in unintended or undesirable ways, often by finding loopholes that subvert the true goal of the task.
  2. Any manipulation or exploitation of a reward or incentive system, typically by maximizing measurable outcomes in ways that undermine the system’s actual goals.

Word forms

reward hacking
This entry uses open data from Wiktionary (CC BY-SA/GFDL). Word forms are used for search and are not indexed as separate pages.