massOfai

Reward Modeling

AI/ML Fundamentals

Learning a reward function from human feedback

What is Reward Modeling?

Used in RLHF (Reinforcement Learning from Human Feedback) to align generative models with human preferences.

Real-World Examples

  • Training reward model from preference labels