Model Alignment

Ethics & Explainability

Ensuring model goals match human intent

Efforts to make AI objectives and behavior align with human values and safety constraints.

Learn more about concepts related to Model Alignment

RLHF

Reinforcement Learning from Human Feedback

AI Safety

Designing AI systems to avoid harmful behavior