Many top language models now err on the side of caution, refusing harmless prompts that merely sound risky – an ‘over-refusal’ behavior that affects their usefulness in real-world scenarios. A new dataset called ‘FalseReject’ targets the…
Many top language models now err on the side of caution, refusing harmless prompts that merely sound risky – an ‘over-refusal’ behavior that affects their usefulness in real-world scenarios. A new dataset called ‘FalseReject’ targets the…