OpenAI has released HealthBench, an open-source evaluation framework designed to measure the performance and safety of large language models (LLMs) in realistic healthcare scenarios. Developed in collaboration with 262 physicians…
OpenAI has released HealthBench, an open-source evaluation framework designed to measure the performance and safety of large language models (LLMs) in realistic healthcare scenarios. Developed in collaboration with 262 physicians…