DoD to Develop Scalable GenAI Testing Datasets

DoD to Develop Scalable GenAI Testing Datasets

In an announcement Thursday, DoD said the CAIRT program's most recent red-team test involved more than 200 agency clinical providers and healthcare analysts to compare three LLMs for two prospective use cases: clinical note summarization and a medical advisory chatbot. They found more than 800 potential vulnerabilities and biases where LLMs are being tested to enhance military medical care.CAIRT aimed to build a community of practice around algorithmic evaluations in collaboration with the Defense Health Agency and the Program Executive Office, Defense Healthcare Management Systems. In 2024, the program also offered a financial AI bias bounty focused on unknown risks in LLMs, beginning with open-source chatbots.

Medigy Insights

"Since applying GenAI for such purposes within the DoD is in earlier stages of piloting and experimentation, this program acts as an essential pathfinder for generating a mass of testing data, surfacing areas for consideration and validating mitigation options that will shape future research, development and assurance of GenAI systems that may be deployed in the future," said Dr. Matthew Johnson CAIRT program lead, in a Jan. 2 statement about the initiative.


Deploy this technology today


Did you find this useful?

Medigy Innovation Network

Connecting innovation decision makers to authoritative information, institutions, people and insights.

Medigy Logo

The latest News, Insights & Events

Medigy accurately delivers healthcare and technology information, news and insight from around the world.

The best products, services & solutions

Medigy surfaces the world's best crowdsourced health tech offerings with social interactions and peer reviews.


© 2025 Netspective Foundation, Inc. All Rights Reserved.

Built on Jan 10, 2025 at 1:03pm