Blog

Why Some Entities Can Market AI/ML-Based Clinical Decision Support Systems Without IRB or FDA Oversight

I was recently asked my thoughts on why some public entities are able to push unsafe medical devices onto the market without IRB or FDA oversight and without ethical considerations. Here was my response:

FDA and IRB regulations are very nuanced, so we’ll first have to unpack a few things:

When is IRB Review and/or FDA oversight required?

A) Both IRB review and FDA oversight are required if there is a clinical investigation/clinical evaluation of a “medical device” or drug regardless if the institution receives federal funding.

B) If investigational drugs or devices are NOT involved, only IRB review requirements apply when the institution is conducting “human subjects research”  and that institution general has been a recipient of federal funds (academia, for example).

C) Commercial entities (Facebook, OpenAI, Apple, etc.) typically do not require IRB oversight unless they are involved in either A or B above.

When is Informed Consent required?

A) Informed Consent is always required (under 45 CFR 46.111 and 116, unless the IRB determines that the research meets waiver criteria. Unfortunately, IRBs are notorious for giving waivers out like candy. This is not great, but does explain why not a lot of consenting is done for this type of research.

B) The US DHHS Secretary Advisory Committee for Human Research Protections (SACHRP) states that informed consent MUST be obtained if data collection is being done “as part of the research”. This means that while many times IRBs consider retrospective chart review as “waiver worthy”, prospective data collection or data collection done directly from patients/participants would require informed consent and is not eligible for a waiver.

C) HIPAA always applies for any covered entity, regardless of research or not, and is separate from Informed Consent, but follows similar waiver criteria, so if the IRB is involved, IRBs have also notoriously granted HIPAA waivers for that as well. I’m not a fan of that practice.

In other words, so long as a “medical device” or drug is being investigated, these requirements apply to both federally funded and non-federally funded institutions. This is outlined in 21 CFR 50/56.

However, I think there are a lot of possible reasons for companies that test and use LLM (or any other AI/ML-based medical devices/AI SaMD) in clinical workflows to “slip through the cracks”, essentially avoiding all of these things. 

  1. The most common reason is that they’ve convinced themselves their product does not constitute a “medical device” under the FDA definitions. They may or may not be right. It depends.
    • For example, general health/wellness products like a Fitbit aren’t a medical device if used as indicated (measure steps during the day, heart rate, etc.). 
    • However, the Apple Watch’s “Irregular Rhythm Notification that is intended to detect irregular rhythms is considered a Class 2 (not an insignificant risk) medical device regulated by the FDA.
    • Some hospitals (and many industries that provide Clinical Decision Support Systems) often misunderstand AI/ML-based platforms to be just like any other software, and throw it into clinical workflows without question. The EPIC Sepsis prediction tool is a perfect example of that (and how it can go horribly wrong). See here for an interesting article.
    • There was another web-based platform (CanRisk) that has a CE-marking that a US hospital was using in their own breast cancer predictions, and later found out that they were not supposed to be using it because it was considered a “medical device” that was not FDA-approved (We can’t accept foreign countries’ “approvals” as equivalent to an FDA approval).
  2. Some AI/ML-based clinical tools would not be considered a “medical device” based on their purpose.
    • For example, a NLP (natural language processing) model used to identify a certain complaint in medical notes to see how frequently it occurs in a given condition, or searching through EHR to identify underlying conditions.
      • This isn’t to say there isn’t risk involved (there’s plenty of risk involved when it comes to foreign code being introduced into EHR and behind hospital firewall leading to breaches and malware that end up costing hospitals millions of dollars)
      • Nevertheless, this type of application wouldn’t necessarily require IRB or FDA oversight or informed consent if the institution isn’t trying to use it to “treat, diagnose, mitigate, prevent, or cure a condition or disease”.
    • However, if they used that NLP to identify episodes that would then send alerts to physicians to act on (in order to diagnose, treat, mitigate, etc…), then it may be considered a medical device.
  3. LLMs used in clinical circumstances, for example, as a chat bot that “supports” clinicians may or may not be FDA regulated, depending on its “intended function”.
    • Let’s say a chat bot asks you to upload a picture of your lesion and then assesses if the lesion is cancerous. This would be FDA-regulated. Therefore, the development of it would require all those things you mentioned.
    • However, if a chat bot asks you about your symptoms and then provides a list of possible things that might be the problem, this would not be FDA-regulated (an exception made under the Cures Act). 
  4. Some other things to consider is that while any medical device (exempt or not) must undergo design controls (under 21 CFR 820.30) not all are required to have clinical evidence to support their marketing permit. For example, if they can prove their device is equivalent to a predicate device, the extensive testing and validating that would normally be required is essentially gone. I worry about this in regard to AI SaMD.
  5. Many hospitals fail to understand the difference between QA/QI and Research. The FDA considers any medical device that has not been tested for safety and/or effectiveness as “investigational”. However, many hospitals try to incorporate AI/ML tools into their systems for “medical purposes” thinking it is “quality improvement” (and as mentioned earlier, sometimes misunderstand it to be just like any other “software” program for “business use”, when in fact, it is being used in an “investigational” manner.
    1. For example, running a predictive model that has not been FDA cleared or approved, for colon cancer risk, so that the hospital can reprioritize their long list of scheduled patients for annual exams.
      1. The nature of issuing a “risk score” for an individual would constitute this as a “medical device” and 
      2. The untested nature of the device makes it “investigational” which would therefore require IRB oversight. 
      3. Ultimately, if institutions choose/neglect to forego this requirement, it remains an institutional risk. For these reasons, I believe institutions should always consult their IRBs first.
  6. Lastly, if you look at the list of currently FDA-approved or cleared AI/ML SaMD, you’ll notice that almost ALL of them failed to go through what, today, we would consider the “correct” pathway.
    1. Most devices failed to undergo external validation (i.e., they were trained and tested on retrospective chart reviews, and oftentimes only by 1 or 2 institutions).
      1. In fact, if you review their summaries, you’ll see they’ve only been tested on a few hundreds of charts (this introduces risk in real-world settings, so adopters of this technology should re-train and re-validate within their own institutions prior to deployment)
    2. Many incorrectly went through the FDA’s 510(k) pathway. The FDA has acknowledged that this was a mistake but at the time there wasn’t a lot of information about AI/ML, so some imaging products, for example, were passed through as “equivalent to an MRI”. I think there’s more scrutiny to the assessment process now (or at least there will be), but that’s how most are getting through FDA clearance processes.

That said, the FDA (and IRB) oversight is vulnerable and imperfect:

  1. The term “medical device” is extremely broad, and often times subjectively interpreted by the manufacturer, and often misinterpreted by many IRBs. 
  2. Even if an AI product is FDA-regulated, FDA has limited resources, so many products do enter the market without the FDA’s knowledge and the only way for them to find out is through whistleblowers/tips. For example, if one company did “the right thing” and saw a competitor did not, then they have incentive to tip off the FDA. Only then would the FDA be triggered to investigate, and would likely issue a cease and desist letter. 

On top of all of these issues, my larger concern is investigator physicians that create their AI/ML SaMD. There’s lots of legal issues that the IRB and hospital have to think about.

  1. Adaptive AI/ML SaMD. Continuously learning algorithms have to go through a predetermined change control plan, and guidance on that isn’t fully hashed out now, but is essentially the largest harm in healthcare, introduced by AI currently.
  2. Hospitals that hire physicians can be considered “manufacturers”/“sponsors”. This introduces institutional conflicts of interest, FDA registration requirements, and a blurred line between the “practice of medicine” by a corporation as opposed to a physician practicing medicine. Some states prohibit this where others have vague interpretations on the matter.
  3. Physician investigators having conflicts of interest that may influence how a model is validated.
  4. Limited resources for a hospital to maintain the cost of LLMs and other AI/ML-based medical products. 

All in all, I remember in the early 2000s when everyone was trying to make an app for anything and everything, and I think we’re in that stage where everyone is trying to make an “AI” for anything and everything, and I don’t think AI belongs “anywhere and everywhere”. I firmly believe that physicians must maintain autonomy and ensure their decisions are based on medical practice and not from trained AIs that may have been carelessly developed (as is the case now). AI may be useful as an augmentation to clinical decision making, but I’m not comfortable with the idea of it being used to drive clinical decision making at this point.

When Are Clinical Decision Support Tools FDA Regulated (considered Medical Devices)

I created this decision tree based off the FDA’s recent guidance on Clinical Decision Support Tools. The actual link to that guidance is here:

https://www.fda.gov/medical-devices/software-medical-device-samd/your-clinical-decision-support-software-it-medical-device

My hope was to make a simplified decision tree to aid IRB’s in determining when or if their clinical decision support tools are considered “medical devices” and therefore require adherence to the FDA regs.

And here is the actual FDA version (which I also absolutely LOVE!)

Let me know what you think!

What Level of IRB Review is Required for AI/ML Research Involving Human Subjects?

Here is a decision tree for IRBs to use, that is primarily for AI/ML Human Subjects Research that involves medical devices. This decision tree is primarily for that and won’t do you any good if you aren’t dealing with medical devices. This tree will help you determine if an IDE is required, if the IRB can make an Non-Significant Risk Determination, and which projects can be expedited, or possibly even “exempt” from IRB review. The “Exempt” Decision Tree can be found embedded in the AI HSR Checklist.

Let me know what you think!

How to Conduct an Effective IRB Review of Artificial Intelligence Human Subjects Research (AI HSR)

Institutional Review Boards (IRBs) are formally designated independent groups charged with the review and ethical oversight of research involving human subjects. The IRB is composed of knowledgeable experts in various fields to provide guidance to researchers to minimize risks and maximize benefits for research participants. Moreover, the IRB is in place to protect the rights and welfare of human subjects in research projects. IRBs inform their decisions based on the principles of the Belmont Report, and established regulations and policies from the Code of Federal Regulations and Food and Drug Administration (FDA) (if applicable).

IRB oversight has been required for human subject research dating back to 1974¹; however, the terms research and human subjects are often misunderstood and inconsistently applied today. Federal guidelines were altered in 2018 to define human subjects to include, “information about [not just physical interventions and interactions with] a living individual”. Artificial intelligence and machine learning (AI and ML) research involving human data challenges the federal human subjects guidelines stemming from the difficulty in defining “about whom” the data is being collected.

This White Paper is intended to be used as a basis for further discussion. We seek feedback on it to inform future iterations of the recommendations it contains. Our aim is to help IRBs build their capacity as regulatory bodies responsible for protecting human subjects in research. We provide recommendations on how AI HSR can be reviewed and adequately overseen within the current regulatory framework until a more thorough regulatory framework can be developed. We also include a decision tree for human subjects and exempt category four (4) (secondary use) determinations, based off the
Office of Human Research Protections (OHRP) current guidance².

For IRB professionals the questions arise in two realms: Is the activity “human subjects research” and, if yes, does it meet Exempt criteria?

Check out our new White Paper that discusses how to make AI HSR determinations.

Resources

  1. Protection of Human Subjects, 39 Fed. Reg. 105, 18914-18920 (1974) (to be codified at 45 C.F.R. [[section]] 46).
  2. United States Department of Health and Human Services (DHHS). “Human Subject Regulations Decision Charts”.
    https://www.hhs.gov/ohrp/regulations-and-policy/decision-charts/index.html. Accessed 30 December 2021.