Artificial Intelligence Human Subjects Research (AI HSR) IRB Reviewer Checklist (with AI HSR and Exempt Decision Tree)

IRBs tread lightly when it comes to the oversight of AI human subject research (AI HSR). This may be due to insufficient understanding of when AI research involves human subjects. It may also be in fear of committing scope creep (who’s role is it to ensure responsible and ethical AI in human subjects research?). Admirably, in response, some have proposed the establishment of commercial AI Ethics Committees, while others try to fit AI ethics review into an ancillary review process. Ancillary AI ethics committees either take on the look and feel of a scientific review committee or treat the process like an IBC or SCRO committee. I argue that IRBs can (and should) fit AI HSR within their current IRB framework in many significant and meaningful ways without committing scope creep.

Admittedly, the current framework has limitations, regardless of if it is AI HSR or any other type of research. However, moving AI HSR oversight to an ancillary committee is not an efficient solution for researchers who will still have to navigate their way through the IRB for their projects in addition to these extra bureaucratic hoops. Ancillary AI HSR committees only delay the process to approval and disincentivize compliance. Rather than build a new AI HSR IRB or ancillary review committee, we need to provide and require the AI HSR education/training of IRB administration and remind the IRB of its duty to ensure a relevant experts sit on the Board when reviewing specific research.

While it may be ideal for institutions with no IRB to outsource their reviews, for institutions with a home IRB, there are multiple downsides to outsourcing AIHSR oversight. Below are a few that come to mind:

1)    Cost: The study team may need to plan for additional funding if the review isn’t free (i.e., when it isn’t done in-house). Additional reviews for modifications or annual renewals may be required, which would add to that cost.

2)    Duplication of Effort: An AI Research Review Committee (AIRC) typically acts as an ancillary review to IRB review. However, many if not all of the issues reviewed would parallel IRB review and cause duplication of effort, time and money.

3)    No binding regulatory power: If an AIRC (or any AI ancillary review) has recommended changes to the protocol, the committee likely won’t have any regulatory “teeth”. This means that the researchers will not be required or inclined to comply with their “suggestions”. Additionally, these suggestions may or may not make their way to the IRB unless there is infrastructure established that keeps the two committees “talking to each other”. 

4)    Sustainability: Need to develop a sustainable administrative process for the committee in regard to.

The key to AI HSR ethical review and research compliance oversight is the need to focus on the data. AI/ML largely depends on the model, but more so depends on the data. Therefore, the IRBs focus should be weighted more heavily on the data used to train the model, as opposed to the algorithm/model itself. IRBs are more well suited to address data concerns than technology (though, the technology may require additional risk assessment by the IT department). These issues can be addressed using a quality AI HSR checklist, adequate board member training, and adding an AI and data expert to the review board. Ancillary and commercial AI HSR IRB committees are innovative and helpful in their own unique ways, but none of these address the rudimentary issue at the forefront of AI HSR oversight which is that we have the tools and protections in place already. We simply need to better understand and utilize them.

We have a lot of work to do! I’ve created a Artificial Intelligence Human Subjects Research (AI HSR) IRB Reviewer Checklist to get this dialogue started.

You can find this in the Creative Commons under a Attribution-NonCommercial-ShareAlike license. Please feel free to distribute, remix, adapt, and build upon the material for noncommercial purposes only (modified material must be under identical terms).

Artificial Intelligence Human Subjects Research IRB Reviewer Checklist (with AI HSR and Exempt Decision Tree) © 2021 by Tamiko Eto is licensed under CC BY-NC-SA 4.0. To view a copy of this license, visit

What is Artificial Intelligence Human Subject Research (AIHSR)? Defining “human subject” and “generalizable knowledge” in AIHSR Projects

The current regulatory challenges IRBs are facing when reviewing novel technologies, specifically AI, is identifying when the use of AI in research constitutes human subject research. Taking AI as we understand it and the federal definitions of “human subject” and “research” feels like we’re handling a shape sorting toy, where instead of putting a square block into a square hole, we’re trying to shove a misshaped block into a toy that doesn’t even have the shape we’re holding. Before we jump to the conclusion that AI doesn’t fit the current regulatory framework, however, let’s take a look at how it does. 

Defining Human Subject: First and foremost, when we think of AI, we might be thinking “complicated technology” or “algorithms”, but what we need to be thinking is simply “data”. Next, we must understand the difference between human-focused datasets and not human-focused datasets. Identifying these differences from the beginning of the project should help IRB’s and researchers identify what projects fall under their oversight, and which do not. 

Human-focused datasets are just what they say: they are datasets used or created to understand humans, human behavior, or the human condition. Not human-focused datasets, on the other hand, might involve human data. However, the difference is this type of AI research is not meant to help us understand humans, human behavior, or human conditions, and would not generally be considered AI HSR as these usually focus more on products and processes. This is in general alignment with the current framework but differs in that the line isn’t always clear. The reason for that is, oftentimes, the datasets are intended to serve both purposes. In that case, the project should still be considered human-focused. 

Take for example, the datasets collected on social media compared to patient healthcare datasets. Both could technically fall under human focused or not-human focused depending on the intended purpose of the data and/or AI role (i.e., what the AI is intended to accomplish). If the AI is used to help us understand human behavior or health conditions, then we would call it a human-focused dataset. If the focus or role of the AI is solely to improve a platform, product or service, then the project is likely not human subject research. 

Using the current definition provided in the Revised Common Rule, we then need to identify if the project meets the federal definition of “human subjects”. In other words, does the research involve a living individual about whom the investigator obtains information through interaction, and uses studies or analyzes that information? 

Often, IRBs are presented with applications that claim the study is not involving human subjects, or that the data is collected from humans but not “about them”. Rather than take that claim at face value, we need to start with two questions:

1) Is it human-focused data?

2) Is the study intended to contribute generalizable knowledge?

As a vast majority of AI studies are intended to learn and model human behavior, getting these questions at the forefront is key. If the AI is intended to model human behavior these studies generally meet the first part of the federal definition of human subject.

Once we get that squared away, we want to remember the second part of the definition of human subject. As recently introduced through the Revised Common rule, to be human subjects, the PI has to either conduct an intervention or interact with the participant, or they can simply obtain, use, analyze, or generate identifiable private information. One might argue that if the data is neither private or identifiable it is not human subject. But what we are seeing now in many AI studies is that AI is dependent on large datasets and linking datasets to other datasets (both private and public) which opens up the possibility of “generating” identifiable information. We also see the extensive use of biometric data such recorded face or voice print, ocular scans, and even gait, which are all considered identifiable information. Taking these things into consideration will help IRB’s make HSR determinations.

The next question we must ask is if the project meets the federal definition of research. We define research as:

“a systematic investigation including research development, testing, and evaluation designed to develop or contribute to generalizable knowledge.”

What most IRBs are challenged with these days is fitting algorithm development, validation, and evaluation and its role in the larger study within this definition. Here lies the most challenging aspect of making Human Subject Research determinations- it requires a common understanding of what constitutes “generalizable knowledge”. 

For now, we as IRB professionals, understand generalizable knowledge to be:

“information where the intended use of the research findings can be applied to situations and populations beyond the current project.”

With this definition, IRBs can determine, based on the study aims, and role of the AI in achieving those aims, if the project is “research” per the federal definition. However, currently there is no federal definition of “generalizable knowledge” so the determination is made inconsistently and subjectively as a result.

So Now What?
In contrast to the current Common Rule guidelines, the FDA and other regulatory bodies have published quite a bit of guidance around where algorithms fit within their larger framework of Software as a Medical Device and have numerous resources available for IRBs, sponsor investigators, and manufacturers.

So, until a definitive policy or guidance is set for AI HSR under the Common Rule, institutions may want to incorporate into their review processes some of the FDA considerations available now, even if the projects aren’t always FDA regulated. This encourages review consistency across projects as well as to ensure various requirements such as the General Data Protection Regulation (GDPR) or 21 CFR Part 11, if applicable, are being met. Note: flexibility is encouraged, depending on the project, as the protocol may not call for some, or any of these additional protections. 

The current regulatory framework, including guidance from the FDA and under which IRBs use in the oversight of human subject research, has been in place for decades and is updated regularly as society and research evolves. Most recently, for example, the Revised Common Rule brought about several changes that were intended to streamline processes and reduce regulatory burdens. As such, while AI as a technology is not new, its use in human subject research is expanding at a rate at which we, as oversight bodies, can no longer use the “wait and see” approach. We are called to take action and are challenged with keeping up with this rapidly changing technology as study designs are beginning to implement it for investigational and non-investigational purposes. Just like we’ve always done, we are being called to look at what we have and how to improve upon it to meet the changing field. I argue that if we start with shared definitions of human subject and “generalizable knowledge”, our mission will be much less challenging.

Ethics and Clinical Research

A Quick Reference to, and Brief Summary of Ethical Resources:

The Beecher Article, “Ethics and clinical research” (Henry K. Beecher)
A brief summary of how unethical research and research conducted without any oversight eventually became to fall under regulative oversight with ethical guidelines.
A summary of important documents in the field of research ethics.

Nuremberg Code
This is a set of research ethics principles for human experimentation set as a result of the subsequent Nuremberg trials at the end of the Second World War.
Basically, in the 1920s German politics tried to make a “master race” by exterminating all those considered to be inferior (they targeted Jewish people). Nazi members held a large portion of the population of German physicians  and were all in for these experiments. There were several arguments against this practice, primarily that it had no therapeutic purpose, there was no informed consent, it lacked beneficence and appeared maleficent instead.
The US led what became as the Nuremberg Trial. The trial resulted in the creation of 10 points that currently make up the “Nuremberg Code”.
The Nuremberg Code covers principles of informed consent, absence of coercion, scientific research validity, beneficence and other important aspects of ethical human subject research.
**Interesting Point** Because the Code was based off extreme acts that seemed almost barbaric, many physicians felt that following the code didn’t apply to them as they didn’t see their research as relevant nor their practices as anything but ethical. Regardless, the Code is considered to be one of the most important and influential documents in regards to human subject research ethics and highlights the importance of global human rights.Because of that the Nuremberg Code and Declaration of Helsinki are basis for the Code of Federal Regulations (CFR 45 Part 46) and what the Department of Health and Human Services and IRBs refer to.

Declaration of Helsinki while the trials were being held, six points defining legitimate medical research were submitted to the Counsel for War Crimes. Three judges, in response to expert medical advisers for the prosecution, adopted these points and added four additional points. The 10 points constituted the “Nuremberg Code,” which includes such principles as informed consent and absence of coercion; properly formulated scientific experimentation; and beneficence towards experiment participants. (has undergone 7 revisions since initially published)
“The Declaration is morally binding on physicians, and that obligation overrides any national or local laws or regulations, if the Declaration provides for a higher standard of protection of humans than the latter. Investigators still have to abide by local legislation but will be held to the higher standard.”
The Declaration originally adopted the 10 principles in the Nuremberg Code, combined them with the Declaration of Geneva, and introduced the concept of requiring an independent committee (e.g., the IRB, which came into effect in the US in 1981). The requirement to disclose conflicts of interest wasn’t addressed until the 5th revision in 2000. By 2013, the 7th revision was published and required dissemination of research (even if it was inconclusive) and included the requirement for compensation for research related injuries. However, the US FDA rejected all versions after the 3rd and developed the Good Clinical Practices as its replacement.

Good Clinical Practices (GCP)
Different from the Nuremberg Code and the Declaration of Helsinki, the GCP is not so much focused on morals but more on procedures. In summary, international clinical trials varied based on inconsistent guidelines and standards which led to investigators having to repeat their studies so in the 1990’s Japan, the U.S., and the European Union came together to establish the ICH (International Conference on Harmonization) with the intention of harmonizing the technical guidelines and requirements for drug marketing which resulted in the GCP (ICH E6 Good Clinical Practices). The GCP is now the international standard for the design, conduct, monitoring, and reporting of clinical research of investigational drugs. This has since been revised to now be called the ICF E6(R2) GCP.
The U.S. FDA has opted out of adopting these as law, however, and only uses them as guidance. Of note, it is said that the ICH guidances and FDA regulations do not contradict each other, rather, the ICH guidances tend to go beyond the regulations. Sponsors generally prefer to abide by these “guidances” though, as it assures their studies are in compliance with the regulations at an international level.

The National Research Act:
in response to the growing concern over research ethics, specifically the prisoner studies, Willowbrook study, and the PHS Syphilis Study, congress passed the National Research Act in 1974 which resulted in the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research which now mandate the requirement for IRBs to oversee any PHS-funded study. Ethical Principles and Guidelines for the Protection of Human Subjects of Research” was thus developed for the regulation of human subject research and is now commonly known as The Belmont Report.
The Belmont Report covers three main principles: Respect for Persons, Justice, and Beneficence.

Why We Even Bother…

If we knew what we were doing, it wouldn’t be called research, would it? — Albert Einstein


One of the main reasons why research oversight is necessary is because it is not always possible for everyone interested in research to understand or consider the possible risks and harms that can and do occur as a consequence of the research. While in most cases, investigators have good intentions and great hearts, let’s face it: it’s not easy to anticipate all the potential risks. I don’t have enough fingers to count how many times I thought I had a great idea, only later to find out it wasn’t such a great idea.

When consulting with researchers, I like to ask them, “if you were the participant, what kind of questions would you ask about the research before signing up? Would you be comfortable yourself, or someone in your family becoming a research participant with all aspects exactly as they are at this point?

The need for human subject research protections was brought to light over time, but a few studies from the past that really stand out are the Nazi studies (a.k.a. the Nuremberg Trials, where someone thought it would be a great idea to try out a bunch of medical experiments on unwilling and uninterested concentration camp prisoners), and the Tuskegee Study (where someone thought it was perfectly reasonable to lie about treating syphilis and thus prevent the participants from actually getting treatment, all “for greater research purposes”).

While we could assume the worst, we could also assume the best of intentions, but in the end, people still get hurt. This is why, after several experiments in which too many people were getting hurt and killed (intentional or not), people finally caught on and thought it best to establish some basic research ethics.

A National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research was created and made a code of research ethics, issuing the “Belmont Report” which is what we pretty much base majority of our regs off today and how the U.S. Department of Health and Human Services (HHS) developed their Code of Federal Regulations (45 CFR 46). Since then, 16 federal agencies have also adopted this. We call it the “Common Rule“.

The Belmont Report contains three basic ethical principles:

  • Respect for Persons
  • Beneficence
  • Justice

Respect for Persons in a nut shell says all people have the right to make decisions for themselves (i.e., they’re autonomous), and that if they are unable to make decisions for themselves (kids, for example) then they must have extra protections. This means that they also must have plenty of information for them to make those decisions. Without enough information, we may end up signing up for something we would not have otherwise.

In other words, participants should understand what the study is about, feel no pressure to say yes or no, and if they do join, they should feel free to quit at any point without consequence. This is where potential for undue influence become a big deal. Let’s say a boss wants to do a study on her subordinates. They may feel afraid of saying “no” at risk of getting on the boss’ bad side.

Beneficence: This one I always confuse with “benefits” which it isn’t. In research, beneficence means (1) don’t harm anyone, and (2) make sure you are maximizing any possible benefits while minimizing all possible harms. In Social Behavioral Research this can be hard because it’s extremely rare that an individual gets any personal benefit out of their participation. This is why it is important to be able to anticipate various risks- so that you can outweigh them with benefits, or at minimum, do your due diligence and protect them from harm, such as any potential for their data being lost.

Justice: Another confusing term. Justice in the Belmont Report builds off of the last two, by identifying how we select participants. I.e., who is ultimately benefiting from this? The Tuskegee study is a great example of injustice. The participants were all black. We call this “inequitable selection”, which basically means you can’t target certain groups if the benefits can be generalized to other groups. Not only did the researchers limit their study to black men, but that meant that in the course of the research, it was only those black men that shouldered the burden and risks from the study while the larger population benefited. The Nuremburg trials as well, serve as a great example. Would the researchers mind serving as research participants in their studies to the extent their selected participants were? Had the tables been turned, it is highly likely that the researchers would not have wanted to be in either of those studies as a participant.

In summary, research protections are not in place as bureaucratic nonsense or stupid rules made up to slow down f important research from getting done. Protections are in place because throughout history, we’ve grown. We’ve learned a lot and in the course of making mistakes, we have found ways to avoid making them in the future- with basic guiding principles.

Fortunately, not every study falls into the same cup. Research is not black and white, and regulations also provide flexibility. By working closely with your IRB, from planning to implementation, you can be confident that you’re getting what you need done in the most practical, efficient, ethical, and compliant way possible. And if that’s not a big enough incentive, it also keeps you funded.