Courts and Regulators Diverge on AI Training as Fair Use Debate Intensifies
Artificial intelligence regulation is entering a new phase as policymakers, technology companies, courts, and content creators continue to debate whether AI developers should be regulated based on the data used to train models or the outputs those models generate.
A new policy paper released by Google argues that regulators should focus on preventing harmful AI-generated outcomes rather than imposing broad restrictions on how AI models are trained. The proposal comes as U.S. states, the European Union, and federal courts continue to take different approaches to AI governance and copyright law.
The debate has become increasingly significant as lawsuits involving AI training practices and copyright infringement continue to move through U.S. courts.
Google Advocates an Output-Based Approach
In a 21-page AI governance paper published on June 25, Google called for regulations that address the real-world impacts of artificial intelligence rather than the methods developers use to build AI models.
The paper argues that regulators should focus on preventing measurable harms instead of “micromanaging the science behind these new tools.”
Google also addressed the ongoing copyright debate surrounding AI training. The company maintained that training AI models using publicly available web content constitutes a transformative use under U.S. copyright law. Google compared the process to an art student drawing inspiration from works displayed in a gallery.
According to the paper, copyright concerns should center on whether AI-generated content unlawfully reproduces protected works rather than how the underlying model was trained.
Google further suggested that publishers who do not want their publicly available content used for AI training should rely on machine-readable opt-out tools, such as robots.txt. For proprietary or restricted content, the company said paid licensing agreements remain the appropriate approach.
Regulators Continue to Emphasize Training Data
Google’s proposal differs from regulatory initiatives already adopted in several jurisdictions.
California’s Training Data Transparency Act, which took effect on Jan. 1, 2026, requires AI developers to disclose information about the datasets used to train their models, including data sources, collection periods, and whether copyrighted or personal information was included.
Similarly, the European Union’s AI Act requires certain AI providers to disclose details about their training data, including data sources and processing methods, using standardized reporting templates. Both frameworks are designed to increase transparency around AI development by focusing on the information used to build models rather than solely evaluating their outputs.
Colorado Takes a Different Direction
Colorado has adopted a regulatory approach that more closely aligns with Google’s recommendations.
Earlier this year, the state revised its AI Act to place greater emphasis on the outcomes produced by AI systems rather than the technical details of model development.
Under the revised framework, businesses must identify situations in which AI materially influences consequential decisions. If an AI-assisted decision results in an adverse outcome, affected consumers must receive notice explaining the role AI played in that decision.
The revisions shift regulatory attention toward accountability for real-world impacts instead of requiring extensive oversight of AI training processes.
Courts Remain Divided on Fair Use
The legal landscape surrounding AI training remains unsettled.
In June, U.S. District Judge William Alsup ruled that AI training can constitute a “quintessentially transformative” use under copyright law, suggesting that using existing works to train AI models may qualify as fair use under certain circumstances.
Just days later, U.S. District Judge Vince Chhabria expressed a different view, warning that unrestricted AI training could weaken the economic incentives that encourage creators to produce original works.
Additional copyright lawsuits involving companies including Anthropic, Google, and Stability AI remain pending, meaning courts are likely to continue shaping the legal boundaries of AI training over the coming years.
Publishers Continue to Push Back
Content publishers have challenged Google’s proposed reliance on opt-out mechanisms.
Digital Content Next recently sent a cease-and-desist letter to the Common Crawl Foundation. The organization argued that copyright law is not an opt-out system and that content creators should not be responsible for preventing their work from being used in AI training.
The disagreement reflects a broader debate over whether AI developers should obtain permission before using copyrighted materials or whether publicly available content may generally be used unless publishers expressly object.
Concentration of AI Development Raises Additional Questions
Beyond copyright, some legal scholars have pointed to broader concerns about the concentration of AI development among a relatively small number of technology companies.
Daryl Lim, H. Laddie Montague Jr. Chair in Law at Penn State Dickinson Law School, noted that only a limited number of organizations currently possess the computing power, data resources, cloud infrastructure, and distribution capabilities necessary to train frontier AI models at scale.
According to Lim, only a small number of companies have the resources to train frontier AI models at scale. Doing so requires ingesting vast amounts of data, including materials that may contain copyrighted works, raising broader questions about market concentration and competition.
An Evolving Regulatory Landscape
The differing positions adopted by regulators, courts, technology companies, and publishers illustrate that no single framework has yet emerged for governing AI training.
While some policymakers continue emphasizing transparency around training data, others are focusing on the measurable effects AI systems have on consumers and businesses. At the same time, courts are beginning to define how existing copyright law applies to rapidly evolving AI technologies.
As additional litigation proceeds and regulatory proposals continue to develop, the balance between innovation, copyright protection, and AI accountability is likely to remain one of the defining policy debates surrounding artificial intelligence.