Balancing AI Progress with Safety Guardrails

An Interview with Claude AI, March 4, 2026

Dirk: What specific guardrails can be put on AI use and development that will not hinder progress but will prevent harm to society?

This is a great question that sits at the intersection of policy, technology, and ethics. There’s a growing body of thought on this, and the challenge is real — guardrails that are too loose invite misuse, while ones that are too tight can stifle beneficial innovation. Here’s a framework for thinking about it:

Transparency and disclosure requirements are among the least controversial guardrails. Requiring that AI-generated content be labeled (especially in political ads, news, and customer service), that companies disclose training data sources, and that organizations publish model cards describing capabilities and limitations doesn’t slow down development — it just ensures people know what they’re interacting with. The EU AI Act moves in this direction, and most researchers support it.

Tiered risk-based regulation is another approach that tries to balance progress with safety. Rather than treating all AI the same, you classify systems by risk level. A recommendation algorithm for a music app doesn’t need the same oversight as a system making parole decisions or diagnosing cancer. High-stakes domains like healthcare, criminal justice, finance, and critical infrastructure get stricter requirements (auditing, testing, human oversight), while lower-risk applications face lighter rules. This lets most innovation proceed freely while focusing scrutiny where harm is most likely.

Pre-deployment testing and red-teaming for frontier models is increasingly seen as a practical guardrail. Companies like Anthropic, OpenAI, and Google DeepMind already do this voluntarily. Making it a standard practice — especially for models above a certain capability threshold — adds a safety check without preventing the research itself. The key is that the testing happens before public release, not as a barrier to research.

Accountability and liability frameworks can shape behavior without prescribing specific technical approaches. If developers and deployers are legally responsible when their systems cause foreseeable harm (through negligence, inadequate testing, or ignoring known risks), they’re incentivized to build responsibly without being told exactly how. This is analogous to how product liability works in other industries.

A few guardrails that are more debated include mandatory licensing for AI developers (risks creating gatekeeping that favors incumbents), compute thresholds that trigger regulation (hard to calibrate and may become obsolete quickly), and broad restrictions on open-source model releases (could hamper the research ecosystem that drives progress for everyone).

Where it gets genuinely hard is in dual-use capabilities — the same model that helps a biologist design proteins could potentially help a bad actor design pathogens. There’s no clean line, and most serious thinkers acknowledge that some tension between openness and safety is inherent and requires ongoing, adaptive governance rather than a one-time rule.

The approaches that seem most promising share a few traits: they focus on outcomes and harms rather than prescribing specific technologies, they scale with actual risk, they preserve the ability to do open research and iterate, and they include mechanisms for updating rules as the technology evolves. Rigid, technology-specific regulation tends to age poorly, while principle-based frameworks tend to be more durable.

Would you like me to dig deeper into any particular dimension of this — say, the policy landscape in a specific region, the technical mechanisms for enforcement, or how this applies to a particular domain?

Dirk: Expand on the technical mechanisms for enforcement.

This is where theory meets practice, and it’s one of the harder parts of AI governance — because you need mechanisms that are actually verifiable, not just promises on paper.

Auditing and evaluation infrastructure is probably the most mature technical enforcement mechanism. Independent third-party audits can test models for bias, safety vulnerabilities, and compliance with stated policies before and after deployment. This looks like structured benchmark suites (testing for toxicity, fairness across demographics, susceptibility to jailbreaks), adversarial red-teaming by external teams, and ongoing monitoring of deployed systems. The technical challenge is that evaluations are only as good as the benchmarks, and sophisticated harms can slip through standardized tests. There’s active research into making evals more robust, but it remains an arms race.

Watermarking and provenance tracking is a promising area for enforcing transparency requirements. The idea is to embed detectable signals into AI-generated content — text, images, audio, video — so that it can be identified downstream. Some approaches work at the model level (statistical patterns in token sampling), while others work at the infrastructure level (cryptographic metadata attached to outputs). The technical limitation is robustness: many watermarking schemes can be defeated by paraphrasing, cropping, or re-encoding. C2PA (Coalition for Content Provenance and Authenticity) is an industry effort to build provenance standards into the content pipeline itself, which is more durable than watermarking alone but requires broad adoption to be effective.

Compute governance and hardware-level controls are a more upstream approach. Since training frontier models requires massive compute clusters, monitoring and regulating access to that compute provides a natural choke point. This could involve know-your-customer rules for cloud GPU providers, reporting thresholds when training runs exceed a certain scale (measured in FLOPs), or even hardware-level mechanisms. Some researchers have proposed secure hardware modules that could cryptographically verify what workloads are being run — essentially a “black box” for training runs. This is technically feasible but raises significant questions about surveillance and who controls the verification infrastructure.

Structured access and API-level controls are already widely practiced. Rather than releasing model weights openly, developers can serve models through APIs that enforce usage policies — rate limiting, content filtering, logging, and terms-of-service enforcement. This gives developers a technical surface for preventing misuse (blocking certain query types, flagging anomalous usage patterns) without restricting legitimate users. The tradeoff is that this only works for closed models; once weights are released openly, API-level controls are irrelevant. This fuels the ongoing open-source versus closed-source debate.

Interpretability and explainability tools represent a longer-term enforcement mechanism. If regulators require that high-stakes AI decisions be explainable, then there needs to be technical machinery to actually produce those explanations. Current approaches range from attention visualization and feature attribution (relatively shallow) to mechanistic interpretability research that tries to reverse-engineer what’s happening inside neural networks at a circuit level. Anthropic invests heavily in this area. The honest assessment is that interpretability for large models is still in its early stages — we can explain simple behaviors but not yet reliably audit a frontier model’s reasoning in the way you’d audit a financial ledger. Progress here would be transformative for enforcement.

Differential privacy and data governance tooling enforce rules about training data. Techniques like differential privacy allow models to be trained on sensitive data while providing mathematical guarantees that individual data points can’t be extracted. Federated learning allows training across distributed data without centralizing it. These are technically sound but come with performance costs — models trained with strict differential privacy tend to be less capable — so there’s an active tradeoff between privacy guarantees and model quality.

Monitoring and incident response systems are the post-deployment layer. This includes automated systems that flag unusual usage patterns (a sudden spike in requests for dangerous chemistry, for instance), human review pipelines, and structured incident reporting frameworks similar to what exists in cybersecurity (CVE-like databases for AI harms). The technical challenge is scale: billions of interactions happen daily, so monitoring has to be largely automated, which means it inherits all the limitations of the classifiers doing the monitoring.

The honest summary is that no single mechanism is sufficient. The most credible enforcement strategies layer multiple approaches — compute-level visibility, pre-deployment testing, API-level controls, watermarking, post-deployment monitoring — so that weaknesses in one layer are compensated by others. And all of them require sustained investment, because the technology they’re governing is itself a moving target.

Want me to go deeper on any of these — perhaps the compute governance angle, or how interpretability research could change the landscape?

Dirk: What is the policy landscape in the United States of America?

The U.S. AI policy landscape right now is genuinely messy — probably the most dynamic and contested it’s been. Here’s where things stand:

No comprehensive federal AI law exists. The U.S. has no equivalent to the EU AI Act. Instead, it relies on a fragmented patchwork of state laws, agency guidance, and voluntary standards.1 Congress has debated various proposals but hasn’t passed anything binding on the private sector.

The Trump administration has taken a deregulatory, innovation-first stance. In January 2025, Executive Order 14179 revoked portions of the Biden-era AI executive order that had emphasized safety testing and reporting requirements.2 The framing shifted explicitly toward maintaining U.S. competitiveness and reducing barriers to AI development.

The big flashpoint is federal preemption of state laws. In December 2025, President Trump signed an executive order proposing to establish a uniform federal AI policy framework that would preempt state AI laws deemed inconsistent with that policy. The order specifically criticized Colorado’s AI Act and created several enforcement mechanisms: the Attorney General was directed to establish an AI Litigation Task Force to challenge state AI laws on grounds including unconstitutional regulation of interstate commerce and federal preemption.2 The order also directed the Secretary of Commerce to publish an evaluation by March 2026 identifying burdensome state AI laws that conflict with federal policy.2

However, and this is critical: the executive order itself cannot overturn existing state law — that can only be done by an act of Congress or the courts. Until legal challenges are resolved, state laws remain enforceable.3 The Senate voted to remove a House provision that would have barred states from enforcing AI-specific regulations for ten years3, which shows the limits of the preemption push.

States have been filling the vacuum aggressively. Thirty-eight states passed AI-related legislation in 20254, covering topics from deepfakes in elections to AI in healthcare. The most significant state laws include:

Colorado’s AI Act, which requires deployers of high-risk AI systems to use reasonable care to avoid algorithmic discrimination, mandates impact assessments, transparency disclosures, and documentation of AI decision-making.1 It’s currently set to take effect mid-2026 but is expected to face both legislative amendments and federal legal challenges.

California has multiple laws taking effect, including new regulations under the California Consumer Privacy Act requiring businesses using automated decision-making technology for significant consumer decisions to provide pre-use notice, opt-out rights, and access to information about their AI use, with compliance required by January 2027.5

Texas passed its own Responsible AI Governance Act, and states like Illinois, New York City, Maryland, and New Jersey have enacted targeted regulations restricting AI use in hiring decisions.1

The result is a standoff. The federal government wants a light-touch, unified national standard. States are pressing ahead with enforceable rules covering discrimination, transparency, healthcare, and elections. Legal analysts note that for the time being, the executive order likely adds more — not less — complication to the current landscape6, because the boundaries of preemption are undefined and will likely be litigated for years.

The practical advice from virtually every law firm is the same: don’t wait for preemption challenges to resolve — identify the most stringent applicable state requirements and build compliance programs around those3, since building down is easier than scrambling to build up.

So the honest picture is that U.S. AI policy is in a transitional period defined more by tension between levels of government than by any coherent national strategy. Whether that resolves through federal legislation, court decisions on preemption, or continued state-led patchwork remains genuinely uncertain.

Would you like me to dig into any particular aspect — say, how this compares to the EU approach, what the Colorado AI Act specifically requires, or how the healthcare AI regulatory picture is shaping up?

Dirk: Give a summary of the EU AI Act.

The EU AI Act is the most comprehensive AI regulation anywhere in the world, and it stands in sharp contrast to the U.S. approach. Here’s how it works:

The core framework is risk-based. The Act assigns AI applications to risk categories. Applications that create unacceptable risk are banned outright. High-risk applications are subject to specific legal requirements. And applications not explicitly banned or listed as high-risk are largely left unregulated.7 This tiered structure is the backbone of the entire law — the higher the perceived risk, the stricter the obligations.

Banned outright (unacceptable risk): Prohibited practices include deploying subliminal, manipulative, or deceptive techniques to distort behavior and impair informed decision-making, causing significant harm.7 Government-run social scoring systems are the most commonly cited example. Real-time biometric identification in public spaces is also heavily restricted, with narrow exceptions for law enforcement that require judicial authorization and fundamental rights impact assessments.

High-risk systems get the heaviest regulation. These include AI used in critical areas like hiring and employment, credit scoring, healthcare, law enforcement, education, and critical infrastructure. Providers of high-risk systems face substantial obligations: risk management programs, data governance requirements, technical documentation, transparency measures, human oversight mechanisms, and conformity assessments before deployment. Developers of high-risk AI systems must conduct a self-conformity assessment, and high-risk AI systems must be registered in an EU database.8

General-purpose AI models (like large language models) have their own rules. All GPAI model providers must provide technical documentation, instructions for use, comply with the Copyright Directive, and publish a summary about the content used for training.7 Models deemed to present systemic risk face additional requirements including model evaluations, adversarial testing, incident tracking and reporting, and cybersecurity protections. Open-source models get lighter treatment unless they present systemic risk.

Transparency obligations apply broadly. Providers must ensure that AI systems intended to directly interact with individuals are designed so that those individuals are informed they are engaging with an AI system.9 Synthetic content like deepfakes must be labeled, and emotion recognition or biometric categorization systems require user notification.

The penalties are significant. Fines can reach up to 35 million euros or 7% of global annual turnover for violations of prohibited practices, up to 15 million euros or 3% for other requirements, and up to 7.5 million euros or 1% for providing incorrect or misleading information.8 These apply to both EU and non-EU companies offering AI systems in the EU.

The timeline is phased. The Act entered into force on August 1, 2024, with prohibited practices and AI literacy obligations applying from February 2025, governance rules and GPAI model obligations from August 2025, and the rules for high-risk systems becoming generally applicable by August 202610, with some categories extended to August 2027. The Commission’s “Digital Omnibus” proposal from November 2025 may push some high-risk deadlines further, linking them to the availability of harmonized technical standards rather than fixed dates.

Extraterritorial reach is a key feature, similar to GDPR. The Act applies to users located in the EU, and to third-country users where the AI system’s output is used in the EU.7 This means U.S. and other non-EU companies serving EU markets must comply.

What’s still evolving: Each member state must establish at least one AI regulatory sandbox by August 20267, creating controlled environments for testing new AI applications. The Commission is also finalizing codes of practice for GPAI transparency, and individual member states are passing their own implementing legislation (Italy was the first).

The big-picture comparison with the U.S. is stark: the EU has chosen a binding, comprehensive, centralized regulatory framework that prioritizes fundamental rights and safety, while the U.S. has no federal equivalent and is actively debating whether states should even be allowed to regulate. Whether the EU approach becomes a global standard (the “Brussels Effect,” as happened with GDPR) or proves too burdensome for innovation is one of the defining policy questions in AI right now.

Want me to compare how specific provisions map onto the guardrails we discussed earlier, or dive into how any particular sector (healthcare, employment, finance) is affected?

Download PDF

Kleptocracy

Kleptocracy is a term used to describe a system of government or state in which leaders or officials exploit their power to steal resources from the country they govern, often for personal gain. The word comes from the Greek words klepto (meaning theft) and kratos (meaning power or rule).

Some characteristic of kleptocracies:

Corruption

The rulers and officials systematically engage in stealing state resources. Corruption is usually so rampant that the ruling class enriches itself at the expense of the population, often leading to extreme inequality, poverty, and a lack of basic services for citizens. In 2024, the United States scored a 65 out of 100 in the Corruption Perceptions Index published by Transparency International.  The world map, shown above, illustrates the corruption across the globe.

Exploitation of Public Office

Government positions are often used for personal enrichment rather than serving the public. Kleptocracy enriches not only high government officials, but a narrow class of plutocrats, who usually represent wealthy individuals and families who have amassed great assets through the usage of political favoritism, special interest legislation, monopolies, special tax breaks, state intervention, subsidies or outright graft.

Lack of Accountability

Often, kleptocratic leaders are above the law, using their power to silence opposition and evade prosecution. A kleptocratic financial system flourishes in the United States by illegally abusing the United States’ liberal economic structure for two reasons:

  1. The United States does not have a beneficial ownership registry, and kleptocrats take advantage of this privacy benefit.
  2. kleptocrats take advantage of incorporation agents, lawyers, and realtors to unknowingly launder their money.

In 2025, the United States scored a 3.9 out of 10 on the Opacity in Real Estate Ownership (OREO) Index.

Weak Institutions

The legal and political institutions necessary to prevent corruption (like an independent judiciary or free press) are often undermined. As the judiciary becomes ineffective, the rule of law diminishes. As the free press is muzzled, disinformation covering the theft of assets becomes easier.

Examples of Kleptocracy

Russia

Under Vladimir Putin, Russia has been widely considered a kleptocracy, where state resources are siphoned off by the ruling elite, including oligarchs close to the president. There has been widespread corruption in state-owned enterprises, and political dissent is often suppressed.

Venezuela

The government under Hugo Chávez and his successor Nicolás Maduro has been accused of corruption, including embezzlement and the diversion of state resources, especially from the country’s oil revenues. This has contributed to the country’s ongoing economic and political crisis.

Nigeria

Nigeria, particularly under the leadership of military dictatorships like that of Sani Abacha, has experienced significant corruption. Abacha’s regime is notorious for looting billions of dollars from the nation’s treasury. Many Nigerian officials are accused of embezzling public funds, leading to severe inequality and underdevelopment despite the country’s oil wealth.

Equatorial Guinea

President Teodoro Obiang Nguema has been in power since 1979 and has presided over a regime where the country’s vast oil wealth is largely controlled by his family and close associates. Despite the nation’s oil riches, most of the population lives in poverty.

Zimbabwe

Under Robert Mugabe’s rule, Zimbabwe became a classic example of kleptocracy. Mugabe and his associates are alleged to have diverted large sums of state funds and resources to themselves, while the country descended into hyperinflation, economic collapse, and widespread poverty.

United States of America

Some have accused the U.S.A. of being a supporter of kleptocracy by providing a place to hide and launder plundered assets.

Others have gone further, evidenced by firing inspectors generals —an independent check on mismanagement and abuse of power within government agencies— and the lack of integrity rules in Congress and the Supreme Court of the U.S. allowing these officials to profit from their positions.

The current administration has removed or diminished many checks on kleptocracy in the United States.

Now the president seems to have no limits on his money-making schemes that profit him, his family, and political supporters from his position.

Election information?

Disinformation is false information deliberately spread to deceive people. In contrast, misinformation refers to inaccuracies that stem from inadvertent error.  Disinformation is an managed activity in which people introduce strategic deceptions and media manipulation tactics to facilitate political, military, or commercial goals.

UNESCO and Ipsos did a survey  (August-September 2023), in 16 countries, asking among other things,  “Are they concerned that disinformation will impact the next campaign?” The effort found the following key points.

  • 56% of internet users in the 16 countries surveyed frequently use social media as their primary source of news, surpassing television at 44%.
  • Trust in traditional media remains high, with 66% of those surveyed trusting television news, 63% trusting radio news, and 61% trusting print media news.
  • Internet users experience a high prevalence of disinformation on social media, with 68% indicating that disinformation is most widespread there.
  • 87% of respondents expressed concern about the impact of disinformation on upcoming elections in their country, with 47% being “very concerned”.
  • 67% of internet users have encountered hate speech online, with a majority believing it’s most prevalent on Facebook (58%).
  • There’s a strong call for regulation, with 88% believing that both governments and regulatory bodies, and social media platforms (90%) should address disinformation and hate speech issues.
  • A majority of internet users (89%) support the idea that governments and regulators should enforce trust and safety measures on social media platforms during election campaigns.
  • Only 48% of surveyed citizens have reported online content related to disinformation in the context of an election campaign.

In the United States, there is no oversight of the social media platforms by the government and social media companies have actually rolled back any self-policing of content for disinformation, because it works against their business model.  Their model is engagement by entertaining, not informing you.  Disinformation is actually amplified by the social media platforms.

The following image illustrates a two-phase framework conceptualizing the dissemination of disinformation on social media. The first phase is “seeding,” in which actors strategically insert misleading deceptions by masquerading or obfuscating statements as legitimate…. The second phase, “echoing,” represents how disinformation circulates in echo chambers through contradictions against opponents in cultural wars and other identity-driven controversies. Participants use falsehoods, selective truths, beliefs, value judgments, and all available controversies to rhetorically enact and exploit their identity in opposition to their perceived opponents.1Diaz Ruiz, C., & Nilsson, T. (2023). Disinformation and Echo Chambers: How Disinformation Circulates on Social Media Through Identity-Driven Controversies. Journal of Public Policy & Marketing, 42(1), 18-35. https://doi.org/10.1177/07439156221103852

Disinformation and echo chamber graphic Citizens in a democracy need a source of information the is not driven by the profit motive of social media platforms and is dedicated to presenting factual information for people to make good decisions.  Donating to this organization will help fund research and implementation of a platform to inform and amplify your voice.  Donate today, before our democracy  dies!

2023 Denver Democracy Summit

The Josef Korbel School of International Studies is pleased to announce that the 3rd Annual Denver Democracy Summit will be held October 25-26, 2023 on the University of Denver campus with proceedings streamed to a worldwide virtual audience. The Denver Democracy Summit will serve as a platform for the DU community and the world’s leading thought leaders on democracy to evaluate ongoing efforts to strengthen democratic norms, values, and institutions. The Summit will include discussions on democratic backsliding, political communication and misinformation, polarization and civil discourse, technology and democracy, and other topics.

More information at Denver Democracy Summit website.

Update 11/2/2023

If you did not have a chance to attend the Denver Democracy Summit you can watch the presentations on their YouTube channel playlist.

Artificial Intelligence Regulation

Artificial Intelligence Regulation

Artificial intelligence (AI) is everywhere these days.  AI is used in applications such as self-driving cars (e.g., Waymo, Nuro, and Zoox), some web search engines (e.g., Google Search and Bing), recommendation systems (used by YouTube, Amazon, and Netflix), understanding human speech (such as Siri and Alexa), generative or creative tools (ChatGPT,  Midjourney, and DALL-E),  playing games (such as chess and Go) and even writing U.S. legislation!

AI has added tremendous functionality to applications in most industries, but the regulation is lagging behind the advances.  Governments are slow to create legislation for fear of stifling the industry or out of ignorance of the dangers.

I decided to ask ChatGPT what it recommends for AI regulation by asking “How should artificial intelligence services, like ChatGPT, be regulated?” Here is it’s response:

The regulation of artificial intelligence services, including advanced models like ChatGPT, is a complex and evolving challenge that requires a balanced approach to ensure both innovation and safety. Here are some key considerations and principles that can guide the regulation of AI services:

1. Transparency and Accountability:
– Require AI developers and service providers to be transparent about the capabilities and limitations of their systems. This includes disclosing when users are interacting with AI rather than humans.
– Establish mechanisms for accountability, making it clear who is responsible for the actions and decisions of AI systems.

2. Data Privacy and Security:
– Enforce strict data privacy regulations to protect users’ personal information and prevent misuse of data.
– Implement security measures to safeguard AI systems from cyberattacks and data breaches.

3. Ethical Guidelines:
– Develop and promote ethical guidelines for AI development and usage, including avoiding biases and discriminatory behavior.
– Encourage AI developers to design systems that prioritize user well-being and safety.

4. Fairness and Bias Mitigation:
– Require AI developers to address and mitigate biases in their systems, ensuring that AI services are fair and do not discriminate against specific groups.
– Implement auditing and testing procedures to detect and correct bias in AI algorithms.

5. User Consent and Control:
– Ensure that users are informed about how AI services collect and use their data.
– Give users control over the data they share and how AI services use that data.

6. Certification and Testing:
– Establish certification processes for AI systems, especially those used in critical applications like healthcare, autonomous vehicles, and finance.
– Mandate rigorous testing and validation of AI systems to ensure their reliability and safety.

7. Liability Framework:
– Define liability frameworks that specify who is responsible if AI systems cause harm or make erroneous decisions.
– Consider establishing insurance requirements for AI developers to cover potential liabilities.

8. Continuous Monitoring and Reporting:
– Implement ongoing monitoring and reporting requirements for AI systems in use, especially those in critical domains.
– Create mechanisms for reporting and addressing AI-related incidents and failures.

9. International Cooperation:
– Promote international collaboration and standardization efforts to harmonize AI regulations across borders, as AI is a global technology.

10. Flexibility and Adaptability:
– Recognize that AI technology evolves rapidly, and regulations should be flexible and adaptable to accommodate innovation while maintaining safety.

11. Public Input:
– Solicit input from experts, stakeholders, and the public when formulating AI regulations to ensure they reflect diverse perspectives and needs.

12. Education and Awareness:
– Invest in education and awareness programs to help the public understand AI and its implications, fostering responsible use and informed decision-making.

Regulating AI services is a multifaceted challenge that requires a careful balance between encouraging innovation and ensuring public safety. Regulations should be informed by ongoing research, collaboration among stakeholders, and a commitment to ethical AI development and deployment. Furthermore, it’s essential to strike a balance between regulatory oversight and not stifling the positive potential of AI technology.

Not a bad response for an AI system!  ChatGPT cannot vote, but it probably has a louder voice in the U.S. Congress.  Maybe us humans will just have to get busy, before AI replaces Congress.  Hmm, an idea worth considering.