Balancing AI Progress with Safety Guardrails

An Interview with Claude AI, March 4, 2026

Dirk: What specific guardrails can be put on AI use and development that will not hinder progress but will prevent harm to society?

This is a great question that sits at the intersection of policy, technology, and ethics. There’s a growing body of thought on this, and the challenge is real — guardrails that are too loose invite misuse, while ones that are too tight can stifle beneficial innovation. Here’s a framework for thinking about it:

Transparency and disclosure requirements are among the least controversial guardrails. Requiring that AI-generated content be labeled (especially in political ads, news, and customer service), that companies disclose training data sources, and that organizations publish model cards describing capabilities and limitations doesn’t slow down development — it just ensures people know what they’re interacting with. The EU AI Act moves in this direction, and most researchers support it.

Tiered risk-based regulation is another approach that tries to balance progress with safety. Rather than treating all AI the same, you classify systems by risk level. A recommendation algorithm for a music app doesn’t need the same oversight as a system making parole decisions or diagnosing cancer. High-stakes domains like healthcare, criminal justice, finance, and critical infrastructure get stricter requirements (auditing, testing, human oversight), while lower-risk applications face lighter rules. This lets most innovation proceed freely while focusing scrutiny where harm is most likely.

Pre-deployment testing and red-teaming for frontier models is increasingly seen as a practical guardrail. Companies like Anthropic, OpenAI, and Google DeepMind already do this voluntarily. Making it a standard practice — especially for models above a certain capability threshold — adds a safety check without preventing the research itself. The key is that the testing happens before public release, not as a barrier to research.

Accountability and liability frameworks can shape behavior without prescribing specific technical approaches. If developers and deployers are legally responsible when their systems cause foreseeable harm (through negligence, inadequate testing, or ignoring known risks), they’re incentivized to build responsibly without being told exactly how. This is analogous to how product liability works in other industries.

A few guardrails that are more debated include mandatory licensing for AI developers (risks creating gatekeeping that favors incumbents), compute thresholds that trigger regulation (hard to calibrate and may become obsolete quickly), and broad restrictions on open-source model releases (could hamper the research ecosystem that drives progress for everyone).

Where it gets genuinely hard is in dual-use capabilities — the same model that helps a biologist design proteins could potentially help a bad actor design pathogens. There’s no clean line, and most serious thinkers acknowledge that some tension between openness and safety is inherent and requires ongoing, adaptive governance rather than a one-time rule.

The approaches that seem most promising share a few traits: they focus on outcomes and harms rather than prescribing specific technologies, they scale with actual risk, they preserve the ability to do open research and iterate, and they include mechanisms for updating rules as the technology evolves. Rigid, technology-specific regulation tends to age poorly, while principle-based frameworks tend to be more durable.

Would you like me to dig deeper into any particular dimension of this — say, the policy landscape in a specific region, the technical mechanisms for enforcement, or how this applies to a particular domain?

Dirk: Expand on the technical mechanisms for enforcement.

This is where theory meets practice, and it’s one of the harder parts of AI governance — because you need mechanisms that are actually verifiable, not just promises on paper.

Auditing and evaluation infrastructure is probably the most mature technical enforcement mechanism. Independent third-party audits can test models for bias, safety vulnerabilities, and compliance with stated policies before and after deployment. This looks like structured benchmark suites (testing for toxicity, fairness across demographics, susceptibility to jailbreaks), adversarial red-teaming by external teams, and ongoing monitoring of deployed systems. The technical challenge is that evaluations are only as good as the benchmarks, and sophisticated harms can slip through standardized tests. There’s active research into making evals more robust, but it remains an arms race.

Watermarking and provenance tracking is a promising area for enforcing transparency requirements. The idea is to embed detectable signals into AI-generated content — text, images, audio, video — so that it can be identified downstream. Some approaches work at the model level (statistical patterns in token sampling), while others work at the infrastructure level (cryptographic metadata attached to outputs). The technical limitation is robustness: many watermarking schemes can be defeated by paraphrasing, cropping, or re-encoding. C2PA (Coalition for Content Provenance and Authenticity) is an industry effort to build provenance standards into the content pipeline itself, which is more durable than watermarking alone but requires broad adoption to be effective.

Compute governance and hardware-level controls are a more upstream approach. Since training frontier models requires massive compute clusters, monitoring and regulating access to that compute provides a natural choke point. This could involve know-your-customer rules for cloud GPU providers, reporting thresholds when training runs exceed a certain scale (measured in FLOPs), or even hardware-level mechanisms. Some researchers have proposed secure hardware modules that could cryptographically verify what workloads are being run — essentially a “black box” for training runs. This is technically feasible but raises significant questions about surveillance and who controls the verification infrastructure.

Structured access and API-level controls are already widely practiced. Rather than releasing model weights openly, developers can serve models through APIs that enforce usage policies — rate limiting, content filtering, logging, and terms-of-service enforcement. This gives developers a technical surface for preventing misuse (blocking certain query types, flagging anomalous usage patterns) without restricting legitimate users. The tradeoff is that this only works for closed models; once weights are released openly, API-level controls are irrelevant. This fuels the ongoing open-source versus closed-source debate.

Interpretability and explainability tools represent a longer-term enforcement mechanism. If regulators require that high-stakes AI decisions be explainable, then there needs to be technical machinery to actually produce those explanations. Current approaches range from attention visualization and feature attribution (relatively shallow) to mechanistic interpretability research that tries to reverse-engineer what’s happening inside neural networks at a circuit level. Anthropic invests heavily in this area. The honest assessment is that interpretability for large models is still in its early stages — we can explain simple behaviors but not yet reliably audit a frontier model’s reasoning in the way you’d audit a financial ledger. Progress here would be transformative for enforcement.

Differential privacy and data governance tooling enforce rules about training data. Techniques like differential privacy allow models to be trained on sensitive data while providing mathematical guarantees that individual data points can’t be extracted. Federated learning allows training across distributed data without centralizing it. These are technically sound but come with performance costs — models trained with strict differential privacy tend to be less capable — so there’s an active tradeoff between privacy guarantees and model quality.

Monitoring and incident response systems are the post-deployment layer. This includes automated systems that flag unusual usage patterns (a sudden spike in requests for dangerous chemistry, for instance), human review pipelines, and structured incident reporting frameworks similar to what exists in cybersecurity (CVE-like databases for AI harms). The technical challenge is scale: billions of interactions happen daily, so monitoring has to be largely automated, which means it inherits all the limitations of the classifiers doing the monitoring.

The honest summary is that no single mechanism is sufficient. The most credible enforcement strategies layer multiple approaches — compute-level visibility, pre-deployment testing, API-level controls, watermarking, post-deployment monitoring — so that weaknesses in one layer are compensated by others. And all of them require sustained investment, because the technology they’re governing is itself a moving target.

Want me to go deeper on any of these — perhaps the compute governance angle, or how interpretability research could change the landscape?

Dirk: What is the policy landscape in the United States of America?

The U.S. AI policy landscape right now is genuinely messy — probably the most dynamic and contested it’s been. Here’s where things stand:

No comprehensive federal AI law exists. The U.S. has no equivalent to the EU AI Act. Instead, it relies on a fragmented patchwork of state laws, agency guidance, and voluntary standards.1 Congress has debated various proposals but hasn’t passed anything binding on the private sector.

The Trump administration has taken a deregulatory, innovation-first stance. In January 2025, Executive Order 14179 revoked portions of the Biden-era AI executive order that had emphasized safety testing and reporting requirements.2 The framing shifted explicitly toward maintaining U.S. competitiveness and reducing barriers to AI development.

The big flashpoint is federal preemption of state laws. In December 2025, President Trump signed an executive order proposing to establish a uniform federal AI policy framework that would preempt state AI laws deemed inconsistent with that policy. The order specifically criticized Colorado’s AI Act and created several enforcement mechanisms: the Attorney General was directed to establish an AI Litigation Task Force to challenge state AI laws on grounds including unconstitutional regulation of interstate commerce and federal preemption.2 The order also directed the Secretary of Commerce to publish an evaluation by March 2026 identifying burdensome state AI laws that conflict with federal policy.2

However, and this is critical: the executive order itself cannot overturn existing state law — that can only be done by an act of Congress or the courts. Until legal challenges are resolved, state laws remain enforceable.3 The Senate voted to remove a House provision that would have barred states from enforcing AI-specific regulations for ten years3, which shows the limits of the preemption push.

States have been filling the vacuum aggressively. Thirty-eight states passed AI-related legislation in 20254, covering topics from deepfakes in elections to AI in healthcare. The most significant state laws include:

Colorado’s AI Act, which requires deployers of high-risk AI systems to use reasonable care to avoid algorithmic discrimination, mandates impact assessments, transparency disclosures, and documentation of AI decision-making.1 It’s currently set to take effect mid-2026 but is expected to face both legislative amendments and federal legal challenges.

California has multiple laws taking effect, including new regulations under the California Consumer Privacy Act requiring businesses using automated decision-making technology for significant consumer decisions to provide pre-use notice, opt-out rights, and access to information about their AI use, with compliance required by January 2027.5

Texas passed its own Responsible AI Governance Act, and states like Illinois, New York City, Maryland, and New Jersey have enacted targeted regulations restricting AI use in hiring decisions.1

The result is a standoff. The federal government wants a light-touch, unified national standard. States are pressing ahead with enforceable rules covering discrimination, transparency, healthcare, and elections. Legal analysts note that for the time being, the executive order likely adds more — not less — complication to the current landscape6, because the boundaries of preemption are undefined and will likely be litigated for years.

The practical advice from virtually every law firm is the same: don’t wait for preemption challenges to resolve — identify the most stringent applicable state requirements and build compliance programs around those3, since building down is easier than scrambling to build up.

So the honest picture is that U.S. AI policy is in a transitional period defined more by tension between levels of government than by any coherent national strategy. Whether that resolves through federal legislation, court decisions on preemption, or continued state-led patchwork remains genuinely uncertain.

Would you like me to dig into any particular aspect — say, how this compares to the EU approach, what the Colorado AI Act specifically requires, or how the healthcare AI regulatory picture is shaping up?

Dirk: Give a summary of the EU AI Act.

The EU AI Act is the most comprehensive AI regulation anywhere in the world, and it stands in sharp contrast to the U.S. approach. Here’s how it works:

The core framework is risk-based. The Act assigns AI applications to risk categories. Applications that create unacceptable risk are banned outright. High-risk applications are subject to specific legal requirements. And applications not explicitly banned or listed as high-risk are largely left unregulated.7 This tiered structure is the backbone of the entire law — the higher the perceived risk, the stricter the obligations.

Banned outright (unacceptable risk): Prohibited practices include deploying subliminal, manipulative, or deceptive techniques to distort behavior and impair informed decision-making, causing significant harm.7 Government-run social scoring systems are the most commonly cited example. Real-time biometric identification in public spaces is also heavily restricted, with narrow exceptions for law enforcement that require judicial authorization and fundamental rights impact assessments.

High-risk systems get the heaviest regulation. These include AI used in critical areas like hiring and employment, credit scoring, healthcare, law enforcement, education, and critical infrastructure. Providers of high-risk systems face substantial obligations: risk management programs, data governance requirements, technical documentation, transparency measures, human oversight mechanisms, and conformity assessments before deployment. Developers of high-risk AI systems must conduct a self-conformity assessment, and high-risk AI systems must be registered in an EU database.8

General-purpose AI models (like large language models) have their own rules. All GPAI model providers must provide technical documentation, instructions for use, comply with the Copyright Directive, and publish a summary about the content used for training.7 Models deemed to present systemic risk face additional requirements including model evaluations, adversarial testing, incident tracking and reporting, and cybersecurity protections. Open-source models get lighter treatment unless they present systemic risk.

Transparency obligations apply broadly. Providers must ensure that AI systems intended to directly interact with individuals are designed so that those individuals are informed they are engaging with an AI system.9 Synthetic content like deepfakes must be labeled, and emotion recognition or biometric categorization systems require user notification.

The penalties are significant. Fines can reach up to 35 million euros or 7% of global annual turnover for violations of prohibited practices, up to 15 million euros or 3% for other requirements, and up to 7.5 million euros or 1% for providing incorrect or misleading information.8 These apply to both EU and non-EU companies offering AI systems in the EU.

The timeline is phased. The Act entered into force on August 1, 2024, with prohibited practices and AI literacy obligations applying from February 2025, governance rules and GPAI model obligations from August 2025, and the rules for high-risk systems becoming generally applicable by August 202610, with some categories extended to August 2027. The Commission’s “Digital Omnibus” proposal from November 2025 may push some high-risk deadlines further, linking them to the availability of harmonized technical standards rather than fixed dates.

Extraterritorial reach is a key feature, similar to GDPR. The Act applies to users located in the EU, and to third-country users where the AI system’s output is used in the EU.7 This means U.S. and other non-EU companies serving EU markets must comply.

What’s still evolving: Each member state must establish at least one AI regulatory sandbox by August 20267, creating controlled environments for testing new AI applications. The Commission is also finalizing codes of practice for GPAI transparency, and individual member states are passing their own implementing legislation (Italy was the first).

The big-picture comparison with the U.S. is stark: the EU has chosen a binding, comprehensive, centralized regulatory framework that prioritizes fundamental rights and safety, while the U.S. has no federal equivalent and is actively debating whether states should even be allowed to regulate. Whether the EU approach becomes a global standard (the “Brussels Effect,” as happened with GDPR) or proves too burdensome for innovation is one of the defining policy questions in AI right now.

Want me to compare how specific provisions map onto the guardrails we discussed earlier, or dive into how any particular sector (healthcare, employment, finance) is affected?

Download PDF

We need a true public space online!

The Internet holds a lot of promise for connecting people around the world.  The problem is that it has been taken over by corporations for their benefit, not necessarily ours.  What is needed is a social platform that is owned by the people that use it, without algorithms pushing content that is negative and often untrue.

A report1https://www.pewresearch.org/internet/2021/11/22/the-future-of-digital-spaces-and-their-role-in-democracy/, released in 2021 by the Pew Research Center, stated, of the 862 expert respondents:

61% said they either hope or expect that by 2035 digital spaces and people’s uses of them WILL change in ways that significantly serve the public good. However, because some wrote that this is merely their hope and others listed one or more extremely difficult hurdles to overcome before that outcome can be achieved.

Some of the hurdles are:

  • Social media algorithms are the first thing to fix: Many of these experts said the key underlying problem is that social media platforms are designed for profit maximization and – in order to accelerate user engagement – these algorithms favor extreme and hateful speech.
  • General public’s digital literacy needs improvement by informing and educating the public.  People who better understand the impact of the emerging negatives in the digital sphere will become more involved and work to influence and motivate business and government leaders to upgrade public spaces.
  • Humans are self-centered and shortsighted, making them easy to manipulate: People’s attention and engagement in public online spaces are drawn by stimulating their emotions, playing to their survival instincts and stoking their fears.  Any new spaces that might be innovated and introduced can still be flooded with innuendo, accusation, fraud, lies and toxic divisiveness.

With these things in mind, I think there are many solutions already developed that can be used to create a new “Public Square”.  There will probably not be one solution, but several applications or platforms for the public to interact to exchange information, discuss solutions, build consensus, and support implementation of the best solutions.  My opinion is these platforms must be:

  1. Owned by the public: meaning non-profit organizations that implement, maintain, and support the platforms.  Organizations must be answerable to the public.  This eliminates the profit motive driving current social platforms to negative results.  Financial support must come from a broad base to reduce concentrations of power and influence.
  2. Free and open-source: meaning that software would be developed in a open and transparent way, to improve trust, and available to any group of people to use for their benefit at low cost.  This will also reduce lock-in that prevents people from changing platforms easily.
  3. Algorithms, used to select or promote content, are documented and ideally selectable.

To these ends, I am implementing a discussion forum to help accomplish these and other goals to improve democracy and society.

Improving Digital Literacy

Digital literacy refers to the ability to find, evaluate, create, and communicate information through digital technologies. Digital literacy is increasingly crucial for protecting democracy in our interconnected world. In the context of protecting democracy, it’s particularly focused on:

  1. Information evaluation:
  2. Spotting misinformation and disinformation:
  3. Understanding data privacy and security:
  4. Media creation and sharing:
  5. Critical thinking in the digital space:
    • Questioning the motivations behind online content.
    • Recognizing emotional manipulation in digital media.
    • Understanding filter bubbles and echo chambers.

To improve digital literacy:

  1. Education: Support digital literacy programs in schools and communities.
  2. Fact-checking: Familiarize yourself with fact-checking websites and teach others how to use them.
  3. Media diversity: Encourage consumption of diverse media sources to get a balanced perspective.
  4. Tech awareness: Stay informed about emerging technologies and their potential impacts on democracy.
  5. Online etiquette: Promote responsible online behavior and civil discourse.
  6. Platform knowledge: Understand how different social media platforms work, including their algorithms and content moderation policies.
  7. Data rights: Learn about data protection laws and advocate for strong privacy regulations.
  8. Cybersecurity practices: Adopt and promote good cybersecurity habits, like using a password manager, strong passwords and two-factor authentication.

By improving digital literacy, individuals can better navigate the complex online information landscape, make informed decisions, and contribute to a healthier democratic discourse.

2023 Denver Democracy Summit

The Josef Korbel School of International Studies is pleased to announce that the 3rd Annual Denver Democracy Summit will be held October 25-26, 2023 on the University of Denver campus with proceedings streamed to a worldwide virtual audience. The Denver Democracy Summit will serve as a platform for the DU community and the world’s leading thought leaders on democracy to evaluate ongoing efforts to strengthen democratic norms, values, and institutions. The Summit will include discussions on democratic backsliding, political communication and misinformation, polarization and civil discourse, technology and democracy, and other topics.

More information at Denver Democracy Summit website.

Update 11/2/2023

If you did not have a chance to attend the Denver Democracy Summit you can watch the presentations on their YouTube channel playlist.

What is Liquid Democracy?

Liquid democracy is a hybrid form of democracy that combines elements of direct and representative democracy. It allows citizens to have a more dynamic and flexible role in the decision-making process. In a traditional representative democracy, people vote for elected officials who make decisions on their behalf. In a direct democracy, individuals vote directly on specific issues or policies. Liquid democracy seeks to bridge the gap between these two approaches.

In a liquid democracy, citizens have the option to either vote on issues directly or delegate their votes to someone they trust. Delegating votes means that an individual can choose another person, often referred to as a proxy or delegate, to vote on their behalf. The key feature of liquid democracy is that delegation can be temporary and can be reassigned at any time. This means that individuals can delegate their votes on specific issues to different people based on their expertise, trust, or personal preferences. For example, if you trust someone’s judgment on environmental issues, you can delegate your vote to them for those matters, while delegating your vote to someone else for economic policy decisions.

The goal of liquid democracy is to increase citizen participation and engagement in the decision-making process, while also benefiting from the expertise and knowledge of individuals who are well-versed in certain areas. It offers more flexibility and adaptability compared to traditional democratic models. Digital technology and online platforms have made it easier to implement liquid democracy by allowing for secure and efficient voting and delegation.

Liquid democracy is still a relatively new concept and has been experimented with in various contexts, such as political parties (German Pirate Party, Partido de la Red), organizations, and even some governments (Argentina). Its success and practicality can vary depending on the specific implementation and the cultural and political context in which it is used. Some current open-source software implementations are LiquidFeedback and DemocracyOS.

Artificial Intelligence Regulation

Artificial Intelligence Regulation

Artificial intelligence (AI) is everywhere these days.  AI is used in applications such as self-driving cars (e.g., Waymo, Nuro, and Zoox), some web search engines (e.g., Google Search and Bing), recommendation systems (used by YouTube, Amazon, and Netflix), understanding human speech (such as Siri and Alexa), generative or creative tools (ChatGPT,  Midjourney, and DALL-E),  playing games (such as chess and Go) and even writing U.S. legislation!

AI has added tremendous functionality to applications in most industries, but the regulation is lagging behind the advances.  Governments are slow to create legislation for fear of stifling the industry or out of ignorance of the dangers.

I decided to ask ChatGPT what it recommends for AI regulation by asking “How should artificial intelligence services, like ChatGPT, be regulated?” Here is it’s response:

The regulation of artificial intelligence services, including advanced models like ChatGPT, is a complex and evolving challenge that requires a balanced approach to ensure both innovation and safety. Here are some key considerations and principles that can guide the regulation of AI services:

1. Transparency and Accountability:
– Require AI developers and service providers to be transparent about the capabilities and limitations of their systems. This includes disclosing when users are interacting with AI rather than humans.
– Establish mechanisms for accountability, making it clear who is responsible for the actions and decisions of AI systems.

2. Data Privacy and Security:
– Enforce strict data privacy regulations to protect users’ personal information and prevent misuse of data.
– Implement security measures to safeguard AI systems from cyberattacks and data breaches.

3. Ethical Guidelines:
– Develop and promote ethical guidelines for AI development and usage, including avoiding biases and discriminatory behavior.
– Encourage AI developers to design systems that prioritize user well-being and safety.

4. Fairness and Bias Mitigation:
– Require AI developers to address and mitigate biases in their systems, ensuring that AI services are fair and do not discriminate against specific groups.
– Implement auditing and testing procedures to detect and correct bias in AI algorithms.

5. User Consent and Control:
– Ensure that users are informed about how AI services collect and use their data.
– Give users control over the data they share and how AI services use that data.

6. Certification and Testing:
– Establish certification processes for AI systems, especially those used in critical applications like healthcare, autonomous vehicles, and finance.
– Mandate rigorous testing and validation of AI systems to ensure their reliability and safety.

7. Liability Framework:
– Define liability frameworks that specify who is responsible if AI systems cause harm or make erroneous decisions.
– Consider establishing insurance requirements for AI developers to cover potential liabilities.

8. Continuous Monitoring and Reporting:
– Implement ongoing monitoring and reporting requirements for AI systems in use, especially those in critical domains.
– Create mechanisms for reporting and addressing AI-related incidents and failures.

9. International Cooperation:
– Promote international collaboration and standardization efforts to harmonize AI regulations across borders, as AI is a global technology.

10. Flexibility and Adaptability:
– Recognize that AI technology evolves rapidly, and regulations should be flexible and adaptable to accommodate innovation while maintaining safety.

11. Public Input:
– Solicit input from experts, stakeholders, and the public when formulating AI regulations to ensure they reflect diverse perspectives and needs.

12. Education and Awareness:
– Invest in education and awareness programs to help the public understand AI and its implications, fostering responsible use and informed decision-making.

Regulating AI services is a multifaceted challenge that requires a careful balance between encouraging innovation and ensuring public safety. Regulations should be informed by ongoing research, collaboration among stakeholders, and a commitment to ethical AI development and deployment. Furthermore, it’s essential to strike a balance between regulatory oversight and not stifling the positive potential of AI technology.

Not a bad response for an AI system!  ChatGPT cannot vote, but it probably has a louder voice in the U.S. Congress.  Maybe us humans will just have to get busy, before AI replaces Congress.  Hmm, an idea worth considering.