All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 3 results for this tag.
Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock
This paper argues that perceived AI alignment failures are not due to emergent malign agency but rather reflect AI models statistically internalizing the full spectrum of human social interactions, including coercive ones. It redefines AGI risk as an amplification of existing human contradictions, necessitating structural governance rather than attempting to instill a single, universal morality.
The Social Responsibility Stack: A Control-Theoretic Architecture for Governing Socio-Technical AI
This paper introduces the Social Responsibility Stack (SRS), a six-layer architectural framework designed to embed societal values into AI systems through explicit constraints, safeguards, and governance processes. It reframes responsible AI as a closed-loop supervisory control problem, aiming to bridge the gap between ethical principles and actionable engineering mechanisms for socio-technical AI.
Training-Free Policy Violation Detection via Activation-Space Whitening in LLMs
This paper introduces a training-free method for detecting policy violations in Large Language Models by treating it as an out-of-distribution problem in the activation space. The approach utilizes activation-space whitening and the Euclidean norm as a compliance score, outperforming existing guardrails and fine-tuned models while offering high interpretability and efficiency.