AI Summary • Published on Mar 25, 2026
Modern artificial intelligence systems are increasingly used in critical applications like medical decision-making and autonomous platforms, demanding reliability beyond just accurate predictions. These systems must also quantify uncertainty, explain decisions, and protect sensitive information, leading to a growing reliance on probabilistic computation and stochastic sampling. This shift fundamentally alters the nature of computation, requiring continuous generation, transport, and consumption of stochastic information. Consequently, traditional hardware, particularly memory systems designed primarily for deterministic data, face new bottlenecks. As stochastic demand increases, existing systems struggle to keep pace with the high throughput required for random number generation and entropy delivery, creating an "entropy wall" that limits overall system performance.
The authors propose a unified data-access perspective where deterministic memory access is considered a limiting case of stochastic sampling. This abstraction allows both modes to be analyzed within a single framework. They introduce a probabilistic data ratio (α) representing the fraction of stochastic accesses, and derive a system-level performance model that accounts for compute throughput (π) and unified data-access throughput (β). This model reveals that as α increases, systems transition from being memory-bound to entropy-bound. Based on this, they define memory-level evaluation criteria: unified operation, distribution programmability, efficiency, robustness to hardware non-idealities, and parallel compatibility. They analyze limitations of conventional von Neumann architectures, which separate random number generation (RNG) from memory, leading to an "entropy wall". They then examine emerging probabilistic compute-in-memory (p-CIM) approaches that integrate sampling directly with memory access to overcome these limitations. Two main p-CIM approaches are discussed: coupled p-CIM, which integrates parameter storage and entropy generation in the same device for high efficiency but limited programmability; and decoupled p-CIM, which separates them for better programmability and statistical fidelity but with potential overheads in entropy delivery.
The unified memory perspective and performance model reveal a critical scaling mismatch: entropy generation throughput in conventional systems lags significantly behind compute and deterministic memory bandwidth. This disparity means that even a small probabilistic data ratio (e.g., α≈1%) can push systems into an entropy-limited regime, creating an "entropy wall" that effectively reduces overall data-access throughput and performance. For instance, Bayesian neural networks, with their high stochastic demand (α≈1), become strongly entropy-limited. Probabilistic compute-in-memory (p-CIM) architectures, by integrating entropy generation directly into memory, show promise in alleviating this bottleneck by co-scaling entropy generation with memory bandwidth. Coupled p-CIM designs offer high efficiency for entropy-dominated workloads, while decoupled p-CIMs provide improved distribution programmability and statistical fidelity, demonstrating a trade-off between efficiency and flexibility.
The findings underscore the necessity for a paradigm shift in hardware design for trustworthy AI, moving beyond treating randomness as an auxiliary function to considering stochastic sampling as a first-class data-access primitive. Future memory architectures must balance efficiency, programmability, statistical fidelity, and system-level usability. This requires cross-layer co-design across devices, circuits, architectures, and software. Emerging technologies and scaling trends in CMOS offer increased intrinsic stochasticity, which can be harnessed as "distributed entropy reservoirs." However, this raw noise needs "entropy shaping" at the circuit level to match required distributions, balancing flexibility with throughput. Workload-aware evaluation frameworks are also needed to connect device-level stochastic properties to algorithm-level performance. Ultimately, developing "entropy-native" memory systems, supported by new architectural interfaces and programming abstractions, is crucial for enabling scalable and trustworthy AI by transforming intrinsic variability into a usable computational resource.