Research Guy

Problem

The increasing computational demands of artificial intelligence (AI), especially for large neural networks, are pushing the limits of conventional electronic accelerators concerning energy efficiency, memory bandwidth, and latency. Photonic computing offers a promising alternative, but current photonic architectures face fundamental challenges in scalability and reliability, typically limited to small arrays. Key bottlenecks include high optical loss in traditional 2D planar photonic layouts due to numerous waveguide crossings and routing congestion. Existing methods for signal accumulation also present significant issues, such as the phase instability of coherent accumulation, the large footprint and crosstalk of mode-division multiplexing, the thermal sensitivity of microring resonator (MRR)-based wavelength-division multiplexing (WDM), and the loss of optical parallelism in purely photocurrent accumulation approaches. Furthermore, electrically programmed phase-change material (PCM) cells suffer from stochastic nucleation, thermal crosstalk, and complex electrical routing, hindering multi-level precision and scalability. Overcoming these limitations to scale photonic processors to hundreds of channels requires profound architectural innovations rather than incremental improvements.

Method

SKYLIGHT addresses the scalability and reliability challenges in photonic computing by co-designing its topology, wavelength routing, accumulation, and programming within a 3D stack. It introduces a 3D Si/SiN crossbar topology that distributes computation across vertically stacked silicon (Si) and silicon nitride (SiN) layers, eliminating cascaded crossings and enabling low-loss scaling to large arrays up to 144x256. For robust wavelength routing, SKYLIGHT replaces thermally sensitive MRR-based systems with a non-resonant WDM datapath built from compact, dispersion-engineered Mach–Zehnder modulators (MZMs) and Bragg-grating-assisted wavelength-selective couplers (WSCs), ensuring stable operation over wide temperature fluctuations. Hierarchical accumulation is implemented using multi-port photodetectors and photocurrent summation, preserving optical parallelism and achieving high signal-to-noise ratio (SNR) partial result accumulation. Non-volatile PCM weight banks are optically programmed by a heterogeneously integrated Vertical Cavity Surface Emitting Laser (VCSEL) array, which reduces programming energy, minimizes thermal crosstalk, and allows for precise weight updates. Additionally, III–V Semiconductor Optical Amplifiers (SOAs) are integrated for power equalization to maintain signal integrity across the large fabric. This comprehensive co-design approach enables SKYLIGHT to perform large-scale convolution/matrix-vector multiplication efficiently.

Results

System-level modeling using SimPhony confirms that a single 144x256 SKYLIGHT core is feasible within a standard reticle size (24.3x28.9 mm²). This architecture achieves a remarkable throughput of 342.1 TOPS at an energy efficiency of 23.7 TOPS/W. For ResNet-50 inference, SKYLIGHT delivers 1212 frames per second (FPS) with approximately 27 mJ per image, and an end-to-end efficiency of 84.17 FPS/W, which is 1.61 times higher than an NVIDIA RTX PRO 6000 Blackwell GPU under the same workload. Ablation studies highlight the critical contributions of each innovation: the 3D topology significantly reduces insertion loss compared to 2D planar designs (from 89.8 dB to 30 dB), non-volatile PCM weights drastically cut static power consumption (from 248.9 W for MZI to 14.4 W), and hierarchical accumulation avoids the severe limitations of MRR-based or purely electrical summation. Furthermore, SKYLIGHT demonstrates robustness to hardware non-idealities like low-bit quantization (INT6 inputs, INT7 weights, INT8 outputs) and signal-proportional analog noise, maintaining high task accuracy across diverse machine learning workloads, including RF signal classification, ImageNet-1K inference with self-supervised pretraining, unsupervised local self-learning on CIFAR-10 (achieving 0.773 accuracy), and remote-sensing flood mapping, when supported by noise-aware training.

Implications

SKYLIGHT represents a significant leap in photonic computing, effectively breaking the long-standing loss and power barriers that have restricted existing photonic processors to small-scale arrays. By enabling scalable, hundred-channel photonic in-memory tensor cores, it delivers superior throughput and energy efficiency, making real-time AI inference feasible for demanding applications. The architecture's ability to sustain large-scale single-core operation transforms optical parallelism into high-performance, low-energy perception and decision-making capabilities, crucial for latency-critical and endurance-constrained deployments. Moreover, SKYLIGHT’s support for unsupervised local learning, such as forward-forward local updates, provides a powerful mechanism for adaptive machine vision in scenarios where labeled data or global backpropagation is impractical, aligning with the needs of autonomous edge platforms. The design inherently allows for expansion into multi-core systems and chiplets, paving the way for tiled photonic compute fabrics with thousands of aggregate channels, supporting larger models and even higher end-to-end throughput without compromising its core architectural benefits.

Research Guy

Understand New Research — Instantly

Daily AI-generated explanations of the latest arXiv papers.

Research Guy

Research Guy

SKYLIGHT: A Scalable Hundred-Channel 3D Photonic In-Memory Tensor Core Architecture for Real-time AI Inference

Problem

Method

Results

Implications