Today, I want to share with you another story (after this similar one shared earlier) about bridging machine learning with circuit simulation. Recently, I succeeded in teaching neural networks to speak the language of circuits again, this time however, I solved the spectral bias problem! You can read more about this success in my research paper called „Ψ-xLSTM: Automated Behavioral Verilog-A Generation from Distilled Physics-Informed xLSTM Networks for High-Frequency Device Modeling“ which got published today in the IEEE Access journal. Also, check out the code behind it at on my GitHub repository.
Imagine you’re trying to teach a musician to play Beethoven, but they can only hear bass notes. They’d miss all the beautiful high-frequency melodies. This is exactly what happens with Physics-Informed Neural Networks (PINNs). PINNs are brilliant because they learn physical laws like Maxwell’s equations or circuit dynamics by embedding them directly into their training. No more „black box“ neural networks that violate conservation of energy or produce physically impossible predictions.
For modern electronics operating at GHz speeds, this is a dealbreaker. A PINN might correctly predict DC characteristics of a memristor but completely miss the fast switching transients that actually matter in real circuits. More about how I modeled printed memristors inspired by PINNs, you can read here.
This is where Physics Structure-Informed Neural Networks (Ψ-NN), that will in my view have the magnitude, reach and importance of the PINNs, changed the game. Instead of just training a network and hoping for the best, Ψ-NN discovered that neural networks naturally learn physical symmetries. Think of it like this: if you train a network on data that follows even-function symmetry (like f(-x) = f(x)), the network’s weights will spontaneously organize into patterns reflecting that symmetry. Ψ-NN exploits this by clustering weights and finding that many neurons learn redundant features that can be compressed into shared parameters.
For high-frequency dynamics up to 150 kHz (with MHz-GHz as future work), a critical regime for modern circuits, the spectral bias problem reared its head. The models would fail to capture the sharp edges that dominate power consumption and signal integrity.
In May 2024, Sepp Hochreiter and his colleagues (yes, the LSTM legend) dropped a bombshell: Extended LSTM (xLSTM). They completely modernized the 1997 LSTM architecture with two game-changing innovations. First, exponential gating that replaces sigmoid gates with exponential functions. Second, matrix memory that upgrades the scalar cell state to a full matrix (like Transformers, but recurrent). The AI community exploded and paper went viral (667 citations as of March 2026).
Then a paper was published in November 2025 called xLSTM-PINN, proving that xLSTM’s exponential gating acts like a high-pass filter that naturally captures sharp transients standard PINNs miss. The spectral bias problem was solved! But here’s the catch: xLSTM-PINN works too well. The matrix memory has O(N²) complexity. For a 64-dimensional state, that’s 4,096 parameters per timestep to update. When I saw Prof. Hochreiter share the xLSTM-PINN paper on LinkedIn, I thought: „This is beautiful for offline simulation, but you could never deploy this in a circuit simulator“. Or maybe you could, but only if you are aware of the recent Psi-NN paper, like I do 😀
SPICE solvers evaluate compact models billions of times during a single transient analysis. They need nanosecond-level evaluation. An xLSTM with 46,000 parameters doing matrix multiplications at every timestep? Completely impractical. It was like building a Formula 1 car that’s too heavy to race.
Then a spark from Nikola Tesla’s wisdom poured on me, as it did earlier for Ψ-HDL: physical systems aren’t arbitrary. A memristor doesn’t need 64 independent time constants because it has maybe 3 or 4 fundamental processes. Fast ionic drift happens in microseconds, medium thermal relaxation takes milliseconds, and slow trap-state dynamics occur over seconds.
That’s how Ψ-xLSTM was born:
First, I realized xLSTM’s exponential forget gates are actually learning physical relaxation times τ = 1/exp(gate_value). Instead of letting 64 neurons each learn their own time constant, I force them to cluster into K=3 groups using k-means. The result? The model discovers τ₁ = 0.34ms, τ₂ = 1.2ms, τ₃ = 8.7ms, which exactly matches known memristor physics! It’s like the network is rediscovering the periodic table.
Second, the 64×64 matrix memory gets factorized using SVD and I found that rank-4 captures 92% of the variance. The dense matrix collapses into 4 dominant eigen-modes. Why? Because memristor dynamics aren’t chaotic but are governed by a few fundamental processes like drift, diffusion, and trapping. The matrix was just storing redundant information.
Third, I developed Recurrent Relation-Aware Distillation (RRAD). Standard knowledge distillation is like a student only seeing the final answer on a teacher’s chalkboard; it doesn’t work for recurrent networks like xLSTM because you miss the process.
Think of it this way: if you want to model a memristor’s switching, you can’t just match the current at time t; you have to match how the internal ionic state is moving. By supervising these gradients, RRAD preserves the „long-term temporal dependencies“ that make xLSTM so powerful, preventing the student from drifting into unphysical territory during a long simulation. It’s not just about being right at the finish line; it’s about following the exact same path the physics takes.
Fourth, the compressed model (The Clustering model uses just 3 discovered time constants from 64 independent gates, while the Low-Rank model compresses from 46K to just 7K parameters!) gets automatically converted to Verilog-A code. The discovered time constants become behavioral descriptions that SPICE can simulate directly.
Here’s what blew my mind when I ran the experiments on my Desktop PC. The accuracy is nearly identical to the full xLSTM Teacher (only 1.5% MSE increase). The speed is 7.6× faster in Python, with projected 100× gains in compiled SPICE. The compression achieves 84% fewer parameters (46K down to 7K). The interpretability shows discovered time constants matching published physics literature. But the real validation came from SPICE. I generated Verilog-A code, simulated it in ngspice, and compared it to PyTorch inference on the same waveforms. Mean Absolute Error: 0.40 mA (that’s less than 0.05% error!). The network to Verilog-A to SPICE pipeline worked flawlessly. In the final published study, I also pushed the validation beyond the original memristor benchmark to active-device cases like MOSFETs and BJTs, and even included a stress test for out-of-expected transient disturbances.
You might ask: – Sorin, does this matter that much, really?! Well, this matters for several reasons dear Earth visitor. For researchers, you can now train high-capacity xLSTM-PINNs offline to capture complex physics, then automatically compress them into deployable models without manual equation derivation. For circuit designers, you get physics-based compact models that run at SPICE-compatible speeds while capturing high-frequency dynamics that standard PINNs miss. For AI, this proves that even „uninterpretable“ deep learning models have hidden structure and we just need to look for it.
Richard Feynman (I have all his books in my library) once said: „What I cannot create, I do not understand“.
The network learns, then we compress by asking: „What physical laws would produce this behavior?“ It’s the reverse engineering of nature, expressed in circuit simulation code.
I’m already dreaming about extensions inspired by Ψ-xLSTM. Can we discover structures in multi-physics coupled electromagnetic-thermal problems? Do quantum tunnel junctions have low-rank dynamics? Can we directly map discovered structures to analog AI accelerators for hardware implementation? One could write an ERC or MSCA proposal on this.
When Sepp Hochreiter shared that xLSTM-PINN paper on LinkedIn, he probably didn’t know it would inspire extending my Ψ-HDL framework to recurrent architectures, not just inheriting xLSTM-PINN’s solution to spectral bias, but compressing it into deployable compact models that can actually run inside a SPICE simulator (Thanks, Sepp!). That’s the beauty of open science where ideas build on ideas. Cheers.




Neueste Kommentare