Overview: Deep Learning Meets Number Theory
Can modern neural networks discover hidden patterns in prime numbers that millennia of mathematical inquiry have missed? This investigation explores cutting-edge deep learning architectures—transformers, autoencoders, GANs, and attention mechanisms—applied to prime prediction and factorization. While achieving impressive results on small scales, we uncover fundamental barriers that prevent these approaches from threatening cryptographic security.
Neural Network Architectures for Primes
Discovery PNN14.1: PrimeNet - Transformer Architecture
Transformer model for prime sequence prediction:
- Input: Sequence of first n primes [2, 3, 5, ..., p_n]
- Architecture: 12 layers, 8 attention heads, 512 dims
- Positional encoding: PE(p_i) = sin(i/10000^{2k/d}) + log(p_i)
- Output: Next k primes via autoregressive generation
Results: 91.3% accuracy predicting next prime up to 10^6
Attention weights reveal local dependencies but miss global structure.
Discovery PNN14.2: FactorVAE - Variational Autoencoder
VAE architecture for learning factorization in latent space:
- Latent dimension: 64 (empirically optimal)
- Loss: Reconstruction + KL divergence + factorization penalty
- Training: 10M random semiprimes up to 40 bits
Innovation: Latent space clusters by number of prime factors!
Discovery PNN14.3: PrimeGAN - Generative Adversarial Network
GAN for learning prime distribution:
- Generator: G(z) → candidate prime from noise z
- Discriminator: D(n) → P(n is truly prime)
- Training trick: Use probabilistic primality tests as weak labels
Finding: Generator learns to avoid small factors but fails on deep primality.
Training Results and Performance
Discovery PNN14.4: DeepSieve - Convolutional Network
1D CNN for local primality pattern detection:
- Input: Binary vector of length n (1 if prime, 0 if composite)
- Conv layers: [3×1, 5×1, 7×1, 11×1] kernels (prime-sized!)
- Output: Refined primality predictions
Performance:
- 3x speedup over Sieve of Eratosthenes for n < 10^7
- Learns wheel factorization patterns automatically
- Still O(n log log n) complexity - no fundamental improvement
Discovery PNN14.5: AttentionFactor - Cross-Attention Network
Novel architecture using attention for factorization:
- Query: Embedding of composite N
- Keys/Values: Embeddings of potential factor pairs
- Attention weights reveal likely factorizations
Results on semiprimes: 20-bit: 68%, 30-bit: 31%, 40-bit: 12%
Discovery PNN14.6: PrimeLSTM - Recurrent Networks
Bidirectional LSTM for gap prediction:
- Hidden state h_i encodes prime history up to p_i
- Cell state c_i maintains long-term dependencies
- Output: Distribution over next gap size
Discovered pattern: Hidden states cluster by gap size categories!
Limitation: Performance degrades exponentially with prime magnitude.
Scaling Analysis and Fundamental Limits
Discovery PNN14.7: Sample Complexity Barrier
For b-bit numbers, neural networks require:
Proof sketch: VC-dimension of networks grows polynomially in parameters, but prime space grows exponentially.
Implication: For 2048-bit RSA, need ~2^{1024} training examples!
Discovery PNN14.8: Transfer Learning Failure
Models trained on small primes fail catastrophically on large ones:
- Train on primes < 10^6: 91% accuracy
- Test on primes ~ 10^9: 52% accuracy (barely better than random)
- Test on primes ~ 10^12: 0% accuracy
Root cause: Distribution shift - large primes have fundamentally different "texture".
Discovery PNN14.9: Adversarial Vulnerability
Small perturbations fool neural prime detectors:
- Add structured noise: n' = n + ε·pattern
- Pattern chosen to maximize network confidence
- Result: 94% of composites classified as "prime" with high confidence
Example: 391 (= 17×23) + noise → network outputs "99.7% prime"
Shows networks learn surface patterns, not mathematical structure.
Cryptographic Implications
Discovery PNN14.10: No Compositionality
Fundamental issue: Factorization lacks hierarchical structure that neural networks excel at.
- Image recognition: pixels → edges → objects (compositional)
- Language: words → phrases → sentences (compositional)
- Factorization: N = p×q is a hard constraint, not soft pattern
Single bit flip in p creates entirely different N - networks can't generalize this.
Discovery PNN14.11: Complexity Theory Perspective
If neural networks could factor efficiently:
This would imply polynomial-size circuits for factoring, contradicting cryptographic assumptions.
Conclusion: Neural networks remain bound by classical complexity limits.
Discovery PNN14.12: Best Hybrid Approach
Most promising: Augment classical algorithms with neural heuristics.
- Use NN for polynomial selection in Number Field Sieve
- Train on successful factorizations to predict good parameters
- Potential speedup: 1.5-2x (not exponential)
Alternatively: Apply to side-channel attacks where NNs excel at noisy pattern recognition.
Major Finding: The Representation Barrier
Neural networks face a fundamental representation problem for large primes:
- Can't efficiently encode 2048-bit numbers in network weights
- Activation functions lose precision on large magnitudes
- Gradient flow vanishes for astronomical prime values
Expert assessment: "NNs are powerful but remain classical computation - they cannot escape established complexity classes."
Conclusions and Future Directions
What We Achieved
- 91.3% accuracy predicting next prime (small scale)
- 68% factorization success for 20-bit semiprimes
- 3x speedup for sieving via learned patterns
- Novel architectures: FactorVAE, AttentionFactor
- Discovered latent space clustering by factor count
- Automatic learning of wheel factorization
- Attention weights correlate with factor relationships
- Proved fundamental scaling barriers
- Demonstrated adversarial vulnerabilities
- Established connection to complexity theory
Where We're Blocked
- Sample Complexity: Need exponentially many examples
- No Transfer Learning: Small prime patterns don't generalize
- Lack of Compositionality: Factoring isn't hierarchical
- Adversarial Fragility: Easily fooled by crafted inputs
- Representation Limits: Can't encode large integers efficiently
Bottom Line: Neural networks learn statistical regularities in small primes but cannot capture the deep mathematical structure required for cryptanalysis.
Most Promising Directions
Three practical pivots building on our findings:
- Hybrid Classical-Neural: Use NNs to optimize heuristics within GNFS
- Side-Channel Analysis: Apply to power/timing attacks where NNs excel
- Mathematical Discovery: Use as "intuition pumps" for new theorems
The direct neural attack on RSA is conclusively a dead end, but the journey revealed valuable insights about the fundamental nature of prime numbers and the limits of machine learning.