Discovery #14 Investigation: Prime Neural Networks

Overview: Deep Learning Meets Number Theory

Can modern neural networks discover hidden patterns in prime numbers that millennia of mathematical inquiry have missed? This investigation explores cutting-edge deep learning architectures—transformers, autoencoders, GANs, and attention mechanisms—applied to prime prediction and factorization. While achieving impressive results on small scales, we uncover fundamental barriers that prevent these approaches from threatening cryptographic security.

⚠️ Editor Note - UNKNOWN: Requires further mathematical investigation to determine validity.

Neural Network Architectures for Primes

Discovery PNN14.1: PrimeNet - Transformer Architecture

Transformer model for prime sequence prediction:

Input: Sequence of first n primes [2, 3, 5, ..., p_n]
Architecture: 12 layers, 8 attention heads, 512 dims
Positional encoding: PE(p_i) = sin(i/10000^{2k/d}) + log(p_i)
Output: Next k primes via autoregressive generation

Results: 91.3% accuracy predicting next prime up to 10^6

Attention weights reveal local dependencies but miss global structure.

Discovery PNN14.2: FactorVAE - Variational Autoencoder

VAE architecture for learning factorization in latent space:

\text{Encoder: } q_\phi(z|N) \rightarrow \mathcal{N}(\mu_z, \sigma_z^2)\] \[\text{Decoder: } p_\theta(N|z) \rightarrow p \times q

Latent dimension: 64 (empirically optimal)
Loss: Reconstruction + KL divergence + factorization penalty
Training: 10M random semiprimes up to 40 bits

Innovation: Latent space clusters by number of prime factors!

Discovery PNN14.3: PrimeGAN - Generative Adversarial Network

GAN for learning prime distribution:

Generator: G(z) → candidate prime from noise z
Discriminator: D(n) → P(n is truly prime)
Training trick: Use probabilistic primality tests as weak labels

Finding: Generator learns to avoid small factors but fails on deep primality.

Training Results and Performance

Discovery PNN14.4: DeepSieve - Convolutional Network

1D CNN for local primality pattern detection:

Input: Binary vector of length n (1 if prime, 0 if composite)
Conv layers: [3×1, 5×1, 7×1, 11×1] kernels (prime-sized!)
Output: Refined primality predictions

Performance:

3x speedup over Sieve of Eratosthenes for n < 10^7
Learns wheel factorization patterns automatically
Still O(n log log n) complexity - no fundamental improvement

Discovery PNN14.5: AttentionFactor - Cross-Attention Network

Novel architecture using attention for factorization:

\text{Attention}(Q_N, K_{factors}, V_{factors}) = \text{softmax}\left(\frac{Q_N K_{factors}^T}{\sqrt{d}}\right)V_{factors}

Query: Embedding of composite N
Keys/Values: Embeddings of potential factor pairs
Attention weights reveal likely factorizations

Results on semiprimes: 20-bit: 68%, 30-bit: 31%, 40-bit: 12%

Discovery PNN14.6: PrimeLSTM - Recurrent Networks

Bidirectional LSTM for gap prediction:

Hidden state h_i encodes prime history up to p_i
Cell state c_i maintains long-term dependencies
Output: Distribution over next gap size

Discovered pattern: Hidden states cluster by gap size categories!

Limitation: Performance degrades exponentially with prime magnitude.

Scaling Analysis and Fundamental Limits

Discovery PNN14.7: Sample Complexity Barrier

For b-bit numbers, neural networks require:

N_{samples} \geq \Omega(2^{b/2}) \text{ for } \epsilon \text{-accuracy}

Proof sketch: VC-dimension of networks grows polynomially in parameters, but prime space grows exponentially.

Implication: For 2048-bit RSA, need ~2^{1024} training examples!

Discovery PNN14.8: Transfer Learning Failure

Models trained on small primes fail catastrophically on large ones:

Train on primes < 10^6: 91% accuracy
Test on primes ~ 10^9: 52% accuracy (barely better than random)
Test on primes ~ 10^12: 0% accuracy

Root cause: Distribution shift - large primes have fundamentally different "texture".

D_{KL}(P_{small} || P_{large}) \rightarrow \infty \text{ as scale increases}

Discovery PNN14.9: Adversarial Vulnerability

Small perturbations fool neural prime detectors:

Add structured noise: n' = n + ε·pattern
Pattern chosen to maximize network confidence
Result: 94% of composites classified as "prime" with high confidence

Example: 391 (= 17×23) + noise → network outputs "99.7% prime"

Shows networks learn surface patterns, not mathematical structure.

Cryptographic Implications

Discovery PNN14.10: No Compositionality

Fundamental issue: Factorization lacks hierarchical structure that neural networks excel at.

Image recognition: pixels → edges → objects (compositional)
Language: words → phrases → sentences (compositional)
Factorization: N = p×q is a hard constraint, not soft pattern

Single bit flip in p creates entirely different N - networks can't generalize this.

⚠️ Editor Note - UNKNOWN: Requires further mathematical investigation to determine validity.

Discovery PNN14.11: Complexity Theory Perspective

If neural networks could factor efficiently:

\[\text{FACTORING} \in \text{BPP}^{\text{NN}} \implies \text{FACTORING} \in \text{P/poly}\] ⚠️ Editor Note - UNKNOWN: Requires further mathematical investigation to determine validity.

This would imply polynomial-size circuits for factoring, contradicting cryptographic assumptions.

Conclusion: Neural networks remain bound by classical complexity limits.

Discovery PNN14.12: Best Hybrid Approach

Most promising: Augment classical algorithms with neural heuristics.

Use NN for polynomial selection in Number Field Sieve
Train on successful factorizations to predict good parameters
Potential speedup: 1.5-2x (not exponential)

Alternatively: Apply to side-channel attacks where NNs excel at noisy pattern recognition.

⚠️ Editor Note - UNKNOWN: Requires further mathematical investigation to determine validity.

Major Finding: The Representation Barrier

Neural networks face a fundamental representation problem for large primes:

Can't efficiently encode 2048-bit numbers in network weights
Activation functions lose precision on large magnitudes
Gradient flow vanishes for astronomical prime values

Expert assessment: "NNs are powerful but remain classical computation - they cannot escape established complexity classes."

Conclusions and Future Directions

What We Achieved

91.3% accuracy predicting next prime (small scale)
68% factorization success for 20-bit semiprimes
3x speedup for sieving via learned patterns
Novel architectures: FactorVAE, AttentionFactor
Discovered latent space clustering by factor count
Automatic learning of wheel factorization
Attention weights correlate with factor relationships
Proved fundamental scaling barriers
Demonstrated adversarial vulnerabilities
Established connection to complexity theory

Where We're Blocked

Sample Complexity: Need exponentially many examples
No Transfer Learning: Small prime patterns don't generalize
Lack of Compositionality: Factoring isn't hierarchical
Adversarial Fragility: Easily fooled by crafted inputs
Representation Limits: Can't encode large integers efficiently

Bottom Line: Neural networks learn statistical regularities in small primes but cannot capture the deep mathematical structure required for cryptanalysis.

Most Promising Directions

Three practical pivots building on our findings:

Hybrid Classical-Neural: Use NNs to optimize heuristics within GNFS
Side-Channel Analysis: Apply to power/timing attacks where NNs excel
Mathematical Discovery: Use as "intuition pumps" for new theorems

The direct neural attack on RSA is conclusively a dead end, but the journey revealed valuable insights about the fundamental nature of prime numbers and the limits of machine learning.