Hallucinating Certificates: Differential Testing of TLS Certificate Validation Using Generative Language Models

Authors

Talha Paracha, Kyle Posluns, Kevin Borgolte, Martina Lindorfer, David Choffnes

Publication

Proceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE), April 2026

Abstract

Certificate validation is a crucial step in Transport Layer Security (TLS), the de facto standard network security protocol. Prior research has shown that differentially testing TLS implementations with synthetic certificates can reveal critical security issues, such as accidentally accepting untrusted certificates. Leveraging known techniques, like random input mutations and program coverage guidance, prior work created corpora of synthetic certificates. By testing the certificates with multiple TLS libraries and comparing the validation outcomes, they discovered new bugs. However, they cannot generate the corresponding inputs efficiently, or they require to model the programs and their inputs in ways that scale poorly.

In this paper, we introduce a new approach, MLCerts, to generate synthetic certificates for differential testing that leverages generative language models to more extensively test software implementations. Recently, these models have become (in)famous for their applications in generating content, writing code, and conversing with users, as well as for “hallucinating” syntactically correct yet semantically nonsensical output. In this paper, we provide and leverage two novel insights: (a) TLS certificates can be expressed in natural-like language, namely in the X.509 standard that aids human readability, and (b) differential testing can benefit from hallucinated malformed test cases.

Using our approach MLCerts, we find significantly more distinct discrepancies between the five TLS implementations OpenSSL, LibreSSL, GnuTLS, MbedTLS, and MatrixSSL than the state-of-the-art benchmark Transcert (+30%; 20 vs 26, out of a maximum possible of 30) and an order of magnitude more than the seminal work Frankencerts (+1,200%; 2 vs 26). Finally, we show that the diversity of MLCerts-generated certificates reveals a range of previously unobserved and interesting behavior with security implications.

Source Code and Data: github.com/rub-softsec/MLCerts

Supplementary Material

aux.pdf provides additional supplementary material for the paper.

Certificate Datasets

Raw PEM certificates used in differential testing can be downloaded via Zenodo:

12 synthetic certificate datasets.
MLCerts 1M dataset.
Frankencerts 8M dataset.
Transcert 30K dataset.

Differential Testing Framework

Please refer to the /differential-testing/ directory of our artifact for code and instructions on accessing Docker container with a running environment for the framework and a patched Transcert implementation.

Language Models

The code, documentation and saved models are available together for download via Zenodo.

Code for Paper Results

Please refer to the /paper-code/ directory of our artifact.

We use IPython Notebooks for producing paper results. The scripts used for different purposes are the following:

Certificate corpus analysis.ipynb: Tables 1 and 3
Graphs/Graph-DiscrepanciesVSCoverage.ipynb: Figure 2
Graphs/Graph-Matrix.ipynb: Figure 3
Graphs/Graph-Temperature.ipynb: Figure 4
Graphs/Graph-CertCountVSDiscrepanvies.ipynb: Figure 5
Zlint for 00000s.ipynb + Diversity Comparison Zlint.ipynb: Figure 6 and 5.3.2 analysis
Library checkpointing.ipynb: Figure 7
Discrepancy analysis.ipynb + Library logs analysis.ipynb: Section 5.3.1

For all notebooks, the output information should be intact to help avoid re-running code.

Acknowledgements

This work is based on research supported by the National Science Foundation (NSF Grant 1955227), the Internet Society Foundation, the Vienna Science and Technology Fund (WWTF) and the City of Vienna [Grant ID: 10.47379/ICT19056 and 10.47379/ICT22060], the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2092 CASA - 390781972, the ICANN Grant Program, and SBA Research (SBA-K1 NGC). Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

BibTeX

@inproceedings{icse2026-hallucinating-certificates,
  title     = {{Hallucinating Certificates: Differential Testing of TLS Certificate Validation Using Generative Language Models}},
  author    = {Paracha, Talha and Posluns, Kyle and Borgolte, Kevin and Lindorfer, Martina and Choffnes, David},
  booktitle = {Proceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE)},
  code      = {https://github.com/rub-softsec/mlcerts},
  data      = {https://zenodo.org/records/15971208},
  date      = {2026-04},
  doi       = {10.1145/3744916.3773203},
  editor    = {Mezini, Mira and Zimmermann, Thomas},
  location  = {Rio de Janeiro, Brazil},
  publisher = {Association for Computing Machinery (ACM)/Institute of Electrical and Electronics Engineers (IEEE)},
  volume    = {48}
}