memorization and generative ai

published: November 14, 2025
on this page

i’m keeping a running list of papers and resources on memorization, recital, membership inference, and training data extraction in generative AI. this is by no means exhaustive, but it’s a starting point for my own research and for anyone else interested.

if you see something missing, please let me know.

papers

YearAuthorsTitlePublicationLink
2016Shokri, R., et al.Membership Inference Attacks against Machine Learning ModelsarXiv:1610.05820arXiv
2020Feldman, V.Does Learning Require Memorization? A Short Tale about a Long TailSTOC 2020arXiv
2020Brown, T., et al.Language Models are Few-Shot LearnersarXiv:2005.14165arXiv
2020Feldman, V., & Zhang, C.What Neural Networks Memorize and Why: Discovering the Long Tail via Influence EstimationNeurIPS 2020arXiv
2020Khandelwal, U., et al.Generalization through Memorization: Nearest Neighbor Language ModelsICLR 2020arXiv
2021Carlini, N., et al.Extracting Training Data from Large Language Models30th USENIX Security SymposiumUSENIX
2021Jagannatha, A., et al.Membership Inference Attack Susceptibility of Clinical Language ModelsarXiv:2104.08305arXiv
2021Lee, K., et al.Deduplicating Training Data Makes Language Models BetterarXiv:2107.06499arXiv
2023Biderman, S., et al.Emergent and Predictable Memorization in Large Language ModelsarXiv:2304.11158arXiv
2023Carlini, N., et al.Extracting Training Data from Diffusion ModelsUSENIX Security 2023arXiv
2023Diera, A., et al.Memorization of Named Entities in Fine-tuned BERT ModelsCD-MAKE 2023arXiv
2023Nasr, M., et al.Scalable Extraction of Training Data from (Production) Language ModelsarXiv:2311.17035arXiv
2023Webster, R.A Reproducible Extraction of Training Images from Diffusion ModelsarXiv:2305.08694arXiv
2023Yeticstiren, B., et al.Evaluating the Code Quality of AI-Assisted Code Generation ToolsarXiv:2302.06590arXiv
2023Nguyen, N., & Nadi, S.An Empirical Evaluation of GitHub Copilot’s Code SuggestionsarXiv:2302.04728arXiv
2024Bharucha, F. G., et al.Generation or Replication: Auscultating Audio Latent Diffusion ModelsICASSP 2024IEEE
2024Dana, L., et al.Memorization in Attention-only TransformersarXiv:2411.10115arXiv
2024Epple, P., et al.Watermarking Training Data of Music Generation ModelsarXiv:2412.08549arXiv
2024Mahdavi, S., et al.Memorization Capacity of Multi-Head Attention in TransformersICLR 2024arXiv
2024Meeus, M., et al.Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted PhenomenonarXiv:2406.17746arXiv
2024Meeus, M., et al.Copyright Traps for Large Language ModelsICML 2024arXiv
2024Patronus AIIntroducing CopyrightCatcher, the first Copyright Detection API for LLMsPatronus AIAnnouncement
2024Qu, X., et al.Automatic Jailbreaking of the Text-to-Image Generative AI SystemsarXiv:2405.16567arXiv
2024Shilov, I., et al.Mosaic Memory: Fuzzy Duplication in Copyright Traps for Large Language ModelsarXiv:2405.15523arXiv
2024Su, E., et al.Extracting Memorized Training Data via DecompositionarXiv:2409.12367arXiv
2024Wang, W., et al.Image Copy Detection for Diffusion ModelsNeurIPS 2024arXiv
2024Wang, Z., et al.Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion ModelsarXiv:2405.05846arXiv
2024Wei, J., et al.Memorization in deep learning: A surveyarXiv:2406.03880arXiv
2024Chen, Y., et al.Extracting Training Data from Unconditional Diffusion ModelsarXiv:2406.12752arXiv
2025Chen, C., et al.Exploring Local Memorization in Diffusion Models via Bright Ending AttentionICLR 2025 SpotlightarXiv
2025Cooper, A. F., et al.Extracting memorized pieces of (copyrighted) books from open-weight language modelsarXiv:2505.12546arXiv
2025Gupta, T., & Pruthi, D.All That Glitters is Not Novel: Plagiarism in AI Generated ResearchACL 2025arXiv
2025Messina, F., et al.Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidancearXiv:2509.14934arXiv
2025Morris, J. X., et al.How much do language models memorize?arXiv:2505.24832arXiv
2025Ruan, Z., et al.Unveiling Over-Memorization in Finetuning LLMs for Reasoning TasksarXiv:2508.04117arXiv

more in sloppyright

on this page