Abstract This paper provides a comprehensive summary of our research program on applying Genetic Improvement to code generated by Large Language Models, consolidating findings from two published studies (EuroGP 2024 and SN Computer Science 2025). Across both works, we demonstrate that neural and evolutionary approaches are fundamentally complementary: LLMs excel at rapidly generating structurally plausible code, while Genetic Improvement refines it toward precise specifications through grammar-based evolutionary search. A consistent finding is the “capability amplifier” effect — smaller open-source models benefit disproportionately from GI, narrowing the gap with larger proprietary models. We also discuss key limitations including oracle dependency, scalability constraints to multi-file projects, bias propagation from LLM-generated grammars, and the stochastic nature of evolutionary algorithms. Presented at Ital-IA 2025, the 5th National Conference on Artificial Intelligence, Rome, Italy. Introduction The intersection of Large Language Models and evolutionary computation represents one of the most promising frontiers in automated software engineering. Over the past two years, our research group has developed and refined a methodology for systematically improving code generated by LLMs using Genetic Improvement (GI) techniques. This paper, presented at Ital-IA 2025 (the 5th National Conference on Artificial Intelligence, organized by CINI), provides a comprehensive summary of this research program and its key findings.
...