Making the LLM-Plus-Evolution Pipeline Actually Smart

TL;DR Our EuroGP 2024 work showed Genetic Improvement (GI) can rescue LLM-generated code. This follow-up makes the GI part itself smarter. Three upgrades: lexicase selection to keep specialists alive, 10% down-sampling to cut compute, and a refined fitness function (F_E) that gives partial credit instead of pass/fail. On four LLMs (GPT-4, ChatGPT, Code Llama 7B, LLaMA 3 8B) over three PSB2 problems, we improved 11 of 12 model-problem combinations. Smaller models gain the most. GI is, increasingly, a capability amplifier for cheap models. What we left on the table last time The EuroGP 2024 paper proved the basic idea: take an LLM’s buggy first draft, hand it to Grammatical Evolution, get back better code. Statistically significant gains on every model. ...

July 1, 2025 · 5 min · Giovanni Pinna