LLM on Giovanni Pinna

LLM on Giovanni Pinna https://giovannipinna.net/tags/llm/ Recent content in LLM on Giovanni Pinna Hugo en Wed, 02 Jul 2025 00:00:00 +0000 The Text-to-SQL Field Has a Measurement Problem https://giovannipinna.net/posts/scireports2025-text-to-sql-metrics/ Wed, 02 Jul 2025 00:00:00 +0000 https://giovannipinna.net/posts/scireports2025-text-to-sql-metrics/ Every text-to-SQL benchmark today scores queries as either perfect or wrong. That's a coin flip dressed up as a metric. We built one that actually tells you how close you got. Making the LLM-Plus-Evolution Pipeline Actually Smart https://giovannipinna.net/posts/sncs2025-exploring-gi-effect/ Tue, 01 Jul 2025 00:00:00 +0000 https://giovannipinna.net/posts/sncs2025-exploring-gi-effect/ Last year we showed evolution can fix LLM code. This year we made the evolution itself smarter — better selection, partial credit, fewer cycles — and got improvements in 11 of 12 cases. Improving LLM-Generated Code via Genetic Improvement: A Summary of Recent Advances https://giovannipinna.net/posts/italia2025-gi-summary/ Mon, 23 Jun 2025 00:00:00 +0000 https://giovannipinna.net/posts/italia2025-gi-summary/ A comprehensive summary of our research program on applying Genetic Improvement to LLM-generated code, presented at the Italian national AI conference Ital-IA 2025. GPT-4 Can Make Court Rulings Easier to Read. It Can Also Lie to You About Them, Confidently. https://giovannipinna.net/posts/wiat2024-courts-to-comprehension/ Tue, 10 Dec 2024 00:00:00 +0000 https://giovannipinna.net/posts/wiat2024-courts-to-comprehension/ We asked 75 people to read summaries of Italian Constitutional Court rulings — written by experts, by GPT-4o, by a fine-tuned LLaMA, and the raw judgments themselves. The results say more about LLMs than about courts. What If We Stopped Asking ChatGPT to Fix Its Own Code? https://giovannipinna.net/posts/eurogp2024-gi-for-llm-code/ Wed, 03 Apr 2024 00:00:00 +0000 https://giovannipinna.net/posts/eurogp2024-gi-for-llm-code/ Self-correction is the default fix for buggy LLM code, but it has a ceiling. We tried something stranger — evolving the code instead — and it worked across every model we tested.