Blog on Giovanni Pinna

Blog on Giovanni Pinna https://giovannipinna.net/posts/ Recent content in Blog on Giovanni Pinna Hugo en Tue, 14 Apr 2026 00:00:00 +0000 There Is No "Best" AI Coding Agent — And That's the Whole Point https://giovannipinna.net/posts/msr2026-comparing-ai-agents/ Tue, 14 Apr 2026 00:00:00 +0000 https://giovannipinna.net/posts/msr2026-comparing-ai-agents/ We looked at 7,156 pull requests from five AI coding agents on real open-source projects. The agent matters less than you'd think. The kind of work matters far more. When AI Agents Lie About Their Own Code (Without Meaning To) https://giovannipinna.net/posts/msr2026-message-code-inconsistency/ Tue, 14 Apr 2026 00:00:00 +0000 https://giovannipinna.net/posts/msr2026-message-code-inconsistency/ Only 1.7% of AI-authored pull requests have descriptions that don't match their code. Those PRs get accepted 51.7% less often and take 3.5× longer to merge. Trust is the bottleneck nobody is measuring. Sometimes the Best Feature Engineering Is Throwing Features Away https://giovannipinna.net/posts/ssbse2025-hotcat/ Mon, 13 Oct 2025 00:00:00 +0000 https://giovannipinna.net/posts/ssbse2025-hotcat/ Classifying urgent software hotfixes is hard: tiny dataset, brutal class imbalance, expensive LLM features. We let evolution pick which features to keep — and discovered some were actively making things worse. Sometimes Your AI Agent Burns More Energy Optimizing Code Than the Code Will Ever Save https://giovannipinna.net/posts/ssbse2025-ga4gc/ Mon, 13 Oct 2025 00:00:00 +0000 https://giovannipinna.net/posts/ssbse2025-ga4gc/ AI coding agents that 'optimize' your code can cost more energy than they save — for hundreds of thousands of runs. We tuned the agent itself, and got 37.7% faster runs and better code at the same time. The Text-to-SQL Field Has a Measurement Problem https://giovannipinna.net/posts/scireports2025-text-to-sql-metrics/ Wed, 02 Jul 2025 00:00:00 +0000 https://giovannipinna.net/posts/scireports2025-text-to-sql-metrics/ Every text-to-SQL benchmark today scores queries as either perfect or wrong. That's a coin flip dressed up as a metric. We built one that actually tells you how close you got. Making the LLM-Plus-Evolution Pipeline Actually Smart https://giovannipinna.net/posts/sncs2025-exploring-gi-effect/ Tue, 01 Jul 2025 00:00:00 +0000 https://giovannipinna.net/posts/sncs2025-exploring-gi-effect/ Last year we showed evolution can fix LLM code. This year we made the evolution itself smarter — better selection, partial credit, fewer cycles — and got improvements in 11 of 12 cases. Improving LLM-Generated Code via Genetic Improvement: A Summary of Recent Advances https://giovannipinna.net/posts/italia2025-gi-summary/ Mon, 23 Jun 2025 00:00:00 +0000 https://giovannipinna.net/posts/italia2025-gi-summary/ A comprehensive summary of our research program on applying Genetic Improvement to LLM-generated code, presented at the Italian national AI conference Ital-IA 2025. GPT-4 Can Make Court Rulings Easier to Read. It Can Also Lie to You About Them, Confidently. https://giovannipinna.net/posts/wiat2024-courts-to-comprehension/ Tue, 10 Dec 2024 00:00:00 +0000 https://giovannipinna.net/posts/wiat2024-courts-to-comprehension/ We asked 75 people to read summaries of Italian Constitutional Court rulings — written by experts, by GPT-4o, by a fine-tuned LLaMA, and the raw judgments themselves. The results say more about LLMs than about courts. What If We Stopped Asking ChatGPT to Fix Its Own Code? https://giovannipinna.net/posts/eurogp2024-gi-for-llm-code/ Wed, 03 Apr 2024 00:00:00 +0000 https://giovannipinna.net/posts/eurogp2024-gi-for-llm-code/ Self-correction is the default fix for buggy LLM code, but it has a ceiling. We tried something stranger — evolving the code instead — and it worked across every model we tested. Influence: Where Marketing Meets Artificial Intelligence https://giovannipinna.net/posts/influence-project/ Wed, 13 Jan 2021 00:00:00 +0000 https://giovannipinna.net/posts/influence-project/ A project exploring the intersection of marketing and artificial intelligence, using data analysis to generate targeted social media content. Book Review: Thinking, Fast and Slow by Daniel Kahneman https://giovannipinna.net/posts/pensieri-lenti-e-veloci/ Sun, 10 Jan 2021 10:00:00 +0100 https://giovannipinna.net/posts/pensieri-lenti-e-veloci/ A review of Daniel Kahneman's Nobel Prize-winning work on how we make decisions and the cognitive biases that influence our thinking. Book Review: Don't Make Me Think by Steve Krug https://giovannipinna.net/posts/dont-make-me-think/ Sun, 10 Jan 2021 09:00:00 +0100 https://giovannipinna.net/posts/dont-make-me-think/ A review of Steve Krug's classic guide to web usability and human-computer interaction.