Bug Taxonomy

TL;DR Classifying software hotfixes — the panic-mode patches you ship to fix something that’s broken in production right now — is hard for ML: tiny dataset (88 entries, 17 categories), brutal class imbalance, and expensive LLM features. HotCat reframes feature engineering as a search problem: NSGA-II evolves binary masks over 18 features, optimizing accuracy, NMI, and runtime simultaneously. A two-stage data augmentation lifts generalization from 55% → 72%. The Pareto front gives a balanced config: 59% accuracy, 0.58 NMI, 129 seconds. Most surprising: some features actively hurt — pruning them is both faster and more accurate. Hotfixes are not normal bugs In any normal software project, bugs queue up. They get triaged, prioritized, scheduled into sprints. Some sit there for months. ...