Back to Publications

GUIrilla: A Scalable Framework for Automated Desktop UI Exploration

    Paper
  • Machine Learning

Autonomous agents capable of operating complex graphical user interfaces (GUIs) have the potential to transform desktop automation. While recent advances in large language models (LLMs) have significantly improved UI understanding, navigating full-window, multi-application desktop environments remains a major challenge, limited by costly manual data curation or synthetic pipelines that typically provide shallow coverage across narrow application domains. The macOS ecosystem is particularly underrepresented in existing UI datasets. To bridge this gap [the gap is the autonomous data collection, and in addition, the macOS in underrepresented] , we introduce GUIrilla, an automated scalable framework that systematically explores applications via the native accessibility API. Our approach organizes discovered interface elements and crawler actions into hierarchical GUI graphs and employs specialized interaction handlers to broaden coverage. From these graphs, we synthesize and release GUIrilla‑Task—27,171 tasks across 1,108 applications—paired with full‑desktop and window‑level screenshots and detailed accessibility metadata. Focus on deep application coverage allows our synthetic approach to outperform other synthetic methods on the ScreenSpot Pro benchmark using 97% less data. We release macapptree, an open‑source library for reproducible collection of structured accessibility metadata, GUIrilla‑Task dataset, manually verified GUIrilla-Gold benchmark and framework code to be used by the community.

@misc{garkot2025guirillascalableframeworkautomated,
      title={GUIrilla: A Scalable Framework for Automated Desktop UI Exploration}, 
      author={Sofiya Garkot and Maksym Shamrai and Ivan Synytsia and Mariya Hirna},
      year={2025},
      eprint={2510.16051},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.16051}, 
}

Related publications