Back to Publications

Parsing macOS Application UI: Techniques and Tools for Effective Analysis and Automation

    Tech Note
  • Software Analysis

Introduction

Parsing macOS application UI using code tools is a technique that allows developers to extract information from the graphical user interface (GUI) of different apps, such as the layout and interface elements (buttons, labels, text fields, images). This operation can be helpful for various purposes:

  1. Testing:
  • Workflow automation to find regressions
  • Input/output data validation
  1. Debugging:
  • Validate real-time UI layout issues and check off-screen rendering and component frames
  • Localization errors
  1. Automation:
  • Validate UI consistency and scale across multiple platforms
  • Batch Processing of multiple elements
  1. Analysis
  • Data extraction for external tools
  • UI Analysis for UX experts and researchers

In the research, we will explore how to parse macOS application UI using code tools and compare the advantages and disadvantages of different methods. We will also show examples of using various tools for parsing macOS application UI and extracting useful information. By the end of this article, you will better understand how to perform parsing and have an example of applying this technique to generic scenarios.

Possible ways to perform parsing

If there is access to the source code, it is pretty trivial to work with the window structure of an app, and there are techniques to parse Xib or SwiftUI files. If we want to explore a distributed app with a GUI, more sophisticated approaches must be used.

Appium framework

There is a well-known open-source test automation tool, Appium, which facilitates UI automation of all modern platforms (Web, macOS, Windows, iOS, and Android). It also supports code written in different languages (JavaScript, Java, Python, etc.) The tool provides a cross-platform API for interacting with platform-specific drivers and performing tests. For macOS and iOS testing, Appium uses the XCUITest framework from Apple, which has various possibilities to interact with apps and can fetch UI layout and perform actions if available on the components.

Appium installation can be done by using the command in the Terminal app. The Node package manager (NPM) is the fastest way to do this is to perform the following installation steps:

  1. Install Brew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  1. Install NodeJS
brew install node
  1. Install Appium
npm i -g appium

Configure and use Appium

The Appium tool API provides the driver for macOS to allow launching an application using bundle ID, listening to a specific IP and port, and handling timeouts. Before launching any tests, driver configuration should be done, and the CLI or GUI Appium version must be launched for handling requests. To get an entire application structure, the following code is used:

@pytest.fixture()
def driver():
    options = Mac2Options()
    options.bundle_id = application_bundle_id
    drv = webdriver.Remote("http://127.0.0.1:4723", options=options)
    yield drv
    drv.quit()

def test_edit_text(driver):
    source = driver.page_source
    print(source)

As a result, print() will provide a list of parsed UI components in the XML format:

<?xml version="1.0" encoding="UTF-8"?>
<XCUIElementTypeApplication elementType="2" identifier="" label="" title="TextEdit">
  <XCUIElementTypeWindow elementType="4" identifier="_NS:34" label="" title="Untitled">
    <XCUIElementTypeScrollView elementType="46" identifier="_NS:8" label="" >
    </XCUIElementTypeScrollView>
    <XCUIElementTypeStaticText elementType="48" identifier="" value="—" label=""/>
    <XCUIElementTypeButton elementType="9" identifier="_XCUI:CloseWindow" label="" />
    <XCUIElementTypeButton elementType="9" identifier="_XCUI:FullScreenWindow" label="">
    </XCUIElementTypeButton>
    <XCUIElementTypeButton elementType="9" identifier="_XCUI:MinimizeWindow" label="" title="" />
    <XCUIElementTypeMenuButton elementType="16" identifier="" label="document actions" title="Edited" />
    <XCUIElementTypeImage elementType="43" identifier="" label="" title="Untitled.txt" />
    <XCUIElementTypeStaticText elementType="48" identifier="" value="Untitled" label="" />
  </XCUIElementTypeWindow>
  <XCUIElementTypeMenuBar elementType="55" identifier="_NS:762" label="" title="">
  </XCUIElementTypeMenuBar>
  <XCUIElementTypeTouchBar elementType="81" identifier="" label="" title="">
  </XCUIElementTypeTouchBar>
</XCUIElementTypeApplication>
TextEdit Window UI
TextEdit Window UI

The TextEdit app has a window with an "Untitled" title, content scroll view, close, full screen and minimize buttons, menu bar, and touch bar items. To perform the analysis, we can recursively check all child elements of the window and try to fetch essential data like type, title, position, size, etc. These components allow us to build the window structure in code. Also, Appium can provide a screenshot for the parsed components in base64 or binary formats. A sample script that can run Appium tests is included in the attached archive.

XCText framework

Appium exploits a framework provided by Apple that can create and run unit, performance, and UI tests from the Xcode or terminal. The easiest way to run tests is to install Xcode from the Mac App Store or Apple Developer Portal.

Installation and usage

No additional installation steps are required to install the framework. It can be used by creating a test target in the Xcode project and adding test cases for each task. Below is an example of getting a complete window structure in a text format. A trivial iteration can do further element processing through the child elements.

import XCTest

final class ApplicationParserUITests: XCTestCase {
    func testAppStructure() throws {
        let app = XCUIApplication(bundleIdentifier: "com.apple.TextEdit")
        app.launch()
        
        let windows = app.windows
        for i in 0..<windows.count {
            let window = windows.element(boundBy: i)
            writeToFile(window.debugDescription)
        }
    }
}

The call's result can help recognize element type, frame, and hierarchy. Moreover, the structure is the same as using the Appium framework.

 →Window (Main), 0x10c015750, {{185.0, 90.0}, {586.0, 476.0}}, identifier: '_NS:34', title: 'Untitled.txt'
    ScrollView, 0x10c014880, {{185.0, 118.0}, {586.0, 448.0}}, identifier: '_NS:8', Disabled
    Button, 0x10c015b80, {{192.0, 96.0}, {14.0, 16.0}}, identifier: '_XCUI:CloseWindow'
    Button, 0x10c015cc0, {{232.0, 96.0}, {14.0, 16.0}}, identifier: '_XCUI:FullScreenWindow'
    Button, 0x10c0160a0, {{212.0, 96.0}, {14.0, 16.0}}, identifier: '_XCUI:MinimizeWindow'
    MenuButton, 0x10c0161e0, {{516.0, 95.0}, {51.0, 16.0}}, title: 'Edited', label: 'document actions'
    Image, 0x10c016320, {{402.0, 95.0}, {16.0, 16.0}}, title: 'Untitled.txt'
    StaticText, 0x10c0167a0, {{420.0, 95.0}, {79.0, 16.0}}, value: Untitled.txt
Path to element:
 →Application, 0x10c013d40, pid: 3247, title: 'TextEdit', Disabled
  ↳Window (Main), 0x10c015750, {{185.0, 90.0}, {586.0, 476.0}}, identifier: '_NS:34', title: 'Untitled.txt'
Query chain:
 →Find: Application 'com.apple.TextEdit'
  Output: {
    Application, 0x10c014ed0, pid: 3247, title: 'TextEdit', Disabled
  }
  ↪︎Find: Descendants matching type Window
    Output: {
      Window (Main), 0x10c015010, {{185.0, 90.0}, {586.0, 476.0}}, identifier: '_NS:34', title: 'Untitled.txt'
    }
    ↪︎Find: Element at index 0
      Output: {
        Window (Main), 0x10c015010, {{185.0, 90.0}, {586.0, 476.0}}, identifier: '_NS:34', title: 'Untitled.txt'
      }

Writing JSON to a local file significantly simplifies the output and parsing of the test results.

Accessibility API

Accessibility features and functionalities in macOS is an assistive technology that helps users interact with OS and applications using spoken commands, onscreen keyboards, assistive devices, and other alternative methods to control the pointer and perform actions. Accessibility has full access to all the internals of UI element structure, their actions, and visual representation.

Appium and XCTest tools under the hood use Accessibility API to perform predefined scenarios and generate reports on state and actions. It is the best option if we want the most flexible and powerful way to inspect UI structure.

Installation and usage

Installation is not required because this technology is integrated into the macOS system and can be used out of the box. To make a basic UI structure inspection, there is a tool bundled with the Xcode app called Accessibility Inspector. It allows capturing the application window or some of its components and inspecting the structure, properties, and call action if supported.

Important: Using the Accessibility API system requires granting permissions for the application before launching any code or scripts:

System Settings -> Privacy & Security -> Accessibility -> "An App"

The Accessibility API can be used from the Xcode or the Python package ApplicationServices, which has a bridge to system calls and provides the same functionality as Xcode. The API is simple and straightforward.

A general algorithm for extracting UI element structure is displayed below:

import ApplicationServices
import AppKit

func parse_window():
    # get the shared workspace
    workspace = AppKit.NSWorkspace.sharedWorkspace()

    # search for launched app
    app_bundle_id = "com.apple.TextEdit"
    running_app = search_for_running_app(workspace, app_bundle_id)

    # search for all the windows
    windows = windows_for_application(application)

    # get window children
    for window in windows:
        err, children = ApplicationServices.AXUIElementCopyAttributeValue(
            window, kAXChildrenAttribute, None
        )

        # serialize UI structure in JSON
        json_string = store_elements_in_json(children)

        # write JSON to file
        write_json_file(json_string)

The complete example of application UI parsing and extracting to a file is attached to the prototype.zip archive.

Tricky case

On macOS, applications are built using AppKit, SwiftUI, Electron, Atom, Flutter frameworks, etc. These various implementations can affect the structure of user-interface elements, and sometimes, Accessibility API does not provide a correct hierarchy while fetching property value by using the attribute kAXChildrenAttribute. This behavior can be reproduced using all the provided tools, Appium, XCTest, and Accessibility sub-system. Let us check the ClearVPN app UI structure:

UI Structure
UI Structure

On the image, an additional layer is added by NSHostingView, which is not reflected in the structure fetched for the window. Other children's elements are placed on the hidden layer; none of the mentioned tools can track this layer out of the box. If fetching children from the root elements is not helpful, we can find any children element and then go back to the root. The hit-test technique was used to resolve the issue. Accessibility API has the option to obtain an element that is located under a provided point. After obtaining the element, we can return to the root element and find a window container.

One of the top elements can be retrieved by using the next Accessibility API calls:

  component = ApplicationServices.AXUIElementCreateSystemWide()
  err, value = ApplicationServices.AXUIElementCopyElementAtPosition(component, 0, 50, None)

Here, we fetch an accessibility object that provides access to system attributes and get a single top element by coordinates. After retrieving an element, we must find a root window or AXHostingView container. This container contains all his children required for parsing the structure.

To complete a parsed structure, we can combine regular and hit-test parsing into one flow and then export all the data to a suitable format like JSON or XML.

Conclusion

Application window parsing is expected to be a regular task, and the practice shows that the task is not very trivial because of the various technologies used for implementation. There are multiple frameworks for building apps where UI might be constructed using native or web technologies. For performed tasks, there are also multiple tools for analyzing UI structure, and Apple provides the most powerful bundled with the Xcode and its operation system. An output of the solution can be used to understand available action for apps from UI representation and interaction with its components. These results may be consumed by third-party applications, neural networks, or automation, provide simplified flows for users, and perform complex automation flows.

Related publications