Back to Publications

Collecting a Dataset of macOS Apps: Pains, Gains, Lessons Learned. Part 1

    Tech Note
  • Software Analysis

Introduction

The desktop OS share of the macOS operating system keeps steadily increasing, approaching 1/5, according to statcounter.com data:

Desktop & Tablet Operating System Market Share Worldwide, Jan 2009 - Jan 2024. Image: gs.statcounter.com
Desktop & Tablet Operating System Market Share Worldwide, Jan 2009 - Jan 2024. Image: gs.statcounter.com

At the same time, the academic research on this part of the Apple ecosystems remains scarce, as noted by researchers in the macOS forensics [12] and [13] malware analysis areas. More specifically, prior work involving app dataset collection is limited to the security domain, specifically malware detection and analysis techniques based on machine learning and vulnerability analyses.

This contrasts sharply with the situation in security research for iOS [10] or Android [1], where large-scale app datasets are available and reused. To overcome this issue, we collect an open and diverse dataset of macOS application bundles from self-hosted distribution platforms (i.e., outside MAS and excluding system binaries and default pre-installed apps).

Fantastic apps and where to find them

In malware research, the datasets include a mix of malware and benignware binaries, and the sources of benign apps vary greatly:

  • in [5], authors collect 1000 macOS app binaries from OS X Mavericks and open-source software (OSS).
  • in [11], the authors collect 460 app binaries from the Mac App Store (MAS).
  • in [6], authors work with 461 app binaries from MAS and the Homebrew package manager.
  • in [3], authors also use 853 binaries from the /usr/bin/ system directory, MAS, and the Softonic.

The body of work on vulnerability research considers larger app datasets, and in contrast with malware studies, authors collect not just the binaries but the whole application bundles:

  • in [4], authors collect 1612 apps from MAS, and
  • in [2], authors collect 13038 apps from MAS and the MacUpdate platform.

To the best of our knowledge, only the dataset of [11] is used in subsequent work [7, 8, 9].

Factors that affect dataset scope (that us and previous studies had faced):

  • regional accessibility, as some apps are only present on specific country markets
  • free vs paid app versions — some paid apps are not accessible without purchase

Collecting the data

In our work, we consider self-hosted distribution platforms explored in the previous work or listed at alternativeto.net, and exclude Steam (game-specific), Setapp (only has paid apps and has <300 apps), MacApps (has <300 apps), Homebrew/MacPorts/Nix (only have CLI binaries), and AppAgg (only has links to MAS). Furthermore, we exclude Softonic, as it is similar to MacUpdate but has fewer apps. We select GitHub over SourceForge to get macOS bundles, as the repositories of the former provide compiled applications as ready-to-download assets.

We finally choose a curated OSS app list on GitHub, Homebrew Cask, and MacUpdate as app source platforms.

See our article Feature Extraction of AppleScript Actions for the experience of getting macOS apps from the Mac App Store.

We implement a Python script for automated app retrieval via API access (GitHub, Homebrew Cask) or website crawling (MacUpdate), and we use Google Cloud Bucket to store the full dataset. We collect 12095 free/freemium/trial app archives (ZIP or DMG), totaling over 800 GB of storage. In case of GitHub, we also fetch TAR archives.

App categories

In some cases (Table 1, row “Bundle N/A”), we fail to extract and analyze the app bundle due to the broken or missing APP files or the DMG archives being protected or requiring explicit license approval from the user.

GitHubHomebrew CaskMacUpdate
Size39342707432
Bundle N/A12 (2.44%)717 (19.1%)2616 (42.49%)
Top-3 Categoriesdev-tools (25%), utilities (21.75%), productivity (14.43%)utilities (14.73%), dev-tools (14.7%), productivity (10.12%)utilities (11.99%), productivity (6.82%), dev-tools (4.97%)

We observe that apps in the dev-tools, utilities, and productivity categories constitute 1/5 to 1/2 of the dataset, a significant proportion not previously reported in the literature.

Bundle sizes

For bundles extracted successfully, we report their size in MB:

Dataset0–55–5050–100100–500500+N/A
GitHub11.99%38.82%5.89%38.62%2.24%2.44%
Brew Cask9.08%25.15%8.82%31.67%6.18%19.1%
MacUpdate11.4%24.87%6.63%12.61%2.0%42.49%

At least 1/3 of the bundles have <50 MB in size, suggesting a fair share of native low-resource apps with little framework use.

Update frequency

For bundles extracted successfully, we report their size and last update date from the Info.plist file.

Dataset30 days6 months1 year3+ yearsN/A
GitHub59.55%10.98%5.89%21.14%2.44%
Brew Cask41.08%14.94%6.39%18.49%19.1%
MacUpdate25.79%5.7%3.35%22.67%42.49%

While at least 1/3 of the apps are maintained up-to-date across all datasets, almost 1/5 of the apps in two of our datasets might represent abandonware.

Peek into the data

GitHub

{
  "short_description": "App for macOS with a minimalistic UI which lets you quickly throttle down the CPU usage of any running process. ",
  "categories": [
    "system"
  ],
  "repo_url": "https://github.com/AppPolice/AppPolice",
  "title": "AppPolice",
  "icon_url": "",
  "screenshots": [
    "https://cloud.githubusercontent.com/assets/1557716/12860558/11908a78-cc66-11e5-9998-b4bec11dbfeb.png",
    "https://cloud.githubusercontent.com/assets/1557716/12860551/ffff72d8-cc65-11e5-9304-4f1341657b5a.png",
    "https://cloud.githubusercontent.com/assets/1557716/12860559/1193fe42-cc66-11e5-9d4f-8b8af842ea72.png",
    "https://cloud.githubusercontent.com/assets/1557716/12860549/fdffd054-cc65-11e5-8405-cc224ea4ab3b.png",
    "https://cloud.githubusercontent.com/assets/1557716/12860557/118f5fcc-cc66-11e5-8822-dc85cbe7bbb9.png"
  ],
  "official_site": "",
  "languages": [
    "objective_c"
  ]
}

Homebrew Cask

{
  "token": "1password",
  "full_token": "1password",
  "old_tokens": [],
  "tap": "homebrew/cask",
  "name": [
    "1Password"
  ],
  "desc": "Password manager that keeps all passwords secure behind one password",
  "homepage": "https://1password.com/",
  "url": "https://downloads.1password.com/mac/1Password-8.10.20-x86_64.zip",
  "url_specs": {},
  "appcast": null,
  "version": "8.10.20",
  "installed": null,
  "installed_time": null,
  "outdated": false,
  "sha256": "4fe7f5f50fe9cec0ba961f0e17fc3fa7344b05f01d964d60c10cfc7ee9e4c7c4",
  "artifacts": [
    {
      "app": [
        "1Password.app"
      ]
    },
    {
      "zap": [
        {
          "trash": [
            "~/Library/Application Scripts/2BUA8C4S2C.com.1password*",
            "~/Library/Application Scripts/2BUA8C4S2C.com.agilebits",
            "~/Library/Application Scripts/com.1password.1password-launcher",
            "~/Library/Application Scripts/com.1password.browser-support",
            "~/Library/Application Support/1Password",
            "~/Library/Application Support/Arc/User Data/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/com.apple.sharedfilelist/com.apple.LSSharedFileList.ApplicationRecentDocuments/com.1password.1password.sfl*",
            "~/Library/Application Support/CrashReporter/1Password*",
            "~/Library/Application Support/Google/Chrome Beta/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Google/Chrome Canary/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Google/Chrome Dev/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Google/Chrome/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Microsoft Edge Beta/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Microsoft Edge Canary/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Microsoft Edge Dev/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Microsoft Edge/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Mozilla/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Application Support/Vivaldi/NativeMessagingHosts/com.1password.1password.json",
            "~/Library/Containers/2BUA8C4S2C.com.1password.browser-helper",
            "~/Library/Containers/com.1password.1password*",
            "~/Library/Containers/com.1password.browser-support",
            "~/Library/Group Containers/2BUA8C4S2C.com.1password",
            "~/Library/Group Containers/2BUA8C4S2C.com.agilebits",
            "~/Library/Logs/1Password",
            "~/Library/Preferences/com.1password.1password.plist",
            "~/Library/Preferences/group.com.1password.plist",
            "~/Library/Saved Application State/com.1password.1password.savedState"
          ]
        }
      ]
    }
  ],
  "caveats": null,
  "depends_on": {
    "macos": {
      ">=": [
        "10.15"
      ]
    }
  },
  "conflicts_with": {
    "cask": [
      "homebrew/cask-versions/1password-beta",
      "homebrew/cask-versions/1password-nightly"
    ]
  },
  "container": null,
  "auto_updates": true,
  "tap_git_head": "c256ad3460b28b2b41e46045fb7871406c46b0c5",
  "languages": [],
  "ruby_source_path": "Casks/1/1password.rb",
  "ruby_source_checksum": {
    "sha256": "f761848b5482f70c4df09158ac756c31f13bce989aa0f490956d738878521a3a"
  },
  "variations": {
    "arm64_sonoma": {
      "url": "https://downloads.1password.com/mac/1Password-8.10.20-aarch64.zip",
      "sha256": "9266a5e707ea08a02583b73ec551ccdfdaabb9cafe29ac6be2934163029fb398"
    },
    "arm64_ventura": {
      "url": "https://downloads.1password.com/mac/1Password-8.10.20-aarch64.zip",
      "sha256": "9266a5e707ea08a02583b73ec551ccdfdaabb9cafe29ac6be2934163029fb398"
    },
    "arm64_monterey": {
      "url": "https://downloads.1password.com/mac/1Password-8.10.20-aarch64.zip",
      "sha256": "9266a5e707ea08a02583b73ec551ccdfdaabb9cafe29ac6be2934163029fb398"
    },
    "arm64_big_sur": {
      "url": "https://downloads.1password.com/mac/1Password-8.10.20-aarch64.zip",
      "sha256": "9266a5e707ea08a02583b73ec551ccdfdaabb9cafe29ac6be2934163029fb398"
    }
  }
}

MacUpdate

{
  "id": 27559,
  "group_id": 27559,
  "title": "123 Flash Chat",
  "title_slug": "123-flash-chat",
  "seo_title": null,
  "description": "<p><em><strong>Note:</strong> 123 Flash Chat is no longer under development, and it is no longer available for download.</em></p>\r\n<p><strong>123 Flash Chat</strong> can add a chat room to website in minutes. It is a live chat solution for networking with multiple skins & chat clients,such as iPhone Chat,Avatar Chat. Other features: New Post Notifier; enhanced HTML Chat supports iPad.</p>",
  "short_description": "Flash chat client.",
  "version": "10.0",
  "date": {
    "date": "2013-11-05T13:40:13+00:00",
    "timestamp": 1383658813
  },
  "requirements": {
    "minimum_os": "Mac OS X 10.1.5",
    "supports_macos": true,
    "supports_ios": false,
    "is_64bit": true,
    "is_32bit": true,
    "architectures": [
      "Intel 64",
      "Intel 32",
      "PPC 64",
      "PPC 32"
    ],
    "other_list": [],
    "other": null
  },
  "file_size": 127590,
  "release_notes": "Version 10.0: \r\n<br> \r\n<br><strong>Note:</strong> Please be aware that all the changes are on the HTML client. \r\n<ul> \r\n\t<li>Add hand-raising function</li> \r\n<ul> \r\n\t<li>Regular users can apply to queue for broadcasting video once it is enabled</li> \r\n\t<li>Admins can approve/disapprove users' request to broadcast video with hand-raising mode</li> \r\n</ul> \r\n\t<li>Push to talk mode is enabled in Audio/Video settings</li> \r\n\t<li>Media Player is enabled</li> \r\n<ul> \r\n\t<li>Mp3/mp4/ogg/webm,etc. are supported</li> \r\n\t<li>Youtube is supported</li> \r\n\t<li>Pause/Play function is realized</li> \r\n\t<li>Volume control is realized</li> \r\n\t<li>Add full screen toggle to the media player, and each video-window can be minimized or maximized.</li> \r\n\t<li>Repeating/previous/next is realized for the playlist</li> \r\n\t<li>Auto resizing is realized, media player will be minimized when height is dwindled to a set value</li> \r\n</ul> \r\n\t<li>YouTube message is supported, media player window will be displayed in the chat area which can be popped up or folded</li> \r\n\t<li>New Web version design</li> \r\n<ul> \r\n\t<li>UI is changed</li> \r\n\t<li>Current Skins are polished</li> \r\n\t<li>A new light red skin is added</li> \r\n</ul> \r\n\t<li>Access configuration is supported for moderated-chat mode. Admin's access to enable/disable moderated-chat mode is supported</li> \r\n\t<li>Access configuration is supported for silent mute function. Admin's access to enable/disable silent mute function is supported</li> \r\n\t<li>IP Range Ban function</li> \r\n<ul> \r\n\t<li>Admin can ban a specific IP range</li> \r\n\t<li>User will still stay in the current room even if his IP range is banned</li> \r\n\t<li>User cannot come back once he has exited the room by himself after his IP range was banned</li> \r\n</ul> \r\n\t<li>Clear screen for all users function. Admin can clear the screen for all users</li> \r\n\t<li>User list position can be changed in the admin panel, right or left</li> \r\n</ul>",
  "bundle_identifiers": [],
  "price": {
    "value": 19900,
    "currency": "USD"
  },
  "download_count": 5046,
  "download_url": {
    "type": "download_file",
    "url": "http://www.123flashchat.com/dl/v100/123flashchat_s.tar.gz"
  },
  "purchase_url": "",
  "mudesktop_url": "mudesktop://download/27559/123flashchat-s-tar-gz/10.0/123flashchat_s.tar.gz/1383658813/",
  "video": [],
  "logo": {
    "id": 131927,
    "source": "https://dl2.macupdate.com/images/icons256/27559.png",
    "s_webp": null,
    "s_png": null,
    "m_webp": null,
    "m_png": null,
    "l_webp": null,
    "l_png": null,
    "created": {
      "date": "2019-09-12 15:28:14.000000",
      "timezone_type": 1,
      "timezone": "+00:00"
    },
    "updated": {
      "date": "2019-09-12 15:28:14.000000",
      "timezone_type": 1,
      "timezone": "+00:00"
    }
  },
  "screenshots": [
    {
      "id": 1174505,
      "source": "https://screenshots.macupdate.com/JPG/27559/27559_1569584753_scr.jpg",
      "s_webp": null,
      "s_png": null,
      "m_webp": "https://static.macupdate.com/screenshots/221310/m/123-flash-chat-screenshot.webp?v=1571060627",
      "m_png": "https://static.macupdate.com/screenshots/221310/m/123-flash-chat-screenshot.png?v=1571060628",
      "l_webp": null,
      "l_png": null,
      "created": {
        "date": "2019-10-14 13:43:48.000000",
        "timezone_type": 1,
        "timezone": "+00:00"
      },
      "updated": {
        "date": "2019-10-14 13:43:48.000000",
        "timezone_type": 1,
        "timezone": "+00:00"
      }
    }
  ],
  "rating": 0.5,
  "review_count": 3,
  "rate_count": 1,
  "license": "Demo",
  "vendor": null,
  "computed_rank": 748,
  "category": {
    "id": 10,
    "parent_id": null,
    "parent_name": null,
    "slug": "internet-utilities",
    "name": "Internet Utilities",
    "description": "Browse our curated collection of messengers, email clients, file managers, and other apps to simplify your online experience.",
    "url": "https://www.macupdate.com/explore/categories/internet-utilities",
    "children": null
  },
  "subcategory": {
    "id": 160,
    "parent_id": 10,
    "parent_name": "Internet Utilities",
    "slug": "internet-utilities/messengers",
    "name": "Messengers",
    "description": "Send instant messages or make video calls with fast and secure chat apps.",
    "url": "https://www.macupdate.com/explore/categories/internet-utilities/messengers",
    "children": null
  },
  "unsupported": true,
  "redirect": null,
  "faq": null,
  "learn_more": null,
  "is_following": null,
  "is_mud_install_available": false,
  "versions": null,
  "last_scan": null,
  "monetization": null,
  "member_rating": null,
  "member_review": null,
  "review_report": {
    "rating": [],
    "label_positive": [],
    "label_negative": []
  },
  "developer": {
    "name": "Topcmm Computing Inc.",
    "url": "http://www.123flashchat.com/",
    "support": "",
    "email": "[email protected]"
  },
  "nofollow": [],
  "is_hidden": 0
}

Conclusions

Our method of collecting macOS apps does not suffer any regional restrictions imposed by the app stores and results in 12k items. There are several ways to further improve it, starting with a more broad selection of OSS macOS apps to reduce possible biases of the currently used curated collection from GitHub. We would also like to complement our dataset with MAS data and verify if we would get similar results to those of [2] when studying the dataset overlap.

  • [1] Allix, Kevin et al. "AndroZoo: collecting millions of Android apps for the research community." Proceedings of the 13th International Conference on Mining Software Repositories, 2016, pp. 468-471. ACM, https://doi.org/10.1145/2901739.2903508

  • [2] Blochberger, Maximilian et al. "State of the Sandbox: Investigating macOS Application Security." Proceedings of the 18th ACM Workshop on Privacy in the Electronic Society, 2019, pp. 1150-161. ACM, https://doi.org/10.1145/3338498.3358654

  • [3] Bumanglag, Kimo. "An Application of Machine Learning to Analysis of Packed Mac Malware." Ph. D. Dissertation, Dakota State University, 2022, https://scholar.dsu.edu/cgi/viewcontent.cgi?article=1382\&context=theses

  • [4] Xing, Luyi et al. "Cracking App Isolation on Apple: Unauthorized Cross-App Resource Access on MAC OS." Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015, pp. 31-43. ACM, https://doi.org/10.1145/2810103.2813609.

  • [5] Walkup, Elizabeth. "Mac Malware Detection via Static File Structure Analysis." University of Stanford, 2014, https://cs229.stanford.edu/proj2014/Elizabeth%20Walkup,%20MacMalware.pdf.

  • [6] Burgardt, Caio Augusto Pereira. "Malware detection in macOS using supervised learning." M. Sc. Thesis, Universidade Federal de Pernambuco, 2022, https://repositorio.ufpe.br/handle/123456789/46235

  • [7] Chen, Alex Chenxingyu and Wulff, Kenneth. "Machine learning for OSX malware detection." Handbook of Big Data Analytics and Forensics, pp. 209-222, 2022. Springer

  • [8] Gharghasheh, Samira Eisaloo and Hadayeghparast, Shahrzad. "Mac OS X Malware Detection with Supervised Machine Learning Algorithms." Handbook of Big Data Analytics and Forensics, pp. 193-208, 2022. Springer

  • [9] Sahoo, Dilip and Dhawan, Yash. "Evaluation of supervised and unsupervised machine learning classifiers for Mac OS malware detection." Handbook of Big Data Analytics and Forensics, pp. 159-175, 2022. Springer

  • [10] Orikogbo, Damilola et al. "CRiOS: Toward Large-Scale iOS Application Analysis." Proceedings of the 6th Workshop on Security and Privacy in Smartphones and Mobile Devices, 2016, pp. 33-42. ACM, http://dl.acm.org/citation.cfm?id=2994473

  • [11] Pajouh, Hamed Haddad et al. "Intelligent OS X malware threat detection with code inspection." Journal of Computer Virology and Hacking Techniques, vol. 14, no. 3, 2018, pp. 213-223. https://doi.org/10.1007/s11416-017-0307-5

  • [12] Manna, Modhuparna, et al. "Modern macOS userland runtime analysis." Forensic Science International: Digital Investigation 38 (2021): 301221. https://www.sciencedirect.com/science/article/abs/pii/S2666281721001293

  • [13] Pham, DP., et al. "Mac-A-Mal: macOS malware analysis framework resistant to anti evasion techniques." J Comput Virol Hack Tech 15, 249–257 (2019). https://doi.org/10.1007/s11416-019-00335-w

Mar 27, 2024

macOS Applications Bundles and Metadata

24000+ macOS applications bundles collected from public sources. Contains archived applications bundles and public metadata

Related publications