GLiNER2Swift: Unified Schema-Based Information Extraction for Apple Silicon

Introduction

We present GLiNER2Swift, a native Swift/MLX implementation of GLiNER2, a unified schema-based information extraction framework. GLiNER was originally introduced by Zaratiana et al. as a generalist and lightweight Named Entity Recognition (NER) model. GLiNER2 subsequently extended this paradigm toward a broader schema-driven information extraction framework.

Our work brings this unified, schema-based extraction approach to the Apple ecosystem with first-class Apple Silicon support. Designed as a CPU-first, production-ready library for macOS, GLiNER2Swift enables developers and researchers to deploy advanced information extraction pipelines directly in Swift applications without relying on Python runtimes or GPU acceleration.

Background

The GLiNER framework was introduced as a generalist, lightweight alternative to traditional NER task systems, allowing entity extraction without task-specific retraining. The original GLiNER paper:

Zaratiana, U., Tomeh, N., Holat, P., & Charnois, T. (2023). GLiNER: Generalist and Lightweight Model for Named Entity Recognition. arXiv preprint arXiv:2311.08526. https://arxiv.org/abs/2311.08526

proposes a model that performs NER using label descriptions rather than fixed classification heads. As stated in the paper:

"GLiNER is a generalist model for Named Entity Recognition that can generalize to arbitrary entity types defined at inference time without task-specific fine-tuning." — Zaratiana et al., 2023

GLiNER2 extends this idea into a broader schema-based information extraction framework, supporting not only NER but also classification and structured extraction under a unified architecture. The reference Python implementation is available at: https://github.com/fastino-ai/gliner2

Motivation

Despite rapid progress in NLP research, most modern frameworks remain Python-centric and GPU-dependent. This creates friction for macOS-native development, especially in contexts where:

On-device inference is required for privacy or latency
Deployment targets Apple Silicon hardware
Swift is the primary programming language
Tight integration with native macOS applications is needed

GLiNER2Swift addresses this gap by providing a numerically faithful port of GLiNER2 implemented fully in Swift and optimized for macOS 14+ on Apple Silicon (M1/M2/M3).

Unified Schema-Based Extraction in Swift

GLiNER2Swift preserves the schema-driven paradigm introduced by GLiNER. Instead of training separate models for NER, classification, and structured extraction, developers define extraction tasks dynamically via label schemas.

The library currently supports:

Named Entity Recognition (NER)
Text Classification
Structured Data Extraction

Relation extraction support is currently in progress.

Example usage:

let model = try await GLiNER2.fromPretrained("macpaw-research/gliner2-base-v1")

let entities = try model.extractEntities(
    from: "Tim Cook is CEO of Apple in Cupertino.",
    labels: ["person", "company", "location"]
)

This design enables rapid adaptation to new domains without retraining, aligning with the original GLiNER philosophy of inference-time label flexibility.

Design Principles

GLiNER2Swift follows several core principles:

Native-first: pure Swift implementation without Python bridges
On-device by default: privacy-preserving local inference
Reproducibility: architectural fidelity to the reference implementation
Developer ergonomics: Swift Package Manager integration
Extensibility: foundation for future fine-tuning and adapter support

Architecture and Implementation

GLiNER2Swift is a direct architectural port of the Python GLiNER2 implementation and aims to achieve numerical parity with it. The model architecture includes the following modules:

Encoder: DeBERTa v3 with disentangled attention
Span Marker: MLP-based span scoring
Count LSTM: entity count prediction
Downscaled Transformer: schema embedding

The implementation leverages:

MLX (https://github.com/ml-explore/mlx) for tensor computation
macOS-native Swift concurrency mechanisms for asynchronous loading and inference

The system is CPU-first and does not require GPU acceleration.

Currently supported models:

fastino/gliner2-base-v1 (205M parameters)
macpaw-research/gliner2-base-v1_mlx (FP-16 instead of FP-32 to reduce model size)

Planned features include training loops, relation extraction, and additional model variants.

Performance

We ran the benchmarks for our model on Apple M3 Pro, macOS 14+, CPU-only inference, using the macpaw-research/gliner2-base-v1_mlx model.

Task	Mean	Min	Max
Entity Extraction	334.8 ms	324.9 ms	339.6 ms
Classification	59.6 ms	58.9 ms	60.1 ms
Structured Extraction	288.3 ms	284.5 ms	291.0 ms
Combined (all 3 tasks)	351.9 ms	344.0 ms	358.3 ms

Applications and Impact

GLiNER2Swift enables:

Intelligent document parsing
Flexible, on-device zero-shot NER
Structured extraction from various document types
Real-time zero-shot classification pipelines

By bringing GLiNER2 to Swift, we bridge modern NLP research and native Apple platform development. This work demonstrates that transformer-based schema-driven extraction can run efficiently on-device, without GPU dependency or server infrastructure.

GLiNER2Swift represents a step toward privacy-preserving, local-first AI tooling for macOS — bringing unified information extraction directly into Swift applications while staying faithful to the original GLiNER research vision.

Code

GLiNER2Swift on GitHub