At Embedded World 2026, Gateworks and NXP unveiled a new AI acceleration card, promising modular performance gains through a decoupled architecture for industrial edge computing.
Gateworks, in collaboration with NXP Semiconductors, has unveiled a new M.2 AI acceleration card, the GW16168, at Embedded World 2026 in Nuremberg, Germany.
Designed, tested, and assembled in the US, the card introduces a ‘decoupled AI architecture” approach that aims to simplify the deployment of high-performance AI workloads without requiring full system redesigns.
The GW16168 integrates NXP’s Ara240 discrete neural processing unit (DNPU), passively cooled to meet industrial-grade standards.
With 16GB of LPDDR4 memory, the card enables inference tasks to be offloaded from host central processing units (CPUs), reducing utilisation and avoiding common out-of-memory errors when running large models such as vision transformers or language models.
Traditional AI hardware often forces developers to choose between repurposed GPUs with high thermal demands or embedded CPUs and NPUs limited by latency and heat.
The companies noted that earlier modular accelerators offered flexibility but lacked sufficient compute capacity. Gateworks’ new card revives modularity while delivering up to 40 eTOPS of performance, described as ‘GPU-class’ capability within a 6.6W power envelope.
Thermal management is a key feature, with the design enabling reliable operation in sealed, fanless environments. Gateworks reports a decade-long lifespan for the modules, supported by advanced thermal engineering.
The GW16168 is backed by NXP’s Ara240 SDK ecosystem, which provides compiler toolchains, support for TensorFlow, PyTorch and ONNX, and integrated model conversion utilities. This framework simplifies the transition of existing AI models to edge deployment.
Ravi Annavajjhala, Vice President and General Manager of Neural Processing Units at NXP, said the card demonstrates why decoupled AI architectures are the future of edge computing, offering scalability without full hardware redesign.
“This brings flexibility, longevity and cost efficiency to real-world AI deployments,” he said.
The GW16168 and its development kit will be available through distributors including DigiKey, Braemac, RoundSolutions and Avnet, with shipping scheduled to begin in late May.


















