Software Engineer

Apply

Overview

CUDA Developer | High-Performance Computing | Applied AI Location: UK-based Remote Type: Contract, Outside IR35, Remote Sector: Advanced Computing / Applied AI We’re partnering with a company building next-generation GPU-accelerated software for scientific and AI applications. We are recruiting for a CUDA Developer who’s passionate about getting every ounce of performance out of modern hardware — someone who loves tuning kernels, benchmarking workloads, and finding elegant solutions to complex computational problems. This is an opportunity to work with a small, expert team where your technical decisions will shape the foundation of an emerging AI technology. What You’ll Be Doing Designing and optimising CUDA kernels for high-performance workloads. Translating advanced algorithms into production-ready GPU-accelerated code. Profiling performance and reducing bottlenecks using Nsight, CUPTI, and custom tooling. Working with C++ engineers and ML researchers to deliver scalable AI computation pipelines. Contributing to architecture decisions on parallelisation, data transfer, and memory efficiency. What We’re Looking For Deep experience with CUDA C/C++ and modern C++ (17/20). Strong understanding of GPU architecture, memory management, and parallelism. Familiarity with OpenMP, MPI, or other HPC frameworks. Bonus points for exposure to AI/ML workloads or scientific computing. Pragmatic and collaborative — you enjoy working in fast-moving, high-impact environments. Why This Role? You’ll be part of a technically elite, low-ego team solving problems at the cutting edge of performance engineering. Your work will be deeply visible. The difference between “it works” and “it flies”. If you love performance, parallelism, and precision, then please apply with a current CV for more information.

Responsibilities

  • Design and optimize CUDA kernels for high-performance workloads.
  • Translate advanced algorithms into production-ready GPU-accelerated code.
  • Profile performance and reduce bottlenecks using Nsight, CUPTI, and custom tooling.
  • Collaborate with C++ engineers and ML researchers to develop scalable AI computation pipelines.
  • Contribute to architecture decisions regarding parallelization, data transfer, and memory efficiency.

Requirements

  • Deep experience with CUDA C/C++ and modern C++ (C++17/20).
  • Strong understanding of GPU architecture, memory management, and parallelism.
  • Familiarity with OpenMP, MPI, or other HPC frameworks.
  • Exposure to AI/ML workloads or scientific computing is a plus.
  • Pragmatic and collaborative approach in fast-paced, high-impact environments.
SkillsC, C++
LocationUnited Kingdom
TypeOn-site
SourceLinkedIn
Posted23/10/25