Software Engineer - United Kingdom | Outside IR35 Tech Jobs

Overview

CUDA Developer | High-Performance Computing | Applied AI Location: UK-based Remote Type: Contract, Outside IR35, Remote Sector: Advanced Computing / Applied AI We’re partnering with a company building next-generation GPU-accelerated software for scientific and AI applications. We are recruiting for a CUDA Developer who’s passionate about getting every ounce of performance out of modern hardware — someone who loves tuning kernels, benchmarking workloads, and finding elegant solutions to complex computational problems. This is an opportunity to work with a small, expert team where your technical decisions will shape the foundation of an emerging AI technology. What You’ll Be Doing Designing and optimising CUDA kernels for high-performance workloads. Translating advanced algorithms into production-ready GPU-accelerated code. Profiling performance and reducing bottlenecks using Nsight, CUPTI, and custom tooling. Working with C++ engineers and ML researchers to deliver scalable AI computation pipelines. Contributing to architecture decisions on parallelisation, data transfer, and memory efficiency. What We’re Looking For Deep experience with CUDA C/C++ and modern C++ (17/20). Strong understanding of GPU architecture, memory management, and parallelism. Familiarity with OpenMP, MPI, or other HPC frameworks. Bonus points for exposure to AI/ML workloads or scientific computing. Pragmatic and collaborative — you enjoy working in fast-moving, high-impact environments. Why This Role? You’ll be part of a technically elite, low-ego team solving problems at the cutting edge of performance engineering. Your work will be deeply visible. The difference between “it works” and “it flies”. If you love performance, parallelism, and precision, then please apply with a current CV for more information.

Responsibilities

Design and optimize CUDA kernels for high-performance workloads.
Translate advanced algorithms into production-ready GPU-accelerated code.
Profile performance and reduce bottlenecks using Nsight, CUPTI, and custom tooling.
Collaborate with C++ engineers and ML researchers to develop scalable AI computation pipelines.
Contribute to architecture decisions regarding parallelization, data transfer, and memory efficiency.

Requirements

Deep experience with CUDA C/C++ and modern C++ (C++17/20).
Strong understanding of GPU architecture, memory management, and parallelism.
Familiarity with OpenMP, MPI, or other HPC frameworks.
Exposure to AI/ML workloads or scientific computing is a plus.
Pragmatic and collaborative approach in fast-paced, high-impact environments.

Skills	C, C++
Location	United Kingdom
Type	On-site
Source	LinkedIn
Posted	23/10/25