Role overview

JOB TITLE
Senior Applied Machine Learning Engineer (audio / music generation)

ABOUT THE ROLE
We’re building an AI-powered music system focused on commercial-ready audio generation. Our initial priority is getting the music generation quality right - structure, musicality, consistency, and production readiness.

We are looking for a Senior Applied ML Engineer to own the end-to-end audio generation pipeline for our MVP. This role is hands-on and pragmatic: you’ll fine-tune open-source music models, integrate inference pipelines, and work closely with audio and backend engineers to deliver usable results quickly and efficiently. This role starts as a contract engagement (details below), with a path to full-time position for the right fit.

ROLE DETAIL

Terms: Fixed-term (5 months) | Potential full-time conversion
Compensation: $30,000 (Full 5 Month Term)
Location: On-site (Monrovia, CA) with possibility of remote

WHAT YOU’LL WORK ON

Fine-tuning open-source music generation models.
Implement conditioning controls (beats per minute, key, mood, section, density).
Training and deploying parameter-efficient fine-tunes (LoRA / adapters).
Building reference-conditioned generation.
Support long-form generation via chunking and continuation.
Integrating with Backend inference pipelines and APIs.
Collaborating with audio DSP engineers to ensure outputs are production ready.

REQUIRED QUALIFICATIONS

Strong experience with Python and PyTorch.
Hands-on experience with audio or speech generation models.
Familiarity with diffusion or autoregressive generative models.
Experience using or fine-tuning open-source ML models, familiar with HF Interfaces.
Understanding of audio representations.
Experience deploying ML models to production or API environments.

NICE-TO-HAVE SKILLS

Familiarity with CLAP / audio embeddings or retrieval-assisted generation.
Experience working with LoRA / PEFT methods.
Basic understanding of audio production workflows (tempo, key, stems, loudness).
Experience Optimizing inference cost and latency.

ROLE GOALS & OBJECTIVES

Reliably generate musically coherent, commercial-friendly cues (30 ~ 120 seconds)
The model responds correctly to conditioning inputs like tempo, key and mood
Outputs are stable, repeatable and usable downstream by post-production tools
The system is modular and ready to be integrated with downstream models.

Show less

Seniority level

Mid-Senior level

Employment type

Contract

Job function

Engineering and Information Technology

Industries

IT System Custom Software Development

Tags & focus areas

Used for matching and alerts on DevFound

Machine Learning Ai

Machine Learning Engineer

Role overview

Tags & focus areas

Ready to Join the Team?