Masters Program

Intern / Student, Full-time · London

Your mission
At Papercup we’re on a mission to make the world’s videos watchable in any language. We’ve invented a patented AI system that generates humanlike synthetic voices across languages, allowing people to watch video content in the language of their choice. Our translated and dubbed content has allowed the likes of Insider, Discovery, Sky News, and Canva to reach over 300 million people globally in just the last year.Having just completed a $20 million Series A round, we're on the hunt for top people to join our ambitious mission.
We’re backed by some of the industry’s heaviest hitters - venture funds like Octopus Ventures, world-renowned angel investors including Des Traynor (co-founder of Intercom) and John Collison (co-founder of Stripe), as well as global media groups like Sky and Guardian Media Group.

We are driven, curious and passionate - our company culture is imperative to us and we set a high bar for those who join theteam. We're also fun to be around (at least that's what people tell us).
Your profile
About the role:

At Papercup, you will be part of a great team pushing the boundaries of neural text-to-speech and speech-to-speech translation systems. Our team works closely with leading speech processing academics as advisors - Mark Gales and Simon King and regularly publishes in top speech conferences. You will apply modern machine learning techniques to model the way people speak (prosody), where they put intonation, how they create emotion, etc. The exact direction of the project will depend on the interests of the student, but we see two main areas of focus:

  • Applying self-supervised learning and foundation models to prosody modelling
    • Our aim is to leverage self supervised learning and foundation models to aid our prosody modelling
    • We have a very large human enhanced synthetic training set that we can use to train very large prosody model
  • Audio production using machine learning
    • To create a realistic sounding voice the synthetic voice must sound like it is in the correct environment, similar to creating the correct lighting of an object in image synthesis
    • Here we want to apply machine learning automatically solve this audio production task
    • And much more. Please get in touch for more details.
Related papersMust haves:
  • This is an internship for Masters Student in Machine Learning
    Experience developing machine learning models using PyTorch or TensorFlow
  • Theoretical understanding of deep learning
  • Desire to lead your own research
Nice to haves:
  • Experience with generative modelling
  • Experience working with ASR and/or TTS systems
  • Good knowledge of audio and signal processing fundamentals
  • Familiarity with AWS, GCP, Kubernetes, Azure

About us
Having recently raised a $20 million Series A round we are backed by leading venture capital funds and two media companies, Papercup is a machine learning startup making the world's video content watchable in any language. Using a cutting-edge AI we automatically translate and voiceover videos in multiple languages, helping enterprises, brands, and media companies reach a global audience with their existing content.

We are looking forward to hearing from you!
Thank you for your interest in Papercup. Please fill out the following short form. Should you have difficulties with the upload of your data, please send an email to

Please upload your CV and any cover letter you wish to add (max. 20 MB in total).

Click to select multiple files or use drag-and-drop
Click to select multiple files or use drag-and-drop

Uploading document. Please wait.