[Hiring] Machine Learning Researcher, Audio @Bland

šŸŒ Remote, USA šŸ’¹ Full-time šŸ• Posted Recently

Job Description

This description is a summary of our understanding of the job description. Click on 'Apply' button to find out more.

Role Description

    As a Machine Learning Researcher at Bland, you'll be working on foundational research and development across the core components of our voice stack: speech-to-text, large language models, neural audio codecs, and text-to-speech. Your work will define how our agents understand, reason, and speak in real time at enterprise scale.
  • Build and Scale Next-Generation TTS Systems
  • Design and train large scale text-to-speech models capable of expressive, controllable, human-sounding output.
  • Develop neural audio codec-based TTS architectures for efficient, high-fidelity generation.
  • Improve prosody modeling, question inflection, emotional expression, and multi-speaker robustness.
  • Optimize for real-time, low-latency inference in production.
  • Advance Speech-to-Text Modeling
  • Build and fine-tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching.
  • Leverage self-supervised pretraining and large-scale weak supervision.
  • Improve transcription accuracy for real-world enterprise scenarios, including structured extraction and conversational nuance.
  • Pioneer Neural Audio Codecs
  • Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss.
  • Explore discrete and continuous latent representations for scalable speech modeling.
  • Design codec architectures that enable downstream generative modeling and controllable synthesis.
  • Develop Scalable Training Pipelines
  • Curate and process massive audio datasets across languages, speakers, and environments.
  • Design staged training curricula and data filtering strategies.
  • Scale training across distributed GPU clusters focusing on cost, throughput, and reliability.
  • Run Rigorous Experiments
  • Design ablation studies that isolate the impact of architectural changes.
  • Measure improvements using both objective metrics and perceptual evaluations.
  • Validate ideas quickly through focused experiments that confirm or eliminate hypotheses.
    Qualifications
  • Experience with self-supervised learning, multimodal modeling, or generative modeling.
  • Hands-on experience building or scaling TTS, STT, or neural audio codec systems.
  • Familiarity with large scale speech datasets and real-world audio variability.
  • Experience training and serving large models on modern accelerators.
  • Track record of designing controlled experiments and meaningful ablations.
  • Comfortable in fast-moving startup environments.
    Requirements
  • Ability to derive new formulations and implement them efficiently.
  • Strong intuition for audio quality, prosody, and conversational dynamics.
  • Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency.
  • Understanding of real-time constraints in telephony or streaming environments.
  • Ability to move quickly from hypothesis to validation.
  • Strong ownership mindset from research through deployment.
  • Excited by ambiguous, unsolved problems.
    Benefits
  • Healthcare, dental, vision, all the good stuff
  • Meaningful equity in a fast-growing company
  • Every tool you need to succeed
  • Beautiful office in Jackson Square, SF with rooftop views
  • Competitive salary: $160,000 to $250,000

Apply tot his job

Apply To this Job

Ready to Apply?

Don't miss out on this amazing opportunity!

šŸš€ Apply Now

Similar Jobs

Recent Jobs

You May Also Like