Research Scientist – Acoustic and Multi-Modal Scene Understanding

09/01/2025

Apply Now

Deadline date:

Job Description

At Meta’s Reality Labs Research, our goal is to make world-class consumer virtual, augmented, and mixed reality experiences. Come work alongside industry-leading scientists and engineers to create the technology that makes VR, AR and smart wearables pervasive and universal. Join the adventure of a lifetime as we make science fiction real and change the world. We are a world-class team of researchers and engineers creating the future of augmented and virtual reality, which together will become as universal and essential as smartphones and personal computers are today. And just as personal computers have done over the past 45 years, AR, VR and smart wearables will ultimately change everything about how we work, play, and connect.We are developing all the technologies needed to enable breakthrough Smartglasses, AR glasses and VR headsets, including optics and displays, computer vision, audio, graphics, brain-computer interfaces, haptic interaction, eye/hand/face/body tracking, perception science, and true telepresence. Some of those will advance much faster than others, but they all need to happen to enable AR and VR that are so compelling that they become an integral part of our lives.The Audio team within RL Research is looking for an experienced Research Scientist with an in-depth understanding of real-time and efficient signal processing and machine learning on audio and multi-modal signals to join our team. You will be doing core and applied research in technologies that improve listener’s hearing abilities under challenging listening conditions using wearable computing, and alongside a team of dedicated researchers, developers, and engineers. You will operate at the intersection of egocentric perception, acoustics, computer vision, and signal processing algorithms with hardware and software co-design.

Research Scientist – Acoustic and Multi-Modal Scene Understanding Responsibilities:

Design innovative solutions for challenging multi-modal egocentric recognition problems with resource constraints
Communicate research results internally and externally in the form of technical reports and scientific publications
Experience of consistently working under own initiative implementing state of the art models and techniques on Pytorch, Tensorflow or other platforms, seeking feedback and input where appropriate
Identify, motivate, and execute on reasonable medium to large hypotheses (each with many tasks) for model improvements through data analysis, and domain knowledge, with capacity to communicate learnings effectively.
Design, perform, and analyze online and offline experiments with specific and well thought-out hypotheses in mind.
Generate reliable, correct training data with great attention to detail.
Identify and debug common issues in training machine learning models such as overfitting/underfitting, leakage, offline/online inconsistency
Aware of common systems considerations and modeling issues, and factor this into modeling choices.
Design acoustic or audio-visual models which can have a small computational footprint on mobile devices and wearables such as smart glasses.

Minimum Qualifications:

Currently has a PhD or a postdoctoral assignment in the field of deep learning, Machine Learning, Computer Vision, Computer Science, Computer Engineering or Statistics or a related field.
4+ years experience with development and implementation of signal processing and deep learning algorithms in the fields of acoustic and multi-modal detection, recognition and/or tracking problems.
4+ years experience with scientific programming languages such as Python, C++, or similar.
3+ years experience with research-oriented software engineering skills, including fluency with machine learning (e.g., PyTorch, TensorFlow, Scikit-learn, Pandas) and libraries for scientific computing (e.g. SciPy ecosystem).
Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.
Demonstrated experience of implementing and evaluating end-to-end prototypical learning systems
Ability to independently resolve most online and offline issues which affect the hypothesis testing
Understand the model architecture used, and the consequences of this for different hypotheses tested. In general, you have a good understanding of computer vision from an applied perspective, even though you may not be up to date with the state of the art.
Experience in communicating effectively with a broad range of stakeholders and collaborators at different levels

Preferred Qualifications:

Experience with audio-visual learning, computer vision, source localization and tracking, audio and visual SLAM systems, egocentric multimodal learning, etc.
Experience with building low-complexity models on acoustic and multi-modal problems aimed at low-power mobile devices and wearables
Experience with integration of development models on real-time running mobile platforms with different levels of compute (on-sensor computation, system on chip, low power island, etc.)
Experience with acoustic localization or visual multi-object tracking problems
Proven track record of achieving significant results and innovation as demonstrated by first-authored publications and patents.

About Meta:

Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today—beyond the constraints of screens, the limits of distance, and even the rules of physics.

Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.

Meta

Research Scientist – Acoustic and Multi-Modal Scene Understanding

Job Description

Research Scientist – Acoustic and Multi-Modal Scene Understanding Responsibilities:

Minimum Qualifications:

Preferred Qualifications:

About Meta:

Software Engineer – Fulfilment Logistics

Associate Customer Success Manager – Entrepreneur (French Speaking)

Principal Customer Success Manager – SMB

Follow Us:

033-3284-8888

Quick Links

For Candidates

For Employers

Download App

Login to jobtex

Reset Password

Create a free jobtex account

Meta

Research Scientist – Acoustic and Multi-Modal Scene Understanding

Job Description

Research Scientist – Acoustic and Multi-Modal Scene Understanding Responsibilities:

Minimum Qualifications:

Preferred Qualifications:

About Meta:

Share this post:

Related Jobs

Software Engineer – Fulfilment Logistics

Associate Customer Success Manager – Entrepreneur (French Speaking)

Principal Customer Success Manager – SMB

Follow Us:

033-3284-8888

Quick Links

For Candidates

For Employers

Download App