Ubicoustics: Plug-and-Play Acoustic Activity Recognition
Despite sound being a rich source of information, computing devices with microphones do not leverage audio to glean useful insights about their physical and social context. For example, a smart speaker sitting on a kitchen countertop cannot figure out if it is in a kitchen, let alone know what a user is doing in a kitchen – a missed opportunity. In this work, we describe a novel, real-time, sound-based activity recognition system. We start by taking an existing, state-of-the-art sound labeling model, which we then tune to classes of interest by drawing data from professional sound effect libraries traditionally used in the entertainment industry. These well-labeled and high-quality sounds are the perfect atomic unit for data augmentation, including amplitude, reverb, and mixing, allowing us to exponentially grow our tuning data in realistic ways. We quantify the performance of our approach across a range of environments and device categories and show that microphone-equipped computing devices already have the requisite capability to unlock real-time activity recognition comparable to human accuracy.