Imagine if you could settle/rekindle domestic arguments by asking your smart speaker when the room last got cleaned or whether the bins have already got taken out. Or — for an healthier use case — what if you ask your speaker to keep count of reps as you do squats and bench presses? Or switch into full-on ‘personal trainer’ mode — barking orders to peddle faster as you spin (who needs a Peloton!).
And what if the speaker was smart enough to know you’re eating dinner and took care of slipping on some mood music? Imagine if all those activity-tracking smarts were on tap without any connected cameras in your home.
Another bit of fascinating research from researchers at Carnegie Mellon University’s Future Interfaces Group opens up these possibilities — demonstrating a novel approach to activity tracking that does not rely on cameras as the sensing tool.
Installing connected cameras inside your home is, of course, a horrible privacy risk. This is why the CMU researchers set about investigating the potential of using millimeter wave (mmWave) Doppler radar as a medium for detecting different types of human activity.
The challenge they to overcome is that while mmWave offers a “signal richness approaching that of microphones and cameras”, as they put it, data sets to train AI models to recognize different human activities as RF noise are not readily available (as visual data for training other types of AI models is).
Not to be deterred, they set about synthesizing Doppler data to feed a human activity-tracking model — devising a software pipeline for training privacy-preserving activity-tracking AI models.
The results can be seen in this correctly identifies several different activities, including cycling, clapping, waving, and squats. Purely from its ability to interpret the mmWave signal the movements generate — and purely trained on public video data.
“We show how this cross-domain translation can be successful through a series of experimental results,” they write. “Overall, our approach is an important stepping stone towards significantly reducing the burden of training such as human sensing systems and could help bootstrap uses in human-computer interaction.”
Researcher Chris Harrison confirms that mmWave Doppler radar-based sensing doesn’t work for “very subtle stuff” (like spotting different facial expressions). But he sensitive enough to detect less vigorous activity — like eating or reading a book.
A need for line-of-sight between the subject and the sensing hardware also limits the motion detection ability of Doppler radar. (Aka: “It can’t reach around corners yet.” Which, for those concerned about
detection, will indeed sound slightly reassuring.)
Detection does require special sensing hardware, of course. But things are already moving on that front: Google has been dipping its toe in via project Hub also integrates the same radar sensors to track sleep quality.sensor to the Pixel 4, for example.
“One of the reasons we haven’t seen more adoption of radar sensors in phones is a lack of compelling use cases (sort of a chicken and egg problem),” Harris tells TechCrunch. “Our research into radar-based activity detection helps to open more applications (e.g., smarter Siris, who know when you are eating, making dinner, cleaning, or working out, etc.).”
Asked whether he sees more significant potential in mobile or fixed applications, Harris reckons there are interesting use cases for both. “I in both mobile and nonmobile,” he says. “Returning to the Nest Hub… the sensor is already in the room, so why not use that to bootstrap more advanced functionality in a Google smart speaker (like rep counting your exercises).
“There are a bunch of radar sensors already used in the building to detect occupancy (but now they can detect the last time the room was cleaned, for example).” “Overall, the cost of these sensors is going to drop to a few dollars very(some on eBay are already around $1), so you can include them in everything,” he adds. “And as that goes in your bedroom, the threat of a ‘surveillance society’ is much less worry-some than with camera sensors.”
Startups like VergeSense are already using sensor hardware andtechnology to power real-time analytics of indoor space and activity for the b2b market (such as measuring office occupancy). But even with local processing of low-resolution image data, there could still be a perception of privacy risk around using vision sensors — certainly in consumer environments.
Radar offers an alternative to visual surveillance that could better fit privacy-risking consumer-connected devices such as ‘smart mirrors‘. “If it is processed locally, would you put a camera in your bedroom? Bathroom? Maybe I’m prudish, but I wouldn’t personally,”.
He also points to earlier work in the dark.”that underlines the value of incorporating more types of sensing hardware: “The more sensors, the longer tail of interesting applications you can support. Cameras can’t capture everything, nor do they
“Cameras are pretty cheap these days, so hard to compete there, even if radar is a bit cheaper. Of course, having any sensing hardware — visual or otherwise — raises potential privacy issues. I do believe the strongest advantage is privacy preservation,” he adds.
For example, a sensor that tells you when a child’s bedroom is occupied may be good or bad depending on who has. (I mean, do you want your smart speaker to know when you’re having sex?) And all sorts of human activity can generate sensitive information, depending on what’s going on.
So while radar-based tracking may be less invasive than other types of sensors, it doesn’t mean there are no potential privacy concerns. It depends on where and how the sensing hardware is being used. Albeit,to argue that the data radar generates is likely less sensitive than comparable visual data were it to be exposed via a breach.
“Any sensor should naturally raise the question of privacy — it is a spectrum rather than a yes/no question,” agrees Harris. “Radar sensors are usually rich in detail but highly anonymizing, unlike cameras. If your Doppler radaronline, it’d be hard to be embarrassed about it. No one would recognize you. If cameras from inside your house leaked online, well….”
Given the lack of immediately available Doppler signal data, what about the computing costs of synthesizing the training data? “It isn’t turnkey, but there are many large download video data and create synthetic radar data than having to recruit people to come into your lab to capture motion data.from (including Youtube-8M),” he says. “It is orders of magnitude faster to
“One is inherently 1 hour spent for 1 hour of Every hour of video takes us about 2 hours to process, but that is just on one desktop machine we have here in the lab. The key is that you can parallelize this, using Amazon AWS or equivalent, and process 100 videos at once, so the throughput can be extremely high.”. At the same time, you can download hundreds of hours of footage pretty easily from many excellently curated .
And while RF signal does reflect, and do so to different degrees off of other surfaces (aka “multi-path interference”), Harris“is by far the dominant signal”. They need to model other reflections to get their demo model working. (Though he notes that could be done to further hone capabilities “by extracting big surfaces like walls/ceiling/floor/furniture with computer vision and adding that into the synthesis stage”.)
“The [doppler] signal is actually very high level and abstract, and so it’s not particularly hard to process in real-time (much less ‘pixels’ than a camera),” he adds. “Embedded car processors use radar data for things like collision braking and blind-spot monitoring, and those are low-end CPUs (noor anything).”
The research is being presented at the ACM CHI conference alongside anotherPose-on-the-Go — which uses smartphone sensors to approximate the user’s full-body pose without needing wearable sensors.
CMU researchers from the Group have also previously demonstrated a method for indoor ‘smart home’ sensing on the cheap (also without the need for cameras), as well as —— showing how smartphone cameras could be used to give an on-device AI assistant more contextual savvy.
In recent years they’ve also investigated using laser vibrometry and electromagnetic noise tobetter environmental awareness and contextual functionality. Other exciting research out of the Group includes using conductive spray paint to turn anything into a touchscreen.
And various methods to extend the interactive potential of wearables — such as using lasers to project virtual buttons onto the arm of a device user or incorporating another wearable (a ring) into the mix. The future of human-computer interaction looks sure to be much more contextually savvy — even if current-gen ‘smart’ devices can still stumble on the basics and seem more than a little dumb.