Integration
Edge AI has come a long way, but sensory perception is not enough. What will it take to build systems that can act?
I sometimes have to pinch myself—but among the many things I’m paid to do, one of them is think about the future. Sure, it’s only a small sliver of the future, a tiny crack through which we try to catch a glimpse, but it’s my small sliver, and I think about it every day.
This week I’ve been working on some research planning. As part of it, I’ve surveyed a set of use cases across a few areas our company cares about: industrial, manufacturing, transportation, et cetera. It’s a wide enough net to catch a lot of interesting stuff: diverse needs and applications for on-device intelligence.
Compared to five years ago, we’re in a pretty great place for on-device sensory perception. There’s some very capable hardware that can run fairly large models for tasks like computer vision, and we have software that makes them a lot easier to train.
It’s now pretty straightforward to build a program that transforms raw camera pixels or audio samples into a stream of deep learning inference results: probabilities that the input represents one state or another. These can be wrapped in application logic and deployed in the field to solve problems. It’s useful, simple, and fun.
But in reviewing my survey of use cases, something stands out that has been haunting me for a while. While transforming input into probabilities is useful, many of the most interesting applications require something more.
I’m fond of the quick-and-dirty definition of intelligence that we included in AI at the Edge:
Intelligence means knowing the right thing to do at the right time.
If you’re a slime mold, this means routing yourself towards food while expending minimal resources. If you’re a cat, it means deploying the appropriate yowl to encourage your owner to rise for breakfast. And if you’re a research manager it means deciding when a bit of writing may be just the thing to help condense your thoughts.
Our on-device sensory perception—transforming input into probabilities, applying a few filters, and doing something with the result—is pretty cool. But at its heart it’s signal processing: we turn one stream of numbers into another. Signal processing is arguably a form of intelligence, but it’s not the one described in our definition.
Most edge AI today is signal processing. The intelligence part is neglected, shoved into application logic, tended by hackers, denied the empiricism and mathematical rigor that we insist on for our signals. We agonize over model metrics, burning dollars to squeeze another half percent of accuracy from our image classifier—and then feed the results into systems with no oversight: spaghetti application code where the impact of changes on outcomes are not measured or recorded.
The simplest case is picking thresholds for a classifier. What’s the right confidence threshold for deciding we’ve heard the keyword, spotted the intruder, or determined that you’re jogging? And once wrapped in an application, how do we know it’s working well?
I’ve seen this done so strangely: production hardware, sold by the millions, tested qualitatively by a single engineer. Tapping a few dozen tries into a spreadsheet. Hacker stuff. Luckily we’re now building tools for this, so the performance of an application can be proven with numbers, not one guy’s opinion that it’s working “pretty well”.
But what if your system is more complex? True intelligence—the right thing at the right moment—requires integrating inputs over time, and maybe space. A robot vacuum crawls the floor. It has a plan, and multiple sensors, and various ways it should respond to new conditions as it tries to clean the room. A self-driving car is the same. A drone that monitors crops while buffeted by wind. A production line that takes action to stay well maintained. A medical device that keeps a patient alive.
For these systems, the application goes far beyond thresholding. They integrate data from many different sources, build internal models of the world, act upon it, and feed back the results. This stuff is hard, and it has to be tuned well, and it needs to be tested extensively. But if we struggle so hard with testing and tuning even the simple case of thresholding, how can we possibly succeed with anything more?
Sure, there are big companies developing products that work. But the promise of technology is not monolithic artifacts handed down by megacorps—it’s the idea that anybody with a bit of knowledge can build their own tools to solve their own problems. We need truly intelligent systems—those that integrate data over time—to be within reach of ordinary engineers, not just companies with teams of thousands.
This is the next frontier for edge AI: the ability to build systems that live in context, sensing their environments and adapting their behavior to get things done. We need to push beyond deep learning and reclaim the old weapons of AI: state machines, probabilistic programming, rule-based systems, planning, reasoning1. We can build crazy new things with foundation models and end-to-end learning, too.
But most importantly, we need to make these tools available to the average developer2, without forcing them to earn a PhD. We’ve done this for sensory perception: today, any engineer can fine-tune a deep CNN, technology that did not exist when I started my career. It’s time to take the next step, from perception into action.
If we want to build a world where our surroundings are intelligent and helpful, an enchanted forest of machines of loving grace, we need a million developers—a worldwide movement of human beings who know all the corners of our civilization that could do with a fresh spark of life.
I think we have a ton to learn about this from video games developers, who have been crafting intelligent agents under computational constraints for decades.
If you’re interested in helping to do this, drop me an email. I’m hiring researchers and engineers, and the company is hiring across the board.
Appreciate the piece Dan! Tinkering around with the edge impulse platform, I notice how it has made the the entire process of collecting, feature extraction, and tuning 'click.' Unless you studied this (or wrote a book about it!) it's not that obvious, so I think that low barrier to entry, and making it intuitive for non PhD's and specialists is key (and the only way to get to those million developers).
I get your point about how these systems have to work over time to really get better, and realize their value over time. It stretches the mind to think that far ahead, and consider potential applications....it's possible that the 'thing' which 'needs to get done at the right time' might not be obvious at the outset, but that ideas might emerge when the engineer looks at what patterns the system he deployed sensed and collected over time.. one more reason to why making it easy to get Version 1.0 out and just getting started is a big deal!