Furby is the future of AI
Hasbro's cute companion shows the power of embodied intelligence. Let's explore the principles of embodied AI—and how it might shape our future.
When Hasbro announced their latest Furby toy, updated after six years, it sold out within days. The appeal is obvious: Furby, designed to be a companion and entertainer, is cute, silly, and fun. It waggles its ears, blinks its eyes, and makes endearing, affectionate, and comical statements. Furby can respond to basic speech commands, play a few simple games, and react to some environmental cues—dancing to music, or speaking when spoken to.
Furby is purpose-built for fun. There’s no veneer of education, or attempts to proclaim a grander purpose. It’s a small, cheap, friendly object that is robust enough to withstand the affections of a youngster. It’s simple to use, but has enough depth of complexity to appeal to inquisitive minds. It has buckets of personality, along with just the right amount of brains: enough to be surprising and adorable, but not enough to feel creepy, or inspire unrealistic expectations.
Furby’s key feature is embodiment. Unlike a virtual pet, or a video game NPC, Furby is part of the world we inhabit. It has a form that follows function: soft, colorful, sturdy, and cute. Even without batteries, Furby is a lovely object—it’s a multisensory experience that appeals to our bodies and minds. The fur feels nice to pet, the weight is pleasing to hold, and the big eyes invoke caring, like a baby or a pup.
Furby will sing, dance, and mimic your voice. It will play a simple game, and react happily when its fur is stroked. Furby does not provide a calculator, or a weather forecast. It won’t remind you of appointments, or search the Internet for useful facts. It is not a serious tool for serious tasks. And this is unlikely to come as a surprise.
The outward appearance of Furby provides powerful cues to its capabilities and purpose. Furby is unambiguously a toy: a tool for play. From your first encounter, it’s easy to understand what it does, how it behaves, and who might be intended to use it. Furby’s physical form reflects its inner software: a set of whimsical interactive experiences that bring the silly, colorful body to life.
In the design of a tool, embodiment allows us to communicate its capabilities and limitations without resorting to documentation. Furby has huge ears, indicating that it may respond to sounds. Furby has no arms or legs, making it obvious that it can’t pick things up or move around.
Embodiment also produces constraints. As a small, lightweight object with limited movement, Furby can’t easily cause an injury. With a battery compartment that requires a screwdriver, it can be disabled by parents if it starts to get annoying. And with no networking hardware, it can’t connect to the Internet and create privacy issues.
Embodiment allows direct interaction with the physical world. Furby comes with a comb, which encourages brushing—triggering an interactive response. Lay Furby on its back and it will go to sleep, like a baby, or a pet. And when excited, Furby does a little dance.
Inside our minds, embodiment changes the way we relate to an object. Furby looks baby-like, so we wish to take good care of it. Furby has bright colors, so we’re drawn to it on the shelf. Furby’s voice sounds funny, so we don’t expect it to give serious answers.
Humans are experts at embodiment. Our evolution and cultural knowledge give us a framework for understanding the world in terms of the way things are shaped. We know trees are unlikely to move, or speak. We know snakes can be dangerous, and that bright colored fruit is sweet. We’ve learned heavier tools are more durable, and sleek-looking cars go fast. Furby’s embodiment carefully communicates exactly what Furby is.
Furby and the Curse of Generality
Technology can be deeply frustrating, and a lot of that irritation is failure of embodiment. When our expectations are unmet, disappointment is immediate. A sleek-looking car should accelerate quickly. A heavy metal tool should be durable and tough. A bright-colored drink should taste sweet. A television should show moving pictures when switched on, and a music player should have an obvious volume control. When these assumptions fail we become frustrated and confused.
We interact with most software in a disembodied form, the shapes of our general devices offering no suggestion of what lies within. This is bound to create friction. Denied any physical cues to the nature of an experience, our only insight comes from what we’ve been told. Smart speakers, designed to channel the Google Assistant, Alexa, and Siri, are simple objects with few contextual clues. The things they most resemble are speakerphones—a suggestion reinforced by the human-like voices of their text-to-speech engines.
This speakerphone disguise encourages conversational queries. This sounds ideal—except that the assistants themselves are far too limited. They can only answer specific prompts, and only in specific ways.
There’s nothing inherently wrong with these limitations, but when presented as a human being—via lifelike text-to-speech, and through a speakerphone chassis—the results can feel frustrating and embarrassing. It feels like talking to someone who refuses to understand; who can speak eloquently and at length, but who doesn’t fully hear what we’re saying, and cannot think.
A similar issue affects ChatGPT. The conversational AI, vastly more capable than the Google Assistant, is presented via a professional-looking instant messenger and accessed through a keyboard or phone. We’ve all learned what to expect from this type of interface: we send a message to a friend or colleague, they text back their reply, and we have a conversation. The instant messaging format creates an expectation of personhood, of the adherence to social norms, and the mental capabilities of a human being.
It’s no surprise, then, that ChatGPT tends to violate our expectations. It lacks contextual understanding, or knowledge of current affairs. It will lie, brazenly, when it doesn’t know the facts. It refuses to discuss certain topics. And it can spew text forth at an inhuman rate. All of this is normal for a text prediction model, and doesn’t harm its usefulness for many important tasks. But it can feel surprising and frustrating, because the outer form suggests a human mind.
I’d call this the Curse of Generality. A system designed to do everything is unlikely to do all of it equally well—which is bound to create disappointment. It also guarantees a disconnect between form and function: what shape is appropriate for an omnipotent AI?1
Furby does not lie to you. Furby presents as a toy: unrealistic, exaggerated, cutesy, and dumb. The feats of AI that power Furby—embedded deep learning for audio classification, impossible until the past few years—don’t outshine its nature as a simple, friendly thing. Furby’s embodiment as a fluffy little monster makes it fun, safe, and lovable. When Furby misunderstands you, it’s charming and amusing, like a pet that gets the wrong end of the stick.
By avoiding the Curse of Generality, and by choosing an embodiment that fits, Furby’s designers create a simple and delightful experience.
The Rules of Embodiment
It’s clear there are benefits to embodied intelligence. But how do we build products that use embodiment well? It’s still early days in figuring this out, but I’d like to suggest some rules. I’m sure these will be updated as our collective experience grows.
Rule 1: Pick one job and do it well
Humans evolved to use tools, and tools are rarely general. A tool typically has a few core functions for which it is suited. For example, a screwdriver is useful for turning screws. It may have a “long tail” of possible applications (opening cans, cleaning under fingernails) but the tool is unlikely to excel at these, and attempting to use them in such a way will lead to a poor experience.
The same is true of AI tools. Furby is a tool for play, and it excels at this function. Its embodied form reflects its intended use and sets expectations clearly. From a technology point of view it would have been trivial to add a calculator, stock price updates, or a security camera to Furby’s list of functions. While ostensibly useful, these functions would have diluted Furby’s focus on play and made it far less intuitive to use.
Picture the Google Home device, designed as a general purpose interface for interacting with the Google Assistant. The Assistant (which I worked on during my time at Google) can do a lot of things very well. There’s also a long tail of things that it does poorly. The Google Home was built generically, to reflect them all. Unfortunately, there was nothing about the form that helped communicate the Assistant’s capabilities, let alone which capabilities are actually effective.
This led to major challenges around discoverability. The Assistant could do many things, but typical users were aware of only a few of them—and it was extremely difficult to teach them more. Nothing about the Assistant’s embodiment suggested what it could be used for: the Google Home looks like a speakerphone, and sounds like a person.
As a result, product managers forced the Assistant to attach verbal advertisement about its capabilities to the ends of other responses. If you asked to set a kitchen timer, the Assistant would do so—and then notify you that it could also play music, answer questions, and set reminders. This felt clumsy, frustrating, and tedious to users. It didn’t help much with discoverability, either.
The Assistant is highly capable, but it was presented as universally capable. Its Google Home embodiment suggested this, too. A smarter design might have reduced its capabilities, making them easier to remember (the list is far too long), and embodied them in a form that was contextually relevant.
For example, the Assistant has many features to help with cooking and grocery shopping. What if these were separately embodied as a kitchen gadget, or inside an existing appliance? Contextualizing its features would constrain the user’s expectations and lead to a better experience.
Rule 2: Use familiar cues for form and function
With no outward indication of its capabilities, the Google Home was bound to be difficult to use. A later iteration of the Home added a large display, providing visual feedback and contextual suggestions about which commands to try. While an improvement, this was still not ideal. It resulted in an unfamiliar, Frankenstein device—a smart speaker with a screen—that resembled a tablet but wasn’t nearly as capable.
Every person develops expectations around objects and their behavior. A tree behaves a certain way, as does a cat, a screwdriver, a camera, or a passenger car. Violating these expectations leads to surprise and confusion. For example, a driverless car would be an alarming sight in most parts of the world, where self-driving is not yet permitted.
Since driverless cars behave differently to those piloted by humans, it’s a good idea to make them visually distinct. While likely unintentional, the arrays of sensors on the roofs of most self-driving vehicles provide a helpful cue to their unusual nature—assisting the public in distinguishing them.
Modern cameras have many AI features. For example, some use deep learning to identify the subject of a scene and ensure that it’s clearly in focus. This is incredibly powerful—but being embodied in a camera, it remains intuitive to use. Anyone familiar with a camera can immediately benefit.
Contrast this with a failed Google product, Google Clips. The idea was nice: an AI-capable camera that could take automatic pictures of family life, capturing candid, unposed moments. The device would snap a shot or video clip when it detected something interesting, saving the results in an album for later review.
The designers chose to eschew the familiar form of a camera. Instead, Clips was a mysterious white box: something between a GoPro and a home security sensor. The design gave few cues to its proper usage, leading inevitably to frustration. With no mental model of how to use it, where to place it, or what types of image it might capture, Clips was unpopular with users and received poor reviews.
By leaning on familiar cues, designers might have made a better product. People know how to use cameras: they’ve existed for hundreds of years. A camera-like shape and behavior could have been more intuitive. Subsequent designs could step closer to the original vision, once users are accustomed to the novel idea.2
If something looks like a camera, people will assume it behaves like one. If it doesn’t look familiar at all, nobody will know what to expect. This effect is very powerful, and it’s key to getting embodiment right.
Rule 3: The form should support and constrain the application
Embodiment is the most effective way to control the capabilities of an AI product. The physical form of a device dictates everything from its processing power and battery life to its communication capabilities and sensory awareness.
Jeff Bier coined the acronym BLERP to describe the key benefits of on-device AI. This framework can also provide a guide to some of the ways embodiment can support or constrain a product:
Bandwidth: Connectivity, or lack of it, dictates how a product functions and what information can enter or leave it.
Latency: Compute capabilities and connectivity dictate how fast a system can react.
Economics: The cost of a device determines who can use it, and in which applications.
Reliability: Physical and software robustness determine how a device can be used.
Privacy: The environment in which a product is used determines which capabilities are safe to grant.
Embodiment can grant an AI system additional capabilities. For example, with a fast processor and sufficient RAM, deep learning models can run fully on-device. This may remove the need for an Internet connection, enabling many potential applications in areas of low connectivity, or where energy availability is limited.
Even basic physical properties can support an application. For example, a small, light device may be easy to carry around, while a larger, heavier device may discourage theft. The former works well for an AI fitness tracker, while the latter works great for an AI self-checkout kiosk.
Embodiment can also provide useful constraints. For example, if a device is designed without communication equipment, it is less likely to create privacy risks, or be used as an attack vector.
An earlier iteration of Furby, released in 2016, demonstrates this quite well. Furby Connect was designed with a Bluetooth modem, allowing it to communicate with a mobile phone app. Attackers discovered how to compromise the connection, snooping on a home with the device’s microphone. The design of the latest Furby does not include a modem, constraining the device to make it safer.
The safety benefits of embodiment may go beyond effective product design. Some hypothesize that a hyper-intelligent future AI may pose an existential threat.3 Embodiment is the best line of defense, since it allows us to limit a system’s compute and communications. As embodied intelligences ourselves, we are deeply familiar with the constraints imposed by the limits of our physical bodies.
Ubiquitous, embodied AI
The key tech vibe of 2023 is that we’re entering a new era of human-computer interaction—but debate remains around how this will look.
Much of the hype has centered around large language models, suggesting natural language AI as the interface of the future. But as Varun Shenoy states in his excellent Substack post, “natural language is an unnatural interface”. With no situational awareness, chatbot-style AI relies on us to communicate our precise mental model by crafting the perfect prompt. But writing is hard, and nobody really wants to be a prompt engineer.
Some visions of the future solve this problem by placing human beings in a virtual (or augmented) reality, creating a shared context inhabited by both people and software. This could make it easier for digital systems to understand what you’re trying to accomplish.
I feel some affection for the retro-futuristic promise of VR, but I personally remain unconvinced. While technology is finally catching up, the truth is that the most exciting use case for Apple’s incredible Vision Pro headset is to create a simulated desktop with gigantic virtual monitors. This isn’t really a new type of human-computer interaction, just a different way to implement a desk.
Forcing humans into a virtual world seems the opposite of what we should do. This past decade we’ve interacted with software through the proxy of a touchscreen: a world of symbols, trapped behind glass. But humans evolved for physical reality. We love tools, objects, and people.4 Our fingertips are the most sensitive parts of our bodies. Our brains evolved to see faces in the clouds. We even prefer our cars to have angry expressions.
I would argue—quite strongly—that our next technological world will be an embodied one. Embodiment allows us to build AI tools that fit into our world, mixing digital intelligence with physical reality in an entirely literal way. Rather than plug into a virtual world, we will fill our real surroundings with ubiquitous, embodied AI.
This isn’t some far-off vision of the future: it’s a process that has already begun. First, we’ve started to escape the Curse of Generality by building task-specific tools that fit our exact needs. Think wearables, like Oura’s ring, whose elegant simplicity keeps track of our bodies without overloading our minds. Smart appliances, like Roomba, which is an ordinary tool with extraordinary powers. Toys, like Furby, which create new ways to entertain. And industrial sensors, like the RAM-1, which can endure conditions that few computers have before.
A new generation of processors and software has brought on-device AI within reach. Sophisticated deep learning models can now run on tiny embedded processors, barely consuming any power.
These new capabilities allow product engineers to “cut the cord” with the cloud, no longer depending on back-end servers to run their algorithms. The relatively limited compute drives products to be task-specific: engineers must design around the minimum viable intelligence required for a given job.
Ironically, this reduction in generality leads to far better experiences. Instead of the overwhelmingly capable Google Home we get a dozen small devices, each designed to fit a certain niche. They’ll understand what we need from them, since we use each of them for specific tasks. They’ll tend to work well, since it’s easier to design for a single purpose. They’re cheap, and private, since there’s no cloud infrastructure required—removing the need for subscription business models, or the collection and sale of private data.5 And they can be replaced and upgraded independently, like any other tool.
In the near future, our homes, our workplaces, and our built environments will host a modular mixture of intelligent objects. They’ll help with specific tasks, then stay out of our way. They won’t monopolize attention, like smartphones and watches, and they won’t collect our data, unless we ask them to. They may communicate together, but they’ll function independently, and they won’t need much configuration to serve our needs.
Furby is a prime example—and some other products show great leadership, too. Roomba, for instance, will work entirely offline. It has a big round button on top, and when you tap it, it will vacuum the floor. All of its smarts are on board, so it won’t demand an Internet connection—but if you offer one, the accompanying app provides some handy extra features. It makes no pretense to greater things: it’s a wonderful vacuum cleaner, and absolutely nothing else.6
As these products get more common, it won’t take long to reach ubiquity. Through my work I know of dozens of major companies that are currently building edge AI products. These are big name brands, with products in every home, along with major players in manufacturing and industry. There’s a wave of invention under way, and it’s still just the early days.
Thanks to smartphones with wake word detection, you’re already never more than a foot or two from a microcontroller running a deep learning model. But the next few years will see a Cambrian explosion of products. We will be surrounded by intelligent devices. Modest intelligence, for sure, but increasingly useful and capable.
The future of embodied intelligence
Technology rarely stands still, and the forces bringing us faster processors and more efficient algorithms are just getting started. Every embedded silicon company is investing furiously in new architectures designed to accelerate deep learning models. The current crop of chips is basically the first generation: there’s a wave of accelerators coming that will put them all to shame. And the hottest things in deep learning, like transformers and LLMs, have barely been deployed on-device—so there is a pipeline of intelligence on the way.
The tooling is also improving. Platforms for algorithm development and on-device deployment, like Edge Impulse (which I help build), are reducing the need for specialist skills. Any engineer can now build AI for the edge, so the barrier to embodied AI is much lower than it previously was. And custom creative tools like Disney Research’s spectacular robot character pipeline show how designers will be able to guide training to create the embodied experiences they want.
The big risk of all this capability, however, is that we’ll fall straight into the Curse of Generality—and build experiences that set huge expectations, yet fail to deliver. One project of note is Humane’s AI Pin. A sleek and mysterious panel that is attached to your clothing, it’s described as a "screenless, standalone device and software platform built from the ground up for AI."
The ambition is laudable: with contextual awareness, the AI Pin could help augment your daily experiences—for example, by translating conversations across two different languages. Its wearable embodiment means it’s present during crucial moments, and it can see and hear what you can, while being supposedly designed with privacy in mind.
However, like the Google Home, the shape of the AI Pin gives few clues to its function. It runs the risk of being confusing—both for owners, who must remember its many features and how to use them, and for those around it, who may feel uncomfortable if they don’t know what it does.
It won’t be long before sophisticated foundation models (or their smaller, more efficient offspring) can run in real-time on low power devices. There are even rumors of a device from OpenAI and Jony Ive. This is a breathtaking opportunity: we can take a broad set of capabilities we now associate with the cloud and deploy them down to edge devices, where privacy is protected and contextual information abounds.
Potentially a tiny wearable, or a household or workplace object, could understand what we say to it, or respond intelligently to its environment. Our homes might end up feeling like enchanted kingdoms, where clocks and candlesticks can intuit our needs, automate our chores, and fill our days with surprise and delight.
But these astonishing capabilities must be handled with care. With the Curse of Generality, a language-based interface can be a minefield of frustration. When we speak with human adults, we know what to expect. The same is true of other language users: namely, children and pets. Nobody expects their dog to comment on the news, or a child to offer help with filing taxes. And we don’t currently expect our appliances to listen.
So if we grant our built environment some ability to think, how do we constrain capability, and manage expectations? A surplus of intelligence might feel creepy, or dangerous. But an unexpected lack can be equally such. If I tell my alarm clock my heart is in pain, should it call for an ambulance to come? If a microwave overhears my private troubles, should it interject with a reassuring quip?
It’s clear to me that embodiment will be a vital tool in shaping this new world. It has powers to both constrain capabilities and to set expectations. It works with our biological and cultural instincts to deliver understanding: it produces objects that make sense in our world. As Furby well knows, the embodiment of intelligence will be the design language of the next human era.
We have opened the frontier of embodied AI. It’s a thrilling place to be. As a child of the 80s, I grew up dreaming of robot pals and enchanted objects. They are almost here. But the world we choose to build could be a place of joy, or of horrors. It will take a new set of design skills to shape it the way we want. We can feel grateful for the opportunity to be a part of that process—while deeply aware of the responsibility that it implies.
Huge thanks to Andrew Brentano for reading my draft and suggesting improvements.
There’s some interesting discussion of this in Benedict Evans’ Unbundling AI, though he doesn’t quite get to a concrete answer.
There are parallels with the skeumorphic app designs of the early iPhone, which helped train a generation of users on the use of a touchscreen smartphone.
I’m not personally concerned about an imminent existential threat from AI, due to a combination of inherent constraints (even “cloud” AI is embodied in a data center) and the limitations of even advanced AI systems. I’m far more concerned about issues resulting from poor design, misapplication of technology, and lack of ethical analysis. This could be a good topic for a future post.
There are countless examples of the benefits of interacting with physical tools, from the cognitive benefits of pen and paper to the way haptic feedback reduces mental load for prosthetics users. And the field of pragmatics studies the importance of context to language and meaning, including context provided by the physical world.
It’s possible that embodied AI (and other offline-capable technology) could help harmonize the interests of consumers and businesses, which are today misaligned due to the need for recurring revenue to support cloud infrastructure. Disconnecting usage from costs and revenue seems healthy for both sides of the market. Furby doesn’t need a subscription.
It remains to be seen whether this will survive the Amazon acquisition.