A legendary piano instructor has a pricing scheme that confounds many clients. He charges $100 an hour if the student has never played the instrument, but $250 an hour if the student has taken lessons elsewhere. The two different rates seem counterintuitive, but the instructor has a perfect explanation: It takes less effort to teach an untrained student to play the piano the right way, and a lot more effort to untrain someone who’s been playing it the wrong way.
This anecdote, most likely based on an apocryphal piano school, illustrates the importance of starting off on the right note when learning a new skill, and the price one must pay for the failure to do so. In the story of human-machine interaction, we may be the students who have been playing the piano the wrong way for nearly two decades, and must now pay a heavy toll to do it right.
3DiVi developed Nuidroid, a gesture-recognition middleware that works with Android OS.
How Computers See
Before the emergence of webcams, computers were essentially blind. Humans have eyes to perceive and discern figures and objects in their line of sight. Desktops and laptops had no such function. With the emergence of integrated cameras and detachable webcams, however, things started to look up, quite literally. For somewhere around $100 to $350, these devices allow our PCs to scan, record and process the outside world in pixels.
So far, computer vision is primarily 2D, a flat collage assembled from dots in an RGB color scheme. Depth cams are the next step in this evolution. The technology, as exemplified by the Microsoft Kinect-Xbox combo, uses infrared pulses to build a three-dimensional view of what it “sees.” It’s similar to how submarine sonar systems use sound waves to detect obstacles underwater. With this development, you could program game consoles to trigger certain actions based on the players’ hand and body movements. For example, in Star Wars-inspired titles, players could use typical swashbuckling gestures to simulate light saber combats; and in a Michael Jackson-tribute title, a player’s dance moves could be scored based on how closely it matches the late pop star’s classic moonwalk.
The general consumer market’s rapid adoption of gesture-based input is paving the way for the professional sector. The possibilities are tantalizing. For instance, you might use natural motions to test-drive a virtual car or inspect a complex CAD assembly. But CAD software development is too deeply invested in the mouse-and-keyboard computing paradigm. Switching from that to the new input mechanism is nothing short of turning around an oil tanker from its preset course. It would have to be an industry-wide effort, supported by vendors as well as users.
In his home office in Northern California, Brian Pene is teaching a string of C++ code to recognize human motion. As part of his job as a research strategist for design software maker Autodesk, Pene has to stay on top of cutting-edge technologies: stereoscope displays, virtual reality goggles and augmented reality installations, to name but a few. His research (which must seem like playing to many) could lead to commercial applications incorporated into the Autodesk portfolio.
“Gesture input won’t be the be-all and end-all,” Pene observes. “I don’t think it’ll totally replace the mouse. But as we move forward, it’ll let us do more interesting things.”
A program like Autodesk Mudbox, which generates geometry in response to simulated finger pokes and pinches, will be a better fit for gestures than, say, Autodesk Inventor. Currently, most Mudbox users employ a mouse or a multi-touch surface (like an iPad) to sculpt geometry. But gesture input will mimic sculpting gestures more closely. The only thing missing from the simulated interaction would be haptic feedback, the sensation one gets when sculpting with mud in the real world.
Pene managed to build a prototype version of gesture-driven Autodesk Showcase, powered by the Xbox Kinect. With the Xbox Kinect activated, Pene could rotate and tumble the ray-traced 3D automotive models in Autodesk Showcase with a wave of his hand in the air. Pene’s work also led to the birth of the Leap Motion plug-in for Autodesk Maya 2014, a free plug-in for users to incorporate gesture-triggered sequences from the Leap Motion controller. The device is about the size of a pack of gum, available online or at Best Buy for $80. The plug-in lets you associate raw data received from the Leap Motion device with certain commands.
Do Androids Dream of Human Gestures?
The title of sci-fi writer Philip K. Dick’s novel, which eventually became the basis for the film “Blade Runner,” poses a thought-provoking question on artificial intelligence: Do Androids Dream of Electric Sheep? A team of developers in Russia, on the other hand, is convinced that there are enough Android users dreaming of the day their devices can understand human gestures, from hand movements and body motion to facial expressions. Accordingly, they came up with a middleware called Nuidroid, which promises “Kinect-style motion controlled applications to ARM/Android and other embedded platforms.” The company, dubbed 3DiVi, states, “With a depth sensor, it adds real-time skeletal tracking to the next generation of smart TVs and game consoles.”
The Nuidroid middleware comes in several modules:
- Nuidroid Body, for segmentation and skeletal tracking from a depth map;
- Nuidroid Fusion, for integration of the depth sensor and a smartphone (the software can locate a smartphone and track its orientation when in use in front of the depth sensor);
- Nuidroid Face, for tracking of the face orientation and optional biometric identification; and
- Nuidroid Hand, for hand tracking to simulate point, click and scroll functions (still in development).
The Leap Motion controller (a) is smaller than a pack of gum. When deployed (b), it becomes a motion detector for a variety of applications. The company has signed partnership agreements with hardware makers HP and Asus.
One of the reasons 3DiVi is devoting its energy to the Android platform, according to Dmitry Morozov, the company’s business development director, is low power consumption. To enable gesture recognition with a smaller power envelope would be a much greater challenge than to accomplish it with the more generous desktop hardware.
Nuidroid Face module is capable of both face recognition and face orientation. The latter could be a useful method for conducting virtual walkthroughs and assembly rotations without using a mouse. Imagine, for example, a ray-traced 3D scene that responds to a user’s head tilts and shifting line of sight. Morozov admits, however, that there are some hurdles 3DiVi is trying to overcome: “The technology doesn’t have enough resolution for biometric applications yet,” he says.
In this experiment, Autodesk research strategist Brian Pene demonstrated how to use natural gestures to inspect and control a virtual assembly model. The prototype was powered by the Leap Motion controller, priced at about $80.
If two players stand in front of a depth sensor, the current version of Nuidroid middleware can distinguish and track the two as different entities. That’s quite sufficient for game and entertainment. However, if you’d like to use biometric data captured by a depth sensor for security (for instance, programming the design software to grant a user read-write-edit privileges based on facial recognition), the technology needs to be much more accurate and robust.
Current technology sophisticated enough for biometric security application is not cheap — certainly not within the price range 3DiVi wants to market its products, Morozov points out.
“Our software works with a majority of depth sensor devices, so it’s not device-dependent,” he explains. “We’re currently focusing on Android OS, but our roadmap includes other operating systems.”
Kinematic Input with SoftKinetic
The six-year-old company SoftKinetic offers depth-sensing hardware (both as portable cameras or as modules for embedding into devices) and gesture-recognition middleware. The desk-mountable DS-series cameras are priced between $249 and $299. Tim Droz, vice president and general manager of SoftKinetic North America, uses the terms depth sensing and 3D sensing interchangeably, but refers to his company’s technology as “time of flight.”
According to the company, its time-of-flight sensor “measures how long it takes for infrared light to make the trip from the camera and back — the time of flight — and gives the 3D DepthSense camera power to turn raw data into real-time 3D images, as well as grayscale confidence and depth-map images.”
The resolution of the current gesture-sensing technology has often been cited as a roadblock in adopting it for professional design and engineering apps, but Droz says he doesn’t believe that to be a major hurdle.
“SoftKinetic sells two sensors today,” he points out. “One is designed for long range, used for gaming. The accuracy is about 1% of the distance. So at 3 meters, your margin of error in the relative positioning is about 3 cm. So at that distance, you can’t get finger-level resolution. We also have a close-interaction camera designed to work at 15 cm to 1 meter away.”
The close-range device, SoftKinetic DS325, is marketed under the Creative Labs brand, as part of the Intel Perceptual Computing SDK. “We are providing the sensor, camera, drivers and the gesture-recognition middleware,” says Droz. “That technology allows you to do finger, eye and head tracking, along with gestures and poses.”
SoftKinetic isn’t ready to publicly discuss its initiatives and partnerships aimed at the professional design market, but “we are definitely interested in that space,” Droz says. The company is also exploring 3D printing through data captured from depth sensors.
Space for Depth Sensors
zSpace, makers of an eponymous system, captured the imagination of 3D design software users with its clever integration of stereoscopic display, special eyewear, and movement tracking into a virtual reality workspace. The technology translates the position of the stylus (tracked by the system) and projects it into virtual space.
Its chief technology officer, Dave Chavez, says that while the current zSpace system doesn’t use depth sensors, this could change as the company explores new ways to track movements or incorporate gesture recognitions.
“If it fits into the user experience and we feel that’s where the customer wants to go, we’ll add that,” he adds. “Our system is architected in such a way so we can drop in new pieces.”
Just about any laptop or monitor you buy today comes with an integrated webcam. But in the future, a depth cam or a motion detector may be a standard feature. In April, Leap Motion inked an agreement with HP to embed its technology in select HP products. This partnership was preceded by a bundling deal with Asus, another PC maker, in January. There alliances are all the more startling if you consider that, at the time they were announced, Leap Motion hadn’t even launched yet. The company officially came online in July. By then, consumer electronic retailer Best Buy had already taken considerable advance orders of the device.
SoftKinetic has struck partnerships with both electronics component supplier Texas Instruments and automotive sensor and electronics supplier Melexis. SoftKinetic’s sensor technology is expected to power Texas Instruments’ time-of-flight chipsets and Melexis’ in-car depth-sense cameras.
Mitch Markow, engineering technologist for end-user computing solutions at Dell, predicts that “gestures, along with voice and other input modalities, will play an important role in how users interact more naturally with their devices in the near future.”
Markow notes that his company is actively investigating these technologies for upcoming products. “Our ultimate goal is not to simply integrate technology, but to create gesture-based experiences using the technology that customers will value,” he says.
Rethinking How We Build Geometry
The way we currently create geometry in 3D mechanical design software is inseparably linked to the default input method: the use of a mouse and a keyboard. But a careful study of the steps used to generate shapes — from simple extrusion of 2D profiles to complex operations like revolving profiles along a spline — will show that the method is anything but optimal. Gesture input can significantly improve the modeling paradigm; it may also reduce injuries associated with repetitive motion required by the mouse and keyboard.
“It’s a natural motion to reach out into the 3D space to touch and manipulate objects,” says Droz, noting that one of SoftKinetic’s demonstrations shows gesture-based navigation of the solar system.
The possibilities are nearly endless. What if, instead of dragging a 2D plane with a mouse pointer to extrude a surface, you can just raise your palm from the virtual ground to the desired height? What if you can indicate the lofting or revolving direction of a profile with the simple trace of a fingertip? The ability to define directions and perspectives in the air with complete freedom also offers the chance to do away with the use of drawing planes and construction planes inside virtual 3D space (a method currently used to draw on surfaces from various angles in today’s 3D modeling programs).
zSpace’s Chavez says he doesn’t believe gestures will replace mouse interactions completely. “The use of tools for fine, comfortable, accurate actions is essential,” he explains. “The mouse, for example, is very accurate, highly responsive, quick, comfortable, and has two or three buttons. While a finger on a touchscreen is ideal for a wide range of input and control tasks, there are many cases where tools are much more effective than hands alone.”
In addition, Autodesk’s Pene discovered something while experimenting with gesture input in front of a depth sensor. “Since you tend to make large gestures in front of the depth cam, you might not be able to do it for a long time,” he observes. “You could get tired.”
This revelation suggests the most straightforward adoption of gesture input may be for design inspection, such as rotating virtual objects and navigating virtual scenes. Adopting it for geometry construction may require more effort, both in creative programming and user re-education. After all, programmers and users have been using — and developing software for — a mouse and a keyboard for decades. It’ll take much more than a couple of years to untrain us.