SciCafe: Teaming Up with Robots with Julie Shah
Julie Shah (Assistant Professor in the Department of Aeronautics and Astronautics, MIT): I’m Julie Shah. I lead the Interactive Robotic Lab at MIT and our research is at the intersection of artificial intelligence and cognitive science. And we’re really working to reverse engineer the human mind to make robots that are better teammates with people.
You know, to a large extent, robots work separately from people today. Our goal is to do more than just make robots that are safe enough to work with us, it’s to make robots that are smart enough to work with us as easily and naturally as we work with other people so that we can ultimately harness the relative strengths of people and machines working together. So, my aim in this talk is to be able to convince you what an incredible world we have ahead of us as our machines become more intelligent to be able to augment what we do.
So, I’ve spent much of my year working in robotics and manufacturing, in factories for building planes and building cars, and in new sectors, as well as many other sectors, we’re actually still quite limited in how we use robots today. So, I’m showing you one picture here from Amazon Robotics. This is a picture from one of their warehouses. When you order products online from Amazon, there’s actually a fleet of robots that are delivering those products to you. But the thing to note is that the warehouses are full of robots, but there’s people that line the edges of this warehouse. The robots bring your product to a person who then boxes it and ships it to you. So people and robots are working together in teams, but they’re working in space that’s physical separate.
Similarly, on automotive assembly lines, we have robots that are working near people. So this is a universal robot. It’s what we call an inherently safe robot. Which means it can work right next to a person and, if it bumps into you or hits you, it won’t permanently harm you. Which is good, that’s key, right? So, effective collaboration.
But ultimately the system is really working next to you, it’s not working interdependently. These are systems that co-exist with us. They don’t truly collaborate.
Now, we think about the automotive industry in particular as an industry that’s been incredibly successful at introducing industrial robots. But half the build process of a car, half the factory footprint and half the build schedule is actually still done by people. And it’s not that much more of that work can’t be done by robots. Some of the work is truly too difficult for robots today. It’s very dexterous work. There’s actually a fair amount of cleverness and judgement and innovating in how to assemble the car.
But there are little pieces of almost every aspect of this manual work that can be done by a robot today. Our problem is an integration problem. We can’t take that works that are physically separated from the manual work and have it still make sense for the flow of the factory.
So a robot that can more intricately collaborate with a person, that can almost like dance with a person, to offer just the right instrument at just the right time, can substantially improve the productivity of the line.
Now, the question is, how do we enable a robot to work that seamlessly, that fluidly with a person? And it’s an interesting question because it makes you take a step back and say, well, how is it that people work so effectively together? If you think about sports teams, if we think about nurses and surgeons in the operating room—really, in every setting in our lives, we are able to collaborate with other people because we have the ability to do three things.
For me to work with you, I need to be able to know what you’re thinking. I need to be able to anticipate what you’re going to do next. And then I need to be able to make fast adjustments when things don’t go according to plan.
And it’s this ability that allows a surgeon to put her hand out in the operating and have a surgical assistant put the right instrument there without even a glance, without a word, without a command.
So, our lab has worked for many years on enabling this type of human-machine and human-robot collaboration. We’ve worked on developing algorithms and models that enable robots to infer our cognitive state, infer our human mental state, use that information to be able to anticipate what we’ll do next, and then we developed fast algorithms for scheduling and planning. So, the robot can use its predictions of what we’ll need and what we do to very quickly adjust its own plan to provide just the right materials at just the right time.
So, realizing this, how do you break this down? Well, we need a robot that can infer what we’re thinking, anticipate what we’ll do next, and execute. Now, in order to make this real, we need a robot that does three things and we need this robot to do three things in sequence. I call those three sequential system capabilities.
To work together as a team, we need the person and the robot, or the multiple people and the multiple robots, to come together and have almost a conversation, have a negotiation about how it is they’re going to work together, who’s going to do what, when and how. We need to form some shared understanding of how it is we’re going to collaborate.
Now, given that, you think about an emergency response team. They come into a conference room, they plan their deployment. Is that the end? Does the team work effectively together? No. And the reason is that it’s unreasonable to ask any team to plan all of the details of how they’re going to work together, all of the contingencies and all of the potential responses in advance. So, beyond being able to plan together, we then need the robot to be able to work with us, to observe us and interact with us, to learn how it refines its plan, how it adapts and modifies its plan based on all of the situations that could arise.
And, finally, once the robot understands how it is we work together flexibly based on all the situations that can unfold, again, we need the robot to be able to use that information to physically work with us—to anticipate and provide the right information or the right materials at the right time.
So, today I’m going to highlight some of our recent works, specifically in systems two and three, and refine and execute. So, diving a little deeper into “refine,” how is it that we can develop this ability to somehow know how it is we trade work, how it is we’re going to respond to any potential circumstance that can arise in the future. So this is a puzzle in and of itself. And we spent a few years in my lab working with a local Boston hospital and studying the work of nurses and doctors on a labor and delivery unit.
And the reason this is an interesting setting—well, there’s a few reasons. One is that this is a very complex workflow. In the labor and delivery unit, you never know who’s going to come through the door, what’s going to happen, how the future’s going to unfold. You need to be able to deal with unpredicted situations.
But beyond that, we have nurses and doctors in training that are actually relatively quickly able to learn the flow of the hospital, to be able to lend support and be at the right place at the right time. And nurses and doctors in training, they do not require millions upon millions upon millions of training examples to figure it out, right? Sort of a week, a few weeks in the hospital and they understand how to be a supportive teammate.
It’s also an interesting environment because we can’t create a simulator for it. We can’t give an AI the simulated world where it can practice and see, if it takes an action, this is what will unfold. All we really have, just as much as the nurses and doctors in training, is the ability to observe what happens in a day in the life over and over and be able to try to learn efficiently from it the way a human team member does.
So, this nurse manager here does a very interesting job. So, the reason that we study this role is because she’s essentially doing the job, this one person, the job of an air traffic controller, on the hospital floor. This one person is deciding which patients go to which rooms, which nurses are assigned to which patients. They control aspects of the OR schedule and many other decisions. And the way they learn to do it is not a codified training process. So, it’s very hard to reverse engineer how and when a nurse will make a particular decision, using the information we have today.
So, the question was can we deploy machine learning algorithms to be able to watch a nurse, nurse’s decision-making process, and be able to predict the decision a nurse might make under some set of circumstances. And can that AI or that robot system them potentially offload some of the work of that nurse?
Why is this so important? Well, if anybody’s been in the hospital lately, you may have seen these types of robot systems wandering the corridors. Has anybody seen these robots? No? No? Well, okay. So, that’s a good point. Why haven’t you seen these robots? They’ve been around for 10, 15 years. There’s actually relatively few of them. Okay.
Here is why they are not well-adopted. Now, this nurse here has maybe 10 to 15 direct reports, right? Of human beings. And now, these machines come in and they’re meant to deliver medications, meant to deliver linens. But how do they know where to go and when to be there? Someone has to tell them. And that schedule of their work is very dynamic, depending on what’s coming through the door. So it is just really unreasonable to ask this, essentially an air traffic controller, to now also task and schedule a large fleet of robots. That’s not ultimately very helpful to them.
But a robot that’s able to learn like a human apprentice and even just suggest the right next step for it can significantly reduce the cognitive load for that nurse and can ultimately make these systems a viable solution.
So, how do we make a robot or a machine learn how to make some of these decisions that the nurse makes? Well, we work to understand how it is that humans learn so efficiently in these settings. And one of the things we know is that people learn very well through examples. And we also know that one of the foundations of human multi-criteria decision-making is our ability to learn through paired comparison.
So, that nurse-in-training or that doctor-in-training, as they’re watching what unfolds in the hospital, they’re constantly thinking, ‘Okay, this is what happened in this situation.’ You’re learning this way every day. You’re thinking, ‘This is what happened in this situation. What’s different about that other situation, that other day? What was different that would result in one person making a decision A here and a decision B there?’ We implicitly compute these paired comparisons and we use them to learn more efficiently.
So what we did was take this insight and provide this type of foundation as what we call a scaffolding for the machine-learning algorithm. We gave it the base of how we know humans learn so efficiently. And ultimately we were able to show that, with just even a few dozen demonstrations of a day in the life of the hospital, the machine or the robot was able to make suggestions for what to do next that nurses and doctors agreed with up to 90% of the time. That is a huge success and gets us a step forward to deploying these systems to help us day-in and day-out.
So, I’m going to show you some videos of our robot in the hospital now. This is a robot—it’s not a service robot. This is a research robot. But what this robot was doing was reading the current state of the labor and delivery floor, you know, the status of patients and then making a suggestion to the nurse or doctor.
RESOURCE NURSE: What is a good decision?
ROBOT: I recommend placing a scheduled cesarean section patient in room 5. Nurse Meridith can take of her.
SHAH: You can see the nurse responds positively.
So we deployed these systems, both in controlled environments, controlled experiment environments, in which we created simulations of labor and delivery floor. And we also deployed the robot in the live labor and delivery floor to show proof of concept that this system could be helpful in the real environment.
LABOR NURSE: What is a good decision?
ROBOT: I recommend placing a new patient in triage bed T5.
LABOR NURSE: What is a bad decision?
ROBOT: A bad decision would be to place a scheduled cesarean section patient in room 14 and have nurse Kristen take care of her.
PHYSICIAN: Ginger, what’s a good decision?
ROBOT: I recommend placing a scheduled induction patient in room 10. Nurse Meridith can take care of her.
PHYSICIAN: Ginger, what’s a bad decision?
ROBOT: A bad decision would be to place a scheduled cesarean section patient in room 14 and have nurse Amy take care of her.
[LAUGHTER]
SHAH: So, ultimately, there’s no ground truth, right? But we’re really encouraged that, with really relatively few training examples for this type of machine-learning algorithm, the system is able to provide recommendations that the nurses and doctors agree with 90% of the time.
Now, the goal of the system in this safety-critical scenario is never to replace the human judgment and decision-making that goes behind the years of training of the nurses and physicians working in these environments. But these nurses are doing the job of an air traffic controller without any decision support, without any aid that a typical air traffic controller would have. So the ability to offload even some of their cognitive capacity through this additional support frees them up to put more of that effort into making the decisions where we really need their judgment and enhance our safety, all of our safety and well-being.
Okay, so that’s a little bit about how we designed robots to be able to take some skeleton of how to work with us and then efficiently learn from observing us and interacting with us in real, difficult, messy environments. Next, we want to be able to deploy robots that can use that information to physically help us. So this is the execute system.
And much of this work we’ve done in assembly manufacturing. And this should be the easiest of all possible scenarios. We have task procedures, we know what’s being built, we put the employees in the factory to build the parts. But it’s quite tricky, actually, because space is tight and a robot that even just makes the wrong maneuver at the wrong time disrupts that line. That line slows, that line stops. And that’s big money, ultimately. So it’s not enough to know an abstract—here are the three steps that will happen in the future. We need fine-grained predictions. We need to know exactly where a person is going to be in space and time. We need to know exactly when an activity starts, we need to know exactly when an activity ends. And that’s the core of the challenge that we address.
Now, I’m showing you here an associate in an automotive test factory with one of our industry collaborators. You can see there’s multiple possible paths that person may take through space in doing their job and there’s a robot here. It’s a fairly restricted robot, a robot on a rail. And what we need to do is predict where in space and time that person will be. And what we discovered in our work was that a machine-learning algorithm can actually predict where a person will walk or where a person will reach on a table bizarrely well.
So, what a machine-learning algorithm can do is predict two steps in advance whether you’ll turn left or right. And it does this by tracking your medial lateral velocity and your head turn. Now, I spent a lot of time walking through the corridors of MIT wondering if I could tell two steps in advance whether someone would turn left or right and I’m not sure. But we can do it here.
Similarly, we have the ability to track the biomechanical model of a human arm and, with about 300 or 400 milliseconds of motion, so just about this much motion, predict with 75% accuracy where a person will reach on a table within four quadrants. That is very powerful. Those are very early predictions that a robot can use to maneuver around us or to synchronize and help us.
Now, the challenge is that in some cases you need to track medial lateral velocity and head turn. In some cases you need to track the hands, the feet. In some cases you need to track objects in the environment. And it takes a machine-learning Ph.D. to tailor your system for every new situation. And if your product changes a little bit, you have to tailor it again.
So we worked in the lab to develop data-driven approaches to be able to automatically stitch the most appropriate predictors together. So we take the machine-learning expert out of it. We feed the system data of people performing their tasks. And it weights the various classifiers and stitches them together in time to develop a very, very accurate prediction in space and time of where a person will be. So you can imagine that this is useful in a number of scenarios. We’ve deployed it in factories.
So I want to show you first a video from this scenario in the Automotive Tech Factory. And I’m going to show you the current state of the art and then what the system looks like when it uses our technique for prediction and planning.
VIDEO VOICE-OVER: First, we examine the scenario using a baseline method.Which emulates the standard safety systems in factories today. As the human enters the shared region, the robot is stopped. The human associate arrives at the depot with the robot nearby. Once the human leaves the shared region, the robot resumes it’s task. While safe, due to the large of amount of time the robot is stopped, this mode does not allow for efficient space sharing. Next, we examine the same scenario, but using our approach of combining prediction of planning and time. Here, we see that as we receive predictions that the human will enter the depot, the planner commands the robot to move away from the human. Now, the human arrives at the depot with the robot waiting at a safe distance. Once the system receives predictions that the human is leaving the shared workspace, the planner commands the robot to resume it’s task. This anticipatory behavior was derived automatically from applying our approach.
SHAH: So the robot didn’t need to be scripted to move out of the way of the person. By integrating where the person would be in space and time and enabling the robot to plan in space and time around the person, we see these behaviors where the robot makes way for the person and sometimes you’ll see behaviors where the robot will quickly try to scoot through and get to what it’s doing. But it’s very human-like. And these are applications where every second matters and I’ll come back to that at the end of the talk.
Now, predicting in space and time where you’ll be is useful. Predicting at a high level what’ll happen in the future is useful. But, ultimately, to work in real time with a person, we need to do something even more. We need to be able to understand in the moment what activity the person is doing. What is the meaningful activity that they’re doing. We need to know exactly when it starts and exactly when it ends. And only with that information can we really use our information to predict what will happen a few steps down the road.
And so we take a similar approach. We developed data-driven methods for providing information about the task. We take the machine-learning expert out of it. And the system learns over time what are the key features to track. What parts to track in building the car? Does it track the hands? Does it track the head of the person? And we apply a similar approach to being able to monitor in a very fine-grained way when activities start and end.
So, this can now be used for deploying [a robot], but it’s truly collaborative. It does more than avoid us but can provide the right material at the right time. So you’ll see on the top here, we have a timeline of the various activities involved in assembling a part of a dashboard. The person here, the human, is going to work with this robot right next to the person; it’s a mobile robot. The dashboard is here, circled in orange. And there’s a meter to install into that dashboard from one of those blue boxes and there’s a NAV unit to install into that dashboard from one of the other blue boxes.
And the way this would work today is the associate would walk back and forth from those blue boxes to pick up the next piece and install it. And so you’ll see the robot using its online activity recognition system and prediction system to offer the right materials at the right time.
And so one thing to note is that—I always like to point this out because this is important for roboticists—this is sped two times. Our activity recognition system works just about within milliseconds. And the robot can move very, very fast. But we ultimately need approval for conducting these experiments and so, as we’re building up the capability of our system, these robots move more slowly. So you’ll see the video is sped up and there is waiting time on the person’s side as the robot offers the material. But this is primarily for us to build competence in the collaboration and for us to slowly increase the speed of this robot to its full capability.
You can see the timeline up top. The robot’s identified that the person moved to the meter. It’s collecting the meter for the person. The robot then needs to identify where the person is, generate a handover to that person. And that handover process is very subtle. There’s actually an enormous amount of study in the robotics community in how you design a robot to recognize the subtle signals of when to let go to avoid dropping a part.
And so the robot predicts the piece, the instrument that the person will require next. Goes to retrieve it from the other bin. And offers that to the person.
Okay, so this is a slow video, but our experiments both in this setting, in the real setting, and in experiments that we do with elbow-to-elbow collaboration between people and industrial robots like you see in this picture. Bear out that a robot that can do these small things, of anticipating where you’ll be, what you need and being able to use that information to re-plan can reduce the amount of time it takes to perform the task, can increase concurrent motion between the person and the robot, it can reduce human idle time and it can reduce robot idle time.
And there’s a sense in safety critical domains that you often have to trade something for efficiency. And, in this case, you may need to trade safety for efficiency. The robot needs to move faster around you, it needs to move closer to you. But we gain all of those benefits in efficiency or productivity with an increased average separation distance between the person and the robot. So we’re not trading safety for efficiency. By making a robot that’s smarter, that understands what’s better, we can achieve the best of both worlds—a more efficient collaboration and improved safety.
Our studies show that, based on these measures across these domains, it’s possible that we could, say in an automotive factory, save something like 3 minutes out of every hour in building a car. Okay, what does that mean? That’s $80K, those three minutes. So what does that mean over a two-day shift, or a two-shift day? That’s about a million dollars. What does that mean over the course of a month? $30 million dollars. That’s big money for basically saving seconds here and there. But this is the promise of a robot that can collaborate with us seamlessly.
Now, if you were looking at the trivia at the beginning of the opening, does anybody remember how many industrial robots are in use worldwide today? Go for it—1.8 million, nicely done. Okay, how many robots are in our homes, domestic and service robots? Thirty million. Okay, that’s not including the robots that are in our office complexes, in our work environments. It’s not included connected systems like the Alexa, Google Home. We have enormous potential for these systems to understand what’s better and optimize our lives the way they can optimize factories. And this is what we’re working on in the lab. The days in which robots are separated from us, behind cages, are over. And we’re working towards a future in which we all want to hug our potentially dangerous industrial robot. They make us the most effective that we can possibly be, as well.
And with that, I can wrap up. Thank you.
[APPLAUSE]
Machine learning uses algorithms to create computers capable of learning tasks on their own rather than simply follow their programming. These new technologies mimic the way human beings learn on the job, and they’re the future of robotics. Julie Shah of MIT explores how the next generation of robots will be able to work side by side with humans naturally, efficiently, and safely.
Hear the full talk on the future of robotics on the Science@AMNH podcast.