The DMV is proposing to let the companies that are developing self-driving cars use California's streets and highways as a testbed for vehicles not only with no backup driver but with no steering wheel or similar controls. Any problems requiring human intervention would be done remotely: from the companies offices.(foot#2) There are 27 companies known to be working on such vehicles--21 actively testing in California--and conflicting behaviors should be expected. The "Rules of the Road" are incomplete guidelines that are often ambiguous to human drivers, and lack the specificity needed by automated systems. Consequently, each company is likely to develop own version of what the right thing to do is, and each is virtually certain to have incomplete coverage.(foot#3) The problem in the Ethernet specification is a cautionary example: The number of situations there was minuscule compared to driving, and it had been subjected to close examination by a wide range of people for over a decade.
I will not be discussing the technology itself, but rather focusing on the issues of being such a testbed. Nor will I be advocating for specific requirements related to the testbed, but rather providing some background to help you understand some of the tradeoffs between risks and benefits. The public input period is announced to in April, with rules being announced later this year.
I see three basic issues with the DMV's current approach/outline. First, it fails to take into account that there will be failures in remote control. Second, the companies will "self-certify" that their vehicles are safe-enough. Cities will not have the ability to control access to their streets, either in total or in specific areas (such are near elementary schools): The DMV removed the requirement for an ordinance or formal resolution by the city and currently only requires the company to have a "statement of support" in the application. Notice that this does not enable the city to withdraw permission, for example, if there are too many dangerous events. Third, there is no provision for compensation/benefits to the cities and citizenry for being used as guinea pigs. The basic purpose of a testbed is to find out what works and doesn't, and there inevitably will be big surprises (= failures).
----1. Remote Control----
There are three basic components to remote control:
- the remote human operators who can intervene as needed,
- the network between that remote operator and the car, and
- the on-board (software) agent that provides remote monitoring and human intervention.
The human monitor: Part of the push for fully automated vehicles is that expecting a human in the car to intervene when needed is highly problematic: They are lulled into inattention and it takes many seconds for them to become engaged, plus they make errors that a driver in a non-automated vehicle probably wouldn't. This is a well-known and studied problem for commercial airliners. However, there are significant differences between the demands for airliners and those for cars. Presumably, this is part of what is to be studied and refined as part of the testbed.
Network: Although this is not specified, the presumption is that remote control will be through the cellular network. Most of us probably have experienced problems when driving on major arteries during peak hours: There are times and places where the breaking up of voice conversations makes them annoying to impossible. Remember that remote control is likely to require substantial bandwidth to get enough sensor data to the remote operator.
Step this up one level to intense congestion, such as the gridlock (aka "Carmageddon") that occurred on December 1.(foot#4) The cellular network is additionally overloaded during such events by drivers calling people to say they will be late and to download maps and directions to try to find alternate routes. The chaos of such events may well create situations unanticipated by the onboard systems and thus requiring remote intervention.
The next step up is a situation analogous to the Oakland Hills Fire. Some of the evacuees had to drive through dense smoke, with fire on both sides of the narrow roads, and there were many narrow escapes. Think how many more could have died if a self-driving car had become "confused" and blocked a key escape route. Recognize that the proposal is to allow cars with no steering wheel or accelerator for humans to take over in an emergency. One of the constants of over six decades of work on Artificial Intelligence has been the frequency of unanticipated situations arising which fatally "confuse" the system, for example arising from training and testing that had unrecognized assumptions and biases.
Finally, recognize that the cellular network is not robust in the event of electrical power failures. Most cell towers have backups--generators and batteries--good for only 4-6 hours and no plan for refueling or recharging on an emergency basis. Palo Alto experienced this during the February 2010 plane crash that knocked out electric power(foot#5): Cell service declined throughout the day as the towers shut down. However, many in surrounding cities were unaware of this. In a similar situation, a self-driving car could enter the blacked-out zone and the first the remote monitor would be aware of it would be only after it lost contact with the vehicle.
On-board agent: As a system matures, the remaining bugs increasingly arise from situations that are unanticipated, surprising and even bizarre. Some of these simultaneously trigger bugs in the monitoring software, either blinding it to the problem (the opening example) or locking it out of the system (for example, in Windows sometimes the CTL-ALT-DEL key combination fails to get the attention of the OS).
Background: The self-driving cars from different companies require different rates of human intervention, and this affects the risk assessment of problems with remote monitoring. Google/Waymo reports one intervention/disengagement per 5000 miles,(foot#6) whereas Uber reportedly requires more than one intervention per mile in a very unchallenging environment.(foot#7) A old report (for 2015) provides an additional feel for the different levels of development for some of the companies.(foot#8)
There are many potential sources of problems with self-certification. Some come from cutting corners to meet the schedule, or even outright cheating. Some from a poor culture. Some from companies trying to do a proper assessment, but with inadequate checks on optimism and enthusiasm. Some are the timeless problems in developing complex systems.
Corporate Schedules: In the first Space Shuttle disaster (Challenger, 1986), one of the contributing factors in the decision to not delay the launch was that the launch was to be cited in the State of the Union Address that night. An old joke in the software industry is "What do you call a bug we don't have time to fix? A feature." Other pressures include investor events and bonus/stock option grants.
The current high-profile exemplar of a company with a reputation for substantially sacrificing quality to meet deadlines is Uber, which places high priority on being first to release.(foot#9)
Corporate Culture: Companies like Uber are but extreme examples of a trend that emphasizes pushing out products to the market with a consequent de-emphasis on testing and other measures to find bugs. For most of its existence--through 2014--Facebook had the guiding philosophy of "Move fast and break things" and this was notable only because Facebook was more open about this than many similar companies. These priorities became dominant during the Dot-Com boom, and thus many of today's developers not only came up under this culture, but had teachers and managers whose experiences were influenced by this culture.(foot#10)(foot#11)(foot#12)
Underappreciated messiness of the real world: As part of their story of making progress, several companies have recounted similar accounts of their first attempt to get onto a highway: It required human intervention because their car wasn't allowed to go faster than the speed limit, but the traffic was.(foot#13) However, my response was to wonder why, oh why, they hadn't considered allowing extra speed if necessary to avoid a collision (the general topic of when and how to allow a robot to break the rules has been an active debate in robotics for many years, and this particular situation is part of an ongoing debate). I was flabbergasted when one of the companies said that they would avoid this in the future by allowing their vehicles to exceed the speed limit by 10 mph. First question: What if the traffic is going 15 mph faster? Second question: Suppose your map incorrectly lists the speed limit as 55 mph, but it is actually 65 mph and traffic is doing only slightly over that limit?
There are many different approaches within Artificial Intelligence research. One involves developing smarter and smarter algorithms. Another regards such universal tools are too expensive and complex, and instead develops a toolbox where the individual tools are specialized for portions of the problem. This transfers the difficulty, complexity and intelligence to selecting the appropriate tools and determining when the selected tool isn't working and you need to switch to a different tool. Most sophisticated AI systems are a combination of the two approaches, but as the individual tools become smarter and more capable, determining when to switch away from them becomes harder. Because the individual tools can handle so many (normal) situations themselves, it becomes increasingly difficult to find situations that necessitate a switch and thereby allow testing the transition.
In addition to a plethora of exceptions, systems working in the real world have to deal with bad data, unavailable data, and especially the unexpected unavailability of important data.(foot#14) "A man's got to know his limitations" is a key line from the "Dirty Harry" sequel "Magnum Force". Same applies to AI systems.
Unrecognized or unavoidable bias in data: Most real-world situations have so many variables that identifying them all, much less controlling them, results in some biases sneaking into even the most carefully constructed training data. An example from the earliest days of machine learning involved attempting to flag aerial photos that contained tanks. The system did fantastically well on the first test and failed miserably on all the subsequent tests. The problem was that the system had trained itself to look for an irrelevant variable. The training data came from a random sample of two sets of photos of the exact same area, one with tanks and the other without. The first test came from a sample of the remainder of those photos (proper procedure). The variable not accounted for was the weather: One set was taken when it was overcast and the other when it was sunny. And that was what the system taught itself to look for.
The problem of a relevant variable not being adequately represented can also occur. Most of the testing of self-driving cars in this area has occurred during drought years, and this year's rains were not of the intensity of the storms we experienced in the 1980s and 1990s. Those storms produced rainfall rates that might interfere with the sensors. Street flooding also occurred when the storm drains shut down because there was no room in the creeks for that water (better to have shallow water in many streets than cause a flood from a creek).
Another example of bias that is difficult to avoid: Google/Waymo's cars in this area are easily spotted and other drivers give them attention and a cushion that they don't give to human-driven cars. This could be a significant bias: When I lived in Michigan (1970s), a news tidbit was that police officers paid substantially more for auto insurance. The explanation was that they had a much higher rate of fender-benders in their personal vehicles because they were accustomed to other drivers giving their patrol cars a wide berth.
Mistaking self-driving cars being better at some aspects of driving as meaning that they are overall better: Some of the self-driving cars are far better than humans in responding to certain developing situations: Their sensors provide better coverage, and their computing power allows them to more quickly anticipate and respond to developing problems.(foot#15) However, they are bad at quickly inferring the rules for certain situations, such as the customized choreography for drop-off at an elementary school or the chaos of curbside drop-off and pick-up at an airport.
There is a range of other data that humans use that is currently beyond the capabilities of self-driving cars. For example, seeing where other drivers are looking: do they see you, what is their likely move, ... For example, profiling the drivers around you: on a cell phone, an aggressive lane changer, ...
Fixing problems: The problems of Google/Waymo cars being "road boulders" on El Camino and similar arterials is a long-standing one. I find published complaints of these cars doing under 25 mph is a 35 mph zone going back over a year. Despite this, the problem persists--I encountered it (again) just last month. This seems to be a problem with a trivial fix--correcting data in the map database--and should have generated reports from the engineers monitoring those cars: Having a constant stream of cars changing lanes to get around you is not something that is easily missed. My suspicion: This wasn't "sexy" and urgent enough to ever get to the front of the queue. It would be troubling if this was hard to fix because that would imply that places with reduced speed limits would also not be part of the database. Having a self-driving car speed through such an area would be dangerous, not just inconvenient.
When to fix a bug can be a very complicated decision: You have to worry that fixing one problem might trigger other problems. Because of the cost and time required for testing, you can be forced to accumulate a number of bug fixes and test them as a group.
Undue optimism: Numerous organization psychology experiments have found that optimistic people are much more likely to be promoted (than those that are concerned about the risk/reward balance). This may be a feature of all humans, not just organizations: Psychology experiments have found that clinically depressed people tend to be more accurate in predicting outcomes than normal people. Yet other studies find that while the leadership of successful groups tends to be dominated by optimism, there are some influential people who are skeptical and cautious.
Note: All this should be treated skeptically because there have been many unsuccessful attempts to repeat the results of prominent psychology experiments.
The problem of benefits becoming exaggerated and risks/problems being minimized has been so common for so long that there are numerous pithy renderings, for example, "Information Flow in the Software Industry". Examples: Microsoft's widely reviled Windows Vista (nagging) and Windows 8 (it presumed hardware that virtually no one had). Although companies try to cover up what happened ("blame the innocent; reward the guilty"), some detailed analyses have become public. One such addressed Release 5.1 of the Silicon Graphics (SGI) operating system: ("Stress Analysis of a Software Project[long" in The Risks Digest, Volume 15 Issue 80 1994-05-28).(foot#16)
Sometimes the problem is the reverse: Junior developers who have too much enthusiasm for--and faith in--their technology and who are not appreciative of the messiness of the real world nor with the escalating complexity and problems that come with larger and larger systems. It is not just that lack of awareness causes them to not program and test for such, but some dismiss, even actively resist, considering such (someone who is old enough to have experience is dismissed as "obsolete" or out-of-touch with current technology).
----3. Compensation to the community----
One of the reasons for the proliferation of "public beta tests" (declared or not) is that software companies had increasing difficulty finding potential customers who were willing to do a serious beta test, for example, beyond seeing of the software installed. But the companies themselves were responsible for the dearth of volunteers: They didn't recognize the often substantial costs of a beta test and rarely provided any real benefits for participation. It was just better to wait for the x.1 release.
Being a testbed for driverless self-driving cars is being promoted as an unalloyed benefit for the communities. Unsurprisingly, I have seen no attempt to identify the costs and risks as a step in determining what the appropriate level of benefits should be conferred on the communities serving as testbeds.
As a method for determining what you find as a reasonable balance between risk/costs and benefits, you might start with a question such as "Would I be comfortable with driverless cars from 20 different companies, including scandal-plagued Uber, delivering children to my child's elementary school?" If not, work back to find a lesser situation that you might be comfortable with.
The problem that the public faces in advocating for rules to protect our interests is that we know too little about the details of the technology, partly because it is rapidly changing and partly because it is proprietary to the companies. There is the technology itself, the evolving implementation, its performance (strengths and weaknesses), and the level of its testing for the various situations in which it will be used.
And please remember that this is a discussion about being a testbed for near-term versions of the technology, and not the potential of a perfected technology.
1. Ethernet problem: Summary: "Performance problems on high utilization Ethernets", USEnet posting by Wes Irish, 1993-10-16.
Full paper: "Investigations into observed performance problems on high utilization Ethernet networks" (it was announced, but a quick look didn't find where/if it was published).
2. "DMV signals for self-driving cars to go driverless", Palo Alto Online, 2017-03-15.
DMV announcement of 2017-03-10: "Deployment of Autonomous Vehicles for Public Operation"
3. Interaction of autonomous vehicles from different groups: I don't know whether the following is a parable or "inspired by actual events". A group tasked with developing a device to sample deep subsoils on a distant planet decided that the most efficient approach would be to have two devices: One to dig the hole and collect the samples, and one to move the soils away from the hole to minimize the energy requirement for lifting the dirt. The development of second device also focused on minimizing energy used by trading off distance it traveled against energy expended to dispose of the soil, for example, being able to slide the soil into a depression instead of lifting it onto a pile. When the time came to test the devices together, the test site was similar to pictures of Mars: A very flat, wind-swept plain with protruding rocks. So what did the dirt-mover do? It looked around, found only one depression--which was conveniently close by--and proceeded to dump the dirt into it. Problem: That "depression" was the hole being dug.
4. "Gridlock frustrates local drivers and residents", Palo Alto Online, 2016-12-16.
5. "Fatal plane crash causes major outage", Palo Alto Online, 2010-02-17.
6. "Waymo's self-driving cars need less driver intervention", USA Today, 2017-01-13. Appears to be a modest rewrite of a Waymo press release -- one that I couldn't easily find.
7. "Internal Metrics Show How Often Uber's Self-Driving Cars Need Human Help", BuzzFeed, 2017-03-16. This is admittedly not a high quality source, but is cited to show the vast differences in the capabilities of the various companies.
8. "Google's self-driving cars would've hit something 13 times if not for humans" - The Verge, 2016-01-13.
9. A recent article on this much-covered topic is "Uber's 'hustle-oriented' culture becomes a black mark on employees' resumes", The Guardian, 2017-03-07.
10. Example of de-emphasis of testing: I was in a company where my software utilized a package written by someone who was regarded as a "genius". That software was supposed to handle tens of thousands of data points, but my tests involving only hundreds found that it produced blindingly obvious garbage, and performance was painfully slow. In debugging that package--the author couldn't/wouldn't find the time--I found that it started producing garbage when the dataset size hit 128 (a 7-bit number). Obviously he had tested it with no more than a toy dataset.
11. Example: A company I worked for was close to releasing a product. One of the QA engineers was nervous about the test data being used and approached me for access to a huge dataset that for some time I had been offering to get. The QA engineer told me that in just the first hour of testing with this dataset, five "show-stopper" bugs were found.
12. The problems related to "geniuses" in the software industry rarely makes it into print except in relation to its negative impact on diversity (recently this was part of "Why is Silicon Valley so awful to women?", The Atlantic, 2017-04). These problems include undue deference and forbearance of shortcuts and serious deficiencies in their code.
- Example: While looking for performance problems in a package widely used for automated reasoning, I found the comment "Too bad about the general case".
- Example: In a system, the sole comment on the section meant to keep the user from being locked out was "You are not supposed to understand this". Two problems. First, how is this crucial section to be subject to code review if the author won't explain it? Second, that the author hasn't put in the effort to be able to explain it is a big red flag that there may well be problems in the logic.
- Example: In looking through the code of another very widely used software system, I found so many obvious basic errors that it was clear that there had been little or no code review, even by the author himself. These included errors, such as off-by-one, that would have gotten a "Programming 101" assignment marked down.
13. Example coverage of merging onto a highway: "Humans Are Slamming Into Driverless Cars and Exposing a Key Flaw" - Bloomberg, 2015-12-18.
14. Unexpected unavailability of important data: In the mid 1990s at SRI's Artificial Intelligence Center, my office was close to those of the robot group and I was routinely drafted to provide an obstacle or clutter for tests of the robot. One scenario involved the robot taking a shortcut between two hallways by going through a conference room. This demonstrated the robot's ability to navigate around the changeable clutter in the room (chairs, tables, carts and sometimes people) and the robot's ability to locate the door on the opposite side of the room and go directly to it rather than hugging the wall. During the official demo, the conference room was packed with people, so many that they blocked the robot's sensors from getting a fix on the far walls, and thus it couldn't locate the door with confidence. It dropped back into hugging the walls. Unfortunately, its map of the room didn't include a coffee hutch that had recently been built into the corner of the room. In testing, the robot had regarded this as a normal obstacle, such as a box, and gone around it. But again, the mass of people prevented the robot from building itself enough of a map to make this maneuver. Thus stymied, it dropped back into asking that the obstacle--the coffee hutch--be moved out if its way.
These sorts of cautionary stories should be part of the education of developers, but are too often crowded out by a focus on what works. This is not new: In the 1960s and 1970s I heard many stories of electrical and mechanical engineers who used mathematical formulas ignoring the constraints. One story was of an airplane auto-pilot that under certain conditions would fly in a spiral instead of a straight line.
Aside: This robot--"Flakey"--was the subject of Scientific American Frontiers with Alan Alda, Season 5 in Episode 1 "Life's Big Questions" starting at 40:05.
15. Better at anticipating a collision:
- ==I "Tesla Autopilot Predicts crash and avoids before it happens"== - YouTube.
- (simulation) ==I "2017 Subaru Impreza Commercial"== - YouTube.
16. Undue optimism/inadequate attention to warnings: The body of the cited posting is a memo entitled "Software Usability II" of 1993-10-05 by Tom Davis.
An abbreviated index by topic and chronologically is available.
----Boilerplate on Commenting----
The Guidelines for comments on this blog are different from those on Town Square Forums. I am attempting to foster more civility and substantive comments by deleting violations of the guidelines.
I am particularly strict about misrepresenting what others have said (me or other commenters). If I judge your comment as likely to provoke a response of "That is not what was said", do not be surprised to have it deleted. My primary goal is to avoid unnecessary and undesirable back-and-forth, but such misrepresentations also indicate that the author is unwilling/unable to participate in a meaningful, respectful conversation on the topic.
If you behave like a Troll, do not waste your time protesting when you get treated like one.