What Is Computer Vision And How Does Law Enforcement Use It?
Computer vision is the art of training computers to understand visual data like photographs and videos. Computer vision is part of familiar technologies, such as UPC bar code readers, and exotic technologies, like self-driving cars. Government agencies and law enforcement are adapting computer vision technologies for their purposes. Automated license plate readers, social media dragnets, and mass surveillance operations all use computer vision. Integrating information from computer vision with private and government databases allows an unprecedented level of surveillance. Although there are other forms of computer vision used by law enforcement for forensic purposes, such as AFIS or NIBIN, this article focuses on the use of deep-learning computer vision software to detect and identify people or objects in videos or images.
How do computers recognize images and video?1
Computers require software to perform any useful task. Traditionally, developers explicitly programmed software to interpret input, meaning programmers had to understand the important relationships within the data. Machine learning is an incredibly powerful set of artificial intelligence tools that help computers uncover relationships in data without explicit instructions. A real estate software developer knows that a house’s sale price relates to the house’s zip code, square footage, schools, and the year of the home’s construction, but might be unsure exactly how to predict a house’s value. Machine learning can determine the relationship between the relevant factors and predict a house’s value, using only historical sales data.
Machine Learning and Deep Learning
Machine learning requires a person to identify the important factors (features) needed to understand the relationships in a set of data. For example, if a medical software developer wanted to understand which patients were at the greatest risk of a heart attack, the developer might need an understanding of what factors (features) might increase the risk of a heart attack. Deep learning is a refinement of machine learning that allows the computer to not only understand relationships between the important factors (features) but also to determine what factors (features) are important to the relationship. A person might understand that a stop sign is a red octagon with a white border and white lettering that says “STOP”, but computers do not understand language or images like a human being. A deep learning system can train itself using many different pre-categorized images, creating a method to recognize a stop sign after comparing the images that contain stop signs to images that do not contain stop signs. The deep learning system continues to improve its ability to detect stop signs without any human input by repeating the training process.
Figure 1: How the terms Artificial Intelligence, Machine Learning, Deep Learning, Neural Network, and Convolutional Neural Network relate.
How Information Flows Through a Neural Network
To design software capable of unassisted learning, researchers used nerve cells (neurons) as an inspiration. A neural network is an interconnected network of individual units called neurons or perceptrons (See Figure 2). Neurons receive input, produce output, and can communicate with other neurons. Neurons are organized into distinct layers. Each input, whether it comes from the outside world or another neuron, can have its own weight. A neuron does not treat all inputs equally and may ignore some inputs entirely. The output of each neuron is decided by an activation function. Typically, the input connects with the first layer of neurons, the first layer of neurons sends output to the second layer of neurons, and data typically flows from layer to layer in the same direction through the network, until the last layer of neurons provides the final output.
Figure 2: A single neuron/perceptron.
Figure 3: A simple neural network.
This process works something like a Presidential election. Each voter hears the same news (the input) but gives the news a different weight. Each voter adds up pros and cons (activation function), deciding how to vote. Voters are pooled by State (this layer’s input), the votes are weighted equally, and each State’s Electoral College allots its electoral votes according to the State’s popular vote (its activation function). Electoral votes (this layer’s input) are weighted by the number of congressional delegates in the State, and the President is selected by the total number of Electoral College votes (another activation function). Just as it is impossible to predict the exact outcome of an election from the news or even the popular vote, a neural network’s behavior depends on the variety of weights and what then happens at each layer.
How Neural Networks Learn
Neural networks are greedy: they require massive amounts of pre-classified training data, for instance, pictures known to contain stop signs and known not to contain stop signs. It is possible to determine how well a neural network correctly interprets the data for a set of weights by calculating the error, but often many images are misclassified. To reduce the amount of error, the software will adjust each weight (using an optimization algorithm). One common method used to optimize the weights is called backpropagation. Backpropagation is like a “hot and cold game.” The neural network moves towards what it calculates to be the “hot” direction (minimum error), while avoiding the “cold” direction, changing its direction when it receives feedback like “you are getting colder!” The process will usually stop when enough cycles occur or when the repeated cycles lead to only tiny differences in error.
Convolutional Neural Networks and Computer Vision
Neural networks are powerful but can grow unwieldy and unworkable when there are too many neurons and connections involved. As there are often tens of thousands or millions of pixels in a single image or frame of video, traditional neural networks do not perform well in most computer vision tasks. Convolutional neural networks transform (convolve) image data into smaller, more workable chunks and retain the spatial relationships in the picture. The software sweeps through the image or video (using a sliding window) and transforms the image (using a filter/kernel) into a form suited to a particular task. There may be several layers of transformations, and at the end of the transformation process, the data may be fed into a traditional neural network. A convolutional neural network will go through the same deep learning and training process as a traditional neural network, teaching itself how to better perform its task.
The process is akin to how a person might solve a difficult “spot the differences” puzzle. The person moves methodically through the image, comparing small segment to small segment, eventually sweeping through the picture. The person marks the differences with stars on a piece of transparent film. To a second person looking at the film and seeing only the stars, the marks may not make sense. To the person who solved the puzzle, the marks contain all the important information about the solution to the puzzle.
Figure 4: The architecture of one type of convolutional neural network.
If a computer vision system can “recognize” an image, the system can then process the image to compare the image against other relevant images or data. A license plate reader can detect and identify letters and numbers on a license plate. For example, a facial recognition system may locate a person’s pupils and measure standardized points of interest from those pupils (nodal points), correcting for the picture’s orientation. By reducing an image of a license plate into letters and numbers or a face into simple measurements, it becomes possible to search databases and share information across many different information systems quickly and efficiently.
Figure 5: Segmenting a license plate into different letters and numbers.
Figure 6: Using nodal points for facial recognition.
Limitations of Computer Vision
A major limitation of computer vision and other types of deep learning is that the “reasoning” the computer uses is opaque, meaning that the data between input and output is unreadable, even by an expert. Convolutional neural networks are fundamentally different than human vision, and computer vision will fail to recognize images that are unambiguous to a human. For example, putting a small sticker on a stop sign may cause a convolutional neural network to interpret an image incorrectly.2 It may be impossible to predict what sort of conditions will cause the computer vision system to fail.3 The term used to describe the potential failure to cope with scenarios outside the training data is brittleness. Artificial intelligence is shallow, meaning it does not have a robust understanding of what it means to be a stop sign. Whereas a human being might understand that leaves, stickers, paint, or damage may obscure parts of a stop sign, a computer only “understands” what it has been trained to “know.”
Using unsuitable training data may negatively influence computer vision even during normal conditions. If a facial recognition database does not contain enough training data from different types of people, the system may perform poorly on underrepresented groups. The system might not contain enough nighttime photos or photos from different angles and may perform poorly when fed data from untested video cameras. The training data may contain too few images to perform reliable face recognition. The term used to describe this weakness in training data is bias.
Total Surveillance
Agencies link facial recognition and vehicle license plate identification tools to public and private databases and large-scale surveillance networks, enabling mass surveillance. Computer vision applications include virtual “gateways,” monitoring who enters and leaves an area, conducting facial recognition searches against drivers’ license databases to identify people on bodycam video, and using computer-vision-aided surveillance cameras to monitor protests.4 Subject matter experts predict that in the near future, computer-vision-monitored surveillance will be ubiquitous.5
Computer Vision and the Fourth Amendment
Agencies defend mass surveillance as equivalent to traditional surveillance and only capturing public information. However, traditional surveillance is limited in both scope and duration – no agency has the resources to track each citizen every day. Unlike traditional surveillance, computer vision surveillance programs allow agencies to simultaneously monitor all traffic through a city, simultaneously tracking every person’s movement from sensor to sensor.
The Fourth Amendment imposes limitations on the use of electronic surveillance to track movement through public spaces. First, the Fourth Amendment prohibits “permeating” police surveillance6, though it is unclear how that term applies to modern technologies. Second, Fourth Amendment affords some measure of protection to what a person seeks to preserve as private, even in an area accessible to the public.7 Third, the Supreme Court has rejected lengthy, warrantless location tracking schemes (through GPS and historical cell site data) in both Jones8 and Carpenter9. The legality of law enforcement computer vision tools may depend on the scope of the surveillance, the data integration capabilities of the tool, how much historical data is retained, whether law enforcement obtains a warrant, and public sentiment regarding mass surveillance. There is not much caselaw in this particular area, and the Fourth Amendment might be a viable attack on mass surveillance, even when supported by a warrant.
Computer Vision and Equal Protection
Equal protection provides limited protection against actions that discriminate against protected classes. Computer vision systems that are poorly trained – for example, using mostly Caucasian faces or mugshots to train data – may lead to differences in how the software performs with different protected classes.10 Abroad, computer vision systems have been used to target and harm minorities.11 When combined with other information systems with systemic differences between protected classes, computer vision systems may inherit those underlying biases.12
Unfortunately, equal protection standards are unfavorable to litigants. A facially neutral state action only violates equal protection rights when the action has a racially discriminatory purpose.13 It is insufficient to demonstrate disparate impact. Issues relating to choosing poor training data is unlikely to satisfy this demanding legal standard. However, if the purpose of a computer vision surveillance system is to keep out protected classes that law enforcement deems “undesirable,” it may be possible to show an equal protection violation.
Computer Vision and the First Amendment
Law enforcement often infiltrates, surveils, and disrupts disfavored groups and protestors.14 Agencies have coupled traditional surveillance with computer vision tools, such as Clearview AI, to arrest protestors15 and it is suspected that agencies use facial recognition to conduct surveillance of protestors.16 The use of computer-vision-aided surveillance techniques provides law enforcement the ability to track and identify protestors in real-time and to associate images with social media accounts and driver’s license photos. Through purpose or effect, powerful surveillance systems may chill free speech and assembly, violating the First Amendment. There is not much caselaw in this particular area, and particularly where mass surveillance is used to intimidate or discourage public protest, it is worthwhile to challenge the constitutionality of mass surveillance.
Some suggestions for your cases
1. Routinely request whether undisclosed video surveillance, including facial recognition and automated license plate readers, were used during the investigation and request any associated warrants.
Do not assume the State will be forthcoming about the use of mass surveillance. In major cases, cases involving special taskforces, and cases where there are major gaps in the offense report, investigators likely used some undisclosed means to find your client.
2. Obtain the resume/curriculum vitae for the State’s proposed expert and determine whether the proposed experts satisfy Rule 702 and/or the Confrontation Clause.
If your “expert” is an officer, it is unlikely that the officer understands computer vision. Cross-examination may focus on the limitations of deep learning, and a law enforcement agent is not likely to understand how the software actually works.
3. Obtain expert assistance when computer vision is important to the case. You may need an expert on computer vision and/or the ethical issues and limitations of computer vision.
An expert can analyze the computer vision technology used in the case and can serve as a stark comparison to an underqualified officer.
4. Request any data used to train or evaluate computer vision software. Request a breakdown and summary of the variables accounted for in the training and evaluation data. Request any validation studies used to support the use of the software on casework.
The training and validation process establishes the limits of computer vision’s expected performance and is an area where poor design can have a massive negative impact.
5. Request the source code for the computer vision system so your expert can review the code.
An expert can determine whether there are obvious flaws in the software used in this case. In some cases, the prosecution will dismiss a case rather than disclose the capabilities of advanced surveillance technology.
6. Request any peer-reviewed studies documenting the methodology used in the computer vision software.
The proponent of computer vision software output must demonstrate that the software applies a reliable methodology. The designated “expert” may not understand, be willing to disclose, or may have no unbiased validation data, leading to grounds for a 702 challenge.
7. Request the data the investigators collected on your client and information about the data systems integrated with the computer vision system.
In addition to having your expert review the data for correctness, the data may contain evidence that the surveillance program collects an intrusive, unconstitutional amount of data.
8. Do not reinvent the wheel.
Request help from civil liberties organizations like the Electronic Freedom Frontier (eff.org) or the American Civil Liberties Union. These organizations collect information about law enforcement technologies and can help you understand and litigate against the use of these technologies.
9. Attack the weaknesses: brittleness, greediness, shallowness, and opacity.
Any qualified expert should understand both that computer vision can behave unpredictably and is not error-free. Interview the opposing expert and see if the expert will acknowledge the limitations of computer vision and consider challenging the expert’s qualifications and/or credibility if the expert denies the known limitations of computer vision.
Footnotes
- For a more in-depth, mathematical, and still approachable explanation of neural networks and computer vision, see Michael Nielsen, Neural Networks and Deep Learning, Determination Press (2015).
- Kevin Eykholt et al., Robust Physical-World Attacks on Deep Learning Visual Classification, Computer Vision, and Pattern Recognition (2018).
- Jason Pontin, Brittle, Greedy, Opaque, and Shallow”: The Promise and Limits of Today’s Artificial Intelligence, Flagship Pioneering (Sept. 26, 2019) available at https://www.flagshippioneering.com/stories/brittle-greedy-opaque-and-shallow-the-promise-and-limits-of-todays-artificial-intelligence.
- Anthony P. Picadio, Privacy in the Age of Surveillance: Technological Surveillance and the Fourth Amendment, 90 Pa. B.A.Q. 162, 178 (2019).
- Josef F. Koller, The Future of Ubiquitous, Realtime Intelligence: A GEOINT Singularity, The Aerospace Corporation (2019).
- United States v. Di Re, 332 U.S. 581, 595 (1948).
- Katz v. United States, 389 U.S. 347, 351–52 (1967).
- United States v. Jones, 565 U.S. 400, 419 (2012).
- Carpenter v. United States, 138 S. Ct. 2206, 2217 (2018).
- Joy Buolamwini and Timnit Gebru, Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, 81 Proceedings of the 1st Conference on Fairness Accountability and Transparency at 77-91 (2018).
- Richard Van Noorden, The ethical questions that haunt facial-recognition research, Nature (Nov. 18, 2020).
- Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner, Machine Bias, ProPublica (May 23, 2016) available at https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencingfacial.
- Washington v. Davis, 426 U.S. 229, 239 (1976).
- See e.g. Socialist Workers Party v. Attorney Gen. of U.S., 642 F. Supp. 1357, 1420 (S.D.N.Y. 1986).
- Kate Cox, Cops in Miami, NYC arrest protesters from facial recognition matches, Ars Technica (2020).
- See e.g. Joshua Ceballos, UM Used Surveillance to Track Student Protesters, Miami New Times (Oct. 15, 2020); Elise Schmelzer, How Colorado law enforcement quietly expanded its use of facial recognition, Denver Post (Sept. 27, 2020).