FBI search warrants for Google Location Service and Geofence data in the U.S. Capitol Hill siege investigation have attracted significant legal and media attention in recent months1. Less well known, however, is the fact that U.S. Government agency subpoenas and search warrants issued to Google have increased more than ten-fold in the past decade2. See Exhibit 1 below.
Exhibit 1. Requests by U.S. Government agencies to Google
In this article, we discuss the use of data from Google Location Services and other proprietary location services in court. The geographical location data provided by these services can be based on cell towers, Wi-Fi, global positioning systems (GPS), or any combination of these sources. Each of these technologies provide a valid source of location data; however, as with any data, it is important to understand the theory and technology upon which the ultimate conclusions are based, including model assumptions and the way the data is stored and subsequently retrieved. Furthermore, when such data is used in legal proceedings, we must adhere to existing standards for the admissibility of scientific evidence.
In 1993, the Supreme Court set the standard for expert testimony in the seminal case (Daubert v. Merrell Dow Pharmaceuticals, Inc., 1993)3.Under the Daubert standard, factors considered in determining whether scientific evidence is admissible now include4:
- whether the theory or technique in question can be and has been tested
- whether it has been subjected to peer review and publication
- its known or potential error rate
- the existence and maintenance of standards controlling its operation, and
- whether it has attracted widespread acceptance within a relevant scientific community
A key issue when using location services data is that the technology and algorithms involved are often proprietary, so clear and transparent responses to Daubert standards are not forthcoming. When a Government Agency obtains location history records from Google in response to a subpoena, the first line of the received document states “Google Confidential & Proprietary”. Right underneath, it describes the “Display Radius” value as “depend[ing] on a great many factors and is an approximation sufficient for its intended product uses.” The “Display Radius” is a critical parameter when assessing the potential error in Google location data: it is the uncertainty of the Blue Dot, a circular area that expands with uncertainty. Because of the proprietary nature of the software, it is not clear how this number is calculated, nor what the “intended product uses” are.
Proprietary issues relating to location services data are not confined to Google. Another location data source that is frequently subpoenaed by government agencies is Network Event Location System (NELOS) from AT&T, which goes beyond standard fixed cell tower records. Standard fixed cell tower records show the cell tower that a particular cellular phone connected to at a particular time. In most circumstances, this will be the cell tower with the strongest signal, not necessarily the closest proximity. Such data can be used to determine the general direction a cellular device is moving using a standard process that is well-known, well-tested, and has been subject to extensive peer-review.5
NELOS, in comparison, adds additional sources of information to explicitly estimate the location of the cellular device at a singular moment. “Possible sources of information relied upon to create NELOS reports include GPS, Wi-Fi signals, single cell towers, or cell tower triangulation. But there is no indication within a NELOS record which of these sources provided any one location, or what database it was derived from and who controls that database. And even if the specific database and the locational source were known, it is not possible to verify the location of the source on the date in question over one year ago. Thus, it is impossible to verify the reliability, accuracy, or authenticity of the data”.6 Recognizing reliability issues in its own NELOS records, AT&T adds the following disclaimer to the top of its document “The results provided are AT&T’s best estimate of the location of the target number. Please exercise caution in using these records for investigative purposes as location data is sourced from various databases which may cause location results to be less than exact”. In a recent Daubert hearing7, some of these NELOS issues were raised. In this hearing the proprietary nature of the estimation algorithm was addressed more explicitly, with particular focus on the black box nature of the location estimation and the absence of scholarly review and evaluation of the technology.
Similar legal concerns should be raised with Google Location Services data. Many of the pertinent issues involved with Google location data are discussed in the academic paper “Google timeline accuracy assessment and error prediction”8, which is the only peer-reviewed journal paper to investigate Google Location Services data that we are aware of. The authors discuss the proprietary nature of the data: “Google withholds information about its algorithms on how the location estimation is computed, and which variables and parameters influence it.” This, like the NELOS algorithm, warrants careful consideration under the Daubert standard.
The researchers tested Google Location Services data because “In order to be able to harness this information in court or for justice purposes, an assessment of the accuracy has to be performed”. Unfortunately, any testing process is complicated by the fact that Google does not reveal its estimates of your location in real time on your device. This is a subtle but important point – we should be able to check that Google’s estimates on a specific device, at a specific time, match the data that is stored and subsequently retrieved from Google Location Services. By contrast, we can do independent real time field tests on cell tower, Wi-Fi, and GPS data. The researchers overcame this issue by time-stamping real time measurements from dedicated GPS devices, their “ground truth” measurements, and comparing them with Google Timeline data that they downloaded later. This absence of real time Google location data complicates testing limits peer review and publication, and raises questions over the existence and maintenance of standards controlling its operation.
The Dutch authors provide a very detailed and statistical account of their experiments, concluding that “Google locations and their accuracies should not be used in a definite way to determine the location of a mobile device”. With GPS, device accuracy was within Google’s “Display Radius” 52% of the time, and with Wi-Fi only that fell dramatically to 7%. As discussed earlier, we don’t know exactly how the “Display Radius” is calculated, but the Dutch research indicates the device was outside of Google’s own radius of uncertainty in 93% of measurements when Wi-Fi was the location source. This isn’t just a speculative concern-in a recent case, a recent case, the defendant’s location history was sourced from Wi-Fi approximately half the time, which obviously raises serious concerns.
Google responds to Government Agency subpoenas with a location history spreadsheet that contains a “Source”, which can switch between “GPS”, “CELL” or “WIFI” at different times, as well as a “Timestamp”, “Latitude”, “Longitude”, “Display Radius”, “Device Tag” and “Platform”. However, it is not clear if the “Source” is a raw value, e.g., GPS only, or a blend of cellular, Wi-Fi, and GPS data that is most heavily weighted toward GPS. Ideally, we would have separate data for each of the raw location sources, as well as the final Google estimate. Without specific details it is hard to know how the “Latitude”, “Longitude”, and “Display Radius” data are calculated, as well as why and how the data switches between “Sources”. These are all black box functions of the Google Location Services algorithm.
In addition to issues relating to the proprietary nature of the technology and algorithms used to estimate location, there are also unknowns relating to data storage and retrieval. Google Location Services data is based on a Google account username rather than internationally recognized identifiers for electronic equipment. A Google user could log into Google Location Services on multiple devices and have multiple location histories in their account. By contrast, call detail records are based on a unique MSISDN, or cell number, and a unique IMEI, or international mobile equipment identifier, so at each timestamp we can be confident we know a specific device connected to a cell tower. In Google location data the latitude and longitude estimates are associated with a “Device Tag”, which appears to be Google’s proprietary device identifier. Mapping a “Device Tag” to a more conventional IMEI should return the device data, though it is not clear why such tags are used or whether mapping errors can occur.
After being subpoenaed by a Government Agency, location data is typically plotted in mapping software and used as evidence in court proceedings – a blue dot that marks the spot can be very compelling to a jury. As such, a proprietary spreadsheet from Google or AT&T can become pivotal evidence in legal proceedings. Careful consideration of the data, its limitations, and whether it even meets existing standards for the admissibility of scientific evidence is required. A common theme amongst all these technologies is that they are proprietary and have not been subject to extensive independent scholarly review and evaluation. We believe that cell tower, Wi-Fi, and GPS are very important sources of location data, and we welcome more transparent details from the location service providers, as well as further independent research and evaluation. We believe that the Daubert standard provides a logical starting point for discussions of location services data in legal cases – it is established, methodical, and scientific.