Data processing documentation

We produce four datasets that are minimally-processed versions of original ICE data to simplify analysis: arrests, detentions at both the stay and stint level, and detainer requests. The stay-level detention dataset is the most heavily processed of these, while the others involve only adding a variable to flag duplicates and renaming variables for clarity.

We provide descriptions of data processing steps and links to the R code that constructed each dataset below.

Arrests, detainer requests, and detention stints

We post data with minimal changes to facilitate analysis using standard tools, including R, Python, Stata, SPSS, and Excel. We do not add or drop any rows.

For these minimally processed datasets, we make the following changes:

  • Add flag for likely duplicates. We add an indicator for rows that are possibly or likely duplicates. For arrests, we flag those within 24 hours of each other for the same noncitizen. For detainers, we flag rows with the same noncitizen ID and request date, because there is no time information. Analysts can use this flag to drop likely duplicates if desired. In some rare cases, these may reflect multiple enforcement actions within a 24-hour period. Most reflect duplicates recorded for administrative reasons, such as when a record is corrected after initial entry.
  • Drop blank or fully redacted columns. The column names can be found in the raw data, also available on the ICE data page.
  • Convert date-time fields to date when there is no time information. In some cases, date fields appear to have time information but every time is recorded as “00:00:00”. For ease of analysis, we convert these to date-only fields, dropping the time information.
  • Add an arrest date variable. The column is simple the date portion of the arrest date-time field, to facilitate analysis that does not require time information.

In a few cases, variable names are shortened to enable saving in Stata or SPSS format.

Inspect code

Detention stays

The detentions data are provided by ICE in a more complex format than the other tables. Our goal in posting this simplified dataset is to make analysis of ICE detention data more straightforward. In the original ICE data, there is a row for every book-in to a particular detention center, but most questions about detention concern what ICE calls an individual’s “stay” in detention — a single period of detention for a single person that often includes transfers between detention centers.

We create a dataset that has a single row for each stay in detention, preserving most (but not all) of the information in the original dataset. Note that individuals can also have more than one stay in detention if they are released and later detained again; in that case, this dataset includes a row for each stay, and those repeated stays are (anonymously) identifiable by the unique IDs in the data, which correspond to individuals’ A-numbers.

We describe those steps in general terms here:

  • Drop duplicate stints. We first identify duplicate detention stints in the data–stints with the same book-in date/time at the same detention center. There are a few thousand of these, a small number relative to the full dataset of over 1.3 million records. Nearly all of these reflect duplicated stints where an individual’s “initial bond set amount” changed. In order to eliminate these duplicates, we created a new variable called “lowest initial bond set amount” that reflects the lowest initial bond amount associated with a given stint.
  • Keep data from last stint. Then we preserve only the last stint in each stay. For most fields, this does not cause any loss of information because values in these fields (e.g. final order date, most serious conviction, and citizenship country) typically do not change within detention stays.
  • Join data from first, last, and longest stint. Finally, we add back in limited information from the stint level: for the first stint, the last stint, and the longest stint, we include the book-in date, book-out date, and detention facility. If a stay includes only one stint, then these are all identical.

Inspect code

Field offices and areas of responsibility

Data on ICE field offices and sub-offices was obtained from the ICE web site’s list of field offices and list of check-in locations as of September 2025. The boundaries of the areas of responsibility were largely sourced from those two lists, which describe areas of responsibility for each office and sub-office by state and/or county; ICE’s areas of responsibility map (see Arrests by Area of Responsibility dashboard) provided details where the lists were ambiguous. The boundaries of states and counties were drawn from the 2024 US Census TIGER/Line shapefiles.

Notes:

  • There is a discrepancy in the definition of the area of responsibility of the Houston Field Office. In some sources, it is described as covering 54 counties, while in others, it is listed as covering 57 counties. Recent ICE statements indicate that the Houston Field Office covers 57 counties, which is consistent with the ICE areas of responsibility map and with the counts of the other Texas field office areas of responsibility reported by ICE. We have used the 57-county definition in this table.
  • There is also a discrepancy for the area of responsibility of the San Francisco Field Office. The list of field offices describes the San Francisco Field Office as covering the island of Saipan, while the list of check-in locations describes the sub-office for Saipan as covering all of the Commonwealth of the Northern Mariana Islands. The areas of responsibility map does not show the Northern Mariana Islands. We have used the more expansive definition in the check-in locations list in this table.
  • We do not include the headquarters (“HQ”) office used in ICE’s individual-level data in the table above, because it is not included in ICE’s list of field offices or the areas of responsibility map. When an arrest or removal is attributed to “HQ” in ICE’s data, we do not know whether or what this indicates about where the arrest took place.
  • The Harlingen field office was created in FY2022 from parts of the San Antonio and Houston areas (see ICE Statistics).
  • The only permanently-populated US territory not covered by an ICE ERO field office is American Samoa, which is because it has its own immigration system, independent of the rest of the US.
  • There are typos and inconsistencies in the descriptions of the areas of responsibility in ICE’s lists of field offices and check-in locations. We retained the original text.

Inspect code