When Data Ages Out: Transit-Oriented Data Lifecycle Management
Welcome to my latest post devoted to developing the Transit-Oriented Developments dataset (launching in October or November). Every other week, I identify a data quality challenge and discuss how I’ve wrestled with that challenge. Problems have included, data mismatches, inaccurate geographic coordinates, incomplete land use data, different ways of counting stations and choosing whether to include or exclude stations from the data. This post addresses a future concern: data obsolescence.
Most data loses some relevance as events or circumstances change. How quickly data depreciates can depend on the underlying pace of change (for example, my hair gets longer by the day but grayer by the year), along with the amount of time elapsed, and whether newer, more accurate data is available. TOD data is no exception. The last National TOD Database, published in 2014, included 4,417 stations and 1,583 proposed stations based on data generated in 2011. It also used Census data that was gathered in the mid-2000s. Since then, many of the proposed stations have become operational, others have been newly proposed or closed, and more recent Census data is available. On the other hand, the vast majority of station locations, the modes that they serve, and the agencies that operate them are the same today as they were ten years ago.
The story of Washington DC’s NoMa-Gallaudet station neighborhood, one of the most dynamic transit-oriented communities in America, illustrates those aspects of our built environment that can change over time and which elements often stay the same.
NoMA-Gallaudet U and Neighborhood Change
Washington DC had plenty of inviting places to visit back in the 1990s, but the area around New York and Florida Avenues Northeast was not one of them. A tourist who wondered a few blocks behind Union Station would find a landscape of warehouses, rail tracks, vacant fields, parking lots, and a methadone clinic.
But in the mid-1990s the District’s population was beginning to increase after decades of decline. Developers and city planners saw potential for growth around New York and Florida Avenues and believed that a new Metro station would enhance this potential. In February 1999, property owners in the vicinity of a proposed station agreed, in principle, to pay $25 million for a new facility. The Federal government contributed $25 million and the remaining cost was split between the District of Columbia and a special tax assessment on surrounding communities. The New York Avenue station (later renamed NoMa-Gallaudet U), opened in November 2024-but not much changed right away. The photo below from 2005 shows a construction site on what was once an empty field, but the rest of the image appears similar to the one taken in 2000.
As of 2010, new development had come to the western side of the railroad tracks, including the headquarters of the Burau of Alcohol, Tobacco, and Firearms and several high rise apartment buildings.
By 2014, a report commissioned by the Urban Land Institute estimated that the 3.8 million square feet of office space, 183,000 SF of retail, 3,057 residential units and 622 hotel rooms were constructed within the report’s NoMa station study area. The image below, taken in 2015, shows additional development west of the railroad tracks with the neighborhoods east of the tracks relatively unchanged.
Fast-forward five years to 2020 and more housing and offices have sprung up on the west side of the tracks. And, for the first time, new development and construction is taking place on the east side.
As the 20th anniversary of the NoMa-Gallaudet U station approaches, the neighborhood continues to develop, with the buildings circled in red constructed in the past four years. The NoMa station is Metro’s 9th busiest, the area is a busy, mixed-use neighborhood, and planners are considering an additional entrance to improve station access.
And yet, even with so much change, much of the landscape has stayed the same. The neighborhood of row houses in the southeast corner of the images appear unchanged from year-to-year and the street grid and rail infrastructure are constant over time.
Since different elements of our built environment change more quickly than others, it makes sense to categorize the features in the Transit-Oriented Discoveries dataset, contemplate their rate of change, and plan updates accordingly.
Station Data
The chart below identifies data being collected on transit stations. New transit stations become operational and close on an ongoing basis. The names of stations also change from time to time. Recent examples include stadium stations re-named for a new corporate sponsor, stations that have shed the names of Confederate generals, and stations whose names were updated to honor local luminaries. It makes sense to perform a data update annually to capture new, discontinued, and re-named stations. A station’s size could also change as a result of an expansion or renovation, and it’s street address may change as street names change, but these updates are less frequent.
On the other hand, stations are, well, stationary, so their location shouldn’t need to be updated. They rarely serve different transit lines or routes, absent a rail infrastructure redevelopment, and it’s hard to imagine an underground station ever rising to street level, or vice versa. Checking in on these features once every ten years seems reasonable.
Land Cover and Land Uses
Over the next ten years, the Transit-Oriented Discoveries database can be used to track changes to station areas over time (ideally showing shifts towards more dense and mixed use development as occurred around the NoMa station). But here, as with station data, different land uses evolve at different speeds. It’s prudent to update data on the number of buildings, building footprint and building height annually to take into account new development and additional data plotted on OpenStreetMap. (As my prior blog post noted, some stations, primarily in suburban and exurban areas, are “undermapped” but this, hopefully, will improve over time). It also makes sense to update data on parking lots near transit as these may be prime development opportunities, and to update the presence of construction sites annually.
Other land uses change more gradually, if at all. Parks, recreation, woodlands, and farmland may be protected from development. The boundaries of rivers, lakes and oceans are relatively stable, global warming notwithstanding. Industrial areas may be resistant to redevelopment. Educational and healthcare campuses don’t typically pack up and leave, and the commercial, residential, and retail land uses designations on OpenStreetMap are both more complete than building data and less likely to change from year-to-year.
The Transportation Network
OpenStreetMap has robust transportation data and, as with land uses, different elements have different lifespans and capacity for change. It’s not too difficult, in the grand scheme of things to paint intersection crosswalks and add bike lanes so this data can be updated annually. Sidewalk additions may take more time but it’s worth checking in once a year as well to account for new sidewalks as well as additional tags in OpenStreetMap.
The Transit-Oriented Discoveries transportation dataset also includes businesses that primarily serve automobiles (such as gas stations, car washes, mechanic shops, and drive-through restaurants) in order to understand the extent to which neighborhoods around stations are oriented towards cars and drivers. Given that businesses can change rapidly, it makes sense to update this data annually.
Other transportation features, such as the street grid configuration and street size are more permanent. Highways, rail lines, and power lines last for generations. But nothing lasts forever so it’s prudent to run a query for updates once every ten years.
Amenities and Civic Infrastructure
TOD planners hope that station areas will become vibrant, bustling places and Transit-Oriented Discoveries can be used to track changes in commerce, the arts, and civic and health infrastructure. Amenities and points of interest also fall along a spectrum with places such as cafes, restaurants, bars, nightclubs and (sadly) child care centers having frequent turnover. Other destinations such as grocery stores, libraries, museums, and places of worship tend to be more stable, but a data refresh occurring every five years would likely uncover new venues and closed destinations, at least in some communities.
Demographic Change
Ultimately, TOD should improve the quality of life for people who live and work near transit stations. Tracking change in the number of people who live near a station along with jobs available near stations is one way to measure the impact of new development. Monitoring changes to home prices, rents, household incomes, race and ethnicity, and other demographic data can also help us better understand whether TOD is associated with gentrification.
It is possible to collect year-over-year changes on a wide range of demographic data via the the Census Bureau’s American Community Survey which annually publishes one-year estimates and rolling five-year estimates. However most demographic change in small areas such as census blocks located near stations occurs gradually and the results of an annual update may not be worth the effort needed to collect and synthesize the data. A population update once every five years should suffice.
Until the Future…
As I organize, analyze, and wrangle transit station data, I’ve been thinking through an approach to data governance with these blog posts serving as Federalist Papers. Data governance involves developing and implementing policies and processes to make sure data is managed effectively throughout its lifecycle. It involves establishing standards for data quality, to ensure data is accurate, consistent, and available for decision-making. Data governance will help end-users use Transit-Oriented Discoveries data responsibly and strategically. Data Lifecycle Management, ensuring data is regularly updated, archived, or deleted based on its relevance, is an important part of data governance. Planning for actions that need to be taken ten years, five years, or even one year from now is also an aspirational gesture that this project will be available and useful in the years to come.