National Transit Gazetteer and Atlas Technical Documentation

Overview

This document provides information on the National Transit Gazetteer and Atlas, including project background, data sources and methodology, and additional context to help people use the platform with confidence. Additional discussion on the methodology used in the project is located on the Transit-Oriented Discoveries Blog

Project Timeframe

The National Transit Gazetteer and Atlas was developed between July 1 2024-April 15, 2025. Transit agency and station data included in the platform represents fixed guideway modes (i.e. heavy rail, light rail, commuter rail, bus rapid transit, ferry, streetcar rail, monorail, incline plane, and ariel tramway) in operation as of September 30, 2024. We anticipate updating the platform in the fall of 2025 to incorporate stations that are newly opened or newly closed as of September 2025.

If you believe the platform is missing modes or stations, please contact connect@transitdiscoveries.com.

Transit Station Data Elements and Sources

The project’s database relies on General Transit Feed Specification GTFS data for station locations and associated transit routes and data from the National Transit Database NTD Facility Inventory for station age, size and configurations. Data from Wikipedia articles and transit agency websites were used to confirm information from both sources and to selectively fill in gaps, when necessary. Data on a station’s the street address, city, county, and State was generated using Google Maps and reverse geocoding. The chart below summarizes the data used in the Gazetteer’s station summaries:

A Note About Bus Rapid Transit (BRT)

The Bus Rapid Transit systems included in the Gazetteer and Atlas are documented in submissions to the National Transit Map hosted by the Bureau of Transportation Statistics or the 2023 National Transit Database Facility Inventory. In a few instances, (such as the Madison Rapid Line A and the King County G Line, both of which opened in the fall of 2024) data that had not yet been reported to either of these databases was included in this platform. Some users of the Gazetteer may not find the BRT in their community included on the map. This is likely because of a lack of standardization on how the mode is defined, leading to inconsistencies in what qualifies as "BRT”. If you believe a BRT system is missing from the Gazetteer and Atlas, please contact connect@transitdiscoveries.com.

Use of Artificial Intelligence (AI) for Station Summaries

This project uses OpenAI's GPT-3.5 model to craft a conversational, easy-to-understand summary of each station being queried. These summaries are grounded in the GTFS and NTD data used in the project along with the Wikipedia entries generated with separate code. AI summaries may also include information about station usage or characteristics of transit agencies and surrounding areas not included in the GTFS and NTD datasets. Much of the information provided by the AI chatbot can be verified by examining the corresponding station area map, the Wikipedia sources generated, or the data in the database. This platform uses AI as an interpretive layer, not an independent source of information. If you have concerns that any of the information in the paragraphs is incorrect, please contact connect@transitdiscoveries.com.

Use of Wikipedia

The platform uses the Wikipedia Application Programming Interface (API) along with the name of the transit station selected, the name of the agency selected, the agency’s location, and the term “transit station” to search Wikipedia for up to three relevant articles. This function includes a scoring system that prioritizes articles that mention both the station name and location in their title or early content. The articles are incorporated into the AI prompt to provide additional context and returned alongside the generated summary for users to explore further.

Users may find that Wikipedia returns irrelevant articles for some station searches. Reasons for this may include non-unique station names (such as the “Convention Center Station” could exist in multiple transit systems) and the search algorithm’s limitations.

In addition, the platform is more likely to generate Wikipedia articles for larger stations in big cities, such as subway stations then for smaller stations such as bus rapid transit lines in smaller communities. Larger metropolitan areas tend to have more comprehensive Wikipedia coverage from volunteer contributors. Regions with more active local historians, transit enthusiasts, or Wikipedia editors will have more comprehensive articles.  Stations with historical importance or unique architectural features are more likely to have dedicated Wikipedia pages

Use of OpenStreetMap Points of Interest

The platform connects with the Overpass API, which is part of the OpenStreetMap (OSM) ecosystem, to fetch nearby points of interest (POIs) dynamically. The query searches for amenities (such as public facilities), shops, and tourist attractions within 800 meters of the station longitude and latitude coordinates. The code the sorts the POIs by proximity and returns the closest three points of interest.

OSM points of interest are tagged in real-time so the three closest points of interest to a particular station today may not be the same a month from now.

Frequent Users of the National Transit Gazetteer and Atlas may notice that some stations are associated with somewhat trivial or quirky points of interest such as “a mailbox”  or “a statue”. This is a direct result of how OpenStreetMap is crowd-sourced and how different contributors document their local environments. Some contributors are extremely detailed, mapping even very minor features like individual mailboxes, street furniture, or small landmarks while others focus on more major landmarks. There are very few strict rules about what can or cannot be mapped.

In addition, OSM tends to have a documentation bias where affluent areas tend to have more detailed POI documentation and regions with active tech communities or university campuses often have more comprehensive mapping.

Nevertheless, we hope you find the OSM points of interest valuable insofar as they add “local color” to station area summaries and enhance the information included in the GTFS or NTD data. 

Use of Concentric Circle Overlays

The Gazetteer and Atlas incorporates allows users to visualize areas within a 200 meter, 400 meter, and 800 meter radius from the longitude and latitude coordinates of the  transit stations. These areas were chosen consistent with common practice for transit-oriented analysis with a 200 meter radius representing the station and immediate surrounding area, an 800-meter radius incorporating the furthest distance that most people are willing to walk to a station and ¼ mile (400 meter) providing an intermediary threshold. Future enhancements may provide more sophisticated information about walking distance to a stion that takes into account local barriers to access (such as rivers or highways).

Some platform users may notice that the dots representing station areas, along with heir concentric circles, are not precisely aligned with the station location on the OpenStreetMap base layer. This typically has to do with how station location coordinates are captured in GTFS stops data versus how OSM volunteers chose to render the station location on the map. Typically these differences are less than 100 meters, however if users spot larger discrepancies, please contact connect@transitdiscoveries.com.

Built Environment Data

The National Transit Station Gazetteer and Atlas contains a thumbnail sketch of the urban form development patterns around each station. The data underlying these summaries was queried using the OpenStreetMap API (overpass). The table below shows displays the data used in the summaries. This data is available to Transit-Oriented Discoveries subscribers.

Data on the number of buildings, building footprint, and building heights are also organized into three bands based on their distance from the transit station. Subscribers can analyze data on buildings within 200 meters, 400 meters, and 800 meters of a station to better understand how (if at all) urban form changes with distance from the transit hub.

Data on parking lots, garages, and square feet also identifies the number of lots and garages that are 2,000 square feet or greater. This data is a subset of the total parking data and is used to estimate the number of buildings that could be built within 1/2 mile of a transit station.

Use of AI for Built Environment Profile

The Transit-Oriented Discoveries platform creates a built environment profile for each station area in the dataset. This profile includes information such as total buildings and weighted average height; building density per hectare; coverage ratio and Floor Area Ratio (FAR); building typology based on footprint size; urban form description (e.g., dense, tall near station, etc); Development pattern: (e.g., clustered high-rises, town center, suburban, etc), height classification: (high-rise / mid-rise / low-rise). After all the building and development statistics are calculated, a prompt is created and passed to OpenAI’s API to generate a concise summary of the data. The AI acts as a data-literate urban planner, instructed to:

  • Report building counts, heights, densities

  • Describe how urban form changes with distance from the station

  • Identify whether the area is more urban core, town center, or suburban

  • Classify the diversity of building types

  • Use only verified numbers—no guessing or speculation

If you have concerns that any of the information in the paragraphs is incorrect, please contact connect@transitdiscoveries.com.

Built Environment Renderings

This web application visualizes transit stations and their surrounding urban environments in an interactive 3D interface. When a user selects a transit agency and station, the system makes API calls to a Flask backend, which retrieves geographic data from OpenStreetMap's Overpass API. This data includes detailed information on buildings, parking structures, and other urban elements within 800 meters of the selected station. The application processes this data to extract critical attributes like building heights (calculated from explicit height tags or estimated from building levels) and parking designations. The processed data is then rendered in the browser using the deck.gl visualization framework, which creates a three-dimensional map with color-coded layers: red for the station location, gray for buildings, and light red for parking structures. Distance rings at 200, 400, and 800 meters help users understand spatial relationships.

Under-mapped Station Areas

The building information in the platform comes from OpenStreetMap data, which is created and maintained by a global community of volunteers. OpenStreetMap (OSM) completeness varies by location, and areas with fewer active contributors often have incomplete or outdated building data (especially heights, footprints, or even presence of buildings). This can impact both the raw statistics and the AI-generated summaries derived from them. As a result, the statistics and summaries may underestimate the actual amount of development near some transit stations.

The transit-oriented-discoveries dataset includes a flag identifying which station areas are “under-mapped” where building data has not yet been added to the OSM database. The land use summaries for these stations contain a sentence noting that building information is incomplete. Under-mapped stations were identified based on a visual inspection of all 4,900 station areas. About 15% of the stations in the database are under-mapped and data gaps are more frequent in stations serving suburban and exurban areas (such as commuter rail). In addition, a column in the “urban form” dataset identifies the number of buildings that were tagged with height data can help users identify the extent to which development profiles may lack complete building height information.

Housing Estimates

The Gazetteer and Atlas provide estimates of the number of housing units that could be built on surface parking and parking garages within 1/2 mile of a station. These estimates are zoning-agnostic in that they do not take into account existing regulations on building height, setbacks, or land uses. They also assume no minimum parking requirements within 1/2 mile of a station. The housing estimates limit development to surface parking and/or parking garages that are 2,000 meters or more under the assumption that building on larger lots may be more straightforward. The estimates also restrict the height of new housing to no higher than the tallest building located within 1/2 mile of the station area (i.e if the tallest building is 2 stories, any new development would not exceed 2 stories, but if the tallest building is 44 stories, a new apartment building could rise up to that height).

The Gazetteer and Atlas also provide “conservative” and “aggressive” housing scenarios based on building height, lot coverage, space for public and commercial use, and housing unit size. The conservative estimate assumes that a new building height would not exceed 60% of the tallest nearby building and that a new development occupies 75% of the lot, which accounts for setbacks, landscaping, walkways, and driveways. This estimate also assumes 20% of a building is reserved for commercial/public use such as Commercial/retail on the ground floor, lobbies, stairwells, elevators, and community facilities (e.g., daycares, offices, civic space). Finally, the conservative scenario assumes each housing unit is 1,000 square feet.

In contrast, the aggressive scenario assumes building heights at 100% of the tallest nearby building, 85% lot coverage, 10% of space reserved for public use, and each housing unit is 800 square feet.

These estimates can translate into hundreds of thousands of new housing units around stations, especially those surrounded by large amount of surface parking and existing high-rise buildings. It is unlikely that any station area will be entirely redeveloped to the parameters used here. Rather, the scenarios illustrate the potential for context-sensitive development near transit and how future development may vary depending on the surrounding built environment.

Please contact connect@transitdiscoveries.com for additional questions about this methodology or to discuss modeling different scenarios based on additional or new parameters.

Estimating Housing Across Multiple Stations

Transit-Oriented Discoveries subscribers also have access to data that estimates the total number of housing that can be built across multiple stations, such as a transit line, a transit system, zip code, city, county, or state. The methodology for these housing estimates is the same as described above except that calculations pro-rate the amount of parking over 2000 square feet associated stations that are located within 1/2 mile of one another to avoid double counting. Approximately 2,700 out of 4,900 stations are located within 1/2 mile of another station. In these instances, The Transit-Oriented Discoveries algorithm identifies the number of other stations within 1/2 mile of the reference station and pro-rates the parking associated with the reference station by the number of other stations. For example, if a reference station is located within 1/2 mile of two other stations that station’s total parking square footage is assigned a coefficient of 0.333 and the amount of housing is based off of the new (smaller) parking area.

Accessing the Full Platform and Underlying Data

The publicly available version of the National Transit Station Gazetteer and Atlas contains data and visuals of 1 out of 5 stations in the United States. Click here to access the full platform and underlying data.

Future Updates

Transit-oriented discoveries is a work in progress, and we plan to add additional functionality based on user experience and feedback. Planned features include transportation network data and analytics to determine the extent to which transit areas are safe and attractive for pedestrians and cyclists, information on additional land uses, such as residential, commercial, retail, and industrial development, data on points of interest and civic infrastructure such as schools, hospitals, and entertainment near stations, and demographic information. Feel free to reach out to connect@transitdiscoveries.com with suggestions for new features.