Ensuring Accurate Station Location is my Latest Vocation

In my last blog post, I wrote about creating an inventory of transit stations in the United States, including the challenge of reconciling station data in different spreadsheets. Although I was able to overcome this obstacle, I also know that a complete list of stations would not be very useful if some of the key information, in particular the station’s location, is incorrect. This post describes the critical work of verifying station locations. I briefly explain longitude and latitude coordinates, summarize challenges encountered in my past work, and describe a tool I built to quickly visualize and verify station locations. I’ll also wax a bit about the benefits of close observation and  give you the chance to try out a prototype of my station selector.

Longitudes and Latitudes

The precise location of transit stations (along with other buildings) is captured by a pair of longitude/latitude coordinates. (Latitude measures north-south position on the Earth's surface, with the equator at 0° and the poles at 90° north and south. Longitude measures east-west position, with the Prime Meridian-Greenwich, England-at 0° and values increasing up to 180° east and west). Transit agencies collect and validate coordinate data on their stations via field surveys, hand-held devices, aerial surveys, satellite imagery, existing public data sets, or Geographic Information System (GIS) software tools. Once agencies collect coordinate points, they can verify them for accuracy and format the data according to the General Transit Feed Specification (GTFS) and make the data available to developers, researchers, and the public. Transit-Oriented Discoveries uses station data that originated in GTFS Stops files which includes station names, coordinates, and other information.

Once Bitten…

The table below displays coordinates for four Long Island Rail Road (LIRR) stations. The information is precise—the number shown in the fifth digit to the right of the decimal point identifies the location of an object within 3 feet 7 inches. But do they represent the actual location of the station? Can you spot which coordinates are off? Unlike the columns of station names shown in my prior post where someone could identify stray characters, words in different orders, or abbreviations, it’s virtually impossible to spot coordinates that do not accurately represent a geographic location.

In prior work on this project, I used lon/lat coordinates reported by transit agencies reporting data to the 2022 NTD Facility Inventory. Agencies entered the data, which may or may not have been the same as the station’s GTFS coordinates, via text boxes in an electronic form. (Some agencies did not report coordinates at all so I geocoded them from the station address). I initially took the data at face value until I began plotting them against station location information shown on OpenStreetMap. The images below show some of the results.

Transit agency coordinates of four LIRR stations plotted against the station locations in OpenStreetMap. The blue circle represents a 1/4 mile radius around the station location.

I suspect that coordinate data from the transit agencies’ GTFS file are more accurate the data submitted in the free text fields from the NTD facility inventory, but I need to be sure because accurate station location data is critical. Most of the data in Transit-Oriented Discoveries will come from OpenStreetMap (OSM) queries of land uses transportation features, and civic infrastructure within various distances (200 meters, 400 meters, 800 meters) of each station. If the station coordinates are off, then the OSM data in the catchmnent areas and whatever conclusions that are drawn from them will be suspect.

My risk mitigation approach is to visually inspect every station in my dataset to confirm that the lon/lat coordinates are consistent with the location of the station on the OSM base map. This is a time consuming but feasible approach. There are around 5,000 stations in the Transit-Oriented Discoveries dataset and it will take a few weeks to get through them all.

Building a Station Selector

Although I do not want to rush through the process of station verification, I need a way to quickly overlay station coordinates onto a base map so that the process of loading and displaying coordinates can happen almost instentaniously. To do this, I built a station selector using Python code.

I started by importing several libraries and tools into my coding notebook. Ipywidges is used to create interactive widgets in Jupyter notebooks. IPython display provides tools to display objects in IPython environments. Folium is a Python wrapper for Leaflet.js, which allows you to create Leaflet maps directly from Python code.(Leaflet is an open-source JavaScript library for mobile-friendly interactive maps). Each of these tools will contribute to the final product.

Next, I wrote a function to identify station names in a dataset and extract the longitude and latitude coordinates of the station. If someone entered a station name that was not found in the database (using an incorrect spelling, for example) it will trigger a message to enter a valid station name. If a person entered no name, the code will return a message that the station name is not found.

The next lines of code add and display a base map that coincides with the station coordinates. (The default base map is OpenStreetMap Carto template, the same source that I’ll be using for future data queries). I set the zoom level to 16 to allow a user to see both the station coordinates and a circle with a 1/2 mile radius (approximately 804 meters) around the station. I also added a small, bright red dot to mark the lon/lat coordinates on the map.

Finally, I wrote code to upload a dataset and created features that let someone interact with the data and map: a text box where someone can type the name of a station, a button labeled "Show Map” that displays the map of the area around the specified train station, code that clears any old map and then uses the name you entered in the box to find and display the map of the specified station, and, finally, code connects the button to its defined functionality, so when someone clicks it, the map is displayed.

Putting it all together, the code provides a “front end” for me to quickly enter a station name, hit “show station” and have the station coordinates and circle representing the 1/2 mile radius display in a second. (The plus and minus buttons allow me to zoom in for more details on the station location and surrounding areas or to zoom out to see the wider area where the station is located).

Try it For Yourself

I’ve developed a simple web application using the code I described here along with Streamlit an open-source tool framework used to create web applications in Python. It is a good tool for rapid prototyping and sharing because it doesn’t require any website building experience. I’ve included some modifications, including a drop-down list of stations, some code to run the Station Selector in Streamlit and a feature to provide a summary sentance or two about each station based in the combined NTA/NTD database. You can find the complete code and station .csv file on my GitHub page.

And Now for a Few Caveats

When it comes to station location accuracy, how close is close enough? Consider the two images below of the St. James station on the LIRR Port Jefferson Branch. The image on the left shows the GTFS coordinates about a block away from the OSM station location. (I am making an assumption the OSM coordinates are correct. Ultimately we need some “ground truth”, pun intended). The image on the right shows the station coordinates and OSM location after I made a manual correction. When it comes time to query OSM data on the land cover within the 1/2 mile circle, a query run on the image on the left would pick up some of the sports pitch and park at the bottom left and a query run on the image on the right would add a bit more of the forest along North Country Road. Would the difference matter? Probably not, but I like to make adjustments so that users viewing station coordinates will be confident that the images are “on point.”

That said, there’s nothing sacred about the 1/2 mile circle around a station. This radius corresponds to an approximately a 10-minute walk and based on the idea that people are willing to walk this distance to access public transportation. It’s become an informal standard in urban planning but people’s interest and ability to walk to transit can vary.

In addition, the station lon/lat coordinates mark the center of a station, not necessarily the location of the station entrance(s). Consider this schema of the LIRR Atlantic Avenue station which also serves New York City Transit subway lines. A circle that covers 1/2 mile from the station’s Hanson Place exit (center) would be different from the 1/2 mile radius around the Pacific Avenue and 4th Street exit (bottom left) or the Barclays Center exit (bottom right). However, plotting the coordinates of each station exit would get complicated fast and a single lon/lat coordinate pair for the entire station will suffice.

Image from Project Subway NYC

Visual Intelligence

In her book, Visual Intelligence: Sharpen Your Perception, Change Your Life art history professor Amy Herman writes about interacting with works of art: “the longer and more attentively we look, the more we will discover…..Invention is less about creation than it is about discovery, and discovery is made possible by simply opening our eyes, turning on our brains, turning in, and paying attention.” (p. 22).

I’ve attempted to take Herman’s words to heart by taking 30 seconds or so to simply look at each station image after I’ve verified the coordinates. I note the street grid layout, the land uses and land forms, whether the surrounding area appears to be densely or sparsely populated, and any unusual land uses for typically urban areas, such as the vineyard or rock quarry nearby.

I’ve also paid close attention to apparent gaps in OpenStreetMap data in the 1/2 mile catchment area. Here is a closeup of the St. James station. Notice the area above Moriches road that includes Applewood Road and Floral Lane. It’s likely that there is a residential development here but it has yet to be mapped by OSM’s corps of volunteer mappers. As a result, a query of the number of buildings and average building footprint around the station would fall short of reality.

Fortunately, OpenStreetMap can be updated by anyone with motivation and a brief tutoral. Why wait for someone to come along and make a change when I can do it myself? I’ll try my hand at adding to the map and will report back in my next update.

Previous
Previous

Filling Data Gaps In OpenStreetMap

Next
Next

To Produce Transit-Oriented Discoveries, Reduce Station Data Discrepancies