Address geocoding
linking "Topologically Integrated Geographic Encoding and Referencing"; quick cleanup per WP:NOPIPE
| ← Previous revision | Revision as of 12:19, 19 April 2026 | ||
| Line 4: | Line 4: | ||
{{Use dmy dates|date=January 2020}} |
{{Use dmy dates|date=January 2020}} |
||
'''Address geocoding''', or simply '''geocoding''', is the process of taking a text-based description of a location, such as an [[ |
'''Address geocoding''', or simply '''geocoding''', is the process of taking a text-based description of a location, such as an [[address]] or the name of a [[location (geography)|place]], and returning [[geographic coordinates]] (typically the latitude/longitude pair) to identify a location on the Earth's surface.{{cite book |author1=Leidner, J.L. |title=International Encyclopedia of Geography |chapter=Georeferencing: From Texts to Maps |volume=vi|year=2017|pages=2897–2907|doi=10.1002/9781118786352.wbieg0160 |isbn=9780470659632}} ''[[Reverse geocoding]]'' on the other hand converts [[geographic coordinates]] to the description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street / road network, together with postal and administrative boundaries. |
||
* Geocode (''verb''):"Geocode" term as a verb, as defined by Oxford English Dictionary at https://en.oxforddictionaries.com/definition/geocode {{Webarchive|url=https://web.archive.org/web/20180426213141/https://en.oxforddictionaries.com/definition/geocode |date=26 April 2018 }} provide geographical coordinates corresponding to (a location). |
* Geocode (''verb''):"Geocode" term as a verb, as defined by Oxford English Dictionary at https://en.oxforddictionaries.com/definition/geocode {{Webarchive|url=https://web.archive.org/web/20180426213141/https://en.oxforddictionaries.com/definition/geocode |date=26 April 2018 }} provide geographical coordinates corresponding to (a location). |
||
| Line 32: | Line 32: | ||
The end of the 20th century had seen geocoding become more user-oriented, especially via open-source GIS software. Mapping applications and [[Geospatial analysis|geospatial data]] had become more accessible over the Internet. |
The end of the 20th century had seen geocoding become more user-oriented, especially via open-source GIS software. Mapping applications and [[Geospatial analysis|geospatial data]] had become more accessible over the Internet. |
||
Because the mail-out/mail-back technique was so successful in the [[ |
Because the mail-out/mail-back technique was so successful in the [[1980 census]], the U.S. Bureau of Census was able to put together a large geospatial database, using [[interpolated]] street geocoding.{{Cite web|url=https://www.ncjrs.gov/html/nij/mapping/ch4_3.html|title=Spatially enabling the data: What is geocoding?|website=National Criminal Justice Reference Service|access-date=2016-06-22}} This database – along with the Census' nationwide coverage of households – allowed for the birth of [[Topologically Integrated Geographic Encoding and Referencing]] (TIGER). |
||
Containing address ranges instead of individual addresses, TIGER has since been implemented in nearly all geocoding software platforms used today. By the end of the [[1990 United States census|1990 census]], TIGER "contained a [[Latitude and longitude|latitude/longitude]]-coordinate for more than 30 million feature intersections and endpoints and nearly 145 million feature 'shape' points that defined the more than 42 million feature segments that outlined more than 12 million polygons."{{Cite web|url=http://census.maps.arcgis.com/apps/MapJournal/index.html?appid=2b9a7b6923a940db84172d6de138eb7e|title=25th Anniversary of TIGER|website=census.maps.arcgis.com|access-date=2016-06-22}} |
Containing address ranges instead of individual addresses, TIGER has since been implemented in nearly all geocoding software platforms used today. By the end of the [[1990 United States census|1990 census]], TIGER "contained a [[Latitude and longitude|latitude/longitude]]-coordinate for more than 30 million feature intersections and endpoints and nearly 145 million feature 'shape' points that defined the more than 42 million feature segments that outlined more than 12 million polygons."{{Cite web|url=http://census.maps.arcgis.com/apps/MapJournal/index.html?appid=2b9a7b6923a940db84172d6de138eb7e|title=25th Anniversary of TIGER|website=census.maps.arcgis.com|access-date=2016-06-22}} |
||
| Line 39: | Line 39: | ||
=== 2000s === |
=== 2000s === |
||
The early 2000s saw the rise of [[Coding Accuracy Support System |
The early 2000s saw the rise of [[Coding Accuracy Support System]] (CASS) address standardization. The CASS certification is offered to all software [[vendor]]s and advertising mailers who want the [[United States Postal Service]] (USPS) to assess the quality of their address-standardizing software. The annually renewed CASS certification is based on [[delivery point]] codes, ZIP codes, and ZIP+4 codes. Adoption of a CASS certified software by software vendors allows them to receive discounts in [[bulk mailing]] and shipping costs. They can benefit from increased accuracy and efficiency in those bulk mailings, after having a certified database. In the early 2000s, geocoding platforms were also able to support multiple datasets. |
||
In 2003, geocoding platforms were capable of merging postal codes with street data, updated monthly. This process became known as "conflation". |
In 2003, geocoding platforms were capable of merging postal codes with street data, updated monthly. This process became known as "conflation". |
||
| Line 45: | Line 45: | ||
Beginning in 2005, geocoding platforms included parcel-centroid geocoding. Parcel-centroid geocoding allowed for a lot of precision in geocoding an address. For example, parcel-centroid allowed a geocoder to determine the centroid of a specific building or lot of land. Platforms were now also able to determine the elevation of specific [[parcel (package)|parcels]]. |
Beginning in 2005, geocoding platforms included parcel-centroid geocoding. Parcel-centroid geocoding allowed for a lot of precision in geocoding an address. For example, parcel-centroid allowed a geocoder to determine the centroid of a specific building or lot of land. Platforms were now also able to determine the elevation of specific [[parcel (package)|parcels]]. |
||
2005 also saw the introduction of the [[ |
2005 also saw the introduction of the [[Assessor's Parcel Number]] (APN). A jurisdiction's [[tax assessor]] was able to assign this number to parcels of real estate. This allowed for proper identification and record-keeping. An APN is important for geocoding an area which is covered by a gas or oil lease, and indexing property tax information provided to the public. |
||
In 2006, Reverse Geocoding and reverse APN lookup were introduced to geocoding platforms. This involved geocoding a numerical point location – with a [[ |
In 2006, Reverse Geocoding and reverse APN lookup were introduced to geocoding platforms. This involved geocoding a numerical point location – with a [[longitude and latitude]] – to a textual, readable address. |
||
2008 and 2009 saw the growth of interactive, user-oriented geocoding platforms – namely MapQuest, Google Maps, Bing Maps, and Global Positioning Systems (GPS). These platforms were made even more accessible to the public with the simultaneous growth of the mobile industry, specifically smartphones. |
2008 and 2009 saw the growth of interactive, user-oriented geocoding platforms – namely MapQuest, Google Maps, Bing Maps, and Global Positioning Systems (GPS). These platforms were made even more accessible to the public with the simultaneous growth of the mobile industry, specifically smartphones. |
||
| Line 75: | Line 75: | ||
The third component is software that matches each geocode in the input dataset to the attributes of a corresponding feature in the reference dataset. Once a match is made, the location of the reference feature can be attached to the input row. These algorithms are of two types: |
The third component is software that matches each geocode in the input dataset to the attributes of a corresponding feature in the reference dataset. Once a match is made, the location of the reference feature can be attached to the input row. These algorithms are of two types: |
||
; Direct match |
; Direct match |
||
: The geocoder expects each input item to directly correspond to a single entire feature in the reference dataset. For example, a country or zip code, or matching street addresses to building point reference data. This kind of match is similar to a relational [[ |
: The geocoder expects each input item to directly correspond to a single entire feature in the reference dataset. For example, a country or zip code, or matching street addresses to building point reference data. This kind of match is similar to a relational [[table join]], except that geocoder algorithms usually incorporate some kind of uncertainty handling to recognize approximate matches (e.g., different capitalization or slight misspellings). |
||
; Interpolated match |
; Interpolated match |
||
: The geocode specifies not only a feature, but some location within that feature. The most common (and oldest) example is matching street addresses to street line data. First the geocoder parses the street address into its component parts (street name, number, directional prefix/suffix). The geocoder matches these components to a corresponding street segment with a number range that includes the input value. Then it calculates where the given number falls within the segment's range to estimate a location along the segment. As with the direct match, these algorithms usually have uncertainty handling to handle approximate matches (especially abbreviations such as "E" for "East" and "Dr" for "Drive"). |
: The geocode specifies not only a feature, but some location within that feature. The most common (and oldest) example is matching street addresses to street line data. First the geocoder parses the street address into its component parts (street name, number, directional prefix/suffix). The geocoder matches these components to a corresponding street segment with a number range that includes the input value. Then it calculates where the given number falls within the segment's range to estimate a location along the segment. As with the direct match, these algorithms usually have uncertainty handling to handle approximate matches (especially abbreviations such as "E" for "East" and "Dr" for "Drive"). |
||
| Line 105: | Line 105: | ||
* Most interpolation implementations will produce a point as their resulting address location. In reality, the physical address is distributed along the length of the segment, i.e. consider geocoding the address of a [[shopping mall]] – the physical lot may run a distance along the street segment (or could be thought of as a two-dimensional space-filling polygon which may front on several different streets — or worse, for cities with multi-level streets, a three-dimensional shape that meets different streets at several different levels) but the interpolation treats it as a singularity. |
* Most interpolation implementations will produce a point as their resulting address location. In reality, the physical address is distributed along the length of the segment, i.e. consider geocoding the address of a [[shopping mall]] – the physical lot may run a distance along the street segment (or could be thought of as a two-dimensional space-filling polygon which may front on several different streets — or worse, for cities with multi-level streets, a three-dimensional shape that meets different streets at several different levels) but the interpolation treats it as a singularity. |
||
A very common error is to believe the accuracy ratings of a given map's geocodable attributes. Such accuracy as quoted by vendors has no bearing on an address being attributed to the correct segment or to the correct side of the segment, nor resulting in an accurate position along that correct segment. With the geocoding process used for [[ |
A very common error is to believe the accuracy ratings of a given map's geocodable attributes. Such accuracy as quoted by vendors has no bearing on an address being attributed to the correct segment or to the correct side of the segment, nor resulting in an accurate position along that correct segment. With the geocoding process used for [[U.S. census]] TIGER datasets, 5–7.5% of the addresses may be allocated to a different [[census tract]], while a study of Australia's TIGER-like system found that 50% of the geocoded points were mapped to the wrong property parcel.{{cite journal |author=Ratcliffe, Jerry H. |
||
|title=On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal units |
|title=On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal units |
||
|journal=International Journal of Geographical Information Science |
|journal=International Journal of Geographical Information Science |
||
| Line 128: | Line 128: | ||
==Other techniques== |
==Other techniques== |
||
In rural areas or other places lacking high quality street network data and addressing, [[GPS]] is useful for mapping a location. For traffic accidents, geocoding to a street intersection or midpoint along a street centerline is a suitable technique. Most highways in developed countries have [[Milestone|mile markers]] to aid in emergency response, maintenance, and navigation. It is also possible to use a combination of these geocoding techniques — using a particular technique for certain cases and situations and other techniques for other cases. |
In rural areas or other places lacking high quality street network data and addressing, [[GPS]] is useful for mapping a location. For traffic accidents, geocoding to a street intersection or midpoint along a street centerline is a suitable technique. Most highways in developed countries have [[Milestone|mile markers]] to aid in emergency response, maintenance, and navigation. It is also possible to use a combination of these geocoding techniques — using a particular technique for certain cases and situations and other techniques for other cases. |
||
In contrast to geocoding of structured postal address records, [[ |
In contrast to geocoding of structured postal address records, [[toponym resolution]] maps place names in unstructured document collections to their corresponding spatial footprints. |
||
* [[Place code]]s offer a way to create digitally generated addresses where no information exists using satellite imagery and machine learning, e.g., [https://www.technologyreview.com/s/612492/four-billion-people-lack-an-address-machine-learning-could-change-that/ Robocodes] |
* [[Place code]]s offer a way to create digitally generated addresses where no information exists using satellite imagery and machine learning, e.g., [https://www.technologyreview.com/s/612492/four-billion-people-lack-an-address-machine-learning-could-change-that/ Robocodes] |
||