TerraFly GeoCoder -- technical details Our geocoder is based on our technology that we developed with seed funding from the National Science Foundation, called the S-tree High Performance Spatial Data Structure. Our service uses our proprietary database compiled from a number of standard data sources for street address ranges that we intelligently merge together. Among them are a dataset we have from Navigational Technologies (NavStreets by NavTeq) and Tiger data sets. Our database is updated when new versions of input data are released. We are planning also to use the data from "MAF/TIGER Accuracy Improvement Project" once it becomes publicly available next year (see http://www.census.gov/geo/mod/maftiger.html). The data from the sources that we use is first converted into our own proprietary highly compressed format, where the street names, feature types, and abbreviations are converted into a single compact form. For example, the Tiger data is compressed from over 30GB to 0.6GB without any loss of relevant information (a factor of 50). Then, we merge the data sources together. In this merging process we remove redundant data, while giving priority to the data sources that have higher mean accuracy. For example, NavTeq data has much higher mean accuracy than Tiger. Thus, we add only the street addresses from Tiger that are missing from NavTeq. We also use comprehensive databases of city names, zip codes, and street feature abbreviations from USPS databases. The resulting database achieves the most accurate and complete street address coverage nationwide. The compactness of our database allows us to store most of the data in memory. This lets us perform address translation with extremely high performance. We can translate more than 1000 addresses per second. With this high performance we can afford to resolve ambiguous addresses by performing multiple searches on the input. For example, if the street could not be found within the specified zip code, the program would automatically expand the search range to the specified city or the cities where the specified zip belongs to. Or, if the street address contains garbage characters at the start or the end, the program would attempt to clean the input string and try different combinations until it can find a match in the database. The program can even tolerate small misspellings in the street names. For example it can find FonntaineblEAU blvd, even if the user misspelled the street name as FonntaineblUE: http://stree.cs.fiu.edu/street?street=7880%20%20Fontaineblue&city=miami%20fl If you add a debug=1 option to the command line, the program will show you the search progress. We will be adding a more comprehensive output to the service that will allow you to find exactly what street was found and where. We will also be adding a reverse lookup feature that will allow you to find the nearest street ranges to the point that you supply. The data is represented by address ranges that correspond to the physical street ranges stored in our database. When an exact house number is given, our program interpolates the coordinates of the house from the nearest street range found in the database. We also add a small offset from the street to indicate at which side of the street the house is located. The constants used in this interpolation process were tuned by using a large database of county property parcel shapefiles. We found that the mean accuracy of our service is about 30 meters from the property centroids in the test county and we continue to work on improving it. To assure hardware and system fault tolerance, our geocoder service is provided by an automatically-routed cluster of interchangeable servers containing replicas of the entire system and database.