I have been using a commercial solution for route distances and travel times for North America and Western/Mid Europe. I am considering expanding the project to cover other countries - and perhaps the entire world. A very limited budget and patchy regional coverage from individual commercial providers, probably make locally-hosted OpenStreetMap the only viable option. Before someone suggests an online solution, my application requires a lot of intensive route calculation - something which would cost a lot or be very impolite (and probably banned) if performed using a web service. The results of the calculations are put back in the public domain, so rediting OpenStreetMaps is not a problem.
My problem is how do I assess the routing data coverage for individual countries in the OpenStreetMap database? Such an assessment could determine if the project is viable, and a suitable order for processing (ie. do the countries with the best coverage first).
High-end commercial data providers can typically supply statistical descriptions, as well as regional descriptions of surveyed coverage. OpenStreetMap is much more patchy - an area typically includes some roads, but not all roads. Individual location errors of a few metres of even 10-20m will not be a problem for my application (I'm looking at city-city distances), but route graph connectivity is. Ie. the road vectors must logically meet correctly at a junction.
Has anyone attempted to create statistics describing data coverage of the OpenStreetMap database?
If not, how would you go about it?
The best I can think of is to take a random sampling of places (eg. cities), and then attempt to calculate routes. There would have to be an assumption that the major roads will tend to be added before the minor roads. Therefore a route between two distant cities would use the logical major road, and not a minor road (which is typically longer/slower) because the major road is missing.
Another problem would be that it is physically not possible to drive between many towns. Often this is due to the presence of islands (where ferries could be used) but often there is no surface route (eg. settlements in Nunavut). So how would such statistics be used when comparing between (say) Tonga and Afghanistan. Afganistan probably has very low data coverage. Tonga is probably better but the settlements are spread out across an archipelago.
Some details about my application: All start and end points are towns and cities with locations taken from the Geonames database. Typically I am looking at the 1000 largest cities in a country that also have a population of at least 1000. Routes are currently calculated in duplicate as both fastest routes and shortest routes. Reasonable road speeds vary according to broad road categories. Estimated travel times are computed alongside road distances. These details are preferences for consistency- they are not set in stone.