Spatial SQL for the Geographer – STDistance Example

When I put together the Spatial SQL for the Geographer posts I never wrote anything about measuring distance.  Geographers often need to know the distance between locations, as distance decay is one of those things we find important.  Fortunately for us, measuring distance between points is pretty easy using Spatial SQL (and pretty fast).  The following query demonstrates an example of measuring distance between a set of defined points from the populated places data available in the sample data.

select 
t1.Id as ID, 
t1.NAMEASCII as t1_Name, 
t1.SOV0NAME as t1_SOV0NAME, 
t1.Latitude as t1_Latitude, 
t1.Longitude as t1_Longitude, 
t2.ID as t2_ID, t2.NAMEASCII as t2_Name, 
t2.SOV0NAME as t2_SOV0NAME, 
t2.LATITUDE as t2_Latitide, 
t2.LONGITUDE as t2_Longitude,
Geography::Point(t1.Latitude, t1.Longitude, 4326).STDistance(Geography::Point(t2.LATITUDE, t2.LONGITUDE, 4326)) as Distance_Meters
from dbo.Populated_Places t1 cross join dbo.Populated_Places t2
where t1.SOV0NAME = 'United States' and  t2.SOV0NAME = 'United States'

You will notice that in the query a few different things are happening:

  • A self cross join is used to take each record and join it to every other record.  This is done to get the lat/long combo for each city and matched up with every other city.  There is probably a better way to do this, but for this example it works.
  • In order to calculate distance two points are generated on the fly from the columns by calculating the point with Geography::Point(LAT, LONG, SRID).  Setting the SRID is important, as that will impact the measurement.  More often than not the user will be using an SRID of 4326 (WGS84), unless they are using projected data.  For more info on SRIDs check out this handy Wikipedia article.
  • You could replace Geography with Geometry when generating the points.  Doing so would lead to vastly different measurement results because of the way distance is calculated on the plane or on the spheroid.  When using the Geography data type the distance results are returned in meters while with the Geometry data type the results are reported in decimal degrees.
  • Once the points are created on either side of the STDistance method the query can be run.  The documentation provides a couple notes about using STDistance. The SRIDs need to match between the points and there is some error involved when measuring distance.  The distance measurement error is important to understand if one has tight tolerance they need to adhere to.

The results of this query will return a set of columns with the measurement in meters from every record to every other record for US cities.  With some additional coding you can generate a distance matrix, find the nearest locations, or create line features between points pretty quickly.  This query can also go pretty fast, if the proper indexes are generated and called in the query it should take less than a minute.  Not to bad when returning over 600k records.