The concept of topology isn’t something that every spatially enabled person fully understands. That is OK, because I too had to learn (and relearn) how spatial topology works over the years, especially early on back in the ArcView 3.X days. I think this experience is fairly typical of someone who uses GIS. If one is taking a GIS course or a course that uses GIS it is not very often that the concept of spatial topology is covered in-depth or at all. Spatial topology also may not be something that people are overly concerned about during their day-to-day workflow, meaning they may let their geospatial topology skills slide from time to time. As a public service here is a basic overview of geospatial topology.
First question: What is topology?
You have probably heard the term topology before, whether it was in a GIS course where the instruction lightly glazed over the topic, or in a geometry/mathematics course.
Technically speaking, topology is a field of mathematics/geometry/graph theory, that studies how the properties of a shape remain under a number of different transformations, like bending, stretching, or twisting. The field of topology is well established within mathematics and far more complicated than I wish to get in this post.
Second question: How does topology relate to GIS and spatial analysis?
Spatial analysis is at its core an analysis of shapes in space. Geospatial topology is used to determine and preserve the relationships between shapes in the vector data model.
The GISsoftware we use for analysis and data storage incorporates a set of “topological rules” to define how vector objects are stored and how they can interact with each other. These rules can dictate how nodes interact within a network, how the edges or faces of polygons coexist, or how points are organized across space.
Back in the “olden-days” (which was before “my time”) GIS users, particularly ArcInfo users, were well versed in geospatial topology because of the coverage. The coverage data model, a precursor to today’s ubiquitous shapefile format, was unique in that topology was stored within the file. This data format allowed users a certain set of controls to the spatial relationships within the dataset that later went away with the shapefile. The shapefile is not a topologically valid dataset, as geometric relationships are not enforced. For example, how may of you have downloaded (or bought) a shapefile from a data provider and it was FULL of slivers? In the Esri world geospatial topology came back with the geodatabase, and has been incorporated into a number of other geospatial data formats including spatial databases supported by Oracle, PostGIS (2.0) and SpatiaLite.
Today, topology is important in geodatabase design (for those who pay attention to it!), and data creation/editing. By understanding the set of geospatial topology rules and creating topologically sound data, the user can have a level of trust in their data during analysis.
The first three parts of the guide give some background information on SQL Server, covers some basic spatial and database topics, and provides links to download some software and data. Once the user has completed the first three part of this guide they will be able to understand this tutorial, which will provide some additional basic spatial SQL scripts.
Tutorial
In the last tutorial several scripts were introduced and examples were given. In the last tutorial I covered the process to validate shapes, calculating centroids, and generate buffers. In this tutorial several additional scripts will be covered, including generating the bounding box/extent of a feature, intersection and union operations, and difference operators.
Calculate Extent
Calculating the extent of a feature is something I use from time to time. To find these values, such as the northern bounding latitude or eastern bounding longitude, can be calculated through Spatial SQL. Spatial SQL has a built in method that will calculate the bounding box of a record, STEnvelope(). In its simplest context the user can generate the bounding box for any feature in a spatial table:
use Spatial_Database
go
select
geom.STEnvelope() as boundingbox
from dbo.States_Provinces
where NAME_1 = 'Massachusetts'
This example will return a field called “boundingbox” for the record where NAME_1 equals ‘Massachusetts’ that contains the minimum area rectangle that encompasses the shape. The data in the “boundingbox” field is relatively useless the user adds some additional code to the query. Building on the example provided by Microsoft the user can add the ToString() method to the query:
use Spatial_Database
go
select
geom.STEnvelope() as BoundingBox,
geom.STEnvelope().ToString() as BoundingBoxString
from dbo.States_Provinces
where NAME_1 = 'Massachusetts'
Here the user will see a new column, “BoundingBoxString” that returns a polygon with the five points that represent it. You may ask yourself, why are there five points when you only need four to create the bounding box? Well, for SQL Server to create a polygon the first point needs to be represented twice, so that the polygon closes itself properly.
Intersect
The intersect table operator is one of the most popular tools in the GIS work belt. Intersect is an overlay operation, where the area of the input feature is used as the basis to select features from the second feature, the intersect feature. The following illustration will explain what an intersect is better than I can with words…
In spatial SQL the STIntersects() tool is straight forward and easy to use. However, it does not return a shape as other overlay methods do. When run, STIntersects() will return a value of one if the two records spatially intersect and a zero if they do not. Also, this method only tests the relationship between a single record in one table against a single record in another table. The following provides an example of intersecting a single line object against a polygon object:
use Spatial_Database
go
DECLARE @poly_geom geometry
select @poly_geom = geom.STAsText()
From dbo.States_Provinces
where dbo.States_Provinces.name_1 like '%Massachusetts%'
DECLARE @line_geom geometry
select @line_geom = geom.STAsText()
from dbo.Roads
where ID = 2974
SELECT @poly_geom.STIntersects(@line_geom) as IntersectValue, @poly_geom as input1, @line_geom as input2
The Massachusetts record form the State and Provinces table is set as the polygon of interest and a single line object from the roads table is also selected. Now, I know that the line object is in Massachusetts so the value returned in IntersectValue field should be a value of one.
Contains
Another spatial overlay method available in spatial SQL is STContains(). This is a yes/no operation. If a geometry instance is completely within another geometry a value of one is returned. If the geometry instance is not wholly inside the other, a value of zero is returned. The following provides a simple example of STContains():
use Spatial_Database
go
DECLARE @poly_geom geometry
select @poly_geom = geom.STAsText()
From dbo.States_Provinces
where dbo.States_Provinces.name_1 = 'Massachusetts'
DECLARE @pointgeom geometry
select @pointgeom = geom.STAsText()
from Spatial_Database.dbo.Populated_Places
where NAMEASCII = 'Boston'
SELECT @poly_geom.STContains(@pointgeom) as ContainValue, @pointgeom as input1, @poly_geom as input2
In this query I am simply testing if the point that represents the city of Boston is within the polygon that represents the state of Massachusetts. After all the variables are declared the select statement tests the point against the polygon. The results, in the ContainValue field returns a one, meaning that the selected point is within the selected polygon.
Union
The union operator is an overlay analysis, like the intersect, however the results of a union are the combination of the two features, with overlapping areas obtaining the attributes of both. Again, let’s turn to Esri for a nice graphic:
Like the intersect operation, the STUnion() operation analyzes the spatial relationship between two different records. The basic union query will return the area represented in the union. In the following example the query will select two countries from the world countries table to union. The results of the query return the unioned geomerty of the two selected polygons.
use Spatial_Database
go
DECLARE @poly_geom geometry
select @poly_geom = geom.STAsText()
from dbo.world_countries
where dbo.world_countries.ADMIN = 'Germany'
DECLARE @poly_geom2 geometry
select @poly_geom2 = geom.STAsText()
from dbo.world_countries
where dbo.world_countries.ADMIN = 'Switzerland'
SELECT @poly_geom2.STUnion(@poly_geom) as UnionGeo
The initial input are two distinct polygon records, one representing Germany and the other, Switzerland. If the user were to query these two records from the table they would see the following spatial result:
However, during the union the geometry of these two records are morphed into a single feature, as seen below:
Using the example script the user could then add additional fields to the select query or perform any number of additional operations on the unioned record.
Difference
The final spatial operator that will be covered will be STDifference(). The difference overlay operation uses two layers with the area of one being removed from the overlapping area of the other.
In spatial SQL the difference method, STDifference(), returns the geometry of the difference. In the following example the user will find the shape that is left behind between the difference of the United States and Massachusetts:
use Spatial_Database
go
DECLARE @poly_geom geometry
select @poly_geom = geom.STAsText()
From dbo.States_Provinces
where dbo.States_Provinces.name_1 = 'Massachusetts'
DECLARE @USA geometry
select @USA = geom.STAsText()
from dbo.world_countries
where dbo.world_countries.ID = 236
SELECT @USA.STDifference(@poly_geom) as Part1andPart2
The results, as seen below, displays the area that Massachusetts represents in the world_countries table.
Conclusion
Well, that is it for this tutorial. If I have some time over the next month I’d like to explore spatial indexes within spatial SQL and perhaps cover some more in-depth spatial queries. All these queries will work within 2008 R2 and the user can, if desired, use tables from ArcSDE within the queries.
The tutorial and the examples are very simple and I do not profess to be an expert in spatial SQL, so, if there is a way to represent any of these queries more efficiently let me know.
The first two parts of the guide give some background information on SQL Server, covers some basic concepts, provides some links to download some software and data. Once the user has down completed the first two part of this guide they will be able to understand this section of the guide, which will provide some basic spatial SQL scripts.
Tutorial
Here the user will learn about a few basic Spatial SQL scripts. Nothing too complicated, but enough to get the user started. Hopefully the user will start to think about ways to expand the scripts into their own ideas.
This tutorial will cover the MakeValid, calculating centroids, and buffering a polygon.
Checking the Validity of Shapes
Before the user starts to calculate buffers, generate centroids, union or intersect tables or perform any other Spatial SQL methods the user should first run the MakeValid command. MakeValid will check the shape to see if the data set breaks any of SQL’s rules for storing the geomerty of the shape. As mentioned in the first part of this guide the geometry datatype is very flexible, but the user should run MakeValid first before doing any analysis because there still might be self intersecting polygons, bad ring order or any other number of topoligical errors within the dataset.
Many GIS programs won’t worry about these problems on the surface, but SQL will and users should perform these queries on all spatial datasets, especially polygons data, as it is usually full of errors.
This first example will update the State_Provinces table, checking the geometry of each record in the table. This query will update the spatial data type column in a table so that the shape is valid, meaning that the shape meets SQL’s requirements for the datatype
In this example the State_Provinces table is being checked. The user will need to
run the MakeValid command against the column that stores the spatial data type, in this case, the geom column.
use Spatial_Database
go update dbo.States_Provinces
set geom = geom.MakeValid()
This query make not catch all errors in the data set. The following scripts will check the data for validity as well. The third script will check to see if any of the records still contain any invalid geometries. These scripts were modified for the geometry datatype from Beginningspatial.com, a great resource the user should check out.
--Fix invalid shapesupdate dbo.States_Provinces
set geom = geometry::STGeomFromWKB(geom.MakeValid().STAsBinary()
,geom.STSrid)
--Correct the order of the points in the polygon ringupdate dbo.States_Provinces
set geom = geometry::STGeomFromWKB
(geom.STUnion(geom.STStartPoint()).STAsBinary(),geom.STSrid)
--Check to see if any records still have any invalid geometries. --The results should return an empty table.select ID, geom.STIsValid() as ValidCheck,
geom, geom.ToString() as Spatial_Text_String
from dbo.States_Provinces
where geom.STIsValid() != 1
When these queries are run the geometry in the table be updated. From there the user will be able to repeat this process for each of the tables loaded in the first tutorial. If the user does not run the MakeValid query some of the examples in this tutorial may not work correctly.
Calculating Centroids
The next set of queries will take a polygon layer, calculate the centroids, and convert the centroids into latitude and longitude. Using the world_countries table in the Spatial_Database database the user can run this query and return the name of the feature and the X and Y of the polygon’s centroid. This query uses a subquery where the centroid is first calculated. Once this is done the outer query converts the centroid into latitude (Centroid.STX) and longitude (Centroid.STY).
select U.Region_Name, Centroid.STY as Longitude,
Centroid.STX as Latidude
from
( select geom.STCentroid() as Centroid, NAME as Region_Name
from dbo.world_countries
)U
The following will expand on the previous query. In the next example the user will calculate the centroid for a specific polygon and then buffer the centroid. In the subquery a where statement is set to select a specific record that meets a certain requirement. In this example the user will select records from the name_1 column that equal ‘Massachusetts’. Also, when the STCentroid() method is called no parameters are needed, unlike other spatial methods which can require several methods.
In the outer query the centroid is buffered by 0.05 decimal degrees, set in the STBuffer method. Why are decimal degrees used here? Well, it is because the data were loaded in as WGS84. There will be a future tutorial on how to tackle this problem.
The user can run this query and return a buffer around the centroid for the state of Massachusetts.
select U.Centroid, Centroid.STBuffer(0.05) as buffer, U.StateBound
from
( select geom.STCentroid() as Centroid, geom as StateBound
from dbo.states_Provinces
where name_1 = 'Massachusetts'
) U
Calculating a Buffer
The last query that will be demonstrated will introduce the buffer, even though I just gave an example. The buffer operation is a common operation within GIS tools. The following script will give a basic example of how to generate a buffer in Spatial SQL.
The user will notice that the shape column, geom, is set and the buffer method, STBuffer, is called. STBuffer accepts a parameter, the size of the buffer. In this example the buffer size will be 0.05 decimal degrees. Also, in this example a where statement is set to query a specific record from the table. When this is done only that record would be buffered. Again, the States_Provinces table is used.
select geom.STBuffer(0.05) as buffer
from dbo.States_Provinces
where name_1 = 'Massachusetts'
What’s Next
That is it for this section of the tutorial. As I said earlier, pretty basic stuff. In the next tutorial I will get into some other topics including unions, intersects, spatial indexes, joins and other methods and operations will be covered. And, it looks like I may end up creating several more tutorials than I originally planned!
I get to do fun things with GIS, like develop models that handle and process millions of records. This model is used to clean a number of spatial variables for input into another analytical model. This awesome monstrosity contains:
205 million input xy points read from SQL tables that are processed into over 500,000 polygons
500,000 polygons merged and manipulated into about 200,000 polygons
Over 20 different spatial processes (merge, clip, field calculations, projections, spatial stats, etc…)
Database driven model, using a mix of SQL tables, SDE datasets, and File Geodatabases
Four custom python scripts, some of which are repeated several times
The output? Two text files, which are generated from python scripts.
Output used in another set of spatial models
The model took about a week to build and calibrate. The model itself runs in about 30 hours on a four core, 64 bit machine and produces close 20GBs of data from about 4GBs of input data.
Anyone can build a mapping website (including me!), but a true GIScience geek lives on this stuff. I love spatial analysis.