Spatial SQL for the Geographer – Part 1 – Welcome to Spatial SQL

Before we begin I would like to make it clear that learning all the capabilities and limitations of SQL Server and Spatial SQL cannot be covered in three short lessons.  If you believe you might need some more training please visit the Microsoft SQL Transact reference guide.

What is Microsoft SQL Server?

Microsoft SQL Server is a relational database system that stores data and provides several mechanisms to retrieve stored data.  One of the beauties of relational database systems is that they allow you to effectively store a variety of data sets and they are both scalable and securable, and in my experience, much more trustworthy than any flat file.

There are a number of good references out there explaining SQL Server in much more detail than I would attempt.  I recommend that you check a couple of them out:

Explanation of Spatial Data Types

As you probably know by now a column in a table or view can be assigned one of many different data types.  Most are familiar with the traditional data types such as varchars, integers, bits, floats, or decimals.  With SQL Server 2008 two new spatial data types were introduced.  These data types support the storage of spatial data, and are broken up into two different types.  Their characteristics are important to understand:

  • Geometry:  Primary characteristic – spatial data stored in a planar system.
  • Geography: Primary characteristic – spatial data stored using the ellipsoid.
  • Here is a great blog post that describes the spatial data types in more detail.

Tutorial

Step One: Getting Started

The user first needs to download some software and data to their local machine.

Downloading Microsoft SQL Server 2008 R2 may take a few minutes.  The user needs to make sure that they read the download and installation instructions before they proceed. Also, when installing the software the users needs to pay particular attention to the installation instructions.

Note:  This tutorial was completed on a 64 bit machine running Windows 7.  I cannot guarantee that any of this will work on anything different.

Notes about the Data.  The populated places, countries, and states and provinces data sets were downloaded from Natural Earth and the roads data set came from Data.gov. The data represent the three different vector data types, points, lines and polygons. They also provide a variety of query options to work with during for the tutorials. Also, I renamed the data from the original names so for clarity during the tutorials.

Once you download the data I recommend first viewing it in a GIS program to make sure you downloaded it correctly.  Here the user will see the four data sets in Quantum:

Tutorial Data

Step Two: Create a Database

Now that the software has been downloaded the user can open Microsoft SQL Server Management Studio.  To do this the user will select to Start–> All Programs–> Microsoft SQL Server 2008 R2–>SQL Server Management Studio.  Since this will probably be the first time the user opens MSSQL it may take a couple minutes to configure and load.

Next, the user will be prompted to connect to a server.  In this case the server is the local machine.   In this example I connect to my local machine instance and use Windows Authentication.  Unless you configured your SQL instance differently you will connect the same way.  Once these options are selected click connect.

connect to server

Now that the user is connected to the server they can create a new database.  First, the user will expand the database instance.  Then the user will right click on the database folder and select New Database from the menu.

Create Database

The new database window will now open.  Here the user can configure a number of settings before the database is created.  In this tutorial we want to name the database Spatial_Database which is set in the Database Name text box.  Another option I like to configure is the location of the data and log files.  This is done by clicking the … button under the Path column.  I recommend storing the database in a location that has plenty of room to grow, as the default location on the C drive may not have space to expand on your local machine.

Change Location

Once the settings are all set the user will click OK to create the database.  The user can refresh the Databases folder to see the new, empty database.

Spatial Database

The user is the system admin on this sever and this database be default.  This means the user has permissions to create databases, delete databases, create tables and load data into them.  With these privileges already in place the admin user can load data into the database without any further configuration.

Step Three: Load Data into the Database using Shape2SQL

Using Shape2SQL is really easy.  The user will first open Shape2SQL and add a shapefile that will be loaded into the database as an individual table.  To do this the user will first navigate to shapefile stored on the local machine by clicking on the Select Button.

Select  Shapefile

The next step is to select the server and database where the data will live.  The user will click on the Configure Button to do this.  In the Configure window the user enter the name of the server in the Server Name window.  The user will then select Use Windows Authentication and then select the appropriate database.  In this tutorial the database has been named Spatial_Database.  Once this information has been configured the user will click OK.

Configure Server

Now, back in the uploader window the user has several additional options to configure before loading the data into the database.  First, the user needs to select the data type. The user will select either the geometry or geography data type.  In this tutorial we load the data as the geometry data, as it is more flexible, in terms of what it will accept, than the geography data type.

The second setting that needs to be set is the name of the geometry column.  The default for this setting is ‘geom’, but the user can change this if desired.  Remember, all the tutorials will use the ‘geom’ column.

The user will then make sure the Create Spatial Index option is selected.  This will ensure that a spatial index is built, which will improve the performance of the table in SQL Server.  Finally the user will set the name of the table in the Table Name window.

There are a few settings that do not need to be selected or altered.  Since this tutorial is using the geometry data type the user does not need to set the SRID.  The user can also change which columns are loaded into the table.  By default all columns are taken from the shapefile and loaded into the table. If a user does not want certain columns they will simply uncheck the columns they don’t want.  Once all the settings are selected the user will select Upload to Database.  Again, I may use all the columns in future tutorials, so you are best off keeping all of them.

Once the user clicks Upload to Database the shapefile will be loaded into the database as a table and the spatial index will be created.  Depending on the size of the shapefile and the power of the user’s machine it may take several minutes to load the table into the database.

The user will repeat these steps and load all four shapefiles into their database.  Once that is completed the user can proceed to the next section of the tutorial.

Step Four: Perform a Basic Query

Now that the shapefiles have been loaded into SQL Server the user can start to perform some queries.  Now, if the user does not have any SQL experience at all it may make sense to read through this tutorial from Microsoft.

The first example query is probably the most simple SQL query one could perform. In this first example I will use the dbo.States_Provinces table.  The user will open SQL Server, connect to their server, and then click the New Query button.  The user will then paste the following into the query window:

use spatial_database
go

select top 100 * from dbo.States_Provinces

The results will return the first 100 records from the dbo.States_Provinces table.  In SQL Server the user will notice there is a Results tab and a Spatial results tab.  The results tab will display the tabular data and the spatial results will display the spatial data.  The Spatial results tab will only appear if there is spatial data returned in the query.

Step Five: Set a Where Statement to Query Specific Data

The user can narrow down what is selected by incorporating some basic SQL statements, such as a where statement.  The basic premise of a where clause is to limit the data returned from a query. The user is setting a condition in regards to what they want returned.  For example, let’s say we only want the return records from the States_Provinces table that are for the United States.  If the user explores the table they will set that this information is stored in the Name_0 column. However, we are unsure of the exact syntax of “United States” so we can use a like operator in the where clause. For example:

use spatial_database
go

select * from dbo.States_Provinces
where NAME_0 like '%United States%'

The user will see only those records that have ‘United States’ in the Name_0 column. If the user clicks on the spatial results tab they will see the spatial representation of those records.

Spatial Results

Alright, that is it! We have completed the first part of this tutorial. The user has downloaded the appropriate software and data, installed the software, loaded a number of shapefiles using Shape2SQL into SQL Server, and have completed two basic queries.

Before the next tutorial I recommend the user, if they are light in their SQL experience, read up on some basic queries from either the W3 Schools or from Microsoft.

The next part to this tutorial will cover some more Spatial SQL queries.  I’ll go into more detail about some spatial operations and provide a number of examples.  If there is anything the you want to see let me know!

10 Replies to “Spatial SQL for the Geographer – Part 1 – Welcome to Spatial SQL”

  1. Can you fix the tutorial download links. Also can your provide the urls to the actual pages you used from Natural Earth and Data.gov sites.

    1. Direct Data Links

      Natural Earth Data

      World Countries – http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/

      States and Provinces – http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-1-states-provinces/

      Populated Places -http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-populated-places/

      Data.gov

      Roads -http://geo.data.gov/geoportal/catalog/search/resource/details.page?uuid=%7B560D9498-E794-2133-17E6-C6A72622DA6D%7D

  2. Is it required the shapefile’s coordinate system be 4326 to upload to SQL?
    I was able to upload the tutorial data but when I tried uploading other shapefiles (GCS_North_American_1983) it uploads without error but I cannot view them in the database.

    Any thoughts?

    1. If you are using the Geometry data type I would use 4326. The tutorials are all based using 4326. If you want to use other projects you need to qualify them in the scripts. If I get some time I’ll try to provide an example.

  3. Hey can I use shapefiles from other parts of the world in this? Say from Europe, as in should I invest time in this and try to bring to to my organisation to use here in Europe?

    1. Sure, all my examples are using the geometry data type, so you should have no problem using data from any part of the world. If you run into any problems let me know and I can help try to debug issues.

Comments are closed.