Working with spatial data in South Africa

Contents

Quick reference

If you’ve done basic spreadsheet editing and used Google Maps then you can create exciting spatial data visualisations on a variety of topics, from land to health service delivery.

For any point on a map the location is defined by:

  • latitude - which tells you the y-axis/ North-South position, and is always a negative number in South Africa.

  • longitude - tells you the x-axis/ East-West position, and is always a positive number in South Africa

In the example image below, you will see that Zingisa No 1 Primary School has a longitude of 24.7334 and latitude of -28.7203.

You can associate any general information with a point location in a spreadsheet row (such as school name/ size/ type or land use/ owner/ value) and this can be imported into a spatial mapping and visualisation tool as a new layer - as long as there is a longitude and latitude column and you have sourced and prepared the data correctly. In the image of Sol Plaatje, a layer of school point information has been placed on top of other layers as yellow circles.

There are some great new free and open source online tools that make it easy to import new layers of information on top of a base map and to create visual stories, such as kepler.gl, which is built on Mapbox and OpenStreetMaps, as is uMap.

Point information (like the school name and location) can be imported from a spreadsheet but you will need to clean the data up and save it as a csv file format first (NB: not semi-colon-separated) otherwise the mapping tool will not import the file. See detailed instructions in the mini-guide.

Line or region information (like the suburb populations) needs to be imported as Shapefile, GeoJSON, KML file format or similar, and these files tell you more about a bounded shape instead of a single point.

Key concepts and words

Longitude and latitude: If you want to show any information (such as a school name) on a map, you need to have the location of the school in the form of longitude and latitude coordinates and tag the name onto that location pin. Latitude tells you the y-axis/ North-South position, longitude tells you the x-axis/ East-West position. In South Africa the longitude will always be a positive number between 15 and 35. The latitude will always be a negative number between 20 and 35. If you have the address of the school you need to convert it to longitude and latitude. In the map of Sol Plaatje below, you will see that Zingisa No 1 Primary School has a longitude of 24.7334 and latitude of -28.7203.

Layers of information: Every map is built up of layers of information starting with a group of ‘base’ layers that typically show a satellite image (or ‘raster’), roads, some key points and possibly municipality or ward boundaries. People often give you 'Shapefiles' (see below) for this information. You then have to insert additional layers of information on top of the base layers, e.g. the name and location of schools in Sol Plaatje or the Northern Cape. A layer is built from one of three types of information:

  • Point: single location on a map defined by longitude and latitude (e.g. location of a school)

  • Line: connections between points (e.g. sections of road)

  • Polygon: connection of multiple lines into a shape (e.g. boundary of a municipality or ward)

Spatial data file formats: Location information is usually combined with other non-spatial information into spatial data files that you can show on a map (e.g. the names of schools are linked to their longitude and latitude and shown as yellow points on the above map).

  • Table (xls(x), ods and csv): each row contains information about a point - such as the name of school, number of teachers, number of learners and telephone number; as well as location information in the form of longitude and latitude or the address, which needs to be converted to longitude and latitude. This data must be sourced and prepared carefully.

  • Shapefile: is a widely used spatial data format usually distributed as a folder or .zip containing a minimum of .shp, .shx and .dbf files. Can show information about a point (like csv) but also line and polygon information.

  • GeoJSON and other open formats: are increasingly used as alternative to Shapefiles and show similar information.

Suggested tools

Mapping and visualisation: Uber has recently built an open source interface on top of Mapbox called kepler.gl (also see this example) which this mini-guide uses. This is probably the easiest to get started with that doesn’t require a sign-up or download. Here Technologies has a similar tool which you need to register to use. There are very lean cloud-based alternatives like Datawrapper or more advanced options like the base version of Mapbox and Carto which is a good proprietary starting tool for 30 day trial (then $149/ month!). Also try uMap which lets you create maps with OpenStreetMap layers. The most widely used open source tool by GIS practitioners is QGIS which involves about 700MB download over a few steps. And QGIS has a great web mapping plugin called QGIS2Web which makes it very easy to export an interactive map for users to explore. See this tutorial and an example output using South African elections data. See Mapzen for supported open data projects including street coverage in OpenStreetMap, addresses in OpenAddresses, transit schedules in Transitland, or global names and places in the gazetteer Who's On First.

Conversion between various spatial data formats: You can import and export into different formats using QGIS. But a quick and easy online alternative is Mapshaper which converts between various formats, such as Shapefile to a GeoJSON. There are other open, closed and cloud-based tools such as toGeoJSON and MyGeodata (only free up to 5MB) that can do other conversions if you need. Shapefiles (SHP) are increasingly being replaced by Geodatabase (GDB), so you may need to try GeoConverter.

Converting addresses to longitude and latitude: This is called geocoding. You can do this manually in Mapbox, Google Maps or OpenStreetMap by searching for the address and getting the longitude and latitude of the pin. You can do this automatically by using Nominatim (uses OpenStreetMap), OpenRefine and Google Maps API, or through Mapbox API and OpenStreetMap. More broadly you can also geotag points and draw polygons on maps/ satellite images using the above mapping tools.

Open Street Map and uMap collaborative project to create open map of the world.

See afrimapr which is creating R building blocks and learning resources to make it easier to make data-driven maps in Africa.

Leaflet is used widely by app developers and is an open-source JavaScript library for mobile-friendly interactive maps.

Citizen-driven open mapping tools: Open Map Kit (OMK) and iD

Open spatial data formats: Geography Markup Language (GML) and GeoPackage (GPKG)

Private spatial mapping service providers in South Africa usually supply comprehensive spatial data information covering most of the above (and more): Mapable, PlanetGIS, 1map, Kartoza

Sourcing and preparing spreadsheet data for mapping tools

To upload your data file into a mapping tool like kepler.gl, Mapbox, uMap or Carto you should prepare the file by addressing a few things, otherwise the tool may not load the file:

  1. Sourcing spatial data: In some cases you don't have an xls(x), ods or csv file and need to scrape the data tables from a pdf or website. Have a look at this easy intro on using Tabula and crawlers to scrape data, and see the note in section above about using a mapping tool to geotag, draw polygons or geocode address data into latitude and longitude.

  2. Location columns: Once you have your xls(x), ods or csv ensure that it includes two separate columns, one with longitude and one with latitude.

  3. Location column names: Ensure that longitude and latitude columns are named exactly that. If the column name is GIS_latitude or some other variation then the mapping tool may not identify the correct column.

  4. Location column format: Not necessarily required here but for future use, convert the longitude and latitude columns to number format. Also increase the number of decimal places to at least 5 otherwise you will affect the accuracy of the location information.

  5. Non-number characters: Remove all non-number characters from the longitude and latitude columns. Even if there is only one row with non-number characters, the mapping tool may not load the entire dataset. You can find the problem rows by sorting the column Z → A and then manually editing or deleting problematic cell contents.

  6. Delimiters: Ensure your operating system is set to save the csv using comma delimiters (,), not semi-colon delimiters (;). This is usually changed in the system region settings or similar.

  7. File format: “Save As” to CSV UTF-8 file format. To be safe you can import your sheet into a Google Sheet and then download as csv from there.

Now open the csv file using a text editor like Notepad or TextEdit and check that:

  • There is a header row in the first line with the names of the columns

  • That the naming of longitude and latitude in the header row is correct

  • That names and values are separated by commas, not semi-colons

If your csv is still not loading and you are using Mapbox or kepler.gl.gl as your mapping tool, have a look at this csv troubleshooting guide.

Example Mapping of Schools in Sol Plaatje

Location of schools (yellow circles), number of learners (size of yellow circle) & population of different suburbs (purple shading) in Sol Plaatje, Northern Cape. Built on kepler.gl.

For the this example data story we wanted to check whether schools are situated near areas with higher populations (at ward level or sub-place). In the image of Sol Plaatje above, the yellow circle shows where a school is located, and the size of the circle is proportional to the number of learners in the school. The colour of the suburb corresponds with the number of residents, darker colours = larger population.

This example uses kepler.gl but you could use any of the suggested tools and it would follow a similar approach. Note that kepler.gl runs as a client-side app and does not require a login, which means you can’t save to an online account. However, you can export your map to a json on your desktop which you can quickly import to kepler.gl any time.

1. Prepare a base layer of administrative boundary and/or infrastructure information

As noted above, your information on the school needs to be shown on top of a ‘base’ map of Sol Plaatje that shows roads and possibly municipality or ward boundaries. Luckily, most online mapping tools, including kepler.gl, already have a base layer of information for South Africa so you probably don’t need to do anything in this step.

However, if you do want to add specific base data you can source it from various locations, e.g.:

2. Add your first variable layer [basic: csv]: School location and size

In this case we first download the updated school master list information from EMIS website for Northern Cape schools under the section “Quarter 4 of 2016: March 2017”. You should first source data and prepare your csv file before trying to import into kepler.gl or another mapping tool.

On kepler.gl click on “+ Add Data” and upload your csv file. This will automatically create a new layer on top of the base layers, and should focus on the region where your points are located. If you hover over one of the points it should pop up information about that school.

Add Northern Cape schools data to kelper.gl as a csv

After creating your map you can now tweak the settings on the layer to enhance the visualisation. For example, you can make the ‘Radius’ of the school circle proportional to the number of learners in the school. By zooming into Sol Plaatje, you can quickly see where the big schools are in the municipality.

Zoom into Sol Plaatje to see school information for the city

3. Add you second variable layer [advanced: shape]: Population in different suburbs

You can now add multiple additional layers as you did with the school information csv. If you are using csv files just make sure there are longitude and latitude columns, and click “+ Add Data” again.

If you would like to add more complex spatial information, such as average household income for sub-place/ suburb/ ward polygons in Sol Plaatje, then you will need to import GeoJSON or Shapefiles with this content.

The main source of population or demographic information is the StatsSA census and community surveys. You can access Census 2011 data down to ward level on Wazimap, including for Sol Plaatje (click on Download and select GeoJSON). To do it yourself, location-based population information is accessible in two ways:

  • Municipality and higher levels: You can use QGIS to (inner) join the municipality boundary demarcation Shapefile (which has location information) mentioned above to municipality census information you can download from StatsSA Nesstar or SuperWEB2.

  • Sub-municipality level: For sub-place and small area (smaller than ward level) census information you will need to contact StatsSA directly to request their 2011 Community Profile Database DVD set (e.g. for 2001 census) - which will then need to be processed into Shapefiles or similar. Or you can buy processed data from a commercial provider like AfriGIS.

An extract of processed Census 2011 (extrapolated to 2014) Shapefiles for Sol Plaatje can be found here, which you can import directly to kepler.gl in a similar way to how you do the csv. An example of this imported into to kepler.gl, as a new layer below the schools, is shown below. The colour of the polygons shows the population for the area: dark colours = lower population, light colours = higher population.

So hopefully more/ larger schools (yellow bubbles) are located in the areas with higher population numbers (lighter colours), which seems to be the case for Sol Plaatje.

Sol Plaatje schools information on top of suburb population (shades of purple)