GIS595/MEA792: UAV/lidar Data Analytics

This is an unmaintained course material, please see current material at:

Using Python and GRASS GIS

Outline:

Import lidar points.
Interpolate with parallelization.
Process multiple maps at once and generate an HTML report.
Use matplotlib to show terrain profiles.

Data:

Lake Wheeler GRASS Location; be sure to use separate Mapset for this assignment
selected LAS tiles from here (see below for the tiles we will need; you can use the provided Python code to download them)

Tools:

Your GRASS GIS 7 installation should also include libLAS library which is used by GRASS modules v.in.lidar and r.in.lidar (standalone GRASS GIS for MS Windows, OSGeo4W and Ubuntu packages contain libLAS).
libLAS installation should include command line tool las2las.
You will need to edit files in a text editor with Python support, e.g. Gedit or Notepad++, or a Python editor or IDE such as Spyder.

This assignment expects at least basic knowledge of Python, GRASS GIS and how they are used together. Review the introduction when in doubts.

Running the code

First, start GRASS GIS in the Mapset you will be using for the assignment. To run the code, put relevant code snippets from each section into a file (a script) using your text editor and save it with .py extension. Note that sometimes not all the snippets in a section should be executed together, always read the surrounding text to find out.

On Linux run the script in the GRASS GIS command line (the system one with GRASS GIS written in ASCII) using python command and full path to the .py file, for example:

python /path/to/file.py

On Windows run the script through the GRASS GIS GUI. In Layer Manager main menu, go to File -> Launch script. This will ask you to find the file on your disk. It will ask you a question about paths for addon. If you are not sure, answer No. Then it will run it in the Console in the GUI.

To run the script again, use arrow up key and press enter. Note that the script might have created some maps or files which you might need to remove before running it again. See the end of this assignment for more information on executing Python code.

Downloading the data

In case you don't have the lidar tiles, download them using Python, otherwise skip this step. First, create a directory where you want to have the files. Then set it as current directory in Python:

# use path which applies to you
# on Windows use something like 'D:\\path\\to\\files'
os.chdir('/path/to/las/files')

Then, download the files using:

import urllib

base_url = 'http://fatra.cnr.ncsu.edu/lidar/'

urllib.urlretrieve(base_url + '0791_005.las', '0791_005.las')
urllib.urlretrieve(base_url + '0791_011.las', '0791_011.las')
urllib.urlretrieve(base_url + '0792_017.las', '0792_017.las')
urllib.urlretrieve(base_url + '0782_020.las', '0782_020.las')
urllib.urlretrieve(base_url + '0781_008.las', '0781_008.las')

The same could be done in much smarter way using a for loop and a list of files to download, so you can try that as well.

Working with command line tools

The LAS files are in different coordinate system, so we need to change it.

First, we check if we are in the directory with the files:

print os.getcwd()

If not, we change directory to where the files are:

# on Windows use 'D:\\path\\to\\files'
os.chdir('/path/to/las/files')

In GUI the current working directory is changed in Settings -> GRASS working environment -> Change working directory. In command line, it is done using the cd command. Note that directory is changed just for the current process and processes started from that process afterwards. So for example, changing directory in your script will not influence GUI but when the directory is changed in the GUI and then a script is launched, it will have the new working directory.

Now change to coordinate system for all files in the current directory:

import os
from subprocess import call

for f in os.listdir('.'):
    if f.endswith('.las'):
        name, ext = os.path.splitext(f)
        call(['las2las', '--a_srs=EPSG:2264', '--t_srs=EPSG:3358', '-i',
              f, '-o', name + '_spm' + ext])

In the same step, we could do also some other conversions which are supported by las2las, for example format conversion.

Import multiple LAS files to GRASS GIS and merging them

We are looping through the lidar files and importing points only in our region of interest. First set the region in command line:

g.region n=219637 s=219254 w=636730 e=637193 -p

Now run this script (assuming you are in the directory with the las files):

import os
import grass.script as gscript

region = gscript.region()

vectors = []
for lidar_file in os.listdir('.'):
    if lidar_file.endswith('_spm.las'):
        bbox = gscript.read_command('r.in.lidar', input=lidar_file,
                                    output='foo', flags='g').strip()
        bbox = gscript.parse_key_val(bbox, vsep=' ', val_type=float)
        if (bbox['n'] < region['s'] or bbox['s'] > region['n']
          or bbox['e'] < region['w'] or bbox['w'] > region['e']):
            gscript.info('Skipping tile %s' % lidar_file)
            continue
        name = 'tile_' + lidar_file.rsplit('.', 1)[0]
        vectors.append(name)
        gscript.run_command('v.in.lidar', input=lidar_file, output=name,
                            flags='rt', class_filter=2)
gscript.run_command('v.patch', input=vectors, output='merged_points',
                    flags='b', overwrite=True)
gscript.run_command('g.remove', type='vector', name=vectors, flags='f')

In case you are having problems with projections but you know that the projections in fact match, add o flag to r.in.lidar and v.in.lidar calls.

As you can see the code above goes over all files in a directory but it creates vector maps only for the files where the data are actually in the current region. The same approach can be used to import data in different format with different modules just for the area of interest (specified using g.region).

DSM interpolation with parallelization

Note that this parallelization doesn't currently work on Windows.

We will interpolate DEM using GridModule which allows to run v.surf.rst in parallel by splitting the region into tiles and interpolating each part and then patching them back. First we set the region of our area (in this example, we use very low resolution to make the computation fast):

g.region n=219637 s=219254 w=636730 e=637193 res=1 -p

Note: You need to change the visible Mapsets to something else than the default (current Mapset and PERMANENT Mapset). Set the visible Mapsets (SEARCH_PATH) by going to Settings -> GRASS working environment -> Mapset access and check one of the unchecked Mapsets. Otherwise, you will get an error "AttributeError: 'VisibleMapset' object has no attribute 'write'", see ticket #2791.

We import GridModule and compute the size of the tiles if we want to run it on 4 cores at once:

import grass.script as gscript
from grass.pygrass.modules.grid import GridModule

region = gscript.region()
width = region['cols'] / 2 + 1
height = region['rows'] / 2 + 1

First try to run it in not parallel mode (debug=True) to see if it is working.

grd = GridModule('v.surf.rst', debug=True, width=width, height=height,
                 overlap=10, processes=4, npmin=100,
                 input='merged_points', elevation='mid_pines_small_dem_1m',
                 tension=30, smooth=1, overwrite=True)
grd.run()

Then run it in parallel (debug=False) and we can see the time needed reduced significantly:

grd = GridModule('v.surf.rst', debug=False, width=width, height=height,
                 overlap=10, processes=4, npmin=100,
                 input='merged_points', elevation='mid_pines_small_dem_1m',
                 tension=30, smooth=1, overwrite=True)
grd.run()

Note that parallelization is not always straightforward. Here we call v.surf.rst to interpolate raster from vector points in different parts of the area. These different calls run at once at different processor cores and they don't communicate with each other. Some modules, such as r.sim.water where the water needs to continuously flow through the whole landscape, cannot be called in parallel in this way. Also note that parallelization often comes with some cost. Here the different tiles must be patched together at the end. However, we actually get the speed improvement because running the interpolating at multiple cores at once is much greater improvement than the slow down cause by patching.

Working with large amount of maps

Set the region to fit one of the raster maps:

g.region raster=2015_06_20_pix4d_11GCP_dsm

Now we will compute difference between raster maps in a list using nested for loops.

import grass.script as gscript

rasters = ['2015_06_20_pix4d_11GCP_dsm', '2015_06_20_DSM_Trimble_11GCP',
           '2015_06_20_DSM_agi_11GCP']
for raster_a in rasters:
    for raster_b in rasters:
        # the two for loops create combination of each map with each map
        # including the map itself and we don't want to create a difference
        # of map with itself, so we skip these combinations
        if raster_a == raster_b:
            continue
        # create the name for the difference raster
        difference = "diff_{a}_{b}".format(a=raster_a.replace('@', '_'),
                                           b=raster_b.replace('@', '_'))
        # compute difference between the rasters using r.mapcalc
        gscript.mapcalc('{diff} = {a} - {b}'.format(
            a=raster_a, b=raster_b, diff=difference))
        # set color table to the resulting map
        gscript.run_command('r.colors', map=difference, color='differences')

Before running the code again, we need to remove the created raster maps (in the command line):

g.remove type=raster pattern="diff_*" -f

Generating a report

Using HTML and generated images, we can create a report using Python for the analysis we have made.

First, we get a list of raster maps we are interested in. We use g.list wrapper with pattern option. Then we compute univariate statistics and we also compute shaded relief for each raster map. Finally, we render the raster map together with shaded relief using d.* modules to a PNG file. Note that we are using d.shade to avoid creating another raster which would combine the original raster and its shaded relief.

import grass.script as gscript

# get list of rasters we are interested in using a search pattern
rasters = gscript.list_strings('raster', pattern='2015_06_*')
# initialize a dictionary to hold statistics for each raster
stats = {}
# iterate through the list of rasters
for raster in rasters:
    # save univariate statistics for raster to dictionary
    stats[raster] = gscript.parse_command('r.univar', map=raster, flags='g')
    # compute shaded relief, use name of the original raster including mapset
    shaded_relief = raster.replace('@', '_') + '_shade'
    gscript.run_command('r.relief', input=raster, output=shaded_relief,
                        overwrite=True)
    gscript.run_command('r.colors', map=raster, color='elevation')
    # render the raster (geographical extent follows current region)
    gscript.run_command('d.mon', start='cairo', output=raster + '.png',
                        width=400, height=400, overwrite=True)
    gscript.run_command('d.shade', shade=shaded_relief, color=raster)
    gscript.run_command('d.mon', stop='cairo')

Now, we create an HTML report from the data collected above. We also link images created above to the HTML.

# a template for an HTML report for one raster
template = """<h2>Raster map {name}</h2>
<h3>Statistics</h3>
<table>
<tr><td>Mean</td><td>{mean}</td>
<tr><td>Variance</td><td>{var}</td>
</table>
<h3>Image</h3>
<img src="{name}.png">
"""

# write to a file using a template
with open('report.html', 'w') as output:
        for raster in sorted(rasters):
            stat = stats[raster]
            output.write(template.format(
                name=raster, mean=stat['mean'], var=stat['variance']))

To remove the created raster maps, we can use (in the command line):

g.remove type=raster pattern="*_shade" -f

Terrain profiles

In this example, we will compare profiles from two raster maps, 2015_10_06_DSM_agi_8GCPs which is in PERMANENT Mapset and mid_pines_lidar2013_dem which in DEM created from Mid Pines lidar tile. You should have this raster map in some of the Mapsets you used before. You need to make map from this Mapset accessible by adding it to search path (SEARCH_PATH). This is done in GUI by going to Settings -> GRASS working environment -> Mapset access and by checking the Mapset there. In command line, use g.mapsets. Alternatively, add at-sign and Mapset name to the name of the map (mapname@mapsetname).

Compute a profile for multiple rasters and plot them in matplotlib:

import grass.script as gscript
import matplotlib.pyplot as plt

rasters = ['2015_10_06_DSM_agi_8GCPs', 'mid_pines_lidar2013_dem']
coordinates = [637160.446919, 219373.236976,
               637105.693795, 219392.416036,
               637072.439779, 219400.613538]
profiles = {}
distance = []
for raster in rasters:
    profile = gscript.read_command('r.profile', input=raster,
                                   coordinates=coordinates, quiet=True).strip()
    # parse output
    if not distance:
        for line in profile.splitlines():
            distance.append(float(line.split()[0]))
    profile_elev = []
    for line in profile.splitlines():
        profile_elev.append(float(line.split()[-1]))
    profiles[raster] = profile_elev

for raster in profiles:
    plt.plot(distance, profiles[raster], label=raster)
    plt.legend(loc=0)
plt.show()

Compute profile statistics using NumPy:

import numpy as np

stats = {}
for raster in profiles:
    profile = np.array(profiles[raster])
    maxim = np.max(profile)
    minim = np.min(profile)
    mean = np.mean(profile)
    stddev = np.std(profile)
    median = np.median(profile)
    roughness = np.std(np.diff(profile))
    stats[raster] = (minim, maxim, mean, stddev, median, roughness)

And write them into a CSV file:

with open('profile_stats.csv', 'w') as f:
    f.write(','.join(['name', 'minim', 'maxim', 'mean',
                      'stddev', 'median', 'roughness']))
    f.write('\n')
    for raster in profiles:
        f.write(raster + ',')
        f.write(','.join([str(i) for i in stats[raster]]))
        f.write('\n')

Tasks and deliverables

Try the provided code snippets. Modify them as you wish.
Run the script for creating profiles several times with different coordinates.
Extent the generated report by computing viewshed from a selected point.
Deliver the generated report as PDF.
- To get the PDF, print the web page as PDF in a web browser (it should be also possible to copy the HTML through the clipboard to a word processor like LibreOffice Writer).
- Alternatively, generate report in LaTeX, or another markup, and deliver generated PDF.
- In any case, include also the source text file (but you don't have to include the individual images).
Alternatively, you can run your own code or use your own data.

The following parts contain additional material. Trying the things out is optional.

Downloading all files linked on a website

import urllib
import re

# note the slash at the end
base_url = 'http://fatra.cnr.ncsu.edu/lidar/'

tile_file = 'tile_list.html'
# download the file (in one step)
urllib.urlretrieve(base_url, filename=tile_file)

# open the file and read it line by line
# we expect to get HTML with a list of filenames in the given directory
with open(tile_file) as html:
    for line in html:
        # search for the links (a tags with href attribute)
        # we expect one link per line in the HTML source code
        match = re.search(r'<a href="(.*?)">', line)
        if not match:
            continue
        # get the content of the href attribute captured before
        filename = match.group(1)
        # download only files we are interested in
        if not filename.endswith('.las'):
            continue
        # we expect the link to be relative to our base URL
        urllib.urlretrieve(base_url + filename, filename=filename)

Script with parameters

Putting filenames, paths, map names etc. to a script might see as an easy option at the beginning, however later this gets hard to manage as one needs to search for them and change them for different data especially when one script is used with different data at the same time. Here, we create a script which adds two numbers and prints the whole expression including the result. It should be easy to imagine that the parameters would be raster names and the expression would be r.mapcalc expression which we could execute.

import sys

if len(sys.argv) != 3:
    sys.exit("Please, provide two parameters")
a = float(sys.argv[1])
b = float(sys.argv[2])
c = a + b
print "{a} + {b} = {c}".format(a=a, b=b, c=c)

When checking the number of parameters and getting the individual parameters, remember that the first (zeroth) parameter is the name of the script and the first parameter provided in the command line is at the position 1 in the list.

GRASS GIS provides its own way of handling parameters which is specialized for geospatial data and standardized across GRASS modules. Interface defined like this can be used to create command line help, documentation or GUI. However, even when used just in a script, first advantage being the usage of key-value syntax in the command line as opposed to relying on the order of parameters. In comparison to Python packages for command line parsing, GRASS GIS provides predefined types related to various values and data types used in GRASS GIS. Read more about it in the documentation.

A proper structure of a Python script

A proper Python script starts with a shebang line which is used on Linux (see below), then we do all the imports and define all the functions. One function, often named main, is called in a if statement at the end of the file. The if statement is using condition __name__ == "__main__" which says that this particular file is being executed (this will be always true for our scripts).

#!/usr/bin/env python

import sys

def main():
    a = 3
    b = 5
    print a + b

if __name__ == "__main__":
    main()

There are different variations of this structure but the main idea is that the code is not placed at the top level (without any indentation) but within some functions. In general, examples do not show this as it is simple to just indent to code more and put it to a function.

More ways of executing Python code with GRASS GIS

If you change the current working directory to the directory where the script is, you can just type python and name of the script. This might be a good option for the assignment or on MS Windows but otherwise, it is not recommend.

On Linux (and Mac), all scripts should contain a shebang line which tell the system which interpreter to use. The shebang is a first line in a file and starts with #!. For your Python scripts, use:

#!/usr/bin/env python

Besides providing shebang line, you need to make your file executable using, for example:

chmod u+x /path/to/file.py

This is a best practice which most people use. Using this you can call the script using:

/path/to/file.py

Note that there is not python and that the extension of the file can be arbitrary.

If you want to run the code which uses GRASS GIS packages from your Python editor you can start the editor from the system command line which opens with GRASS GIS. Then the editor is part of the GRASS GIS environment and GRASS Python packages will be accessible and GRASS modules will recognize the current Location and Mapset.

If you want to start GRASS GIS without GUI to just run your Python scripts, your can run the grass command in terminal on Linux or using the GRASS GIS Command Line application on Windows.

If you want to use GRASS Python packages and GRASS modules without starting the GRASS GIS application at all, you can do so by setting up the path to GRASS Python packages and doing an initialization procedure to setup runtime environment for modules and also connect to a given GRASS Database, Location and Mapset. Please refer to the documentation for details.

Alternatively, you can set your system environment to be the proper environment GRASS modules will work in without a need to actually start GRASS GIS. To run something you would need to just set GRASS Database, Location and Mapset. There is a reason why GRASS GIS is not doing that, although there is a lot of applications which are invading your system in this way or force you to do it, especially on Windows. If you wish GRASS GIS would be the same, use the documentation or the wiki as a guide. This solution should be fine in some isolated environment such as virtual machine. Less invasive version of this is to set just Python packages path (PYTHONPATH) and GRASS installation directory (GISBASE) so that you can import GRASS Python packages and leave the rest on the GRASS setup functions. Note this can still result in problems with different versions of Python on your system (not generally an issue on Linux).

Note that last two ways are more intended for application developers using Python and GRASS GIS. GIS professionals generally more benefit from running the scripts from within GRASS GIS as they can review the results right-away and such scripts can be easily turned into real GRASS modules with a consistent user interface.