If you ever find yourself needing part of a large (>2GB) raster dataset from the CSIRO Data Access Portal, you may need to use their WebDAV environment to access it. This generally means downloading the entire raster (many of which are >50GB) in order to clip out a small chunk of it. This is a bit crap if you’re not on AARNet, tbh.

Here’s an alternative - you can clip out portions of the target raster without downloading the whole thing first, using gdal_translate.

The following assumes you are working in Windows and have GDAL 1.11 - get it from GIS Internals. See the end of the post for an explanation of why I’m not using the latest version.

I got a little help nailing this workflow down from the good folks at gis.stackexchange.com.

  1. In the CSIRO DAP, pick the dataset you want and click on the ‘Files’ tab. Fill out the data request stuff on the bottom-right of he page (email address, then pick ‘Download via WebDAV’ and the nearest location to you. Since your options are Canberra or elbourne, you may as well flip a coin if you’re more than 500km from either).

  2. When you receive your confirmation email, click the link and log in with the credentials provided.

  3. Navigate to the exact file you want to extract. Most of the big datasets in the DAP contain .tif mosaics, so let’s focus on hose. Find the .tif you want, right click on it, and copy the full path to an open text file.

  4. Check the metadata, then pick a set of bounding box coordinates in the dataset’s CRS (almost always WGS84, EPSG:4326). Write them ut in decimal degrees, ulx, uly, rlx, rly

  5. in your text file, create a gdal_translate command like:
    gdal_translate --config CPL_TMPDIR C:\path\to\local\folder -projwin_srs EPSG:4283 -projwin ulx uly lrx lry /vsicurl/http://USER:PASSWORD@webdav-bm.data.csiro.au/path/to/tif/sweet_sweet_data.tif C:\path\to\local\folder\sweet_sweet_data_clip.tif

Where it says USER:PASSWORD above, substitute in the email address and password you received in your access email. Replace the @ in your email address with its UTF-8 code, %40.

Once you’ve worked out the whole command, open the OSGeo4W shell and paste it in, then hit enter. If all goes well, you should see a line reporting the number of rows and columns in the target dataset, and then a progress bar. Go make a coffee.

The output .tif will contain any cells whose lower-right corners fell within the bounding box. No alignment shifts or resampling will occur.

Update 01-Aug-2016: Turns out there’s a bit of a bug in GDAL 2.1 - a change was made that meant the -projwin tag did some unexpected pixel reshaping and realignment. The behaviour should be reverted in v.2.1.2, so stick to using v1.x for this kind of operation until then. I’ve tested the method successfully with GDAL 1.9 and 1.11. Note also that the QGIS >= 2.14 Clipper tool relies on GDAL 2.1.0, so it may not perform as you might expect.

NB: If you’re behind a firewall, you’l have to set up your proxy stuff first. In the OSGeo4w shell, the commands are:

  • set GDAL_HTTP_PROXY=http://proxy_url:port
  • set GDAL_HTTP_PROXYUSRPASS=USER:PASSWORD (in this case, sub in your proxy logon stuff, not the WebDAV credentials. For Win users, this is usually your network logon and password. Don’t save those in any text files, of course)

This has been working pretty well for me over the past few days, although transfer times got really slow once the work week started; about 2 hours for 140MB, ugh. Not sure whether to blame general usage levels or the PokemonGo launch…

Ok, so if you’re an ESRI user and you want a bounding box for a single dataset, or a feature/set of features within a single dataset, there’s the Minimum Bounding Geometry tool for vector data (assuming you have 10.3 or above). Raster bounding boxes require more work, but not for any good reason.

There’s not much around for generating bounding boxes over multiple datasets, unfortunately, which is stupid because extent parameters are so easily extracted from any spatial dataset. Sure, you could use MBG after mucking around for an hour creating individual bounding polygons for each of your datasets and then running Dissolve or something, but ugh. Ugh.

My problem today was that I had four (roughly) adjacent soil survey boundaries, and I wanted to run extracts from other overlapping datasets for Science Reasons. I wanted a simple clipping box that encompassed all of the survey areas, and I didn’t need to be picky about restricting the clip to the actual, irregular, outlines. Below is the quick-and-dirty method I came up with, starting with this guy’s code (hat tip!) and branching out.

The idea is that it will draw a box around anything you’ve added to the map window. Caveats: I haven’t accounted for meridian crosses, haven’t tested it on multiple data types, and haven’t tested it on northern hemisphere data, because I don’t need to care about any of that ¯\_(ツ)_/¯


import arcpy  
from arcpy import env  
from arcpy import mapping  

# list layers  
mxd = mapping.MapDocument("CURRENT") # will work even if your project is unsaved  
layer_list = mapping.ListLayers(mxd) # lists everything in the ToC window  
sr = arcpy.SpatialReference(4283) # GDA_94  

# coordinates from each layer's bounding box will be placed in these lists  
N_list = []  
E_list = []  
S_list = []  
W_list = []  

for layer in layer_list:  
    extent = arcpy.Describe(layer).extent
    N_list.append(extent.YMax)  
    E_list.append(extent.XMax)  
    S_list.append(extent.YMin)  
    W_list.append(extent.XMin)  

# get the biggest/smallest (...extentiest?) values for each direction  
bound_N = max(N_list)  
bound_E = max(E_list)  
bound_S = min(S_list)  
bound_W = min(W_list)  
# create point objects for new BB  
point_UR = arcpy.Point(bound_E, bound_N)  
point_LR = arcpy.Point(bound_E, bound_S)  
point_LL = arcpy.Point(bound_W, bound_S)  
point_UL = arcpy.Point(bound_W, bound_N)  

# Create the bounding box, it will be saved in your default gdb  
boundPoly = "{}\\extent".format(env.scratchWorkspace)  
array = arcpy.Array()  
array.add(point_UR)  
array.add(point_LR)  
array.add(point_LL)  
array.add(point_UL)  
# ensure the polygon is closed  
array.add(point_UR)  
# Create the polygon object  
polygon = arcpy.Polygon(array, sr)  
array.removeAll()  
# save to disk  
arcpy.CopyFeatures_management(polygon, boundPoly)  
del polygon

Usage instructions for noobs: Open everything you want to box up, click Geoprocessing > Python Console, paste in the above, hit enter, profit.

After I was done I buffered the bounding polygon by 250m or so because I’ll be running some raster analyses that have wacky edge effects. You should probably do something similar.