R package for downloading OpenStreetMap data
at main 186 lines 5.6 kB view raw
1--- 2title: "4. Splitting large queries" 3author: 4 - "Mark Padgham" 5 - "Martin Machyna" 6date: "`r Sys.Date()`" 7bibliography: osmdata-refs.bib 8output: 9 html_document: 10 toc: true 11 toc_float: true 12 number_sections: false 13 theme: flatly 14vignette: > 15 %\VignetteIndexEntry{4. query-split} 16 %\VignetteEngine{knitr::rmarkdown} 17 %\VignetteEncoding{UTF-8} 18--- 19 20## 1. Introduction 21 22The `osmdata` package retrieves data from the [`overpass` 23server](https://overpass-api.de) which is primarily designed to deliver small 24subsets of the full Open Street Map (OSM) data set, determined both by specific 25bounding coordinates and specific OSM key-value pairs. The server has internal 26routines to limit delivery rates on queries for excessively large data sets, 27and may ultimately fail for large queries. This vignette describes one approach 28for breaking overly large queries into a set of smaller queries, and for 29re-combining the resulting data sets into a single `osmdata` object reflecting 30the desired, large query. 31 32 33## 2. Query splitting 34 35Complex or data-heavy queries may exhaust the time or memory limits of the 36`overpass` server. One way to get around this problem is to split the bounding 37box (bbox) of a query into several smaller fragments, and then to re-combine 38the data and remove duplicate objects. This section demonstrates how that may 39be done, starting with a large bounding box. 40 41```{r get-bbox, eval = FALSE} 42library (osmdata) 43 44bb <- getbb ("Southeastern Connecticut COG", featuretype = "boundary") 45bb 46``` 47```{r out1, eval = FALSE} 48#> min max 49#> x -72.46677 -71.79315 50#> y 41.27591 41.75617 51``` 52 53The following lines then divide that bounding box into two smaller areas: 54 55```{r bbox-split, eval = FALSE} 56dx <- (bb ["x", "max"] - bb ["x", "min"]) / 2 57 58bbs <- list (bb, bb) 59 60bbs [[1]] ["x", "max"] <- bb ["x", "max"] - dx 61bbs [[2]] ["x", "min"] <- bb ["x", "min"] + dx 62 63bbs 64``` 65```{r out2, eval = FALSE} 66#> [[1]] 67#> min max 68#> x -72.46677 -72.12996 69#> y 41.27591 41.75617 70#> 71#> [[2]] 72#> min max 73#> x -72.12996 -71.79315 74#> y 41.27591 41.75617 75``` 76 77These two bounding boxes can then be used to submit two separate overpass 78queries: 79 80```{r opq-2x, eval = FALSE} 81res <- list () 82 83res [[1]] <- opq (bbox = bbs [[1]]) |> 84 add_osm_feature (key = "admin_level", value = "8") |> 85 osmdata_sf () 86res [[2]] <- opq (bbox = bbs [[2]]) |> 87 add_osm_feature (key = "admin_level", value = "8") |> 88 osmdata_sf () 89``` 90 91The retrieved `osmdata` objects can then be merged using the`c(...)` function, 92which automatically removes duplicate objects. 93 94```{r opq-merge, eval = FALSE} 95res <- c (res [[1]], res [[2]]) 96``` 97 98 99## 3. Automatic bbox splitting 100 101The previous code demonstrated how to divide a bounding box into two, smaller 102regions. It will generally not be possible to know in advance how small a 103bounding box should be for a query for work, and so we need a more general 104version of that functionality to divide a bounding box into a arbitrary number 105of sub-regions. 106 107We can automate this process by monitoring the exit status of `opq() |> 108osmdata_sf()` and in case of a failed query we can keep recursively splitting 109the current bounding box into increasingly smaller fragments until the overpass 110server returns a result. The following function demonstrates splitting a 111bounding box into a list of four equal-sized bounding boxes in a 2-by-2 grid, 112each box having a specified degree of overlap (`eps=0.05`, or 5%) with the 113neighbouring box. 114 115```{r bbox-auto-split, eval = FALSE} 116split_bbox <- function (bbox, grid = 2, eps = 0.05) { 117 xmin <- bbox ["x", "min"] 118 ymin <- bbox ["y", "min"] 119 dx <- (bbox ["x", "max"] - bbox ["x", "min"]) / grid 120 dy <- (bbox ["y", "max"] - bbox ["y", "min"]) / grid 121 122 bboxl <- list () 123 124 for (i in 1:grid) { 125 for (j in 1:grid) { 126 b <- matrix (c ( 127 xmin + ((i - 1 - eps) * dx), 128 ymin + ((j - 1 - eps) * dy), 129 xmin + ((i + eps) * dx), 130 ymin + ((j + eps) * dy) 131 ), 132 nrow = 2, 133 dimnames = dimnames (bbox) 134 ) 135 136 bboxl <- append (bboxl, list (b)) 137 } 138 } 139 bboxl 140} 141``` 142 143We pre-split our area and create a queue of bounding boxes that we will use for 144submitting queries. 145 146```{r bbox-pre-split, eval = FALSE} 147bb <- getbb ("Connecticut", featuretype = NULL) 148queue <- split_bbox (bb) 149result <- list () 150``` 151 152Now we can create a loop that will monitor the exit status of our query and in 153case of success remove the bounding box from the queue. If our query fails for 154some reason, we split the failed bounding box into four smaller fragments and 155add them to our queue, repeating until all results have been successfully 156delivered. 157 158```{r auto-query, eval = FALSE} 159while (length (queue) > 0) { 160 161 print (queue [[1]]) 162 163 opres <- NULL 164 opres <- try ({ 165 opq (bbox = queue [[1]], timeout = 25) |> 166 add_osm_feature (key = "natural", value = "tree") |> 167 osmdata_sf () 168 }) 169 170 if (class (opres) [1] != "try-error") { 171 result <- append (result, list (opres)) 172 queue <- queue [-1] 173 } else { 174 bboxnew <- split_bbox (queue [[1]]) 175 queue <- append (bboxnew, queue [-1]) 176 } 177} 178``` 179 180All retrieved `osmdata` objects stored in the `result` list can then be 181combined using the `c(...)` operator. Note that for large datasets this process 182can be quite time consuming. 183 184```{r merge-result-list, eval = FALSE} 185final <- do.call (c, result) 186```