vignettes/query-split.Rmd at main · mpadge.tngl.sh/osmdata

mpadge.tngl.sh / osmdata
fork atom
R package for downloading OpenStreetMap data
fork atom
osmdata / vignettes / query-split.Rmd
at main 186 lines 5.6 kB view raw
wrap content
mpadge.tngl.sh add eps overlap param to query-split vignette 3y ago
4eca55bb
  1---
  2title: "4. Splitting large queries"
  3author: 
  4  - "Mark Padgham"
  5  - "Martin Machyna"
  6date: "`r Sys.Date()`"
  7bibliography: osmdata-refs.bib
  8output: 
  9    html_document:
 10        toc: true
 11        toc_float: true
 12        number_sections: false
 13        theme: flatly
 14vignette: >
 15  %\VignetteIndexEntry{4. query-split}
 16  %\VignetteEngine{knitr::rmarkdown}
 17  %\VignetteEncoding{UTF-8}
 18---
 19
 20## 1. Introduction
 21
 22The `osmdata` package retrieves data from the [`overpass`
 23server](https://overpass-api.de) which is primarily designed to deliver small
 24subsets of the full Open Street Map (OSM) data set, determined both by specific
 25bounding coordinates and specific OSM key-value pairs. The server has internal
 26routines to limit delivery rates on queries for excessively large data sets,
 27and may ultimately fail for large queries. This vignette describes one approach
 28for breaking overly large queries into a set of smaller queries, and for
 29re-combining the resulting data sets into a single `osmdata` object reflecting
 30the desired, large query.
 31
 32
 33## 2. Query splitting
 34
 35Complex or data-heavy queries may exhaust the time or memory limits of the
 36`overpass` server. One way to get around this problem is to split the bounding
 37box (bbox) of a query into several smaller fragments, and then to re-combine
 38the data and remove duplicate objects. This section demonstrates how that may
 39be done, starting with a large bounding box.
 40
 41```{r get-bbox, eval = FALSE}
 42library (osmdata)
 43
 44bb <- getbb ("Southeastern Connecticut COG", featuretype = "boundary")
 45bb
 46```
 47```{r out1, eval = FALSE}
 48#>         min       max
 49#> x -72.46677 -71.79315
 50#> y  41.27591  41.75617
 51```
 52
 53The following lines then divide that bounding box into two smaller areas:
 54
 55```{r bbox-split, eval = FALSE}
 56dx <- (bb ["x", "max"] - bb ["x", "min"]) / 2
 57
 58bbs <- list (bb, bb)
 59
 60bbs [[1]] ["x", "max"] <- bb ["x", "max"] - dx
 61bbs [[2]] ["x", "min"] <- bb ["x", "min"] + dx
 62
 63bbs
 64```
 65```{r out2, eval = FALSE}
 66#> [[1]]
 67#>         min       max
 68#> x -72.46677 -72.12996
 69#> y  41.27591  41.75617
 70#>
 71#> [[2]]
 72#>         min       max
 73#> x -72.12996 -71.79315
 74#> y  41.27591  41.75617
 75```
 76
 77These two bounding boxes can then be used to submit two separate overpass
 78queries:
 79
 80```{r opq-2x, eval = FALSE}
 81res <- list ()
 82
 83res [[1]] <- opq (bbox = bbs [[1]]) |>
 84    add_osm_feature (key = "admin_level", value = "8") |>
 85    osmdata_sf ()
 86res [[2]] <- opq (bbox = bbs [[2]]) |>
 87    add_osm_feature (key = "admin_level", value = "8") |>
 88    osmdata_sf ()
 89```
 90
 91The retrieved `osmdata` objects can then be merged using the`c(...)` function,
 92which automatically removes duplicate objects.
 93
 94```{r opq-merge, eval = FALSE}
 95res <- c (res [[1]], res [[2]])
 96```
 97
 98
 99## 3. Automatic bbox splitting
100
101The previous code demonstrated how to divide a bounding box into two, smaller
102regions. It will generally not be possible to know in advance how small a
103bounding box should be for a query for work, and so we need a more general
104version of that functionality to divide a bounding box into a arbitrary number
105of sub-regions.
106
107We can automate this process by monitoring the exit status of `opq() |>
108osmdata_sf()` and in case of a failed query we can keep recursively splitting
109the current bounding box into increasingly smaller fragments until the overpass
110server returns a result. The following function demonstrates splitting a
111bounding box into a list of four equal-sized bounding boxes in a 2-by-2 grid,
112each box having a specified degree of overlap (`eps=0.05`, or 5%) with the
113neighbouring box.
114
115```{r bbox-auto-split, eval = FALSE}
116split_bbox <- function (bbox, grid = 2, eps = 0.05) {
117    xmin <- bbox ["x", "min"]
118    ymin <- bbox ["y", "min"]
119    dx <- (bbox ["x", "max"] - bbox ["x", "min"]) / grid
120    dy <- (bbox ["y", "max"] - bbox ["y", "min"]) / grid
121
122    bboxl <- list ()
123
124    for (i in 1:grid) {
125        for (j in 1:grid) {
126            b <- matrix (c (
127                xmin + ((i - 1 - eps) * dx),
128                ymin + ((j - 1 - eps) * dy),
129                xmin + ((i + eps) * dx),
130                ymin + ((j + eps) * dy)
131            ),
132            nrow = 2,
133            dimnames = dimnames (bbox)
134            )
135
136            bboxl <- append (bboxl, list (b))
137        }
138    }
139    bboxl
140}
141```
142
143We pre-split our area and create a queue of bounding boxes that we will use for 
144submitting queries.
145
146```{r bbox-pre-split, eval = FALSE}
147bb <- getbb ("Connecticut", featuretype = NULL)
148queue <- split_bbox (bb)
149result <- list ()
150```
151
152Now we can create a loop that will monitor the exit status of our query and in 
153case of success remove the bounding box from the queue. If our query fails for 
154some reason, we split the failed bounding box into four smaller fragments and
155add them to our queue, repeating until all results have been successfully
156delivered.
157
158```{r auto-query, eval = FALSE}
159while (length (queue) > 0) {
160
161    print (queue [[1]])
162
163    opres <- NULL
164    opres <- try ({
165        opq (bbox = queue [[1]], timeout = 25) |>
166            add_osm_feature (key = "natural", value = "tree") |>
167            osmdata_sf ()
168    })
169
170    if (class (opres) [1] != "try-error") {
171        result <- append (result, list (opres))
172        queue <- queue [-1]
173    } else {
174        bboxnew <- split_bbox (queue [[1]])
175        queue <- append (bboxnew, queue [-1])
176    }
177}
178```
179
180All retrieved `osmdata` objects stored in the `result` list can then be
181combined using the `c(...)` operator. Note that for large datasets this process
182can be quite time consuming.
183
184```{r merge-result-list, eval = FALSE}
185final <- do.call (c, result)
186```