pacman::p_load(
"curl",
"dplyr",
"fontawesome",
"htmltools",
"kableExtra", "knitr",
"lidR",
"raster", "reproj",
"sf",
"terra", "tmap",
"useful")Point Clouds
Data Preprocessing and DTM Generation
1.0 Overview
Raw LiDAR data arrives as massive unclassified point clouds (.xyz, .laz format) containing ground, vegetation, buildings, and noise. This chapter documents the preprocessing pipeline: catalog management, ground classification, noise removal, DTM generation, and height normalization. Output is analysis-ready normalized point clouds for tree detection.
The working example in this and subsequent chapters uses the airborne lidar tile 10SGH1587 from the USGS 3DEP Lidar Point Cloud collection flown over the Carr Hirz Delta Fires burn scar in Shasta County, northern California. The data are Quality Level 1 linear-mode lidar, accepted for the 3DEP program with ≤ 9.8 cm absolute vertical accuracy at 95 % CI in open terrain. The tile covers 1 km × 1 km at ~78 pts/m² (78.3 million points) and is downloadable from the USGS Staged Elevation FTP directory. Figures below are built from a 1 ha clip centred on the densest-canopy cell in the tile, which keeps every downstream chunk under a million points.
Note: Ground classification and DTM generation on large catalogs can require full-day processing times. Tiling configuration and indexing quality directly impact performance.
Environment Setup
1.1 Catalog Management
LAScatalog handles datasets too large for memory by partitioning the data into chunks and processing them sequentially or in parallel. The Carr Fire tile is a single 1 km × 1 km LAZ; we index it, read a thinned scan of the whole tile, pick the 25 m cell with the tallest 95th-percentile canopy, and clip a 100 m × 100 m AOI around that centroid. This keeps every chunk below one million points without losing the tree-rich patch.
src <- "../assets/data/USGS_LPC_CA_SierraNevada_B22_10SGH1587.laz"
ctg <- readLAScatalog(src, select = "xyzcr",
filter = "-thin_with_grid 0.5")
opt_chunk_size(ctg) <- 500
opt_chunk_buffer(ctg) <- 10
is.indexed(ctg)
las_check(ctg)
# Coarse scan: 95p Z on a 25 m grid to locate the densest canopy
scan <- readLAS(src, select = "xyz",
filter = "-keep_every_nth 20")
z95_grid <- pixel_metrics(scan, ~quantile(Z, .95, na.rm = TRUE),
res = 25)
pick <- which.max(terra::values(z95_grid))
xy <- terra::xyFromCell(z95_grid, pick)
cx <- xy[1, "x"]; cy <- xy[1, "y"]
las_raw <- clip_rectangle(ctg,
cx - 50, cy - 50, cx + 50, cy + 50
)Key parameters:
- Buffer = 10 m minimises edge effects
- Chunk size = 500 m works for the 1 km Carr tile
-
-thin_with_grid 0.5halves in-memory density for scans; the final clip is read at full density - For multi-tile catalogs add
opt_filter('-drop_class 19')to remove overlapping points



1.2 Ground Classification
In this section we compare the Cloth Simulation Filter (csf()) with the Progressive Morphological Filter (pmf()). The CSF algorithm tends to produce a smoother DTM across variable topography (Zhang et al., 2016). For the Carr tile — steep post-fire slopes with roughly 60 m of relief across the 1 ha AOI — CSF is visibly cleaner and is carried downstream.


1.3 Noise Removal
Once ground points are classified, any remaining noise is identified and removed to ensure data integrity. This process was carried out using the Statistical Outlier Removal (sor) algorithm with a default neighborhood sample of k=10 and a multiplier of m=3.
The workflow considered four potential methods for noise screening:
- Using internal algorithms with the
filter_poifunction and aLASNOISEstring. - Filtering with
opt_filterand the-drop class 19string. - Screening by a Z-axis threshold (e.g.,
Z > 40 & Z < 0). - Screening by the 95th percentile.
Using filter_poi with an internal algorithm was found to be slower in the long run due to loss of spatial indexing after a new catalog object was assigned.
las_csf_so <- classify_noise(las_csf, sor(k = 10, m = 3))
las_csf_sor <- filter_poi(las_csf_so, Classification != LASNOISE)
las_pmf_so <- classify_noise(las_pmf, sor(k = 10, m = 3))
las_pmf_sor <- filter_poi(las_pmf_so, Classification != LASNOISE)

1.4 Digital Terrain Model
Rasterisation is paramount to terrain analysis because it shapes every subsequent height-based estimate. The Inverse Distance Weighting (knnidw) algorithm is used here for its efficiency and robustness (Tu et al., 2020). Parameters:
- Default maximum radius of 50 m
- Neighborhood of
10 - Inverse distance weighting power of
2 - Resolution of
0.5 m(matches the 78 pts/m² density)
dtm_csf <- rasterize_terrain(las_csf_sor, res = 0.5,
knnidw(k = 10, p = 2, rmax = 50))
dtm_pmf <- rasterize_terrain(las_pmf_sor, res = 0.5,
knnidw(k = 10, p = 2, rmax = 50))
# Hillshade visualization
slope <- terra::terrain(dtm_csf, "slope", unit = "radians")
aspect <- terra::terrain(dtm_csf, "aspect", unit = "radians")
hs <- terra::shade(slope, aspect, angle = 40, direction = 270)
terra::plot(hs, col = grey(0:100/100), legend = FALSE)

Result: CSF produces a visibly smoother DTM across the Carr relief gradient.
Compare the Carr tile DTM with the classic lidR Topography sample for reference — the mesh render below comes from lidR::extdata/Topography.laz, a frequently reused teaching tile.

1.5 Height Normalization
Transform Z values from absolute elevation to height above ground. Essential input for all canopy analyses.


Validation: the ground-point histogram should centre near zero. Deviations indicate DTM issues or algorithm failure in complex terrain.
The lidR::extdata/MixedConifer.laz sample provides a dramatic illustration of what a normalised canopy surface looks like once pit-free triangulation is applied — the kernel-density “net” below shows individual crowns rising above the ground plane.

Runtime Notes
- 311 tiles processed in ~24 hours (high-performance system) on the author’s original BC demonstration; the Carr 1 ha AOI completes in under three minutes on a laptop.
- Bottlenecks: ground classification (60%), DTM interpolation (30%), I/O overhead (10%).
- Process chunks sequentially if RAM < 32 GB
- Increase chunk buffer if edge artifacts visible
- Use
opt_independent_files()for parallel tasks
Common bugs:
- Double-counting or insufficient buffering producing edge artifacts in DTM
- Poor indexing leads to 10x slowdown in catalog operations
Runtime Log
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────
## setting value
## version R version 4.4.1 (2024-06-14)
## os macOS 15.7.5
## system aarch64, darwin20
## ui X11
## language (EN)
## collate en_CA.UTF-8
## ctype en_CA.UTF-8
## tz America/Vancouver
## date 2026-04-23
## pandoc 3.9.0.2 @ /opt/local/bin/ (via rmarkdown)
## quarto 1.7.33 @ /usr/local/bin/quarto
##
## ─ Packages ───────────────────────────────────────────────────────
## ! package * version date (UTC) lib source
## abind 1.4-8 2024-09-12 [1] CRAN (R 4.4.1)
## base64enc 0.1-6 2026-02-02 [1] CRAN (R 4.4.3)
## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.1)
## P class 7.3-22 2023-05-03 [?] CRAN (R 4.4.1)
## classInt 0.4-11 2025-01-08 [1] CRAN (R 4.4.1)
## cli 3.6.5 2025-04-23 [1] CRAN (R 4.4.1)
## P codetools 0.2-20 2024-03-31 [?] CRAN (R 4.4.1)
## colorspace 2.1-2 2025-09-22 [1] CRAN (R 4.4.1)
## cols4all 0.10 2025-10-27 [1] CRAN (R 4.4.1)
## crosstalk 1.2.2 2025-08-26 [1] CRAN (R 4.4.1)
## curl * 7.0.0 2025-08-19 [1] CRAN (R 4.4.1)
## data.table 1.18.2.1 2026-01-27 [1] CRAN (R 4.4.3)
## DBI 1.3.0 2026-02-25 [1] CRAN (R 4.4.3)
## devtools 2.5.0 2026-03-14 [1] CRAN (R 4.4.3)
## digest 0.6.39 2025-11-19 [1] CRAN (R 4.4.3)
## dplyr * 1.2.0 2026-02-03 [1] CRAN (R 4.4.3)
## e1071 1.7-17 2025-12-18 [1] CRAN (R 4.4.3)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.1)
## evaluate 1.0.5 2025-08-27 [1] CRAN (R 4.4.1)
## farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.1)
## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.1)
## fontawesome * 0.5.3 2024-11-16 [1] CRAN (R 4.4.1)
## fs 1.6.7 2026-03-06 [1] CRAN (R 4.4.3)
## generics 0.1.4 2025-05-09 [1] CRAN (R 4.4.1)
## ggplot2 * 4.0.2 2026-02-03 [1] CRAN (R 4.4.3)
## glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
## gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.1)
## htmltools * 0.5.9 2025-12-04 [1] CRAN (R 4.4.3)
## htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
## jsonlite 2.0.0 2025-03-27 [1] CRAN (R 4.4.1)
## kableExtra * 1.4.0 2024-01-24 [1] CRAN (R 4.4.0)
## P KernSmooth 2.23-24 2024-05-17 [?] CRAN (R 4.4.1)
## knitr * 1.51 2025-12-20 [1] CRAN (R 4.4.3)
## P lattice 0.22-6 2024-03-20 [?] CRAN (R 4.4.1)
## leafem 0.2.5 2025-08-28 [1] CRAN (R 4.4.1)
## leaflegend 1.2.1 2024-05-09 [1] CRAN (R 4.4.0)
## leaflet 2.2.3 2025-09-04 [1] CRAN (R 4.4.1)
## leafsync 0.1.0 2019-03-05 [1] CRAN (R 4.4.0)
## lidR * 4.2.3 2026-01-08 [1] CRAN (R 4.4.3)
## lifecycle 1.0.5 2026-01-08 [1] CRAN (R 4.4.3)
## logger 0.4.1 2025-09-11 [1] CRAN (R 4.4.1)
## lwgeom 0.2-15 2026-01-12 [1] CRAN (R 4.4.3)
## magrittr 2.0.4 2025-09-12 [1] CRAN (R 4.4.1)
## maptiles 0.11.0 2025-12-12 [1] CRAN (R 4.4.3)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
## otel 0.2.0 2025-08-29 [1] CRAN (R 4.4.1)
## P pacman 0.5.1 2019-03-11 [?] CRAN (R 4.4.0)
## pillar 1.11.1 2025-09-17 [1] CRAN (R 4.4.1)
## pkgbuild 1.4.8 2025-05-26 [1] CRAN (R 4.4.1)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.1)
## pkgload 1.5.0 2026-02-03 [1] CRAN (R 4.4.3)
## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.1)
## png 0.1-9 2026-03-15 [1] CRAN (R 4.4.3)
## proj4 1.0-15 2025-03-21 [1] CRAN (R 4.4.1)
## proxy 0.4-29 2025-12-29 [1] CRAN (R 4.4.3)
## purrr 1.2.1 2026-01-09 [1] CRAN (R 4.4.3)
## R6 2.6.1 2025-02-15 [1] CRAN (R 4.4.1)
## raster * 3.6-32 2025-03-28 [1] CRAN (R 4.4.1)
## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.4.1)
## Rcpp 1.1.1 2026-01-10 [1] CRAN (R 4.4.3)
## renv 1.1.5 2025-07-24 [1] CRAN (R 4.4.1)
## reproj * 0.7.0 2024-06-11 [1] CRAN (R 4.4.0)
## rlang 1.1.7 2026-01-09 [1] CRAN (R 4.4.3)
## rmarkdown 2.30 2025-09-28 [1] CRAN (R 4.4.1)
## rstudioapi 0.18.0 2026-01-16 [1] CRAN (R 4.4.3)
## s2 1.1.9 2025-05-23 [1] CRAN (R 4.4.1)
## S7 0.2.1 2025-11-14 [1] CRAN (R 4.4.3)
## scales 1.4.0 2025-04-24 [1] CRAN (R 4.4.1)
## sessioninfo 1.2.3 2025-02-05 [1] CRAN (R 4.4.1)
## sf * 1.1-0 2026-02-24 [1] CRAN (R 4.4.3)
## sp * 2.2-1 2026-02-13 [1] CRAN (R 4.4.3)
## spacesXYZ 1.6-0 2025-06-06 [1] CRAN (R 4.4.1)
## stars 0.7-1 2026-02-13 [1] CRAN (R 4.4.3)
## stringi 1.8.7 2025-03-27 [1] CRAN (R 4.4.1)
## stringr 1.6.0 2025-11-04 [1] CRAN (R 4.4.1)
## svglite 2.2.2 2025-10-21 [1] CRAN (R 4.4.1)
## systemfonts 1.3.2 2026-03-05 [1] CRAN (R 4.4.3)
## terra * 1.9-1 2026-03-08 [1] CRAN (R 4.4.3)
## textshaping 1.0.5 2026-03-06 [1] CRAN (R 4.4.3)
## tibble 3.3.1 2026-01-11 [1] CRAN (R 4.4.3)
## tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
## tmap * 4.2 2025-09-10 [1] CRAN (R 4.4.1)
## tmaptools 3.3 2025-07-24 [1] CRAN (R 4.4.1)
## units 1.0-1 2026-03-11 [1] CRAN (R 4.4.3)
## useful * 1.2.7 2026-02-26 [1] CRAN (R 4.4.3)
## usethis 3.2.1 2025-09-06 [1] CRAN (R 4.4.1)
## vctrs 0.7.2 2026-03-21 [1] CRAN (R 4.4.3)
## viridisLite 0.4.3 2026-02-04 [1] CRAN (R 4.4.3)
## withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
## wk 0.9.5 2025-12-18 [1] CRAN (R 4.4.3)
## xfun 0.57 2026-03-20 [1] CRAN (R 4.4.3)
## XML 3.99-0.23 2026-03-20 [1] CRAN (R 4.4.3)
## xml2 1.5.2 2026-01-17 [1] CRAN (R 4.4.3)
## yaml 2.3.12 2025-12-10 [1] CRAN (R 4.4.3)
##
## [1] /Users/seamus/repos/lidar-forestry/renv/library/macos/R-4.4/aarch64-apple-darwin20
## [2] /Users/seamus/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/93405863
##
## * ── Packages attached to the search path.
## P ── Loaded and on-disk path mismatch.
##
## ──────────────────────────────────────────────────────────────────