Point Clouds

Data Preprocessing and DTM Generation

Author

Seamus Murphy

Published

December 5, 2021

1.0 Overview

Raw LiDAR data arrives as massive unclassified point clouds (.xyz, .laz format) containing ground, vegetation, buildings, and noise. This chapter documents the preprocessing pipeline: catalog management, ground classification, noise removal, DTM generation, and height normalization. Output is analysis-ready normalized point clouds for tree detection.

The working example in this and subsequent chapters uses the airborne lidar tile 10SGH1587 from the USGS 3DEP Lidar Point Cloud collection flown over the Carr Hirz Delta Fires burn scar in Shasta County, northern California. The data are Quality Level 1 linear-mode lidar, accepted for the 3DEP program with ≤ 9.8 cm absolute vertical accuracy at 95 % CI in open terrain. The tile covers 1 km × 1 km at ~78 pts/m² (78.3 million points) and is downloadable from the USGS Staged Elevation FTP directory. Figures below are built from a 1 ha clip centred on the densest-canopy cell in the tile, which keeps every downstream chunk under a million points.

Note: Ground classification and DTM generation on large catalogs can require full-day processing times. Tiling configuration and indexing quality directly impact performance.

Environment Setup

pacman::p_load(
  "curl",
  "dplyr",
  "fontawesome",
  "htmltools",
  "kableExtra", "knitr",
  "lidR",
  "raster", "reproj",
  "sf",
  "terra", "tmap",
  "useful")

1.1 Catalog Management

LAScatalog handles datasets too large for memory by partitioning the data into chunks and processing them sequentially or in parallel. The Carr Fire tile is a single 1 km × 1 km LAZ; we index it, read a thinned scan of the whole tile, pick the 25 m cell with the tallest 95th-percentile canopy, and clip a 100 m × 100 m AOI around that centroid. This keeps every chunk below one million points without losing the tree-rich patch.

src <- "../assets/data/USGS_LPC_CA_SierraNevada_B22_10SGH1587.laz"

ctg <- readLAScatalog(src, select = "xyzcr",
                     filter = "-thin_with_grid 0.5")
opt_chunk_size(ctg)   <- 500
opt_chunk_buffer(ctg) <- 10
is.indexed(ctg)
las_check(ctg)

# Coarse scan: 95p Z on a 25 m grid to locate the densest canopy
scan    <- readLAS(src, select = "xyz",
                   filter = "-keep_every_nth 20")
z95_grid <- pixel_metrics(scan, ~quantile(Z, .95, na.rm = TRUE),
                          res = 25)
pick <- which.max(terra::values(z95_grid))
xy   <- terra::xyFromCell(z95_grid, pick)
cx   <- xy[1, "x"]; cy <- xy[1, "y"]

las_raw <- clip_rectangle(ctg,
  cx - 50, cy - 50, cx + 50, cy + 50
)

Key parameters:

  • Buffer = 10 m minimises edge effects
  • Chunk size = 500 m works for the 1 km Carr tile
  • -thin_with_grid 0.5 halves in-memory density for scans; the final clip is read at full density
  • For multi-tile catalogs add opt_filter('-drop_class 19') to remove overlapping points

Raw Z histogramCatalog chunks


1.2 Ground Classification

In this section we compare the Cloth Simulation Filter (csf()) with the Progressive Morphological Filter (pmf()). The CSF algorithm tends to produce a smoother DTM across variable topography (Zhang et al., 2016). For the Carr tile — steep post-fire slopes with roughly 60 m of relief across the 1 ha AOI — CSF is visibly cleaner and is carried downstream.

library(RCSF)

# Apply CSF
las_csf <- classify_ground(las_raw, csf(sloop_smooth = TRUE, 0.5, 1))

# Apply PMF for comparison
las_pmf <- classify_ground(
  las_raw,
  pmf(ws = c(3, 6, 9), th = c(0.5, 1, 1.5))
)

CSF groundPMF ground


1.3 Noise Removal

Once ground points are classified, any remaining noise is identified and removed to ensure data integrity. This process was carried out using the Statistical Outlier Removal (sor) algorithm with a default neighborhood sample of k=10 and a multiplier of m=3.

The workflow considered four potential methods for noise screening:

  1. Using internal algorithms with the filter_poi function and a LASNOISE string.
  2. Filtering with opt_filter and the -drop class 19 string.
  3. Screening by a Z-axis threshold (e.g., Z > 40 & Z < 0).
  4. Screening by the 95th percentile.

Using filter_poi with an internal algorithm was found to be slower in the long run due to loss of spatial indexing after a new catalog object was assigned.

las_csf_so  <- classify_noise(las_csf, sor(k = 10, m = 3))
las_csf_sor <- filter_poi(las_csf_so, Classification != LASNOISE)

las_pmf_so  <- classify_noise(las_pmf, sor(k = 10, m = 3))
las_pmf_sor <- filter_poi(las_pmf_so, Classification != LASNOISE)

Noise removal CSFNoise removal PMF


1.4 Digital Terrain Model

Rasterisation is paramount to terrain analysis because it shapes every subsequent height-based estimate. The Inverse Distance Weighting (knnidw) algorithm is used here for its efficiency and robustness (Tu et al., 2020). Parameters:

  • Default maximum radius of 50 m
  • Neighborhood of 10
  • Inverse distance weighting power of 2
  • Resolution of 0.5 m (matches the 78 pts/m² density)
dtm_csf <- rasterize_terrain(las_csf_sor, res = 0.5,
                             knnidw(k = 10, p = 2, rmax = 50))
dtm_pmf <- rasterize_terrain(las_pmf_sor, res = 0.5,
                             knnidw(k = 10, p = 2, rmax = 50))

# Hillshade visualization
slope   <- terra::terrain(dtm_csf, "slope",  unit = "radians")
aspect  <- terra::terrain(dtm_csf, "aspect", unit = "radians")
hs      <- terra::shade(slope, aspect, angle = 40, direction = 270)
terra::plot(hs, col = grey(0:100/100), legend = FALSE)

DTM CSFDTM PMF

Result: CSF produces a visibly smoother DTM across the Carr relief gradient.

Compare the Carr tile DTM with the classic lidR Topography sample for reference — the mesh render below comes from lidR::extdata/Topography.laz, a frequently reused teaching tile.

lidR Topography sample — 3D DTM mesh

1.5 Height Normalization

Transform Z values from absolute elevation to height above ground. Essential input for all canopy analyses.

las_norm <- normalize_height(las_csf_sor, knnidw())

# Validate normalization
hist(filter_ground(las_norm)$Z,
     breaks = seq(-0.6, 0.6, 0.01),
     main = "Ground Point Heights",
     xlab = "Elevation (m)")

lidR::writeLAS(las_norm, "../data/las_1ha.laz")

Normalized point cloudGround height distribution

Validation: the ground-point histogram should centre near zero. Deviations indicate DTM issues or algorithm failure in complex terrain.

The lidR::extdata/MixedConifer.laz sample provides a dramatic illustration of what a normalised canopy surface looks like once pit-free triangulation is applied — the kernel-density “net” below shows individual crowns rising above the ground plane.

MixedConifer CHM net (lidR sample)

Runtime Notes

  • 311 tiles processed in ~24 hours (high-performance system) on the author’s original BC demonstration; the Carr 1 ha AOI completes in under three minutes on a laptop.
  • Bottlenecks: ground classification (60%), DTM interpolation (30%), I/O overhead (10%).
  • Process chunks sequentially if RAM < 32 GB
  • Increase chunk buffer if edge artifacts visible
  • Use opt_independent_files() for parallel tasks

Common bugs:

  • Double-counting or insufficient buffering producing edge artifacts in DTM
  • Poor indexing leads to 10x slowdown in catalog operations

Runtime Log

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.1 (2024-06-14)
##  os       macOS 15.7.5
##  system   aarch64, darwin20
##  ui       X11
##  language (EN)
##  collate  en_CA.UTF-8
##  ctype    en_CA.UTF-8
##  tz       America/Vancouver
##  date     2026-04-23
##  pandoc   3.9.0.2 @ /opt/local/bin/ (via rmarkdown)
##  quarto   1.7.33 @ /usr/local/bin/quarto
## 
## ─ Packages ───────────────────────────────────────────────────────
##  ! package      * version   date (UTC) lib source
##    abind          1.4-8     2024-09-12 [1] CRAN (R 4.4.1)
##    base64enc      0.1-6     2026-02-02 [1] CRAN (R 4.4.3)
##    cachem         1.1.0     2024-05-16 [1] CRAN (R 4.4.1)
##  P class          7.3-22    2023-05-03 [?] CRAN (R 4.4.1)
##    classInt       0.4-11    2025-01-08 [1] CRAN (R 4.4.1)
##    cli            3.6.5     2025-04-23 [1] CRAN (R 4.4.1)
##  P codetools      0.2-20    2024-03-31 [?] CRAN (R 4.4.1)
##    colorspace     2.1-2     2025-09-22 [1] CRAN (R 4.4.1)
##    cols4all       0.10      2025-10-27 [1] CRAN (R 4.4.1)
##    crosstalk      1.2.2     2025-08-26 [1] CRAN (R 4.4.1)
##    curl         * 7.0.0     2025-08-19 [1] CRAN (R 4.4.1)
##    data.table     1.18.2.1  2026-01-27 [1] CRAN (R 4.4.3)
##    DBI            1.3.0     2026-02-25 [1] CRAN (R 4.4.3)
##    devtools       2.5.0     2026-03-14 [1] CRAN (R 4.4.3)
##    digest         0.6.39    2025-11-19 [1] CRAN (R 4.4.3)
##    dplyr        * 1.2.0     2026-02-03 [1] CRAN (R 4.4.3)
##    e1071          1.7-17    2025-12-18 [1] CRAN (R 4.4.3)
##    ellipsis       0.3.2     2021-04-29 [1] CRAN (R 4.4.1)
##    evaluate       1.0.5     2025-08-27 [1] CRAN (R 4.4.1)
##    farver         2.1.2     2024-05-13 [1] CRAN (R 4.4.1)
##    fastmap        1.2.0     2024-05-15 [1] CRAN (R 4.4.1)
##    fontawesome  * 0.5.3     2024-11-16 [1] CRAN (R 4.4.1)
##    fs             1.6.7     2026-03-06 [1] CRAN (R 4.4.3)
##    generics       0.1.4     2025-05-09 [1] CRAN (R 4.4.1)
##    ggplot2      * 4.0.2     2026-02-03 [1] CRAN (R 4.4.3)
##    glue           1.8.0     2024-09-30 [1] CRAN (R 4.4.1)
##    gtable         0.3.6     2024-10-25 [1] CRAN (R 4.4.1)
##    htmltools    * 0.5.9     2025-12-04 [1] CRAN (R 4.4.3)
##    htmlwidgets    1.6.4     2023-12-06 [1] CRAN (R 4.4.0)
##    jsonlite       2.0.0     2025-03-27 [1] CRAN (R 4.4.1)
##    kableExtra   * 1.4.0     2024-01-24 [1] CRAN (R 4.4.0)
##  P KernSmooth     2.23-24   2024-05-17 [?] CRAN (R 4.4.1)
##    knitr        * 1.51      2025-12-20 [1] CRAN (R 4.4.3)
##  P lattice        0.22-6    2024-03-20 [?] CRAN (R 4.4.1)
##    leafem         0.2.5     2025-08-28 [1] CRAN (R 4.4.1)
##    leaflegend     1.2.1     2024-05-09 [1] CRAN (R 4.4.0)
##    leaflet        2.2.3     2025-09-04 [1] CRAN (R 4.4.1)
##    leafsync       0.1.0     2019-03-05 [1] CRAN (R 4.4.0)
##    lidR         * 4.2.3     2026-01-08 [1] CRAN (R 4.4.3)
##    lifecycle      1.0.5     2026-01-08 [1] CRAN (R 4.4.3)
##    logger         0.4.1     2025-09-11 [1] CRAN (R 4.4.1)
##    lwgeom         0.2-15    2026-01-12 [1] CRAN (R 4.4.3)
##    magrittr       2.0.4     2025-09-12 [1] CRAN (R 4.4.1)
##    maptiles       0.11.0    2025-12-12 [1] CRAN (R 4.4.3)
##    memoise        2.0.1     2021-11-26 [1] CRAN (R 4.4.0)
##    otel           0.2.0     2025-08-29 [1] CRAN (R 4.4.1)
##  P pacman         0.5.1     2019-03-11 [?] CRAN (R 4.4.0)
##    pillar         1.11.1    2025-09-17 [1] CRAN (R 4.4.1)
##    pkgbuild       1.4.8     2025-05-26 [1] CRAN (R 4.4.1)
##    pkgconfig      2.0.3     2019-09-22 [1] CRAN (R 4.4.1)
##    pkgload        1.5.0     2026-02-03 [1] CRAN (R 4.4.3)
##    plyr           1.8.9     2023-10-02 [1] CRAN (R 4.4.1)
##    png            0.1-9     2026-03-15 [1] CRAN (R 4.4.3)
##    proj4          1.0-15    2025-03-21 [1] CRAN (R 4.4.1)
##    proxy          0.4-29    2025-12-29 [1] CRAN (R 4.4.3)
##    purrr          1.2.1     2026-01-09 [1] CRAN (R 4.4.3)
##    R6             2.6.1     2025-02-15 [1] CRAN (R 4.4.1)
##    raster       * 3.6-32    2025-03-28 [1] CRAN (R 4.4.1)
##    RColorBrewer   1.1-3     2022-04-03 [1] CRAN (R 4.4.1)
##    Rcpp           1.1.1     2026-01-10 [1] CRAN (R 4.4.3)
##    renv           1.1.5     2025-07-24 [1] CRAN (R 4.4.1)
##    reproj       * 0.7.0     2024-06-11 [1] CRAN (R 4.4.0)
##    rlang          1.1.7     2026-01-09 [1] CRAN (R 4.4.3)
##    rmarkdown      2.30      2025-09-28 [1] CRAN (R 4.4.1)
##    rstudioapi     0.18.0    2026-01-16 [1] CRAN (R 4.4.3)
##    s2             1.1.9     2025-05-23 [1] CRAN (R 4.4.1)
##    S7             0.2.1     2025-11-14 [1] CRAN (R 4.4.3)
##    scales         1.4.0     2025-04-24 [1] CRAN (R 4.4.1)
##    sessioninfo    1.2.3     2025-02-05 [1] CRAN (R 4.4.1)
##    sf           * 1.1-0     2026-02-24 [1] CRAN (R 4.4.3)
##    sp           * 2.2-1     2026-02-13 [1] CRAN (R 4.4.3)
##    spacesXYZ      1.6-0     2025-06-06 [1] CRAN (R 4.4.1)
##    stars          0.7-1     2026-02-13 [1] CRAN (R 4.4.3)
##    stringi        1.8.7     2025-03-27 [1] CRAN (R 4.4.1)
##    stringr        1.6.0     2025-11-04 [1] CRAN (R 4.4.1)
##    svglite        2.2.2     2025-10-21 [1] CRAN (R 4.4.1)
##    systemfonts    1.3.2     2026-03-05 [1] CRAN (R 4.4.3)
##    terra        * 1.9-1     2026-03-08 [1] CRAN (R 4.4.3)
##    textshaping    1.0.5     2026-03-06 [1] CRAN (R 4.4.3)
##    tibble         3.3.1     2026-01-11 [1] CRAN (R 4.4.3)
##    tidyselect     1.2.1     2024-03-11 [1] CRAN (R 4.4.0)
##    tmap         * 4.2       2025-09-10 [1] CRAN (R 4.4.1)
##    tmaptools      3.3       2025-07-24 [1] CRAN (R 4.4.1)
##    units          1.0-1     2026-03-11 [1] CRAN (R 4.4.3)
##    useful       * 1.2.7     2026-02-26 [1] CRAN (R 4.4.3)
##    usethis        3.2.1     2025-09-06 [1] CRAN (R 4.4.1)
##    vctrs          0.7.2     2026-03-21 [1] CRAN (R 4.4.3)
##    viridisLite    0.4.3     2026-02-04 [1] CRAN (R 4.4.3)
##    withr          3.0.2     2024-10-28 [1] CRAN (R 4.4.1)
##    wk             0.9.5     2025-12-18 [1] CRAN (R 4.4.3)
##    xfun           0.57      2026-03-20 [1] CRAN (R 4.4.3)
##    XML            3.99-0.23 2026-03-20 [1] CRAN (R 4.4.3)
##    xml2           1.5.2     2026-01-17 [1] CRAN (R 4.4.3)
##    yaml           2.3.12    2025-12-10 [1] CRAN (R 4.4.3)
## 
##  [1] /Users/seamus/repos/lidar-forestry/renv/library/macos/R-4.4/aarch64-apple-darwin20
##  [2] /Users/seamus/Library/Caches/org.R-project.R/R/renv/sandbox/macos/R-4.4/aarch64-apple-darwin20/93405863
## 
##  * ── Packages attached to the search path.
##  P ── Loaded and on-disk path mismatch.
## 
## ──────────────────────────────────────────────────────────────────