IN43A-1721 – Scalable algorithms for unsupervised classification and anomaly detection in large geospatiotemporal data sets (Invited)

Authors

Richard Mills
Intel Corporation
Forrest Hoffman (forrest at climatemodeling dot org)
Oak Ridge National Laboratory
Jitendra Kumar
Oak Ridge National Laboratory

Session

Big Data in the Geosciences: New Analytics Methods and Parallel Algorithms Posters
Thursday, December 17, 2015 13:40–18:00
Moscone South Poster Hall

Abstract

The increasing availability of high-resolution geospatiotemporal datasets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery and mining of ecological data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe some unsupervised knowledge discovery and anomaly detection approaches based on highly scalable parallel algorithms for k-means clustering and singular value decomposition, consider a few practical applications thereof to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.


Forrest M. Hoffman (forrest at climatemodeling dot org)