Anirudh, R., Lohit, S., and Turaga, P. (2021). “Generative Patch Priors for Practical Compressive Image Recovery.” 2021 IEEE Winter Conference on Applications of Computer Vision. [abstract]
In this paper, we propose the generative patch prior (GPP) that defines a generative prior for compressive image recovery, based on patch-manifold models. Unlike learned, image-level priors that are restricted to the range space of a pre-trained generator, GPP can recover a wide variety of natural images using a pre-trained patch generator. Additionally, GPP retains the benefits of generative priors like high reconstruction quality at extremely low sensing rates, while also being much more generally applicable. We show that GPP outperforms several unsupervised and supervised techniques on three different sensing model—linear compressive sensing with known, and unknown calibration settings, and the non-linear phase retrieval problem. Finally, we propose an alternating optimization strategy using GPP for joint calibration-and-reconstruction which performs favorably against several baselines on a real world, un-calibrated compressive sensing dataset.
Shanthamallu, U. S., Thiagarajan, J. J., and Spanias, A. (2021). “Uncertainty-Matching Graph Neural Networks to Defend Against Poisoning Attacks.” 35th AAAI Conference on Artificial Intelligence. [abstract]
Graph Neural Networks (GNNs), a generalization of neural networks to graph-structured data, are often implemented using message passes between entities of a graph. While GNNs are effective for node classification, link prediction and graph classification, they are vulnerable to adversarial attacks, i.e., a small perturbation to the structure can lead to a non-trivial performance degradation. In this work, we propose Uncertainty Matching GNN (UM-GNN), that is aimed at improving the robustness of GNN models, particularly against poisoning attacks to the graph structure, by leveraging epistemic uncertainties from the message passing framework. More specifically, we propose to build a surrogate predictor that does not directly access the graph structure, but systematically extracts reliable knowledge from a standard GNN through a novel uncertainty-matching strategy. Interestingly, this uncoupling makes UM-GNN immune to evasion attacks by design, and achieves significantly improved robustness against poisoning attacks. Using empirical studies with standard benchmarks and a suite of global and target attacks, we demonstrate the effectiveness of UM-GNN, when compared to existing baselines including the state-of-the-art robust GCN.
Gokhale, T., Anirudh, R., Kailkhura, B., Thiagarajan, J. J., Baral, C., and Yang, Y. (2021). “Attribute-Guided Adversarial Training for Robustness to Natural Perturbations.” 35th AAAI Conference on Artificial Intelligence. [abstract]
While existing work in robust deep learning has focused on small pixel-level `p norm-based perturbations, this may not account for perturbations encountered in several real world settings. In many such cases although test data might not be available, broad specifications about the types of perturbations (such as an unknown degree of rotation) may be known. We consider a setup where robustness is expected over an unseen test domain that is not i.i.d. but deviates from the training domain. While this deviation may not be exactly known, its broad characterization is specified a priori, in terms of attributes. We propose an adversarial training approach which learns to generate new samples so as to maximize exposure of the classifier to the attributes-space, without having access to the data from the test domain. Our adversarial training solves a min-max optimization problem, with the inner maximization generating adversarial perturbations, and the outer minimization finding model parameters by optimizing the loss on adversarial perturbations generated from the inner maximization. We demonstrate the applicability of our approach on three types of naturally occurring perturbations—object-related shifts, geometric transformations, and common image corruptions. Our approach enables deep neural networks to be robust against a wide range of naturally occurring perturbations. We demonstrate the usefulness of the proposed approach by showing the robustness gains of deep neural networks trained using our adversarial training on MNIST, CIFAR-10, and a new variant of the CLEVR dataset.
Thiagarajan, J. J., Narayanaswamy, V., Anirudh, R., Bremer, P.-T., Spanias, A. (2021). “Accurate and Robust Feature Importance Estimation under Distribution Shifts.” 35th AAAI Conference on Artificial Intelligence. [abstract]
With increasing reliance on the outcomes of black-box models in critical applications, post-hoc explainability tools that do not require access to the model internals are often used to enable humans understand and trust these models. In particular, we focus on the class of methods that can reveal the influence of input features on the predicted outputs. Despite their wide-spread adoption, existing methods are known to suffer from one or more of the following challenges: computational complexities, large uncertainties and most importantly, inability to handle real-world domain shifts. In this paper, we propose PRoFILE, a novel feature importance estimation method that addresses all these challenges. Through the use of a loss estimator jointly trained with the predictive model and a causal objective, PRoFILE can accurately estimate the feature importance scores even under complex distribution shifts, without any additional re-training. To this end, we also develop learning strategies for training the loss estimator, namely contrastive and dropout calibration, and find that it can effectively detect distribution shifts. Using empirical studies on several benchmark image and nonimage data, we show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
Thiagarajan, J. J., Bremer, P.-T., Anirudh, R., Germann, T. C., Del Valle, S. Y., Streitz, F. H. (2020). “Machine Learning-Powered Mitigation Policy Optimization in Epidemiological Models.” Preprint. [abstract]
A crucial aspect of managing a public health crisis is to effectively balance prevention and mitigation strategies, while taking their socio-economic impact into account. In particular, determining the influence of different non-pharmaceutical interventions (NPIs) on the effective use of public resources is an important problem, given the uncertainties on when a vaccine will be made available. In this paper, we propose a new approach for obtaining optimal policy recommendations based on epidemiological models, which can characterize the disease progression under different interventions, and a look-ahead reward optimization strategy to choose the suitable NPI at different stages of an epidemic. Given the time delay inherent in any epidemiological model and the exponential nature especially of an unmanaged epidemic, we find that such a look-ahead strategy infers non-trivial policies that adhere well to the constraints specified. Using two different epidemiological models, namely SEIR and EpiCast, we evaluate the proposed algorithm to determine the optimal NPI policy, under a constraint on the number of daily new cases and the primary reward being the absence of restrictions.
Anirudh, R., Thiagarajan, J. J., Bremer, P.-T., Germann, T. C., Del Valle, S. Y., Streitz, F. H. (2020). “Accurate Calibration of Agent-Based Epidemiological Models with Neural Network Surrogates.” Preprint. [abstract]
Calibrating complex epidemiological models to observed data is a crucial step to provide both insights into the current disease dynamics, i.e., by estimating a reproductive number, as well as to provide reliable forecasts and scenario explorations. Here we present a new approach to calibrate an agent-based model—EpiCast—using a large set of simulation ensembles for different major metropolitan areas of the United States. In particular, we propose: a new neural network based surrogate model able to simultaneously emulate all different locations; and a novel posterior estimation that provides not only more accurate posterior estimates of all parameters but enables the joint fitting of global parameters across regions.
Tipnis, U., Abbas, K., Tran, E., Amico, E., Shen, L., Kaplan, A. D., Goñi, J. (2020). “Functional Connectome Fingerprint Gradients in Young Adults.” Preprint. [abstract]
The assessment of brain fingerprints has emerged in the recent years as an important tool to study individual differences and to infer quality of neuroimaging datasets. Studies so far have mainly focused on connectivity fingerprints between different brain scans of the same individual. Here, we extend the concept of brain connectivity fingerprints beyond test/retest and assess fingerprint gradients in young adults by developing an extension of the differential identifiability framework. To do so, we look at the similarity between not only the multiple scans of an individual (subject fingerprint), but also between the scans of monozygotic and dizygotic twins (twin fingerprint). We have carried out this analysis on the 8 fMRI conditions present in the Human Connectome Project–Young Adult dataset, which we processed into functional connectomes (FCs) and timeseries parcellated according to the Schaefer Atlas scheme, which has multiple levels of resolution. Our differential identifiability results show that the fingerprint gradients based on genetic and environmental similarities are indeed present when comparing FCs for all parcellations and fMRI conditions. Importantly, only when assessing optimally reconstructed FCs, we fully uncover fingerprints present in higher resolution atlases. We also study the effect of scanning length on subject fingerprint of resting-state FCs to analyze the effect of scanning length and parcellation. In the pursuit of open science, we have also made available the processed and parcellated FCs and timeseries for all conditions for ~1200 subjects part of the HCP-YA dataset to the scientific community.
Pallotta, G., and Santer, B.(2020). “Multi-Frequency Analysis of Simulated Versus Observed Variability in Tropospheric Temperature.” Journal of Climate 33(23), pp. 10383–10402. [abstract]
Studies seeking to identify a human-caused global warming signal generally rely on climate model estimates of the “noise” of intrinsic natural variability. Assessing the reliability of these noise estimates is of critical importance. We evaluate here the statistical significance of differences between climate model and observational natural variability spectra for global-mean mid- to upper-tropospheric temperature (TMT). We use TMT information from satellites and large multimodel ensembles of forced and unforced simulations. Our main goal is to explore the sensitivity of model-versus-data spectral comparisons to a wide range of subjective decisions. These include the choice of satellite and climate model TMT datasets, the method for separating signal and noise, the frequency range considered, and the statistical model used to represent observed natural variability. Of particular interest is the amplitude of the interdecadal noise against which an anthropogenic tropospheric warming signal must be detected. We find that on time scales of 5–20 years, observed TMT variability is (on average) overestimated by the last two generations of climate models participating in the Coupled Model Intercomparison Project. This result is relatively insensitive to different plausible analyst choices, enhancing confidence in previous claims of detectable anthropogenic warming of the troposphere and indicating that these claims may be conservative. A further key finding is that two commonly used statistical models of short-term and long-term memory have deficiencies in their ability to capture the complex shape of observed TMT spectra.
Thiagarajan, J. J., Venkatesh, B., Anirudh, R., Bremer, P.-T., Gaffney, J., Anderson, G., and Spears, B. (2020). “Designing Accurate Emulators for Scientific Processes Using Calibration-Driven Deep Models.” Nat Commun 11. [abstract]
Predictive models that accurately emulate complex scientific processes can achieve speed-ups over numerical simulators or experiments and at the same time provide surrogates for improving the subsequent analysis. Consequently, there is a recent surge in utilizing modern machine learning methods to build data-driven emulators. In this work, we study an often overlooked, yet important, problem of choosing loss functions while designing such emulators. Popular choices such as the mean squared error or the mean absolute error are based on a symmetric noise assumption and can be unsuitable for heterogeneous data or asymmetric noise distributions. We propose Learn-by-Calibrating, a novel deep learning approach based on interval calibration for designing emulators that can effectively recover the inherent noise structure without any explicit priors. Using a large suite of use-cases, we demonstrate the efficacy of our approach in providing high-quality emulators, when compared to widely-adopted loss function choices, even in small-data regimes.
Liu, S., Anirudh, R., Thiagarajan, J. J., and Bremer, P.-T. (2020). “Uncovering Interpretable Relationships in High-Dimensional Scientific Data Through Function Preserving Projections.” Mach. Learn.: Sci. Technol. [abstract]
In many fields of science and engineering, we frequently encounter experiments or simulations datasets that describe the behavior of complex systems and uncovering human interpretable patterns between their inputs and outputs via exploratory data analysis is essential for building intuition and facilitating discovery. Often, we resort to 2D embeddings for examining these high-dimensional relationships (e.g. dimensionality reduction). However, most existing embedding methods treat the dimensions as coordinates for samples in a high-dimensional space, which fail to capture the potential functional relationships, and the few methods that do take function into consideration either only focus on linear patterns or produce non-linear embeddings that are hard to interpret. To address these challenges, we proposed function preserving projections (FPP), which construct 2D linear embeddings optimized to reveal interpretable yet potentially non-linear patterns between the domain and the range of a high-dimensional function. The intuition here is that humans are good at understanding potentially non-linear patterns in 2D but unable to interpret non-linear mapping from high-dimensional space to 2D. Therefore, we should restrict the projection to linear but not the pattern we are seeking. Using FPP on real-world datasets, one can obtain fundamentally new insights about high-dimensional relationships in extremely large datasets that could not be processed with existing dimension reduction methods.
Soper, B. C., Nygård, M., Abdulla, G., Meng, R., Nygård, J. F. (2020). “A Hidden Markov Model for Population‐Level Cervical Cancer Screening Data.” Statistics in Medicine 39(25). [abstract]
The Cancer Registry of Norway has been administrating a national cervical cancer screening program since 1992 by coordinating triennial cytology exam screenings for the female population between 25 and 69 years of age. Up to 80% of cancers are prevented through mass screening, but this comes at the expense of considerable screening activity and leads to overtreatment of clinically asymptomatic precancers. In this article, we present a continuous‐time, time‐inhomogeneous hidden Markov model which was developed to understand the screening process and cervical cancer carcinogenesis in detail. By leveraging 1.7 million individual’s multivariate time‐series of medical exams performed over a 25‐year period, we simultaneously estimate all model parameters. We show that an age‐dependent model reflects the Norwegian screening program by comparing empirical survival curves from observed registry data and data simulated from the proposed model. The model can be generalized to include more detailed individual‐level covariates as well as new types of screening exams. By utilizing individual screening histories and covariate data, the proposed model shows potential for improving strategies for cancer screening programs by personalizing recommended screening intervals.
Thiagarajan, J. J., Rajan, D., Katoch, S., and Spanias, A. (2020).“DD xNet: A Deep Learning Model for Automatic Interpretation of Electronic Health Records, Electrocardiograms and Electroencephalograms.” Scientific Reports 10. [abstract]
Effective patient care mandates rapid, yet accurate, diagnosis. With the abundance of non-invasive diagnostic measurements and electronic health records (EHR), manual interpretation for differential diagnosis has become time-consuming and challenging. This has led to wide-spread adoption of AI-powered tools, in pursuit of improving accuracy and efficiency of this process. While the unique challenges presented by each modality and clinical task demand customized tools, the cumbersome process of making problem-specific choices has triggered the critical need for a generic solution to enable rapid development of models in practice. In this spirit, we develop DDxNet, a deep architecture for time-varying clinical data, which we demonstrate to be well-suited for diagnostic tasks involving different modalities (ECG/EEG/EHR), required level of characterization (abnormality detection/phenotyping) and data fidelity (single-lead ECG/22-channel EEG). Using multiple benchmark problems, we show that DDxNet produces high-fidelity predictive models, and sometimes even provides significant performance gains over problem-specific solutions.
Kim, H., Han, J., and Han, T. Y.-J. (2020). “Machine Vision-Driven Automatic Recognition of Particle Size and Morphology in SEM Images.” Nanoscale 12, pp. 19461–19469. [abstract]
Scanning Electron Microscopy (SEM) images provide a variety of structural and morphological information of nanomaterials. In the material informatics domain, automatic recognition and quantitative analysis of SEM images in a high-throughput manner are critical, but challenges still remain due to the complexity and the diversity of image configurations in both shape and size. In this paper, we present a generally applicable approach using computer vision and machine learning techniques to quantitatively extract particle size, size distribution and morphology information in SEM images. The proposed pipeline offers automatic, high-throughput measurements even when overlapping nanoparticles, rod shapes, and core–shell nanostructures are present. We demonstrate effectiveness of the proposed approach by performing experiments on SEM images of nanoscale materials and structures with different shapes and sizes. The proposed approach shows promising results (Spearman coefficients of 0.91 and 0.99 using fully automated and semi-automated processes, respectively) when compared with manually measured sizes. The code is made available as open source software at https://github.com/LLNL/LIST.
Lee, X. Y., Sahab, S. K., Sarkar, S. and Giera, B. (2020). “Two Photon Lithography Additive Manufacturing: Video Dataset of Parameter Sweep of Light Dosages, Photo-Curable Resins, and Structures.” Data in Brief 32. [abstract]
This document describes the collection and organization of a dataset that consists of raw videos and extracted sub-images from video frames of a promising additive manufacturing technique called Two-Photon Lithography (TPL). Four unprocessed videos were collected, with each video capturing the printing process of different combinations of 3D parts on different photoresists at varying light dosages. These videos were further trimmed to obtain short clips that are organized by experimental parameters. Additionally, this dataset also contains a python script to reproduce an organized directory of cropped video frames extracted from the trimmed videos. These cropped frames focus on a region of interest around the parts being printed. We envision that the raw videos and cropped frames provided in this dataset will be used to train various computer vision and machine learning algorithms for applications such as object segmentation and localization applications. The cropped video frames were manually labeled by an expert to determine the quality of the printed part and for printing parameter optimization as presented in “Automated Detection of Part Quality during Two-Photon Lithography via Deep Learning.”
Narayanaswamy, V., Thiagarajan, J. J., Anirudh, R., and Spanias, A. (2020). “Unsupervised Audio Source Separation using Generative Priors.” Preprint. [abstract]
State-of-the-art under-determined audio source separation systems rely on supervised end-end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are severely challenged in terms of requiring access to expensive source level labeled data and being specific to a given set of sources and the mixing process, which demands complete re-training when those assumptions change. This strongly emphasizes the need for unsupervised methods that can leverage the recent advances in data-driven modeling, and compensate for the lack of labeled data through meaningful priors. To this end, we propose a novel approach for audio source separation based on generative priors trained on individual sources. Through the use of projected gradient descent optimization, our approach simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources. Though the generative priors can be defined in the time domain directly, e.g., WaveGAN, we find that using spectral domain loss functions for our optimization leads to good-quality source estimates. Our empirical studies on standard spoken digit and instrument datasets clearly demonstrate the effectiveness of our approach over classical as well as state-of-the-art unsupervised baselines.
Feiger, B., Gounley, J., Adler, D., et al. “Accelerating Massively Parallel Hemodynamic Models of Coarctation of the Aorta using Neural Networks.” Scientific Reports 10. [abstract]
Comorbidities such as anemia or hypertension and physiological factors related to exertion can influence a patient’s hemodynamics and increase the severity of many cardiovascular diseases. Observing and quantifying associations between these factors and hemodynamics can be difficult due to the multitude of co-existing conditions and blood flow parameters in real patient data. Machine learning-driven, physics-based simulations provide a means to understand how potentially correlated conditions may affect a particular patient. Here, we use a combination of machine learning and massively parallel computing to predict the effects of physiological factors on hemodynamics in patients with coarctation of the aorta. We first validated blood flow simulations against in vitro measurements in 3D-printed phantoms representing the patient’s vasculature. We then investigated the effects of varying the degree of stenosis, blood flow rate, and viscosity on two diagnostic metrics—pressure gradient across the stenosis (ΔP) and wall shear stress (WSS)—by performing the largest simulation study to date of coarctation of the aorta (over 70 million compute hours). Using machine learning models trained on data from the simulations and validated on two independent datasets, we developed a framework to identify the minimal training set required to build a predictive model on a per-patient basis. We then used this model to accurately predict ΔP (mean absolute error within 1.18 mmHg) and WSS (mean absolute error within 0.99 Pa) for patients with this disease.
Cadena, J., Sales, A. P., Lam, D., et al. (2020). “Modeling the Temporal Network Dynamics of Neuronal Cultures.” PLOS Computational Biology. [abstract]
Neurons form complex networks that evolve over multiple time scales. In order to thoroughly characterize these networks, time dependencies must be explicitly modeled. Here, we present a statistical model that captures both the underlying structural and temporal dynamics of neuronal networks. Our model combines the class of Stochastic Block Models for community formation with Gaussian processes to model changes in the community structure as a smooth function of time. We validate our model on synthetic data and demonstrate its utility on three different studies using in vitro cultures of dissociated neurons.
Anirudh, R., Thiagarajan, J. J., Bremer, P.-T., and Spears, B. K. (2020). “Improved Surrogates in Inertial Confinement Fusion with Manifold and Cycle Consistencies.” PNAS 117(18), pp. 9741–9746. [abstract]
Neural networks have become the method of choice in surrogate modeling because of their ability to characterize arbitrary, high-dimensional functions in a data-driven fashion. This paper advocates for the training of surrogates that are 1) consistent with the physical manifold, resulting in physically meaningful predictions, and 2) cyclically consistent with a jointly trained inverse model; i.e., backmapping predictions through the inverse results in the original input parameters. We find that these two consistencies lead to surrogates that are superior in terms of predictive performance, are more resilient to sampling artifacts, and tend to be more data efficient. Using inertial confinement fusion (ICF) as a test-bed problem, we model a one-dimensional semianalytic numerical simulator and demonstrate the effectiveness of our approach.
Maitia, A. Venkat, A., Kosiba, G. D., et al. (2020). “Topological Analysis of X-Ray CT Data for the Recognition and Trending of Subtle Changes in Microstructure under Material Aging.” Computational Materials Science 182. [abstract]
X-ray computed tomography (CT) is an established non-destructive tool for 3D imaging of multiphasic composites. Numerous applications of X-ray CT in medical diagnosis and materials characterization have been reported, many involving field-specific innovations in the imaging technology itself. Yet, quantitative summarization to link image features to properties of interest has been rare. We address this issue by employing state-of-the-art technics in scalar field topology for the summarization of X-ray CT images of an example biphasic system. By varying processing-parameters we create different microstructures, evolve them through accelerated thermal aging, CT-image them pre- and post-aged, and demonstrate the ability of our image summarization method to systematically track process- and age-related changes, which can often be very subtle. A novel aspect of the algorithm involves recognition over multiple resolution levels, which provides deeper insight into the pattern relationship between grain-like features and their neighbors. The method is general, adaptable to diverse image reconstruction methods and materials systems, and particularly useful in applications where practical constraints on the sample-size limits the reliable use of more complex models, e.g., convolutional neural networks.
Kim, Y., Choi, Y., Widemann, D., and Zohdi, T. (2020). “A Fast and Accurate Physics-Informed Neural Network Reduced Order Model with Shallow Masked Autoencoder.” JCP 2020. [abstract]
Traditional linear subspace reduced order models (LS-ROMs) are able to accelerate physical simulations, in which the intrinsic solution space falls into a subspace with a small dimension, i.e., the solution space has a small Kolmogorov n-width. However, for physical phenomena not of this type, e.g., any advection-dominated flow phenomena, such as in traffic flow, atmospheric flows, and air flow over vehicles, a low-dimensional linear subspace poorly approximates the solution. To address cases such as these, we have developed a fast and accurate physics-informed neural network ROM, namely nonlinear manifold ROM (NM-ROM), which can better approximate high-fidelity model solutions with a smaller latent space dimension than the LS-ROMs. Our method takes advantage of the existing numerical methods that are used to solve the corresponding full order models. The efficiency is achieved by developing a hyper-reduction technique in the context of the NM-ROM. Numerical results show that neural networks can learn a more efficient latent space representation on advection-dominated data from 1D and 2D Burgers’ equations. A speedup of up to 2.6 for 1D Burgers’ and a speedup of 11.7 for 2D Burgers’ equations are achieved with an appropriate treatment of the nonlinear terms through a hyper-reduction technique. Finally, a posteriori error bounds for the NM-ROMs are derived that take account of the hyper-reduced operators.
Choi, Y., Brown, P., Arrighi, W., Anderson, R., and Huynh, K. (2020). “Space–Time Reduced Order Model for Large-Scale Linear Dynamical Systems with Application to Boltzmann Transport Problems.” JCP 2020. [abstract]
A classical reduced order model for dynamical problems involves spatial reduction of the problem size. However, temporal reduction accompanied by the spatial reduction can further reduce the problem size without losing much accuracy, which results in a considerably more speed-up than the spatial reduction only. Recently, a novel space–time reduced order model for dynamical problems has been developed by Choi and Carlberg [SIAM J. Sci. Comput., 41 (2019), pp. A26--A58], where the space–time reduced order model shows an order of a hundred speed-up with a relative error of for small academic problems. However, in order for the method to be applicable to a large-scale problem, an efficient space–time reduced basis construction algorithm needs to be developed. We present the incremental space–time reduced basis construction algorithm. The incremental algorithm is fully parallel and scalable. Additionally, the block structure in the space–time reduced basis is exploited, which enables the avoidance of constructing the reduced space–time basis. These novel techniques are applied to a large-scale particle transport simulation with million and billion degrees of freedom. The numerical example shows that the algorithm is scalable and practical. Also, it achieves a tremendous speed-up, maintaining a good accuracy. Finally, error bounds for space-only and space–time reduced order models are derived.
Hoang, C., Choi, Y., and Carlberg, K. (2020). “Domain-Decomposition Least-Squares Petrov-Galerkin (DD-LSPG) Nonlinear Model Reduction.” CMAME 2020. [abstract]
While reduced-order models (ROMs) have demonstrated success in many applications across computational science, challenges remain when applied both to extreme-scale models due to the prohibitive cost of generating requisite training data, and to decomposable systems due to many-query problems often requiring repeated reconfigurations of system components. Therefore, we propose the domain-decomposition least-squares Petrov-Galerkin (DD-LSPG) model-reduction method applicable to parameterized systems of nonlinear algebraic equations. In contrast with previous works, we adopt an algebraically non-overlapping decomposition strategy rather than a spatial-decomposition strategy, which facilitates application to different spatial-discretization schemes. Rather than constructing a low-dimensional subspace for the entire state space in a monolithic fashion---which would be infeasible for extreme-scale systems and decomposable models—the methodology constructs separate subspaces for the different subdomains/components characterizing the original model. In the offline stage, the method constructs low-dimensional bases for the interior and interface of components. In the online stage, the approach constructs an LSPG ROM for each component and enforces strong or weak compatibility on the ‘ports’ connecting them. We propose four different ways to construct reduced bases on the interface/ports of subdomains and several ways to enforce compatibility across connecting ports. We derive a posteriori and a priori error bounds for the DD-LSPG solutions. Numerical results performed on nonlinear benchmark problems in heat transfer and fluid dynamics demonstrate that the proposed method performs well in terms of both accuracy and computational cost, with different choices of basis and compatibility constraints yielding different performance profiles.
Choi, Y., Coombs, D., and Anderson, R. (2020). “SNS: A Solution-Based Nonlinear Subspace Method for Time-Dependent Model Order Reduction.” SISC 2020. [abstract]
Several reduced order models have been successfully developed for nonlinear dynamical systems. To achieve a considerable speed-up, a hyper-reduction step is needed to reduce the computational complexity due to nonlinear terms. Many hyper-reduction techniques require the construction of nonlinear term basis, which introduces a computationally expensive offline phase. A novel way of constructing nonlinear term basis within the hyper-reduction process is introduced. In contrast to the traditional hyper-reduction techniques where the collection of nonlinear term snapshots is required, the SNS method avoids collecting the nonlinear term snapshots. Instead, it uses the solution snapshots that are used for building a solution basis, which enables avoiding an extra data compression of nonlinear term snapshots. As a result, the SNS method provides a more efficient offline strategy than the traditional model order reduction techniques, such as the DEIM, GNAT, and ST-GNAT methods. The SNS method is theoretically justified by the conforming subspace condition and the subspace inclusion relation. It is useful for model order reduction of large-scale nonlinear dynamical problems to reduce the offline cost. It is especially useful for ST-GNAT that has shown promising results, such as a good accuracy with a considerable online speed-up for hyperbolic problems in a recent paper by Choi and Carlberg [SIAM J. Sci. Comput., 41 (2019), pp. A26–A58], because ST-GNAT involves an expensive offline cost related to collecting nonlinear term snapshots. Error analysis for the SNS method is presented. Numerical results support that the accuracy of the solution from the SNS method is comparable to the traditional methods and a considerable speed-up (i.e., a factor of two to a hundred) is achieved in the offline phase.
Choi, Y., Boncoraglio, G., Anderson, S., Amsallem, D., and Farhat, C. (2020). “Gradient-Based Constrained Optimization Using a Database of Linear Reduced-Order Models.” JCP 2020. [abstract]
A methodology grounded in model reduction is presented for accelerating the gradient-based solution of a family of linear or nonlinear constrained optimization problems where the constraints include at least one linear Partial Differential Equation (PDE). A key component of this methodology is the construction, during an offline phase, of a database of pointwise, linear, Projection-based Reduced-Order Models (PROM)s associated with a design parameter space and the linear PDE(s). A parameter sampling procedure based on an appropriate saturation assumption is proposed to maximize the efficiency of such a database of PROMs. A real-time method is also presented for interpolating at any queried but unsampled parameter vector in the design parameter space the relevant sensitivities of a PROM. The practical feasibility, computational advantages, and performance of the proposed methodology are demonstrated for several realistic, nonlinear, aerodynamic shape optimization problems governed by linear aeroelastic constraints.
Zhang, J., Kailkhura, B., and Han, T. Y-J. (2020). “Mix-n-Match: Ensemble and Compositional Methods for Uncertainty Calibration in Deep Learning.” ICML 2020. [abstract]
This paper studies the problem of post-hoc calibration of machine learning classifiers. We introduce the following desiderata for uncertainty calibration: (a) accuracy-preserving, (b) data-efficient, and (c) high expressive power. We show that none of the existing methods satisfy all three requirements, and demonstrate how Mix-n-Match calibration strategies (i.e., ensemble and composition) can help achieve remarkably better data efficiency and expressive power while provably preserving classification accuracy of the original classifier. We also show that existing calibration error estimators (e.g., histogram-based ECE) are unreliable especially in small-data regime. Therefore, we propose an alternative data-efficient kernel density-based estimator for a reliable evaluation of the calibration performance and prove its asymptotically unbiasedness and consistency.
Shanthamallu, U., Thiagarajan, J. J., and Spanias, A. (2020). “Regularized Attention Mechanism for Graph Attention Networks.” IEEE ICASSP 2020. [abstract]
Machine learning models that can exploit the inherent structure in data have gained prominence. In particular, there is a surge in deep learning solutions for graph-structured data, due to its wide-spread applicability in several fields. Graph attention networks (GAT), a recent addition to the broad class of feature learning models in graphs, utilizes the attention mechanism to efficiently learn continuous vector representations for semi-supervised learning problems. In this paper, we per- form a detailed analysis of GAT models, and present interesting insights into their behavior. In particular, we show that the models are vulnerable to heterogeneous rogue nodes and hence propose novel regularization strategies to improve the robustness of GAT models. Using benchmark datasets, we demonstrate performance improvements on semi-supervised learning, using the proposed robust variant of GAT.
Thiagarajan, J. J., Venkatesh, B., and Rajan, D. (2020). “Learn-by-Calibrating: Using Calibration as a Training Objective.” IEEE ICASSP 2020. [abstract]
With rapid adoption of deep learning in critical applications, the question of when and how much to trust these models often arises, which drives the need to quantify the inherent uncertainties. While identifying all sources that account for the stochasticity of models is challenging, it is common to augment predictions with confidence intervals to convey the expected variations in a model's behavior. We require prediction intervals to be well-calibrated, reflect the true uncertainties, and to be sharp. However, existing techniques for obtaining prediction intervals are known to produce unsatisfactory results in at least one of these criteria. To address this challenge, we develop a novel approach for building calibrated estimators. More specifically, we use separate models for prediction and interval estimation, and pose a bi-level optimization problem that allows the former to leverage estimates from the latter through an uncertainty matching strategy. Using experiments in regression, time-series forecasting, and object localization, we show that our approach achieves significant improvements over existing uncertainty quantification methods, both in terms of model fidelity and calibration error.
Thiagarajan, J. J., Venkatesh, B., Sattigeri, P., Bremer, P-T. (2020). “Building Calibrated Deep Models via Uncertainty Matching with Auxiliary Interval Predictors.” 34th AAAI Conference on Artificial Intelligence 2020. [abstract]
With rapid adoption of deep learning in critical applications, the question of when and how much to trust these models often arises, which drives the need to quantify the inherent uncertainties. While identifying all sources that account for the stochasticity of models is challenging, it is common to augment predictions with confidence intervals to convey the expected variations in a model's behavior. We require prediction intervals to be well-calibrated, reflect the true uncertainties, and to be sharp. However, existing techniques for obtaining prediction intervals are known to produce unsatisfactory results in at least one of these criteria. To address this challenge, we develop a novel approach for building calibrated estimators. More specifically, we use separate models for prediction and interval estimation, and pose a bi-level optimization problem that allows the former to leverage estimates from the latter through an uncertainty matching strategy. Using experiments in regression, time-series forecasting, and object localization, we show that our approach achieves significant improvements over existing uncertainty quantification methods, both in terms of model fidelity and calibration error.
Thiagarajan, J. J., Kashyap, S., and Karagyris, A. (2020). “Distill-to-Label: Weakly Supervised Instance Labeling Using Knowledge Distillation.” IEEE International Conference on Machine Learning and Applications 2019. [abstract]
Weakly supervised instance labeling using only image-level labels, in lieu of expensive fine-grained pixel annotations, is crucial in several applications including medical image analysis. In contrast to conventional instance segmentation scenarios in computer vision, the problems that we consider are characterized by a small number of training images and non-local patterns that lead to the diagnosis. In this paper, we explore the use of multiple instance learning (MIL) to design an instance label generator under this weakly supervised setting. Motivated by the observation that an MIL model can handle bags of varying sizes, we propose to repurpose an MIL model originally trained for bag-level classification to produce reliable predictions for single instances, i.e., bags of size 1. To this end, we introduce a novel regularization strategy based on virtual adversarial training for improving MIL training, and subsequently develop a knowledge distillation technique for repurposing the trained MIL model. Using empirical studies on colon cancer and breast cancer detection from histopathological images, we show that the proposed approach produces high-quality instance-level prediction and significantly outperforms state-of-the MIL methods.
Anirudh, R., Kim, H., Thiagarajan, J. J., Mohan, A. K., and Champley, K. (2020). “Improving Limited Angle CT Reconstruction with a Robust GAN Prior.” Neurips 2019: Solving Inverse Problems with Deep Learning Workshop. [abstract]
Limited angle CT reconstruction is an under-determined linear inverse problem that requires appropriate regularization techniques to be solved. In this work we study how pre-trained generative adversarial networks (GANs) can be used to clean noisy, highly artifact laden reconstructions from conventional techniques, by effectively projecting onto the inferred image manifold. In particular, we use a robust version of the popularly used GAN prior for inverse problems, based on a recent technique called corruption mimicking, that significantly improves the reconstruction quality. The proposed approach operates in the image space directly, as a result of which it does not need to be trained or require access to the measurement model, is scanner agnostic, and can work over a wide range of sensing scenarios.
Anirudh, R., Thiagarajan, J. J., Liu, S., Bremer, P-T., and Spears, B. (2020). “Exploring Generative Physics Models with Scientific Priors in Inertial Confinement Fusion.” Neurips 2019: Machine Learning and the Physical Sciences Workshop. [abstract]
There is significant interest in using modern neural networks for scientific applications due to their effectiveness in modeling highly complex, non-linear problems in a data-driven fashion. However, a common challenge is to verify the scientific plausibility or validity of outputs predicted by a neural network. This work advocates the use of known scientific constraints as a lens into evaluating, exploring, and understanding such predictions for the problem of inertial confinement fusion.
Narayanaswamy, V., Thiagarajan, J. J., Anirudh, R., Forouzanfar, F., Bremer, P-T., and Wu, X-H. (2020). “Designing Deep Inverse Models for History Matching in Reservoir Simulations.” Neurips 2019: Machine Learning and the Physical Sciences Workshop. [abstract]
In a wide-range of applications in science and engineering, one often faces the need to learn complex mappings between independent parameters and dependent/measured quantities, i.e. the forward and inverse mappings. Building reliable inverse maps characterizing the conditional posteriors is challenging since in practice the mappings are seldom bijective. Moreover, it is challenging to incorporate scientific priors into the learning process. In this paper, we argue that enforcing self-consistency between forward and inverse models is an effective regularizer for learning predictive models in scientific applications. In particular, we develop two different strategies to enforce self-consistency, namely cyclical and coupled training methods. Using data from a reservoir model simulator, we apply the proposed approaches for history matching, which is the process of identifying the distribution of parameters that best explains the observed data. Our results show that self-consistency is highly beneficial, and both training strategies produce well calibrated inverse models.
Shanthamallu, U., Li, Q., Thiagarajan, J. J., Anirudh, R., Kaplan, A., and Bremer, P-T. (2020). “Modeling Human Brain Connectomes using Structured Neural Networks.” Neurips 2019: Graph Representation Learning Workshop. [abstract]
Generalizations of neural network architectures to arbitrarily structured data, e.g. graphs, have opened new opportunities for applying data-driven learning to novel scientific domains, in particular brain network analysis. While classical approaches have relied on hand-engineering statistical descriptors from structural or functional connectomes of human brains to build predictive models, there is growing interest in leveraging deep learning techniques. Though the human connectome is often viewed as a graph defined with each node indicating to a brain region, and the edges representing neural connections, we argue that existing graph neural network solutions, that are built on the assumption of information diffusion, are not directly applicable. Consequently, we develop a structured network architecture that uses the connectome to constrain the message passing between two network layers representing edges and nodes, respectively. Using connectomes from the Human Connectome Project (HCP), we show that proposed approach can effectively predict meta-information such as age and gender, and accurately recover the volumes of different brain regions, which are known to be encoded in the connectomes.
Song, H., and Thiagarajan, J. J. (2020). “Improved Deep Embeddings for Inferencing with Multi-Layered Graphs.” Deep Graph Learning: Methodologies and Applications, IEEE Big Data 2019. [abstract]
Inferencing with network data necessitates the mapping of its nodes into a vector space, where the relationships are preserved. However, with multi-layered networks, where multiple types of relationships exist for the same set of nodes, it is crucial to exploit the information shared between layers, in addition to the distinct aspects of each layer. In this paper, we propose a novel approach that first obtains node embeddings in all layers jointly via DeepWalk on a supra graph, which allows interactions between layers, and then fine-tunes the embeddings to encourage cohesive structure in the latent space. With empirical studies in node classification, link prediction and multi-layered community detection, we show that the proposed approach outperforms existing single- and multi-layered network embedding algorithms on several benchmarks. In addition to effectively scaling to a large number of layers (tested up to 37), our approach consistently produces highly modular community structure, even when compared to methods that directly optimize for the modularity function.
Patki, T., Thiagarajan, J. J., Ayala, A., and Islam, T. (2020). “Performance Optimality or Reproducibility: That Is the Question.” Supercomputing 2019. [abstract]
The era of extremely heterogeneous supercomputing brings with itself the devil of increased performance variation and reduced reproducibility. There is a lack of understanding in the HPC community on how the simultaneous consideration of network traffic, power limits, concurrency tuning, and interference from other jobs impacts application performance.
Shanthamallu, U., Thiagarajan, J. J., Song, H., and Spanias, A. (2020). “GrAMME: Semi-Supervised Learning using Multi-Layered Graph Attention Models.” IEEE Transactions on Neural Networks and Learning Systems. [abstract]
Modern data analysis pipelines are becoming increasingly complex due to the presence of multi-view information sources. While graphs are effective in modeling complex relationships, in many scenarios, a single graph is rarely sufficient to succinctly represent all interactions, and hence, multilayered graphs have become popular. Though this leads to richer representations, extending solutions from the single-graph case is not straightforward. Consequently, there is a strong need for novel solutions to solve classical problems, such as node classification, in the multilayered case. In this article, we consider the problem of semi-supervised learning with multilayered graphs. Though deep network embeddings, e.g., DeepWalk, are widely adopted for community discovery, we argue that feature learning with random node attributes, using graph neural networks, can be more effective. To this end, we propose to use attention models for effective feature learning and develop two novel architectures, GrAMME-SG and GrAMME-Fusion, that exploit the interlayer dependences for building multilayered graph embeddings. Using empirical studies on several benchmark data sets, we evaluate the proposed approaches and demonstrate significant performance improvements in comparison with the state-of-the-art network embedding strategies. The results also show that using simple random features is an effective choice, even in cases where explicit node attributes are not available.
Kamath, C., and Fan, Y. J. (2020). “Compressing Unstructured Mesh Data Using Spline Fits, Compressed Sensing, and Regression Methods.” 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018 - Proceedings 8646678, pp. 316-320. [abstract]
Compressing unstructured mesh data from computer simulations poses several challenges that are not encountered in the compression of images or videos. Since the spatial locations of the points are not on a regular grid, as in an image, it is difficult to identify near neighbors of a point whose values can be exploited for compression. In this paper, we investigate how three very different methods—spline fits, compressed sensing, and kernel regression—compare in terms of the reconstruction accuracy and reduction in data size when applied to a practical problem from a plasma physics simulation.
Druzgalski, C., Lapointe, S., Whitesides, R., and McNenly, M. (2020). “Predicting Octane Number from Microscale Flame Dynamics.” Combustion and Flame. [abstract]
Microflow reactors have shown promise as a potential alternative to testing fuel samples in a Cooperative Fuel Research (CFR) engine. Collecting large quantities of experimental data from a microflow reactor for many different fuels would be prohibitively expensive. Therefore we use numerical simulations to predict combustion ignition characteristics in a microflow reactor. A neural network is used to demonstrate that the simulated ignition data can provide valuable data for accurate experimentally obtained octane number predictions. This paper describes the methodology of creating and using simulated physics data in a neural network to predict experimentally obtained measurements. This work is of interest to researchers who work on: machine learning with simulated and experimental data, biofuel development, computational combustion, and chemical mechanisms.
Alom, Z., Taha, T., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, S., Hasan, M., Van Essen, B., Awwal, A., and Asari, V. (2019). “A State-of-the-Art Survey on Deep Learning Theory and Architectures.” Electronics. [abstract]
In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the-art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others. This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network (DNN). The survey goes on to cover Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). Additionally, we have discussed recent developments, such as advanced variant DL techniques based on these DL approaches. This work considers most of the papers published after 2012 from when the history of deep learning began. Furthermore, DL approaches that have been explored and evaluated in different application domains are also included in this survey. We also included recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys that have been published on DL using neural networks and a survey on Reinforcement Learning (RL). However, those papers have not discussed individual advanced techniques for training large-scale deep learning models and the recently developed method of generative models.
Anirudh, R. and Thiagarajan, J.J. (2019). “Bootstrapping Graph Convolutional Neural Networks for Autism Spectrum Disorder Classification.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. [abstract]
Using predictive models to identify patterns that can act as biomarkers for different neuropathological conditions is becoming highly prevalent. In this paper, we consider the problem of Autism Spectrum Disorder (ASD) classification where previous work has shown that it can be beneficial to incorporate a wide variety of meta features, such as socio-cultural traits, into predictive modeling. A graph-based approach naturally suits these scenarios, where a contextual graph captures traits that characterize a population, while the specific brain activity patterns are utilized as a multivariate signal at the nodes. Graph neural networks have shown improvements in inferencing with graph-structured data. Though the underlying graph strongly dictates the overall performance, there exists no systematic way of choosing an appropriate graph in practice, thus making predictive models non-robust. To address this, we propose a bootstrapped version of graph convolutional neural networks (G-CNNs) that utilizes an ensemble of weakly trained G-CNNs, and reduce the sensitivity of models on the choice of graph construction. We demonstrate its effectiveness on the challenging Autism Brain Imaging Data Exchange (ABIDE) dataset and show that our approach improves upon recently proposed graph-based neural networks. We also show that our method remains more robust to noisy graphs.
Chu, A., Nguyen, D., Talathi, S.S., (…), Stolaroff, J.K., and Giera, B. (2019). “Automated Detection and Sorting of Microencapsulation: via Machine Learning.” Lab on a Chip. [abstract]
Microfluidic-based microencapsulation requires significant oversight to prevent material and quality loss due to sporadic disruptions in fluid flow that routinely arise. State-of-the-art microcapsule production is laborious and relies on experts to monitor the process, e.g. through a microscope. Unnoticed defects diminish the quality of collected material and/or may cause irreversible clogging. To address these issues, we developed an automated monitoring and sorting system that operates on consumer-grade hardware in real-time. Using human-labeled microscope images acquired during typical operation, we train a convolutional neural network that assesses microencapsulation. Based on output from the machine learning algorithm, an integrated valving system collects desirable microcapsules or diverts waste material accordingly. Although the system notifies operators to make necessary adjustments to restore microencapsulation, we can extend the system to automate corrections. Since microfluidic-based production platforms customarily collect image and sensor data, machine learning can help to scale up and improve microfluidic techniques beyond microencapsulation.
Cong, G., Domeniconi, G., Shapiro, J., Zhou, F., and Chen, B. (2019). “Accelerating Deep Neural Network Training for Action Recognition on a Cluster of GPUs.” Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing. [abstract]
Due to the additional temporal dimension, large-scale video action recognition is even more challenging than image recognition and typically takes days to train on modern GPUs even for modest-sized datasets. We propose algorithms and techniques to accelerate training of deep neural networks for action recognition on a cluster of GPUs. In terms of convergence and scaling, our distributed training algorithm with adaptive batch size is provably superior to popular asynchronous stochastic gradient descent algorithms. The convergence analysis of our algorithm shows it is possible to reduce communication cost and at the same time minimize the number of iterations needed for convergence. We customize the Adam optimizer for our distributed algorithm to improve efficiency. In addition, we employ transfer-learning to further reduce training time while improving validation accuracy. Compared with the base-line single-GPU stochastic gradient descent implementation of the two-stream training approach, our implementation achieves super-linear speedups on 16 GPUs while improving validation accuracy. For the UCFI0l and HMDB51 datasets, the validation accuracies achieved are 93.1 % and 67.9% respectively. As far as we know, these are the highest accuracies achieved with the two-stream approach that does not involve computationally expensive 3D convolutions or pretraining on much larger datasets.
Cong, G., Domeniconi, G., Yang, C.-C., Shapiro, J., and Chen, B. (2019). “Video Action Recognition with an Additional End-To-End Trained Temporal Stream.” 2019 IEEE Winter Conference on Applications of Computer Vision. [abstract]
Detecting actions in videos requires understanding the temporal relationships among frames. Typical action recognition approaches rely on optical flow estimation methods to convey temporal information to a CNN. Recent studies employ 3D convolutions in addition to optical flow to process the temporal information. While these models achieve slightly better results than two-stream 2D convolutional approaches, they are significantly more complex, requiring more data and time to be trained. We propose an efficient, adaptive batch size distributed training algorithm with customized optimizations for training the two 2D streams. We introduce a new 2D convolutional temporal stream that is trained end-to-end with a neural network. The flexibility to freeze some network layers from training in this temporal stream brings the possibility of ensemble learning with more than one temporal streams. Our architecture that combines three streams achieves the highest accuracies as we know of on UCF101 and HMDB51 by systems that do not pretrain on much larger datasets (e.g., Kinetics). We achieve these results while keeping our spatial and temporal streams 4.67x faster to train than the 3D convolution approaches.
Deelman, E., Mandal, A., Jiang, M., and Sakellariou, R. (2019). “The Role of Machine Learning in Scientific Workflows.” International Journal of High Performance Computing Applications. [abstract]
Machine learning (ML) is being applied in a number of everyday contexts from image recognition, to natural language processing, to autonomous vehicles, to product recommendation. In the science realm, ML is being used for medical diagnosis, new materials development, smart agriculture, DNA classification, and many others. In this article, we describe the opportunities of using ML in the area of scientific workflow management. Scientific workflows are key to today’s computational science, enabling the definition and execution of complex applications in heterogeneous and often distributed environments. We describe the challenges of composing and executing scientific workflows and identify opportunities for applying ML techniques to meet these challenges by enhancing the current workflow management system capabilities. We foresee that as the ML field progresses, the automation provided by workflow management systems will greatly increase and result in significant improvements in scientific productivity.
Dryden, N., Maruyama, N., Benson, T., Moon, T., Snir, M., and Van Essen, B. (2019). “Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism.” International Parallel and Distributed Processing Symposium. [abstract]
Scaling CNN training is necessary to keep up with growing datasets and reduce training time. We also see an emerging need to handle datasets with very large samples, where memory requirements for training are large. Existing training frameworks use a data-parallel approach that partitions samples within a mini-batch, but limits to scaling the mini-batch size and memory consumption makes this untenable for large samples. We describe and implement new approaches to convolution, which parallelize using spatial decomposition or a combination of sample and spatial decomposition. This introduces many performance knobs for a network, so we develop a performance model for CNNs and present a method for using it to automatically determine efficient parallelization strategies. We evaluate our algorithms with microbenchmarks and image classification with ResNet-50. Our algorithms allow us to prototype a model for a mesh-tangling dataset, where sample sizes are very large. We show that our parallelization achieves excellent strong and weak scaling and enables training for previously unreachable datasets.
Dryden, N., Maruyama, N., Moon, T., (…), Snir, M., and Van Essen, B. (2019). “Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems.” Proceedings of Machine Learning in HPC Environments and the International Conference for High Performance Computing, Networking, Storage and Analysis. [abstract]
We identify communication as a major bottleneck for training deep neural networks on large-scale GPU clusters, taking over 10x as long as computation. To reduce this overhead, we discuss techniques to overlap communication and computation as much as possible. This leads to much of the communication being latency-bound instead of bandwidth-bound, and we find that using a combination of latency- and bandwidth-optimized allreduce algorithms significantly reduces communication costs. We also discuss a semantic mismatch between MPI and CUDA that increases overheads and limits asynchrony, and propose a solution that enables communication to be aware of CUDA streams. We implement these optimizations in the open-source Aluminum communication library, enabling optimized, asynchronous, GPU-aware communication. Aluminum demonstrates improved performance in benchmarks and end-to-end training of deep networks, for both strong and weak scaling.
Endrei, M., Jin, C., Dinh, M.N., (…)., DeRose, L., and de Supinski, B.R. (2019). “Statistical and Machine Learning Models for Optimizing Energy in Parallel Applications.” International Journal of High Performance Computing Applications. [abstract]
Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.
Fan, Y.J. (2019). “Autoencoder Node Saliency: Selecting Relevant Latent Representations.” Pattern Recognition. [abstract]
The autoencoder is an artificial neural network that performs nonlinear dimension reduction and learns hidden representations of unlabeled data. With a linear transfer function it is similar to the principal component analysis (PCA). While both methods use weight vectors for linear transformations, the autoencoder does not come with any indication similar to the eigenvalues in PCA that are paired with eigenvectors. We propose a novel autoencoder node saliency method that examines whether the features constructed by autoencoders exhibit properties related to known class labels. The supervised node saliency ranks the nodes based on their capability of performing a learning task. It is coupled with the normalized entropy difference (NED). We establish a property for NED values to verify classifying behaviors among the top ranked nodes. By applying our methods to real datasets, we demonstrate their ability to provide indications on the performing nodes and explain the learned tasks in autoencoders.
Humbird, K.D., Peterson, J.L., and Mcclarren, R.G. (2019). “Deep Neural Network Initialization with Decision Trees.” IEEE Transactions on Neural Networks and Learning Systems. [abstract]
In this paper, a novel, automated process for constructing and initializing deep feedforward neural networks based on decision trees is presented. The proposed algorithm maps a collection of decision trees trained on the data into a collection of initialized neural networks with the structures of the networks determined by the structures of the trees. The tree-informed initialization acts as a warm-start to the neural network training process, resulting in efficiently trained, accurate networks. These models, referred to as 'deep jointly informed neural networks' (DJINN), demonstrate high predictive performance for a variety of regression and classification data sets and display comparable performance to Bayesian hyperparameter optimization at a lower computational cost. By combining the user-friendly features of decision tree models with the flexibility and scalability of deep neural networks, DJINN is an attractive algorithm for training predictive models on a wide range of complex data sets.
Kafle, S., Gupta, V., Kailkhura, B., Wimalajeewa, T., and Varshney, P.K. (2019). “Joint Sparsity Pattern Recovery with 1-b Compressive Sensing in Distributed Sensor Networks.” IEEE Transactions on Signal and Information Processing over Networks. [abstract]
In this paper, we study the problem of joint sparse support recovery with 1-b quantized compressive measurements in a distributed sensor network. Multiple nodes in the network are assumed to observe sparse signals having the same but unknown sparse support. Each node quantizes its measurement vector element-wise to 1-b. First, we consider that all the quantized measurements are available at a central fusion center. We derive performance bounds for sparsity pattern recovery using 1-bit quantized measurements from multiple sensors when the maximum likelihood decoder is employed. We further develop two computationally tractable algorithms for joint sparse support recovery in the centralized setting. One algorithm minimizes a cost function defined as the sum of the likelihood function and the l 1,∞ quasi-norm, while the other algorithm extends the binary iterative hard thresholding algorithm to the multiple measurement vector case. Second, we consider a decentralized setting where each node transmits 1-b measurements to its one-hop neighbors. The basic idea behind the algorithms developed in the decentralized setting is to embed collaboration among nodes and fusion strategies. We show that even with noisy 1-b compressed measurements, joint support recovery can be carried out accurately in both centralized and decentralized settings. We further show that the performance of the proposed 1-bit compressive sensing-based algorithms is very close to that of their real-valued counterparts except when the signal-to-noise ratio is very small.
Kailkhura, B., Gallagher, B., Kim, S., Hiszpanski, A., and Han, T. Y-J. (2019). “Reliable and Explainable Machine-Learning Methods for Accelerated Material Discovery.” npj Computational Materials. [abstract]
Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.
Kim, S., Kim, H., Yoon, S., Lee, J., Kahou, S., Kashinath, K., and Prabhat, M. (2019). “Deep-Hurricane-Tracker: Tracking and Forecasting Extreme Climate Events using ConvLSTM.” 2019 IEEE Winter Conference on Applications of Computer Vision. [abstract]
Tracking and predicting extreme events in large-scale spatio-temporal climate data are long standing challenges in climate science. In this paper, we propose Convolutional LSTM (ConvLSTM)-based spatio-temporal models to track and predict hurricane trajectories from large-scale climate data; namely, pixel-level spatio-temporal history of tropical cyclones. To address the tracking problem, we model time sequential density maps of hurricane trajectories, enabling to capture not only the temporal dynamics but also spatial distribution of the trajectories. Furthermore, we introduce a new trajectory prediction approach as a problem of sequential forecasting from past to future hurricane density map sequences. Extensive experiment on actual 20 years record shows that our ConvLSTM-based tracking model significantly outperforms existing approaches, and that the proposed forecasting model achieves successful mapping from predicted density map to ground truth.
Leach, W., Henrikson, J., Hatarik, R., (…), Palmer, N., and Rever, M. (2019). “Using Convolutional Neural Networks to Classify Static X-ray Imager Diagnostic Data at the National Ignition Facility.” Proceedings of the International Society for Optical Engineering. [abstract]
Hohlraums convert the laser energy at the National Ignition Facility (NIF) into X-ray energy to compress and implode a fusion capsule, creating fusion. The Static X-ray Imager (SXI) diagnostic collects time-integrated images of hohlraum wall X-ray illumination patterns viewed through the laser entrance hole (LEH). NIF image processing algorithms calculate the size and location of the LEH opening from the SXI images. Images obtained come from different experimental categories and camera setups and occasionally do not contain applicable or usable information. Unexpected experimental noise in the data can also occur where affected images should be removed and not run through the processing algorithms. Current approaches to try and identify these types of images are done manually and on a case-by-case basis, which can be prohibitively time-consuming. In addition, the diagnostic image data can be sparse (missing segments or pieces) and may lead to false analysis results. There exists, however, an abundant variety of image examples in the NIF database. Convolutional Neural Networks (CNNs) have been shown to work well with this type of data and under these conditions. The objective of this work was to apply transfer learning and fine tune a pre-trained CNN using a relatively small-scale dataset (∼1500 images) and determine which instances contained useful image data. Experimental results are presented that show that CNNs can readily identify useful image data while filtering out undesirable images. The CNN filter is currently being used in production at the NIF.
Maiti, A. (2019). “Second-Order Statistical Bootstrap for the Uncertainty Quantification of Time-temperature-superposition Analysis.” Rheologica Acta. [abstract]
Time-temperature superposition (TTS), which for decades has been a powerful method for long-term prediction from accelerated aging data, involves rigid-shifting isotherms in logarithmic time to produce a single master prediction curve. For simple thermo-rheological properties that accurately follow the TTS principle, the shifts can be easily determined, even manually by the eye. However, for many properties of interest, where the principle is obeyed only approximately, or the data is noisy, it is imperative to develop objective shifting techniques along with reliable uncertainty bounds. This work analyzes in detail the method of arclength-minimization as an unsupervised algorithm to determining optimum shifts and demonstrates that the method is nearly unbiased for all practical datasets with a variety of noise distributions. Moreover, if averaged over with-replacement (bootstrap) resamples, the predicted shifts follow a normal distribution, a fact that can be used to construct confidence interval for the master curve through a second-order bootstrap procedure.
Narayanaswamy, V.S., Thiagarajan, J.J., Song, H., and Spanias, A. (2019). “Designing an Effective Metric Learning Pipeline for Speaker Diarization.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. [abstract]
State-of-the-art speaker diarization systems utilize knowledge from external data, in the form of a pre-trained distance metric, to effectively determine relative speaker identities to unseen data. However, much of recent focus has been on choosing the appropriate feature extractor, ranging from pre-trained i-vectors to representations learned via different sequence modeling architectures (e.g. 1D-CNNs, LSTMs, attention models), while adopting off-the-shelf metric learning solutions. In this paper, we argue that, regardless of the feature extractor, it is crucial to carefully design a metric learning pipeline, namely the loss function, the sampling strategy and the discriminative margin parameter, for building robust diarization systems. Furthermore, we propose to adopt a fine-grained validation process to obtain a comprehensive evaluation of the generalization power of metric learning pipelines. To this end, we measure diarization performance across different language speakers, and variations in the number of speakers in a recording. Using empirical studies, we provide interesting insights into the effectiveness of different design choices and make recommendations.
Nathan, E., Sanders, G., and Henson, V.E. (2019). “Personalized Ranking in Dynamic Graphs Using Nonbacktracking Walks.” Lecture Notes in Computer Science, including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. [abstract]
Centrality has long been studied as a method of identifying node importance in networks. In this paper we study a variant of several walk-based centrality metrics based on the notion of a nonbacktracking walk, where the pattern i→ j → i is forbidden in the walk. Specifically, we focus our analysis on dynamic graphs, where the underlying data stream the network is drawn from is constantly changing. Efficient algorithms for calculating nonbacktracking walk centrality scores in static and dynamic graphs are provided and experiments on graphs with several million vertices and edges are conducted. For the static algorithm, comparisons to a traditional linear algebraic method of calculating scores show that our algorithm produces scores of high accuracy within a theoretically guaranteed bound. Comparisons of our dynamic algorithm to the static show speedups of several orders of magnitude as well as a significant reduction in space required.
Petersen, B.K., Yang, J., Grathwohl, W.S., (…), An, G., and Faissol, D.M. (2019). “Deep Reinforcement Learning and Simulation as a Path toward Precision Medicine.” Journal of Computational Biology. [abstract]
Traditionally, precision medicine involves classifying patients to identify subpopulations that respond favorably to specific therapeutics. We pose precision medicine as a dynamic feedback control problem, where treatment administered to a patient is guided by measurements taken during the course of treatment. We consider sepsis, a life-threatening condition in which dysregulation of the immune system causes tissue damage. We leverage an existing simulation of the innate immune response to infection and apply deep reinforcement learning (DRL) to discover an adaptive personalized treatment policy that specifies effective multicytokine therapy to simulated sepsis patients based on systemic measurements. The learned policy achieves a dramatic reduction in mortality rate over a set of 500 simulated patients relative to standalone antibiotic therapy. Advantages of our approach are threefold: (1) the use of simulation allows exploring therapeutic strategies beyond clinical practice and available data, (2) advances in DRL accommodate learning complex therapeutic strategies for complex biological systems, and (3) optimized treatments respond to a patient's individual disease progression over time, therefore, capturing both differences across patients and the inherent randomness of disease progression within a single patient. We hope that this work motivates both considering adaptive personalized multicytokine mediation therapy for sepsis and exploiting simulation with DRL for precision medicine more broadly.
Reza, T., Ripeanu, M. Tripoul, N., Sanders, G., and Pearce, R. (2019). “PruneJuice: Pruning Trillion-edge Graphs to a Precise Pattern-matching Solution.” Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. [abstract]
Pattern matching is a powerful graph analysis tool. Unfortunately, existing solutions have limited scalability, support only a limited set of search patterns, and/or focus on only a subset of the real-world problems associated with pattern matching. This paper presents a new algorithmic pipeline that: (i) enables highly scalable pattern matching on labeled graphs, (ii) supports arbitrary patterns, (iii) enables trade-offs between precision and time-to-solution (while always selecting all vertices and edges that participate in matches, thus offering 100% recall), and (iv) supports a set of popular data analytics scenarios. We implement our approach on top of HavoqGT and demonstrate its advantages through strong and weak scaling experiments on massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) graphs, respectively, and at scales (1,024 nodes / 36,864 cores) orders of magnitude larger than used in the past for similar problems.
Roberts, R. S., Goforth, J.W., Weinert, G.F., (…), Stinson, B.J., and Duncan, A.M. (2019). “Automated Annotation of Satellite Imagery using Model-based Projections.” Proceedings of the Applied Imagery Pattern Recognition Workshop. [abstract]
GeoVisipedia is a new and novel approach to annotating satellite imagery. It uses wiki pages to annotate objects rather than simple labels. The use of wiki pages to contain annotations is particularly useful for annotating objects in imagery of complex geospatial configurations such as industrial facilities. GeoVisipedia uses the PRISM algorithm to project annotations applied to one image to other imagery, hence enabling ubiquitous annotation. This paper derives the PRISM algorithm, which uses image metadata and a 3D facility model to create a view matrix unique to each image. The view matrix is used to project model components onto a mask which aligns the components with the objects in the scene that they represent. Wiki pages are linked to model components, which are in turn linked to the image via the component mask. An illustration of the efficacy of the PRISM algorithm is provided, demonstrating the projection of model components onto an effluent stack. We conclude with a discussion of the efficiencies of GeoVisipedia over manual annotation, and the use of PRISM for creating training sets for machine learning algorithms.
Shukla, R., Lipasti, M., Van Essen, B., Moody, A., and Maruyama, N. (2019). “Remodel: Rethinking Deep CNN Models to Detect and Count on a Neurosynaptic System.” Frontiers in Neuroscience. [abstract]
In this work, we perform analysis of detection and counting of cars using a low-power IBM TrueNorth Neurosynaptic System. For our evaluation we looked at a publicly-available dataset that has overhead imagery of cars with context present in the image. The trained neural network for image analysis was deployed on the NS16e system using IBM's EEDN training framework. Through multiple experiments we identify the architectural bottlenecks present in TrueNorth system that does not let us deploy large neural network structures. Following these experiments we propose changes to CNN model to circumvent these architectural bottlenecks. The results of these evaluations have been compared with caffe-based implementations of standard neural networks that were deployed on a Titan-X GPU. Results showed that TrueNorth can detect cars from the dataset with 97.60% accuracy and can be used to accurately count the number of cars in the image with 69.04% accuracy. The car detection accuracy and car count (-/+ 2 error margin) accuracy are comparable to high-precision neural networks like AlexNet, GoogLeNet, and ResCeption, but show a manifold improvement in power consumption.
Thiagarajan, J.J., Anirudh, R., Sridhar, R., and Bremer, P.-T. (2019). “Unsupervised Dimension Selection Using a Blue Noise Graph Spectrum.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. [abstract]
Unsupervised dimension selection is an important problem that seeks to reduce dimensionality of data, while preserving the most useful characteristics. While dimensionality reduction is commonly utilized to construct low-dimensional embeddings, they produce feature spaces that are hard to interpret. Further, in applications such as sensor design, one needs to perform reduction directly in the input domain, instead of constructing transformed spaces. Consequently, dimension selection (DS) aims to solve the combinatorial problem of identifying the top-k dimensions, which is required for effective experiment design, reducing data while keeping it interpretable, and designing better sensing mechanisms. In this paper, we develop a novel approach for DS based on graph signal analysis to measure feature influence. By analyzing synthetic graph signals with a blue noise spectrum, we show that we can measure the importance of each dimension. Using experiments in supervised learning and image masking, we demonstrate the superiority of the proposed approach over existing techniques in capturing crucial characteristics of high dimensional spaces, using only a small subset of the original features.
Thiagarajan, J.J., Kim, I., Anirudh, R., and Bremer, P.-T. (2019). “Understanding Deep Neural Networks through Input Uncertainties.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. [abstract]
Techniques for understanding the functioning of complex machine learning models are becoming increasingly popular, not only to improve the validation process, but also to extract new insights about the data via exploratory analysis. Though a large class of such tools currently exists, most assume that predictions are point estimates and use a sensitivity analysis of these estimates to interpret the model. Using lightweight probabilistic networks we show how including prediction uncertainties in the sensitivity analysis leads to: (i) more robust and generalizable models; and (ii) a new approach for model interpretation through uncertainty decomposition. In particular, we introduce a new regularization that takes both the mean and variance of a prediction into account and demonstrate that the resulting networks provide improved generalization to unseen data. Furthermore, we propose a new technique to explain prediction uncertainties through uncertainties in the input domain, thus providing new ways to validate and interpret deep learning models.
Thiagarajan, J., Rajan, D., and Sattigeri, P. (2019). “Understanding Behavior of Clinical Models under Domain Shifts.” 2019 KDD Workshop on Applied Data Science for Healthcare. [abstract]
The hypothesis that computational models can be reliable enough to be adopted in prognosis and patient care is revolutionizing healthcare. Deep learning, in particular, has been a game changer in building predictive models, thus leading to community-wide data curation efforts. However, due to inherent variabilities in population characteristics and biological systems, these models are often biased to the training datasets. This can be limiting when models are deployed in new environments, when there are systematic domain shifts not known a priori. In this paper, we propose to emulate a large class of domain shifts, that can occur in clinical settings, with a given dataset, and argue that evaluating the behavior of predictive models in light of those shifts is an effective way to quantify their reliability. More specifically, we develop an approach for building realistic scenarios, based on analysis of \textit{disease landscapes} in multi-label classification. Using the openly available MIMIC-III EHR dataset for phenotyping, for the first time, our work sheds light into data regimes where deep clinical models can fail to generalize. This work emphasizes the need for novel validation mechanisms driven by real-world domain shifts in AI for healthcare.
Thopalli, K., Anirudh, R., Thiagarajan, J.J., and Turaga, P. (2019). “Multiple Subspace Alignment Improves Domain Adaptation.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. [abstract]
We present a novel unsupervised domain adaptation (DA) method for cross-domain visual recognition. Though subspace methods have found success in DA, their performance is often limited due to the assumption of approximating an entire dataset using a single low-dimensional subspace. Instead, we develop a method to effectively represent the source and target datasets via a collection of low-dimensional subspaces, and subsequently align them by exploiting the natural geometry of the space of subspaces, on the Grassmann manifold. We demonstrate the effectiveness of this approach, using empirical studies on two widely used benchmarks, with performance on par or better than the performance of the state-of-the-art domain adaptation methods.
Tran, K., Panahi, A., Adiga, A., Sakla, W., and Krim, H. (2019). “Nonlinear Multi-Scale Super-resolution Using Deep Learning.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. [abstract]
We propose a deep learning architecture capable of performing up to 8× single image super-resolution. Our architecture incorporates an adversarial component from the super-resolution generative adversarial networks (SRGANs) and a multi-scale learning component from the multiple scale super-resolution network (MSSRNet), which only together can recover smaller structures inherent in satellite images. To further enhance our performance, we integrate progressive growing and training to our network. This, aided by feed forwarding connections in the network to move along and enrich information from previous inputs, produces super-resolved images at scaling factors of 2, 4, and 8. To ensure and enhance the stability of GANs, we employ Wasserstein GANs (WGANs) during training. Experimentally, we find that our architecture can recover small objects in satellite images during super-resolution whereas previous methods cannot.
Tripoul, N., Halawa, H., Reza, T., (…), Pearce, R., and Ripeanu, M. (2019). “There Are Trillions of Little Forks in the Road. Choose Wisely! Estimating the Cost and Likelihood of Success of Constrained Walks to Optimize a Graph Pruning Pipeline.” Proceedings of IA3 2018: 8th Workshop on Irregular Applications: Architectures and Algorithms, and the International Conference for High Performance Computing, Networking, Storage and Analysis. [abstract]
We have developed [Reza et al. SC'18] a highly scalable algorithmic pipeline for pattern matching in labeled graphs and demonstrated it on trillion-edge graphs. This pipeline: (i) supports arbitrary search patterns, (ii) identifies all the vertices and edges that participate in matches - offering 100% precision and recall, and (iii) supports realistic data analytics scenarios. This pipeline is based on graph pruning: it decomposes the search template into individual constraints and uses them to repeatedly prune the graph to a final solution. Our current solution, however, makes a number of ad-hoc intuition-based decisions with impact on performance. In a nutshell these relate to (i) constraint selection - which constraints to generate? (ii) constraint ordering - in which order to use them? and (iii) individual constraint generation - how to best verify them? This position paper makes the observation that by estimating the runtime cost and likelihood of success of a constrained walk in a labeled graph one can inform these optimization decisions. We propose a preliminary solution to make these estimates, and demonstrate - using a prototype shared-memory implementation - that this: (i) is feasible with low overheads, and (ii) offers accurate enough information to optimize our pruning pipeline by a significant margin.
Veldt, N., Klymko, C., and Gleich, D.F. (2019). “Flow-Based Local Graph Clustering with Better Seed Set Inclusion.” SIAM International Conference on Data Mining. [abstract]
Flow-based methods for local graph clustering have received significant recent attention for their theoretical cut improvement and runtime guarantees. In this work we present two improvements for using flow-based methods in real-world semi-supervised clustering problems. Our first contribution is a generalized objective function that allows practitioners to place strict and soft penalties on excluding specific seed nodes from the output set. This feature allows us to avoid the tendency, often exhibited by previous flow-based methods, to contract a large seed set into a small set of nodes that does not contain all or even most of the seed nodes. Our second contribution is a fast algorithm for minimizing our generalized objective function, based on a variant of the push-relabel algorithm for computing preflows. We make our approach very fast in practice by implementing a global relabeling heuristic and employing a warm-start procedure to quickly solve related cut problems. In practice our algorithm is faster than previous related flow-based methods, and is also more robust in detecting ground truth target regions in a graph thanks to its ability to better incorporate semi-supervised information about target clusters.
White, D.A., Arrighi, W.J., Kudo, J., and Watts, S.E. (2019). “Multiscale Topology Optimization Using Neural Network Surrogate Models.” Computer Methods in Applied Mechanics and Engineering. [abstract]
We are concerned with optimization of macroscale elastic structures that are designed utilizing spatially varying microscale metamaterials. The macroscale optimization is accomplished using gradient-based nonlinear topological optimization. But instead of using density as the optimization decision variable, the decision variables are the multiple parameters that define the local microscale metamaterial. This is accomplished using single layer feedforward Gaussian basis function networks as a surrogate models of the elastic response of the microscale metamaterial. The surrogate models are trained using highly resolved continuum finite element simulations of the microscale metamaterials and hence are significantly more accurate than analytical models e.g. classical beam theory. Because the derivative of the surrogate model is important for sensitivity analysis of the macroscale topology optimization, a neural network training procedure based on the Sobolev norm is described. Since the SIMP method is not appropriate for spatially varying lattices, an alternative method is developed to enable creation of void regions. The efficacy of this approach is demonstrated via several examples in which the optimal graded metamaterial outperforms a traditional solid structure.
Yuan, B., Giera, B., Guss, G., Matthews, M., and McMains, S. (2019). “Semi-Supervised Convolutional Neural Networks for in-situ Video Monitoring of Selective Laser Melting.” IEEE Winter Conference on Applications of Computer Vision. [abstract]
Selective Laser Melting (SLM) is a metal additive manufacturing technique. The lack of SLM process repeatability is a barrier for industrial progression. SLM product quality is hard to control, even when using fixed system settings. Thus SLM could benefit from a monitoring system that provides quality assessments in real-time. Since there is no publicly available SLM dataset, we ran experiments to collect over one thousand SLM videos, measured the physical output via height map images, and applied a proposed image processing algorithm to them to produce a dataset for semi-supervised learning. Then we trained convolutional neural networks (CNNs) to recognize desired quality metrics from videos. Experimental results demonstrate the effectiveness of our proposed monitoring approach and also show that the semi-supervised model can mitigate the time and expense of labeling an entire SLM dataset.
Anirudh, R., Kim, H., Thiagarajan, J. J., Mohan, K. A., Champley, K. and Bremer, P.T. (2018). “Lose the Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion.” Conference on Computer Vision and Pattern Recognition. [abstract]
Computed Tomography (CT) reconstruction is a fundamental component to a wide variety of applications ranging from security, to healthcare. The classical techniques require measuring projections, called sinograms, from a full 180° view of the object. This is impractical in a limited angle scenario, when the viewing angle is less than 180°, which can occur due to different factors including restrictions on scanning time, limited flexibility of scanner rotation, etc. The sinograms obtained as a result, cause existing techniques to produce highly artifact-laden reconstructions. In this paper, we propose to address this problem through implicit sinogram completion, on a challenging real world dataset containing scans of common checked-in luggage. We propose a system, consisting of 1D and 2D convolutional neural networks, that operates on a limited angle sinogram to directly produce the best estimate of a reconstruction. Next, we use the x-ray transform on this reconstruction to obtain a “completed” sinogram, as if it came from a full 180° measurement. We feed this to standard analytical and iterative reconstruction techniques to obtain the final reconstruction. We show with extensive experimentation that this combined strategy outperforms many competitive baselines. We also propose a measure of confidence for the reconstruction that enables a practitioner to gauge the reliability of a prediction made by our network. We show that this measure is a strong indicator of quality as measured by the PSNR, while not requiring ground truth at test time. Finally, using a segmentation experiment, we show that our reconstruction preserves the 3D structure of objects effectively.
Kamath, C. and Fan, Y.J. (2018). “Compressing Unstructured Mesh Data Using Spline Fits, Compressed Sensing, and Regression Methods.” IEEE Global Conference on Signal and Information Processing. [abstract]
Compressing unstructured mesh data from computer simulations poses several challenges that are not encountered in the compression of images or videos. Since the spatial locations of the points are not on a regular grid, as in an image, it is difficult to identify near neighbors of a point whose values can be exploited for compression. In this paper, we investigate how three very different methods—spline fits, compressed sensing, and kernel regression—compare in terms of the reconstruction accuracy and reduction in data size when applied to a practical problem from a plasma physics simulation.
Kamath, C. and Fan, Y. (2018). "Regression with small data sets: A case study using code surrogates in additive manufacturing." Knowledge and Information Systems: An International Journal. [abstract]
There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. However, there are some problems where collecting even a single data point is very expensive, resulting in data sets with only tens or hundreds of samples. One such problem is that of building code surrogates, where a computer simulation is run using many different values of the input parameters and a regression model is built to relate the outputs of the simulation to the inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments, but the cost of running expensive simulations at many samples points can be high. In this paper, we use a problem from the domain of additive manufacturing to show that even with small data sets, we can build good quality surrogates by appropriately selecting the input samples and the regression algorithm. Our work is broadly applicable to simulations in other domains and the ideas proposed can be used in time-constrained machine learning tasks, such as hyper-parameter optimization.
Lin, Y., Wang, S., Thiagarajan, J. J., Guthrie, G. and Coblentz, D. (2018). "Efficient Data-Driven Geologic Feature Characterization from Pre-stack Seismic Measurements using Randomized Machine-Learning Algorithm." Geophysical Journal International. [abstract]
Conventional seismic techniques for detecting the subsurface geologic features are challenged by limited data coverage, computational inefficiency, and subjective human factors. We developed a novel data-driven geological feature characterization approach based on pre-stack seismic measurements. Our characterization method employs an efficient and accurate machine-learning method to extract useful subsurface geologic features automatically. Specifically, our method is based on the kernel ridge regression model. The conventional kernel ridge regression can be computationally prohibitive because of the large volume of seismic measurements. We employ a data reduction technique in combination with the conventional kernel ridge regression method to improve the computational efficiency and reduce memory usage. In particular, we utilize a randomized numerical linear algebra technique, named Nyström method, to effectively reduce the dimensionality of the feature space without compromising the information content required for accurate characterization. We provide thorough computational cost analysis to show the efficiency of our new geological feature characterization methods. We further validate the performance of our new subsurface geologic feature characterization method using synthetic surface seismic data for 2D acoustic and elastic velocity models. Our numerical examples demonstrate that our new characterization method significantly improves the computational efficiency while maintaining comparable accuracy. Interestingly, we show that our method yields a speed-up ratio on the order of ∼ 102 to ∼ 103 in a multi-core computational environment.
Liu, S., Bremer, P.T., Thiagarajan, J. J., Srikumar, V., Wang, B., Livnat, Y. and Pascucci, V. (2018). "Visual Exploration of Semantic Relationships in Neural Word Embeddings." IEEE Transactions on Visualization and Computer Graphics. [abstract]
Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). However, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. In particular, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or even misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. Here, we introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.
Mundhenk, T. N., Ho, D., Chen, B. Y. (2018). "Improvements to context based self-supervised learning." Conference on Computer Vision and Pattern Recognition. [abstract]
We develop a set of methods to improve on the results of self-supervised learning using context. We start with a baseline of patch based arrangement context learning and go from there. Our methods address some overt problems such as chromatic aberration as well as other potential problems such as spatial skew and mid-level feature neglect. We prevent problems with testing generalization on common self-supervised benchmark tests by using different datasets during our development. The results of our methods combined yield top scores on all standard self-supervised benchmarks, including classification and detection on PASCAL VOC 2007, segmentation on PASCAL VOC 2012, and "linear tests" on the ImageNet and CSAIL Places datasets. We obtain an improvement over our baseline method of between 4.0 to 7.1 percentage points on transfer learning classification tests. We also show results on different standard network architectures to demonstrate generalization as well as portability.
Rajan, D., and Thiagarajan, J.J. (2018). “A Generative Modeling Approach to Limited Channel ECG Classification.” Conference proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society. [abstract]
Processing temporal sequences is central to a variety of applications in health care, and in particular multi-channel Electrocardiogram (ECG) is a highly prevalent diagnostic modality that relies on robust sequence modeling. While Recurrent Neural Networks (RNNs) have led to significant advances in automated diagnosis with time-series data, they perform poorly when models are trained using a limited set of channels. A crucial limitation of existing solutions is that they rely solely on discriminative models, which tend to generalize poorly in such scenarios. In order to combat this limitation, we develop a generative modeling approach to limited channel ECG classification. This approach first uses a Seq2Seq model to implicitly generate the missing channel information, and then uses the latent representation to perform the actual supervisory task. This decoupling enables the use of unsupervised data and also provides highly robust metric spaces for subsequent discriminative learning. Our experiments with the Physionet dataset clearly evidence the effectiveness of our approach over standard RNNs in disease prediction.
Song H., Rajan D., Thiagarajan, J. J. and Spanias, A. (2018). "Attend and Diagnose: Clinical Time Series Analysis using Attention Models." AAAI Conference. [abstract]
With widespread adoption of electronic health records, there is an increased emphasis for predictive models that can effectively deal with clinical time-series data. Powered by Recurrent Neural Network (RNN) architectures with Long Short-Term Memory (LSTM) units, deep neural networks have achieved state-of-the-art results in several clinical prediction tasks. Despite the success of RNNs, its sequential nature prohibits parallelized computing, thus making it inefficient particularly when processing long sequences. Recently, architectures which are based solely on attention mechanisms have shown remarkable success in transduction tasks in NLP, while being computationally superior. In this paper, for the first time, we utilize attention models for clinical time-series modeling, thereby dispensing recurrence entirely. We develop the \textit{SAnD} (Simply Attend and Diagnose) architecture, which employs a masked, self-attention mechanism, and uses positional encoding and dense interpolation strategies for incorporating temporal order. Furthermore, we develop a multi-task variant of \textit{SAnD} to jointly infer models with multiple diagnosis tasks. Using the recent MIMIC-III benchmark datasets, we demonstrate that the proposed approach achieves state-of-the-art performance in all tasks, outperforming LSTM models and classical baselines with hand-engineered features.
Song, H., Thiagarajan, J.J., Sattigeri, P. and Spanias, A. (2018). "Optimizing Kernel Machines using Deep Learning." IEEE Transactions on Neural Networks and Learning Systems. [abstract]
Building highly non-linear and non-parametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring non-linear models through the construction of similarity functions from data. These methods are particularly preferred in cases where the training data sizes are limited and when prior knowledge of the data similarities is available. Despite their usefulness, they are limited by the computational complexity and their inability to support end-to-end learning with a task-specific objective. On the other hand, deep neural networks have become the de facto solution for end-to-end inference in several learning paradigms. In this article, we explore the idea of using deep architectures to perform kernel machine optimization, for both computational efficiency and end-to-end inferencing. To this end, we develop the DKMO (Deep Kernel Machine Optimization) framework, that creates an ensemble of dense embeddings using Nystrom kernel approximations and utilizes deep learning to generate task-specific representations through the fusion of the embeddings. Intuitively, the filters of the network are trained to fuse information from an ensemble of linear subspaces in the RKHS. Furthermore, we introduce the kernel dropout regularization to enable improved training convergence. Finally, we extend this framework to the multiple kernel case, by coupling a global fusion layer with pre-trained deep kernel machines for each of the constituent kernels. Using case studies with limited training data, and lack of explicit feature sources, we demonstrate the effectiveness of our framework over conventional model inferencing techniques.
Song, H., Willi, M., Thiagarajan, J.J., Berisha, V., and Spanias, A. (2018). “Triplet Network with Attention for Speaker Diarization.” Proceedings of the Annual Conference of the International Speech Communication Association. [abstract]
In automatic speech processing systems, speaker diarization is a crucial front-end component to separate segments from different speakers. Inspired by the recent success of deep neural networks (DNNs) in semantic inferencing, triplet loss-based architectures have been successfully used for this problem. However, existing work utilizes conventional i-vectors as the input representation and builds simple fully connected networks for metric learning, thus not fully leveraging the modeling power of DNN architectures. This paper investigates the importance of learning effective representations from the sequences directly in metric learning pipelines for speaker diarization. More specifically, we propose to employ attention models to learn embeddings and the metric jointly in an end-to-end fashion. Experiments are conducted on the CALLHOME conversational speech corpus. The diarization results demonstrate that, besides providing a unified model, the proposed approach achieves improved performance when compared against existing approaches.
Thiagarajan, J. J., Anirudh, R., Kailkhura, B., Jain, N., Islam, T., Bhatele, A., Yeom, J.S. and Gamblin, T. (2018). "PADDLE: Performance Analysis using a Data-driven Learning Environment." IEEE International Parallel and Distributed Processing Symposium. [abstract]
The use of machine learning techniques to model execution time and power consumption, and, more generally, to characterize performance data is gaining traction in the HPC community. Although this signifies huge potential for automating complex inference tasks, a typical analytics pipeline requires selecting and extensively tuning multiple components ranging from feature learning to statistical inferencing to visualization. Further, the algorithmic solutions often do not generalize between problems, thereby making it cumbersome to design and validate machine learning techniques in practice. In order to address these challenges, we propose a unified machine learning framework, PADDLE, which is specifically designed for problems encountered during analysis of HPC data. The proposed framework uses an information-theoretic approach for hierarchical feature learning and can produce highly robust and interpretable models. We present user-centric workflows for using PADDLE and demonstrate its effectiveness in different scenarios: (a) identifying causes of network congestion; (b) determining the best performing linear solver for sparse matrices; and (c) comparing performance characteristics of parent and proxy application pairs.
Thiagarajan, J.J., Jain, N., Anirudh, R., Giménez, A., Sridhar, R., Marathe, A., Wang, T., Emani, M., Bhatele, A., and Gamblin, T. (2018). “Bootstrapping Parameter Space Exploration for Fast Tuning.” Association for Computing Machinery. [abstract]
The task of tuning parameters for optimizing performance or other metrics of interest such as energy, variability, etc. can be resource and time consuming. Presence of a large parameter space makes a comprehensive exploration infeasible. In this paper, we propose a novel bootstrap scheme, called GEIST, for parameter space exploration to find performance optimizing configurations quickly. Our scheme represents the parameter space as a graph whose connectivity guides information propagation from known configurations. Guided by the predictions of a semi-supervised learning method over the parameter graph, GEIST is able to adaptively sample and find desirable configurations using limited results from experiments. We show the effectiveness of GEIST for selecting application input options, compiler flags, and runtime/system settings for several parallel codes including LULESH, Kripke, Hypre, and OpenAtom.
Thiagarajan, J. J., Liu, S., Ramamurthy, K. and Bremer, P.T. (2018). "Exploring High-Dimensional Structure via Axis-Aligned Decomposition of Linear Projections." Conference on Visualization. [abstract]
Two-dimensional embeddings remain the dominant approach to visualize high dimensional data. The choice of embeddings ranges from highly non-linear ones, which can capture complex relationships but are difficult to interpret quantitatively, to axis-aligned projections, which are easy to interpret but are limited to bivariate relationships. Linear project can be considered as a compromise between complexity and interpretability, as they allow explicit axes labels, yet provide significantly more degrees of freedom compared to axis-aligned projections. Nevertheless, interpreting the axes directions, which are linear combinations often with many non-trivial components, remains difficult. To address this problem we introduce a structure aware decomposition of (multiple) linear projections into sparse sets of axis aligned projections, which jointly capture all information of the original linear ones. In particular, we use tools from Dempster-Shafer theory to formally define how relevant a given axis aligned project is to explain the neighborhood relations displayed in some linear projection. Furthermore, we introduce a new approach to discover a diverse set of high quality linear projections and show that in practice the information of k linear projections is often jointly encoded in ∼k axis aligned plots. We have integrated these ideas into an interactive visualization system that allows users to jointly browse both linear projections and their axis aligned representatives. Using a number of case studies we show how the resulting plots lead to more intuitive visualizations and new insight.
Zheng, P., Aravkin, A. Y., Ramamurthy, K. and Thiagarajan, J. J. (2018). "Visual Exploration of Semantic Relationships in Neural Word Embeddings." IEEE International Conference on Computer Vision Workshops. [abstract]
Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets.
Anirudh, R., Kailkhura, B., Thiagarajan, J.J. and Bremer, P. T. (2017). "Poisson Disk Sampling on the Grassmannian: Applications in Subspace Optimization." Conference on Computer Vision and Pattern Recognition. [abstract]
To develop accurate inference algorithms on embedded manifolds such as the Grassmannian, we often employ several optimization tools and incorporate the characteristics of known manifolds as additional constraints. However, a direct analysis of the nature of functions on manifolds is rarely performed. In this paper, we propose an alternative approach to this inference by adopting a statistical pipeline that first generates an initial sampling of the manifold, and then performs subsequent analysis based on these samples. First, we introduce a better sampling technique based on dart throwing (called the Poisson disk sampling (PDS)) to effectively sample the Grassmannian. Next, using Grassmannian sparse coding, we demonstrate the improved coverage achieved by PDS. Finally, we develop a consensus approach, with Grassmann samples, to infer the optimal embeddings for linear dimensionality reduction, and show that the resulting solutions are nearly optimal.
Kim, S., Ames, S., Lee, J., Zhang, C., Wilson, A. C. and Williams, D. (2017). "Massive Scale Deep Learning for Detecting Extreme Climate Events." International Workshop on Climate Informatics. [abstract]
Conventional extreme climate event detection relies on high spatial resolution climate model output for improved accuracy. It often poses significant computational challenges due to its tremendous iteration cost. As a cost-efficient alternative, we developed a system to detect and locate extreme climate events by deep learning. Our system can capture the pattern of extreme climate events from pre-existing coarse reanalysis data, corresponds to only 16 thousand grid points without expensive downscaling process with less than 5 hours to training our dataset, and less than 5 seconds to testing our test set using 5-layered Convolutional Neural Networks (CNNs). As the use case of our framework, we tested tropical cyclones detection with labeled reanalysis data and our cross validation results show 99.98% of detection accuracy and the localization accuracy is within 4.5 degrees of longitude/latitude (which is around 500 km, and is 3 times of data resolution).
Kim, S., Ames, S., Lee, J., Zhang, C., Wilson, A. C. and Williams, D. (2017). "Resolution Reconstruction of Climate Data with Pixel Recursive Model." IEEE International Conference on Data Mining. [abstract]
Deep learning techniques have been successfully applied to solve many problems in climate and geoscience using massive-scaled observed and modeled data. For extreme climate event detections, several models based on deep neural networks have been recently proposed and attend superior performance that overshadows all previous handcrafted expert based method. The issue arising, though, is that accurate localization of events requires high quality of climate data. In this work, we propose framework capable of detecting and localizing extreme climate events in very coarse climate data. Our framework is based on two models using deep neural networks, (1) Convolutional Neural Networks (CNNs) to detect and localize extreme climate events, and (2) Pixel recursive super resolution model to reconstruct high resolution climate data from low resolution climate data. Based on our preliminary work, we have presented two CNNs in our framework for different purposes, detection and localization. Our results using CNNs for extreme climate events detection shows that simple neural nets can capture the pattern of extreme climate events with high accuracy from very coarse reanalysis data. However, localization accuracy is relatively low due to the coarse resolution. To resolve this issue, the pixel recursive super resolution model reconstructs the resolution of input of localization CNNs. We present a best networks using pixel recursive super resolution model that synthesizes details of tropical cyclone in ground truth data while enhancing their resolution. Therefore, this approach not only dramatically reduces the human effort, but also suggests possibility to reduce computing cost required for downscaling process to increase resolution of data.
Lennox, K. P., Rosenfield, P., Blair, B., Kaplan, A., Ruz, J., Glenn, A. and Wurtz, R. (2017). "Assessing and Minimizing Contamination in Time of Flight Based Validation Data." Nuclear Instruments and Methods in Physics Research. [abstract]
Time of flight experiments are the gold standard method for generating labeled training and testing data for the neutron/gamma pulse shape discrimination problem. As the popularity of supervised classification methods increases in this field, there will also be increasing reliance on time of flight data for algorithm development and evaluation. However, time of flight experiments are subject to various sources of contamination that lead to neutron and gamma pulses being mislabeled. Such labeling errors have a detrimental effect on classification algorithm training and testing, and should therefore be minimized. This paper presents a method for identifying minimally contaminated data sets from time of flight experiments and estimating the residual contamination rate. This method leverages statistical models describing neutron and gamma travel time distributions and is easily implemented using existing statistical software. The method produces a set of optimal intervals that balance the trade-off between interval size and nuisance particle contamination, and its use is demonstrated on a time of flight data set for Cf-252. The particular properties of the optimal intervals for the demonstration data are explored in detail.
Li, Q., Kailkhura, B., Thiagarajan, J. J. and Varshney, P. K. (2017). "Influential Node Detection in Implicit Social Networks using Multi-task Gaussian Copula Models." Conference on Neural Information Processing Systems. [abstract]
Influential node detection is a central research topic in social network analysis. Many existing methods rely on the assumption that the network structure is completely known a priori. However, in many applications, network structure is unavailable to explain the underlying information diffusion phenomenon. To address the challenge of information diffusion analysis with incomplete knowledge of network structure, we develop a multi-task low rank linear influence model. By exploiting the relationships between contagions, our approach can simultaneously predict the volume (i.e. time series prediction) for each contagion (or topic) and automatically identify the most influential nodes for each contagion. The proposed model is validated using synthetic data and an ISIS twitter dataset. In addition to improving the volume prediction performance significantly, we show that the proposed approach can reliably infer the most influential users for specific contagions.
Lin, Y., Wang, S., Thiagarajan, J. J., Guthrie, G. and Coblentz, D. (2017). "Towards Real-Time Geologic Feature Detection from Seismic Measurements Using a Randomized Machine-Learning Algorithm." SEG Annual Conference. [abstract]
Conventional seismic techniques for detecting the subsurface geologic features are challenged by limited data coverage, computational inefficiency, and subjective human factors. We propose to employ an efficient and accurate machine-learning detection approach to extract useful subsurface geologic features automatically. We employ a data reduction technique in combination with the conventional kernel ridge regression method to improve the computational efficiency and reduce the memory usage. Specifically, we utilize a randomized numerical linear algebra technique to effectively reduce the dimensionality of the feature space without compromising the information content required for accurate detection. We validate the performance of our new subsurface geologic feature detection method using synthetic surface seismic data for a 2D geophysical model. Our numerical examples demonstrate that our new detection method significantly improves the computational efficiency while maintaining comparable accuracy. Interestingly, we show that our method yields a speed-up ratio on the order of ~102 to ~103 in a multi-core computational environment.
Marathe, A., Anirudh, R., Jain, N., Bhatele, A., Thiagarajan, J. J., Kailkhura, B., Yeom, J. S., Rountree, B. and Gamblin, T. (2017). "Performance Modeling Under Resource Constraints Using Deep Transfer Learning." Supercomputing Conference. [abstract]
Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.
Mudigonda, M., Kim, S., Mahesh, A., Kahou, S., Kashinath, K., Williams, D., Michalski, V., O’Brien, T. and Prabhat, M. (2017). "Segmenting and Tracking Extreme Climate Events using Neural Networks." Conference on Neural Information Processing Systems. [abstract]
Predicting extreme weather events in a warming world is one of the most pressing and challenging problems that humanity faces today. Deep learning and advances in the field of computer vision provide a novel and powerful set of tools to tackle this demanding task. However, unlike images employed in computer vision, climate datasets present unique challenges. The channels (or physical variables) in a climate dataset are manifold, and unlike pixel information in computer vision data, these channels have physical properties. We present preliminary work using a convolutional neural network and a recurrent neural network for tracking cyclonic storms. We also show how state-of-the-art segmentation algorithms can be used to segment atmospheric rivers and tropical cyclones in global climate model simulations. We show how the latest advances in machine learning and computer vision can provide solutions to important problems in weather and climate sciences, and we highlight unique challenges and limitations.
Mundhenk, N. T., Kegelmeyer, L. M. and Trummer, S.K. (2017). "Deep learning for evaluating difficult-to-detect incomplete repairs of high fluence laser optics at the National Ignition Facility." Thirteenth International Conference on Quality Control by Artificial Vision. [abstract]
Two machine-learning methods were evaluated to help automate the quality control process for mitigating damage sites on laser optics. The mitigation is a cone-like structure etched into locations on large optics that have been chipped by the high fluence (energy per unit area) laser light. Sometimes the repair leaves a difficult to detect remnant of the damage that needs to be addressed before the optic can be placed back on the beam line. We would like to be able to automatically detect these remnants. We try Deep Learning (convolutional neural networks using features autogenerated from large stores of labeled data, like ImageNet) and find it outperforms ensembles of decision trees (using custom-built features) in finding these subtle, rare, incomplete repairs of damage. We also implemented an unsupervised method for helping operators visualize where the network has spotted problems. This is done by projecting the credit for the result backwards onto the input image. This shows regions in an image most responsible for the networks decision. This can also be used to help understand the black box decisions the network is making and potentially improve the training process.
Pallotta, G., Konjevod, G., Cadena, J. and Nguyen, P. (2017). "Context-aided Analysis of Community Evolution in Networks." Statistical Analysis and Data Mining: The ASA Data Science Journal. [abstract]
We are interested in detecting and analyzing global changes in dynamic networks (networks that evolve with time). More precisely, we consider changes in the activity distribution within the network, in terms of density (ie, edge existence) and intensity (ie, edge weight). Detecting change in local properties, as well as individual measurements or metrics, has been well studied and often reduces to traditional statistical process control. In contrast, detecting change in larger scale structure of the network is more challenging and not as well understood. We address this problem by proposing a framework for detecting change in network structure based on separate pieces: a probabilistic model for partitioning nodes by their behavior, a label-unswitching heuristic, and an approach to change detection for sequences of complex objects. We examine the performance of one instantiation of such a framework using mostly previously available pieces. The dataset we use for these investigations is the publicly available New York City Taxi and Limousine Commission dataset covering all taxi trips in New York City since 2009. Using it, we investigate the evolution of an ensemble of networks under different spatiotemporal resolutions. We identify the community structure by fitting a weighted stochastic block model. We offer insights on different node ranking and clustering methods, their ability to capture the rhythm of life in the Big Apple, and their potential usefulness in highlighting changes in the underlying network structure.
Sakla, W., Konjevod, G. and Mundhenk, N.T. (2017). "Deep Multi-modal Vehicle Detection in Aerial ISR Imagery." IEEE Winter Conference on Applications of Computer Vision. [abstract]
Since the introduction of deep convolutional neural networks (CNNs), object detection in imagery has witnessed substantial breakthroughs in state-of-the-art performance. The defense community utilizes overhead image sensors that acquire large field-of-view aerial imagery in various bands of the electromagnetic spectrum, which is then exploited for various applications, including the detection and localization of human-made objects. In this work, we utilize a recent state-of-the art object detection algorithm, faster R-CNN, to train a deep CNN for vehicle detection in multimodal imagery. We utilize the vehicle detection in aerial imagery (VEDAI) dataset, which contains overhead imagery that is representative of an ISR setting. Our contribution includes modification of key parameters in the faster R-CNN algorithm for this setting where the objects of interest are spatially small, occupying less than 1:5×10-3 of the total image pixels. Our experiments show that (1) an appropriately trained deep CNN leads to average precision rates above 93% on vehicle detection, and (2) transfer learning between imagery modalities is possible, yielding average precision rates above 90% in the absence of fine-tuning.
Song, H., Thiagarajan, J. J., Sattigeri, P. and Spanias, A. (2017). "A Deep Learning Approach to Multiple Kernel Learning." IEEE International Conference on Acoustics, Speech and Signal Processing. [abstract]
Kernel fusion is a popular and effective approach for combining multiple features that characterize different aspects of data. Traditional approaches for Multiple Kernel Learning (MKL) attempt to learn the parameters for combining the kernels through sophisticated optimization procedures. In this paper, we propose an alternative approach that creates dense embeddings for data using the kernel similarities and adopts a deep neural network architecture for fusing the embeddings. In order to improve the effectiveness of this network, we introduce the kernel dropout regularization strategy coupled with the use of an expanded set of composition kernels. Experiment results on a real-world activity recognition dataset show that the proposed architecture is effective in fusing kernels and achieves state-of-the-art performance.
Zheng, P., Aravkin, A. Y., Ramamurthy, K. and Thiagarajan, J.J. (2017). "Learning Robust Representations for Computer Vision." IEEE International Conference on Computer Vision Workshops. [abstract]
Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets.