Alom, Z., Taha, T., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, S., Hasan, M., Van Essen, B., Awwal, A., and Asari, V. (2019). “A State-of-the-Art Survey on Deep Learning Theory and Architectures.” Electronics. []

In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the-art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others. This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network (DNN). The survey goes on to cover Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). Additionally, we have discussed recent developments, such as advanced variant DL techniques based on these DL approaches. This work considers most of the papers published after 2012 from when the history of deep learning began. Furthermore, DL approaches that have been explored and evaluated in different application domains are also included in this survey. We also included recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys that have been published on DL using neural networks and a survey on Reinforcement Learning (RL). However, those papers have not discussed individual advanced techniques for training large-scale deep learning models and the recently developed method of generative models.

Anirudh, R. and Thiagarajan, J.J. (2019). “Bootstrapping Graph Convolutional Neural Networks for Autism Spectrum Disorder Classification.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

Using predictive models to identify patterns that can act as biomarkers for different neuropathological conditions is becoming highly prevalent. In this paper, we consider the problem of Autism Spectrum Disorder (ASD) classification where previous work has shown that it can be beneficial to incorporate a wide variety of meta features, such as socio-cultural traits, into predictive modeling. A graph-based approach naturally suits these scenarios, where a contextual graph captures traits that characterize a population, while the specific brain activity patterns are utilized as a multivariate signal at the nodes. Graph neural networks have shown improvements in inferencing with graph-structured data. Though the underlying graph strongly dictates the overall performance, there exists no systematic way of choosing an appropriate graph in practice, thus making predictive models non-robust. To address this, we propose a bootstrapped version of graph convolutional neural networks (G-CNNs) that utilizes an ensemble of weakly trained G-CNNs, and reduce the sensitivity of models on the choice of graph construction. We demonstrate its effectiveness on the challenging Autism Brain Imaging Data Exchange (ABIDE) dataset and show that our approach improves upon recently proposed graph-based neural networks. We also show that our method remains more robust to noisy graphs.

Chu, A., Nguyen, D., Talathi, S.S., (…), Stolaroff, J.K., and Giera, B. (2019). “Automated Detection and Sorting of Microencapsulation: via Machine Learning.” Lab on a Chip. []

Microfluidic-based microencapsulation requires significant oversight to prevent material and quality loss due to sporadic disruptions in fluid flow that routinely arise. State-of-the-art microcapsule production is laborious and relies on experts to monitor the process, e.g. through a microscope. Unnoticed defects diminish the quality of collected material and/or may cause irreversible clogging. To address these issues, we developed an automated monitoring and sorting system that operates on consumer-grade hardware in real-time. Using human-labeled microscope images acquired during typical operation, we train a convolutional neural network that assesses microencapsulation. Based on output from the machine learning algorithm, an integrated valving system collects desirable microcapsules or diverts waste material accordingly. Although the system notifies operators to make necessary adjustments to restore microencapsulation, we can extend the system to automate corrections. Since microfluidic-based production platforms customarily collect image and sensor data, machine learning can help to scale up and improve microfluidic techniques beyond microencapsulation.

Cong, G., Domeniconi, G., Shapiro, J., Zhou, F., and Chen, B. (2019). “Accelerating Deep Neural Network Training for Action Recognition on a Cluster of GPUs.” Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing. []

Due to the additional temporal dimension, large-scale video action recognition is even more challenging than image recognition and typically takes days to train on modern GPUs even for modest-sized datasets. We propose algorithms and techniques to accelerate training of deep neural networks for action recognition on a cluster of GPUs. In terms of convergence and scaling, our distributed training algorithm with adaptive batch size is provably superior to popular asynchronous stochastic gradient descent algorithms. The convergence analysis of our algorithm shows it is possible to reduce communication cost and at the same time minimize the number of iterations needed for convergence. We customize the Adam optimizer for our distributed algorithm to improve efficiency. In addition, we employ transfer-learning to further reduce training time while improving validation accuracy. Compared with the base-line single-GPU stochastic gradient descent implementation of the two-stream training approach, our implementation achieves super-linear speedups on 16 GPUs while improving validation accuracy. For the UCFI0l and HMDB51 datasets, the validation accuracies achieved are 93.1 % and 67.9% respectively. As far as we know, these are the highest accuracies achieved with the two-stream approach that does not involve computationally expensive 3D convolutions or pretraining on much larger datasets.

Cong, G., Domeniconi, G., Yang, C.-C., Shapiro, J., and Chen, B. (2019). “Video Action Recognition with an Additional End-To-End Trained Temporal Stream.” 2019 IEEE Winter Conference on Applications of Computer Vision. []

Detecting actions in videos requires understanding the temporal relationships among frames. Typical action recognition approaches rely on optical flow estimation methods to convey temporal information to a CNN. Recent studies employ 3D convolutions in addition to optical flow to process the temporal information. While these models achieve slightly better results than two-stream 2D convolutional approaches, they are significantly more complex, requiring more data and time to be trained. We propose an efficient, adaptive batch size distributed training algorithm with customized optimizations for training the two 2D streams. We introduce a new 2D convolutional temporal stream that is trained end-to-end with a neural network. The flexibility to freeze some network layers from training in this temporal stream brings the possibility of ensemble learning with more than one temporal streams. Our architecture that combines three streams achieves the highest accuracies as we know of on UCF101 and HMDB51 by systems that do not pretrain on much larger datasets (e.g., Kinetics). We achieve these results while keeping our spatial and temporal streams 4.67x faster to train than the 3D convolution approaches.

Deelman, E., Mandal, A., Jiang, M., and Sakellariou, R. (2019). “The Role of Machine Learning in Scientific Workflows.” International Journal of High Performance Computing Applications. []

Machine learning (ML) is being applied in a number of everyday contexts from image recognition, to natural language processing, to autonomous vehicles, to product recommendation. In the science realm, ML is being used for medical diagnosis, new materials development, smart agriculture, DNA classification, and many others. In this article, we describe the opportunities of using ML in the area of scientific workflow management. Scientific workflows are key to today’s computational science, enabling the definition and execution of complex applications in heterogeneous and often distributed environments. We describe the challenges of composing and executing scientific workflows and identify opportunities for applying ML techniques to meet these challenges by enhancing the current workflow management system capabilities. We foresee that as the ML field progresses, the automation provided by workflow management systems will greatly increase and result in significant improvements in scientific productivity.

Dryden, N., Maruyama, N., Benson, T., Moon, T., Snir, M., and Van Essen, B. (2019). “Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism.” International Parallel and Distributed Processing Symposium. []

Scaling CNN training is necessary to keep up with growing datasets and reduce training time. We also see an emerging need to handle datasets with very large samples, where memory requirements for training are large. Existing training frameworks use a data-parallel approach that partitions samples within a mini-batch, but limits to scaling the mini-batch size and memory consumption makes this untenable for large samples. We describe and implement new approaches to convolution, which parallelize using spatial decomposition or a combination of sample and spatial decomposition. This introduces many performance knobs for a network, so we develop a performance model for CNNs and present a method for using it to automatically determine efficient parallelization strategies. We evaluate our algorithms with microbenchmarks and image classification with ResNet-50. Our algorithms allow us to prototype a model for a mesh-tangling dataset, where sample sizes are very large. We show that our parallelization achieves excellent strong and weak scaling and enables training for previously unreachable datasets.

Dryden, N., Maruyama, N., Moon, T., (…), Snir, M., and Van Essen, B. (2019). “Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems.” Proceedings of Machine Learning in HPC Environments and the International Conference for High Performance Computing, Networking, Storage and Analysis. []

We identify communication as a major bottleneck for training deep neural networks on large-scale GPU clusters, taking over 10x as long as computation. To reduce this overhead, we discuss techniques to overlap communication and computation as much as possible. This leads to much of the communication being latency-bound instead of bandwidth-bound, and we find that using a combination of latency- and bandwidth-optimized allreduce algorithms significantly reduces communication costs. We also discuss a semantic mismatch between MPI and CUDA that increases overheads and limits asynchrony, and propose a solution that enables communication to be aware of CUDA streams. We implement these optimizations in the open-source Aluminum communication library, enabling optimized, asynchronous, GPU-aware communication. Aluminum demonstrates improved performance in benchmarks and end-to-end training of deep networks, for both strong and weak scaling.

Endrei, M., Jin, C., Dinh, M.N., (…)., DeRose, L., and de Supinski, B.R. (2019). “Statistical and Machine Learning Models for Optimizing Energy in Parallel Applications.” International Journal of High Performance Computing Applications. []

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Fan, Y.J. (2019). “Autoencoder Node Saliency: Selecting Relevant Latent Representations.” Pattern Recognition. []

The autoencoder is an artificial neural network that performs nonlinear dimension reduction and learns hidden representations of unlabeled data. With a linear transfer function it is similar to the principal component analysis (PCA). While both methods use weight vectors for linear transformations, the autoencoder does not come with any indication similar to the eigenvalues in PCA that are paired with eigenvectors. We propose a novel autoencoder node saliency method that examines whether the features constructed by autoencoders exhibit properties related to known class labels. The supervised node saliency ranks the nodes based on their capability of performing a learning task. It is coupled with the normalized entropy difference (NED). We establish a property for NED values to verify classifying behaviors among the top ranked nodes. By applying our methods to real datasets, we demonstrate their ability to provide indications on the performing nodes and explain the learned tasks in autoencoders.

Humbird, K.D., Peterson, J.L., and Mcclarren, R.G. (2019). “Deep Neural Network Initialization with Decision Trees.” IEEE Transactions on Neural Networks and Learning Systems. []

In this paper, a novel, automated process for constructing and initializing deep feedforward neural networks based on decision trees is presented. The proposed algorithm maps a collection of decision trees trained on the data into a collection of initialized neural networks with the structures of the networks determined by the structures of the trees. The tree-informed initialization acts as a warm-start to the neural network training process, resulting in efficiently trained, accurate networks. These models, referred to as 'deep jointly informed neural networks' (DJINN), demonstrate high predictive performance for a variety of regression and classification data sets and display comparable performance to Bayesian hyperparameter optimization at a lower computational cost. By combining the user-friendly features of decision tree models with the flexibility and scalability of deep neural networks, DJINN is an attractive algorithm for training predictive models on a wide range of complex data sets.

Kafle, S., Gupta, V., Kailkhura, B., Wimalajeewa, T., and Varshney, P.K. (2019). “Joint Sparsity Pattern Recovery with 1-b Compressive Sensing in Distributed Sensor Networks.” IEEE Transactions on Signal and Information Processing over Networks. []

In this paper, we study the problem of joint sparse support recovery with 1-b quantized compressive measurements in a distributed sensor network. Multiple nodes in the network are assumed to observe sparse signals having the same but unknown sparse support. Each node quantizes its measurement vector element-wise to 1-b. First, we consider that all the quantized measurements are available at a central fusion center. We derive performance bounds for sparsity pattern recovery using 1-bit quantized measurements from multiple sensors when the maximum likelihood decoder is employed. We further develop two computationally tractable algorithms for joint sparse support recovery in the centralized setting. One algorithm minimizes a cost function defined as the sum of the likelihood function and the l 1,∞ quasi-norm, while the other algorithm extends the binary iterative hard thresholding algorithm to the multiple measurement vector case. Second, we consider a decentralized setting where each node transmits 1-b measurements to its one-hop neighbors. The basic idea behind the algorithms developed in the decentralized setting is to embed collaboration among nodes and fusion strategies. We show that even with noisy 1-b compressed measurements, joint support recovery can be carried out accurately in both centralized and decentralized settings. We further show that the performance of the proposed 1-bit compressive sensing-based algorithms is very close to that of their real-valued counterparts except when the signal-to-noise ratio is very small.

Kailkhura, B., Gallagher, B., Kim, S., Hiszpanski, A., and Yong-Jin Han, T. (2019). “Reliable and explainable machine-learning methods for accelerated material discovery.” npj Computational Materials. []

Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.

Kim, S., Kim, H., Yoon, S., Lee, J., Kahou, S., Kashinath, K., and Prabhat, M. (2019). “Deep-Hurricane-Tracker: Tracking and Predicting Extreme Climate Events using ConvLSTM.” 2019 IEEE Winter Conference on Applications of Computer Vision. []

Tracking and predicting extreme events in large-scale spatio-temporal climate data are long standing challenges in climate science. In this paper, we propose Convolutional LSTM (ConvLSTM)-based spatio-temporal models to track and predict hurricane trajectories from large-scale climate data; namely, pixel-level spatio-temporal history of tropical cyclones. To address the tracking problem, we model time sequential density maps of hurricane trajectories, enabling to capture not only the temporal dynamics but also spatial distribution of the trajectories. Furthermore, we introduce a new trajectory prediction approach as a problem of sequential forecasting from past to future hurricane density map sequences. Extensive experiment on actual 20 years record shows that our ConvLSTM-based tracking model significantly outperforms existing approaches, and that the proposed forecasting model achieves successful mapping from predicted density map to ground truth.

Leach, W., Henrikson, J., Hatarik, R., (…), Palmer, N., and Rever, M. (2019). “Using Convolutional Neural Networks to Classify Static X-ray Imager Diagnostic Data at the National Ignition Facility.” Proceedings of the International Society for Optical Engineering. []

Hohlraums convert the laser energy at the National Ignition Facility (NIF) into X-ray energy to compress and implode a fusion capsule, creating fusion. The Static X-ray Imager (SXI) diagnostic collects time-integrated images of hohlraum wall X-ray illumination patterns viewed through the laser entrance hole (LEH). NIF image processing algorithms calculate the size and location of the LEH opening from the SXI images. Images obtained come from different experimental categories and camera setups and occasionally do not contain applicable or usable information. Unexpected experimental noise in the data can also occur where affected images should be removed and not run through the processing algorithms. Current approaches to try and identify these types of images are done manually and on a case-by-case basis, which can be prohibitively time-consuming. In addition, the diagnostic image data can be sparse (missing segments or pieces) and may lead to false analysis results. There exists, however, an abundant variety of image examples in the NIF database. Convolutional Neural Networks (CNNs) have been shown to work well with this type of data and under these conditions. The objective of this work was to apply transfer learning and fine tune a pre-trained CNN using a relatively small-scale dataset (∼1500 images) and determine which instances contained useful image data. Experimental results are presented that show that CNNs can readily identify useful image data while filtering out undesirable images. The CNN filter is currently being used in production at the NIF.

Maiti, A. (2019). “Second-order Statistical Bootstrap for the Uncertainty Quantification of Time-temperature-superposition Analysis.” Rheologica Acta. []

Time-temperature superposition (TTS), which for decades has been a powerful method for long-term prediction from accelerated aging data, involves rigid-shifting isotherms in logarithmic time to produce a single master prediction curve. For simple thermo-rheological properties that accurately follow the TTS principle, the shifts can be easily determined, even manually by the eye. However, for many properties of interest, where the principle is obeyed only approximately, or the data is noisy, it is imperative to develop objective shifting techniques along with reliable uncertainty bounds. This work analyzes in detail the method of arclength-minimization as an unsupervised algorithm to determining optimum shifts and demonstrates that the method is nearly unbiased for all practical datasets with a variety of noise distributions. Moreover, if averaged over with-replacement (bootstrap) resamples, the predicted shifts follow a normal distribution, a fact that can be used to construct confidence interval for the master curve through a second-order bootstrap procedure.

Narayanaswamy, V.S., Thiagarajan, J.J., Song, H., and Spanias, A. (2019). “Designing an Effective Metric Learning Pipeline for Speaker Diarization.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

State-of-the-art speaker diarization systems utilize knowledge from external data, in the form of a pre-trained distance metric, to effectively determine relative speaker identities to unseen data. However, much of recent focus has been on choosing the appropriate feature extractor, ranging from pre-trained i-vectors to representations learned via different sequence modeling architectures (e.g. 1D-CNNs, LSTMs, attention models), while adopting off-the-shelf metric learning solutions. In this paper, we argue that, regardless of the feature extractor, it is crucial to carefully design a metric learning pipeline, namely the loss function, the sampling strategy and the discriminative margin parameter, for building robust diarization systems. Furthermore, we propose to adopt a fine-grained validation process to obtain a comprehensive evaluation of the generalization power of metric learning pipelines. To this end, we measure diarization performance across different language speakers, and variations in the number of speakers in a recording. Using empirical studies, we provide interesting insights into the effectiveness of different design choices and make recommendations.

Nathan, E., Sanders, G., and Henson, V.E. (2019). “Personalized Ranking in Dynamic Graphs Using Nonbacktracking Walks.” Lecture Notes in Computer Science, including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. []

Centrality has long been studied as a method of identifying node importance in networks. In this paper we study a variant of several walk-based centrality metrics based on the notion of a nonbacktracking walk, where the pattern i→ j → i is forbidden in the walk. Specifically, we focus our analysis on dynamic graphs, where the underlying data stream the network is drawn from is constantly changing. Efficient algorithms for calculating nonbacktracking walk centrality scores in static and dynamic graphs are provided and experiments on graphs with several million vertices and edges are conducted. For the static algorithm, comparisons to a traditional linear algebraic method of calculating scores show that our algorithm produces scores of high accuracy within a theoretically guaranteed bound. Comparisons of our dynamic algorithm to the static show speedups of several orders of magnitude as well as a significant reduction in space required.

Petersen, B.K., Yang, J., Grathwohl, W.S., (…), An, G., and Faissol, D.M. (2019). “Deep Reinforcement Learning and Simulation as a Path toward Precision Medicine.” Journal of Computational Biology. []

Traditionally, precision medicine involves classifying patients to identify subpopulations that respond favorably to specific therapeutics. We pose precision medicine as a dynamic feedback control problem, where treatment administered to a patient is guided by measurements taken during the course of treatment. We consider sepsis, a life-threatening condition in which dysregulation of the immune system causes tissue damage. We leverage an existing simulation of the innate immune response to infection and apply deep reinforcement learning (DRL) to discover an adaptive personalized treatment policy that specifies effective multicytokine therapy to simulated sepsis patients based on systemic measurements. The learned policy achieves a dramatic reduction in mortality rate over a set of 500 simulated patients relative to standalone antibiotic therapy. Advantages of our approach are threefold: (1) the use of simulation allows exploring therapeutic strategies beyond clinical practice and available data, (2) advances in DRL accommodate learning complex therapeutic strategies for complex biological systems, and (3) optimized treatments respond to a patient's individual disease progression over time, therefore, capturing both differences across patients and the inherent randomness of disease progression within a single patient. We hope that this work motivates both considering adaptive personalized multicytokine mediation therapy for sepsis and exploiting simulation with DRL for precision medicine more broadly.

Reza, T., Ripeanu, M. Tripoul, N., Sanders, G., and Pearce, R. (2019). “PruneJuice: Pruning Trillion-edge Graphs to a Precise Pattern-matching Solution.” Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. []

Pattern matching is a powerful graph analysis tool. Unfortunately, existing solutions have limited scalability, support only a limited set of search patterns, and/or focus on only a subset of the real-world problems associated with pattern matching. This paper presents a new algorithmic pipeline that: (i) enables highly scalable pattern matching on labeled graphs, (ii) supports arbitrary patterns, (iii) enables trade-offs between precision and time-to-solution (while always selecting all vertices and edges that participate in matches, thus offering 100% recall), and (iv) supports a set of popular data analytics scenarios. We implement our approach on top of HavoqGT and demonstrate its advantages through strong and weak scaling experiments on massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) graphs, respectively, and at scales (1,024 nodes / 36,864 cores) orders of magnitude larger than used in the past for similar problems.

Roberts, R. S., Goforth, J.W., Weinert, G.F., (…), Stinson, B.J., and Duncan, A.M. (2019). “Automated Annotation of Satellite Imagery using Model-based Projections.” Proceedings of the Applied Imagery Pattern Recognition Workshop. []

GeoVisipedia is a new and novel approach to annotating satellite imagery. It uses wiki pages to annotate objects rather than simple labels. The use of wiki pages to contain annotations is particularly useful for annotating objects in imagery of complex geospatial configurations such as industrial facilities. GeoVisipedia uses the PRISM algorithm to project annotations applied to one image to other imagery, hence enabling ubiquitous annotation. This paper derives the PRISM algorithm, which uses image metadata and a 3D facility model to create a view matrix unique to each image. The view matrix is used to project model components onto a mask which aligns the components with the objects in the scene that they represent. Wiki pages are linked to model components, which are in turn linked to the image via the component mask. An illustration of the efficacy of the PRISM algorithm is provided, demonstrating the projection of model components onto an effluent stack. We conclude with a discussion of the efficiencies of GeoVisipedia over manual annotation, and the use of PRISM for creating training sets for machine learning algorithms.

Shukla, R., Lipasti, M., Van Essen, B., Moody, A., and Maruyama, N. (2019). “Remodel: Rethinking Deep CNN Models to Detect and Count on a Neurosynaptic System.” Frontiers in Neuroscience. []

In this work, we perform analysis of detection and counting of cars using a low-power IBM TrueNorth Neurosynaptic System. For our evaluation we looked at a publicly-available dataset that has overhead imagery of cars with context present in the image. The trained neural network for image analysis was deployed on the NS16e system using IBM's EEDN training framework. Through multiple experiments we identify the architectural bottlenecks present in TrueNorth system that does not let us deploy large neural network structures. Following these experiments we propose changes to CNN model to circumvent these architectural bottlenecks. The results of these evaluations have been compared with caffe-based implementations of standard neural networks that were deployed on a Titan-X GPU. Results showed that TrueNorth can detect cars from the dataset with 97.60% accuracy and can be used to accurately count the number of cars in the image with 69.04% accuracy. The car detection accuracy and car count (-/+ 2 error margin) accuracy are comparable to high-precision neural networks like AlexNet, GoogLeNet, and ResCeption, but show a manifold improvement in power consumption.

Thiagarajan, J.J., Anirudh, R., Sridhar, R., and Bremer, P.-T. (2019). “Unsupervised Dimension Selection Using a Blue Noise Graph Spectrum.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

Unsupervised dimension selection is an important problem that seeks to reduce dimensionality of data, while preserving the most useful characteristics. While dimensionality reduction is commonly utilized to construct low-dimensional embeddings, they produce feature spaces that are hard to interpret. Further, in applications such as sensor design, one needs to perform reduction directly in the input domain, instead of constructing transformed spaces. Consequently, dimension selection (DS) aims to solve the combinatorial problem of identifying the top-k dimensions, which is required for effective experiment design, reducing data while keeping it interpretable, and designing better sensing mechanisms. In this paper, we develop a novel approach for DS based on graph signal analysis to measure feature influence. By analyzing synthetic graph signals with a blue noise spectrum, we show that we can measure the importance of each dimension. Using experiments in supervised learning and image masking, we demonstrate the superiority of the proposed approach over existing techniques in capturing crucial characteristics of high dimensional spaces, using only a small subset of the original features.

Thiagarajan, J.J., Kim, I., Anirudh, R., and Bremer, P.-T. (2019). “Understanding Deep Neural Networks through Input Uncertainties.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

Techniques for understanding the functioning of complex machine learning models are becoming increasingly popular, not only to improve the validation process, but also to extract new insights about the data via exploratory analysis. Though a large class of such tools currently exists, most assume that predictions are point estimates and use a sensitivity analysis of these estimates to interpret the model. Using lightweight probabilistic networks we show how including prediction uncertainties in the sensitivity analysis leads to: (i) more robust and generalizable models; and (ii) a new approach for model interpretation through uncertainty decomposition. In particular, we introduce a new regularization that takes both the mean and variance of a prediction into account and demonstrate that the resulting networks provide improved generalization to unseen data. Furthermore, we propose a new technique to explain prediction uncertainties through uncertainties in the input domain, thus providing new ways to validate and interpret deep learning models.

Thiagarajan, J., Rajan, D., and Sattigeri, P. (2019). “Understanding Behavior of Clinical Models under Domain Shifts.” 2019 KDD Workshop on Applied Data Science for Healthcare. []

The hypothesis that computational models can be reliable enough to be adopted in prognosis and patient care is revolutionizing healthcare. Deep learning, in particular, has been a game changer in building predictive models, thus leading to community-wide data curation efforts. However, due to inherent variabilities in population characteristics and biological systems, these models are often biased to the training datasets. This can be limiting when models are deployed in new environments, when there are systematic domain shifts not known a priori. In this paper, we propose to emulate a large class of domain shifts, that can occur in clinical settings, with a given dataset, and argue that evaluating the behavior of predictive models in light of those shifts is an effective way to quantify their reliability. More specifically, we develop an approach for building realistic scenarios, based on analysis of \textit{disease landscapes} in multi-label classification. Using the openly available MIMIC-III EHR dataset for phenotyping, for the first time, our work sheds light into data regimes where deep clinical models can fail to generalize. This work emphasizes the need for novel validation mechanisms driven by real-world domain shifts in AI for healthcare.

Thopalli, K., Anirudh, R., Thiagarajan, J.J., and Turaga, P. (2019). “Multiple Subspace Alignment Improves Domain Adaptation.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

We present a novel unsupervised domain adaptation (DA) method for cross-domain visual recognition. Though subspace methods have found success in DA, their performance is often limited due to the assumption of approximating an entire dataset using a single low-dimensional subspace. Instead, we develop a method to effectively represent the source and target datasets via a collection of low-dimensional subspaces, and subsequently align them by exploiting the natural geometry of the space of subspaces, on the Grassmann manifold. We demonstrate the effectiveness of this approach, using empirical studies on two widely used benchmarks, with performance on par or better than the performance of the state-of-the-art domain adaptation methods.

Tran, K., Panahi, A., Adiga, A., Sakla, W., and Krim, H. (2019). “Nonlinear Multi-scale Super-resolution Using Deep Learning.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

We propose a deep learning architecture capable of performing up to 8× single image super-resolution. Our architecture incorporates an adversarial component from the super-resolution generative adversarial networks (SRGANs) and a multi-scale learning component from the multiple scale super-resolution network (MSSRNet), which only together can recover smaller structures inherent in satellite images. To further enhance our performance, we integrate progressive growing and training to our network. This, aided by feed forwarding connections in the network to move along and enrich information from previous inputs, produces super-resolved images at scaling factors of 2, 4, and 8. To ensure and enhance the stability of GANs, we employ Wasserstein GANs (WGANs) during training. Experimentally, we find that our architecture can recover small objects in satellite images during super-resolution whereas previous methods cannot.

Tripoul, N., Halawa, H., Reza, T., (…), Pearce, R., and Ripeanu, M. (2019). “There Are Trillions of Little Forks in the Road. Choose Wisely! Estimating the Cost and Likelihood of Success of Constrained Walks to Optimize a Graph Pruning Pipeline.” Proceedings of IA3 2018: 8th Workshop on Irregular Applications: Architectures and Algorithms, and the International Conference for High Performance Computing, Networking, Storage and Analysis. []

We have developed [Reza et al. SC'18] a highly scalable algorithmic pipeline for pattern matching in labeled graphs and demonstrated it on trillion-edge graphs. This pipeline: (i) supports arbitrary search patterns, (ii) identifies all the vertices and edges that participate in matches - offering 100% precision and recall, and (iii) supports realistic data analytics scenarios. This pipeline is based on graph pruning: it decomposes the search template into individual constraints and uses them to repeatedly prune the graph to a final solution. Our current solution, however, makes a number of ad-hoc intuition-based decisions with impact on performance. In a nutshell these relate to (i) constraint selection - which constraints to generate? (ii) constraint ordering - in which order to use them? and (iii) individual constraint generation - how to best verify them? This position paper makes the observation that by estimating the runtime cost and likelihood of success of a constrained walk in a labeled graph one can inform these optimization decisions. We propose a preliminary solution to make these estimates, and demonstrate - using a prototype shared-memory implementation - that this: (i) is feasible with low overheads, and (ii) offers accurate enough information to optimize our pruning pipeline by a significant margin.

Veldt, N., Klymko, C., and Gleich, D.F. (2019). “Flow-based Local Graph Clustering with Better Seed Set Inclusion.” SIAM International Conference on Data Mining. []

Flow-based methods for local graph clustering have received significant recent attention for their theoretical cut improvement and runtime guarantees. In this work we present two improvements for using flow-based methods in real-world semi-supervised clustering problems. Our first contribution is a generalized objective function that allows practitioners to place strict and soft penalties on excluding specific seed nodes from the output set. This feature allows us to avoid the tendency, often exhibited by previous flow-based methods, to contract a large seed set into a small set of nodes that does not contain all or even most of the seed nodes. Our second contribution is a fast algorithm for minimizing our generalized objective function, based on a variant of the push-relabel algorithm for computing preflows. We make our approach very fast in practice by implementing a global relabeling heuristic and employing a warm-start procedure to quickly solve related cut problems. In practice our algorithm is faster than previous related flow-based methods, and is also more robust in detecting ground truth target regions in a graph thanks to its ability to better incorporate semi-supervised information about target clusters.

White, D.A., Arrighi, W.J., Kudo, J., and Watts, S.E. (2019). “Multiscale Topology Optimization Using Neural Network Surrogate Models.” Computer Methods in Applied Mechanics and Engineering. []

We are concerned with optimization of macroscale elastic structures that are designed utilizing spatially varying microscale metamaterials. The macroscale optimization is accomplished using gradient-based nonlinear topological optimization. But instead of using density as the optimization decision variable, the decision variables are the multiple parameters that define the local microscale metamaterial. This is accomplished using single layer feedforward Gaussian basis function networks as a surrogate models of the elastic response of the microscale metamaterial. The surrogate models are trained using highly resolved continuum finite element simulations of the microscale metamaterials and hence are significantly more accurate than analytical models e.g. classical beam theory. Because the derivative of the surrogate model is important for sensitivity analysis of the macroscale topology optimization, a neural network training procedure based on the Sobolev norm is described. Since the SIMP method is not appropriate for spatially varying lattices, an alternative method is developed to enable creation of void regions. The efficacy of this approach is demonstrated via several examples in which the optimal graded metamaterial outperforms a traditional solid structure.

Yuan, B., Giera, B., Guss, G., Matthews, M., and McMains, S. (2019). “Semi-supervised Convolutional Neural Networks for in-situ Video Monitoring of Selective Laser Melting.” IEEE Winter Conference on Applications of Computer Vision. []

Selective Laser Melting (SLM) is a metal additive manufacturing technique. The lack of SLM process repeatability is a barrier for industrial progression. SLM product quality is hard to control, even when using fixed system settings. Thus SLM could benefit from a monitoring system that provides quality assessments in real-time. Since there is no publicly available SLM dataset, we ran experiments to collect over one thousand SLM videos, measured the physical output via height map images, and applied a proposed image processing algorithm to them to produce a dataset for semi-supervised learning. Then we trained convolutional neural networks (CNNs) to recognize desired quality metrics from videos. Experimental results demonstrate the effectiveness of our proposed monitoring approach and also show that the semi-supervised model can mitigate the time and expense of labeling an entire SLM dataset.


Anirudh, R., Kim, H., Thiagarajan, J. J., Mohan, K. A., Champley, K. and Bremer, P.T. (2018). “Lose the Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion.” Conference on Computer Vision and Pattern Recognition. []

Computed Tomography (CT) reconstruction is a fundamental component to a wide variety of applications ranging from security, to healthcare. The classical techniques require measuring projections, called sinograms, from a full 180° view of the object. This is impractical in a limited angle scenario, when the viewing angle is less than 180°, which can occur due to different factors including restrictions on scanning time, limited flexibility of scanner rotation, etc. The sinograms obtained as a result, cause existing techniques to produce highly artifact-laden reconstructions. In this paper, we propose to address this problem through implicit sinogram completion, on a challenging real world dataset containing scans of common checked-in luggage. We propose a system, consisting of 1D and 2D convolutional neural networks, that operates on a limited angle sinogram to directly produce the best estimate of a reconstruction. Next, we use the x-ray transform on this reconstruction to obtain a “completed” sinogram, as if it came from a full 180° measurement. We feed this to standard analytical and iterative reconstruction techniques to obtain the final reconstruction. We show with extensive experimentation that this combined strategy outperforms many competitive baselines. We also propose a measure of confidence for the reconstruction that enables a practitioner to gauge the reliability of a prediction made by our network. We show that this measure is a strong indicator of quality as measured by the PSNR, while not requiring ground truth at test time. Finally, using a segmentation experiment, we show that our reconstruction preserves the 3D structure of objects effectively.

Kamath, C. and Fan, Y.J. (2018). “Compressing Unstructured Mesh Data Using Spline Fits, Compressed Sensing, and Regression Methods.” IEEE Global Conference on Signal and Information Processing. []

Compressing unstructured mesh data from computer simulations poses several challenges that are not encountered in the compression of images or videos. Since the spatial locations of the points are not on a regular grid, as in an image, it is difficult to identify near neighbors of a point whose values can be exploited for compression. In this paper, we investigate how three very different methods—spline fits, compressed sensing, and kernel regression—compare in terms of the reconstruction accuracy and reduction in data size when applied to a practical problem from a plasma physics simulation.

Kamath, C. and Fan, Y. (2018). "Regression with small data sets: A case study using code surrogates in additive manufacturing." Knowledge and Information Systems: An International Journal. []

There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. However, there are some problems where collecting even a single data point is very expensive, resulting in data sets with only tens or hundreds of samples. One such problem is that of building code surrogates, where a computer simulation is run using many different values of the input parameters and a regression model is built to relate the outputs of the simulation to the inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments, but the cost of running expensive simulations at many samples points can be high. In this paper, we use a problem from the domain of additive manufacturing to show that even with small data sets, we can build good quality surrogates by appropriately selecting the input samples and the regression algorithm. Our work is broadly applicable to simulations in other domains and the ideas proposed can be used in time-constrained machine learning tasks, such as hyper-parameter optimization.

Lin, Y., Wang, S., Thiagarajan, J. J., Guthrie, G. and Coblentz, D. (2018). "Efficient Data-Driven Geologic Feature Characterization from Pre-stack Seismic Measurements using Randomized Machine-Learning Algorithm." Geophysical Journal International. []

Conventional seismic techniques for detecting the subsurface geologic features are challenged by limited data coverage, computational inefficiency, and subjective human factors. We developed a novel data-driven geological feature characterization approach based on pre-stack seismic measurements. Our characterization method employs an efficient and accurate machine-learning method to extract useful subsurface geologic features automatically. Specifically, our method is based on the kernel ridge regression model. The conventional kernel ridge regression can be computationally prohibitive because of the large volume of seismic measurements. We employ a data reduction technique in combination with the conventional kernel ridge regression method to improve the computational efficiency and reduce memory usage. In particular, we utilize a randomized numerical linear algebra technique, named Nyström method, to effectively reduce the dimensionality of the feature space without compromising the information content required for accurate characterization. We provide thorough computational cost analysis to show the efficiency of our new geological feature characterization methods. We further validate the performance of our new subsurface geologic feature characterization method using synthetic surface seismic data for 2D acoustic and elastic velocity models. Our numerical examples demonstrate that our new characterization method significantly improves the computational efficiency while maintaining comparable accuracy. Interestingly, we show that our method yields a speed-up ratio on the order of ∼ 102 to ∼ 103 in a multi-core computational environment.

Liu, S., Bremer, P.T., Thiagarajan, J. J., Srikumar, V., Wang, B., Livnat, Y. and Pascucci, V. (2018). "Visual Exploration of Semantic Relationships in Neural Word Embeddings." IEEE Transactions on Visualization and Computer Graphics. []

Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). However, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. In particular, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or even misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. Here, we introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.

Mundhenk, T. N., Ho, D., Chen, B. Y. (2018). "Improvements to context based self-supervised learning." Conference on Computer Vision and Pattern Recognition. []

We develop a set of methods to improve on the results of self-supervised learning using context. We start with a baseline of patch based arrangement context learning and go from there. Our methods address some overt problems such as chromatic aberration as well as other potential problems such as spatial skew and mid-level feature neglect. We prevent problems with testing generalization on common self-supervised benchmark tests by using different datasets during our development. The results of our methods combined yield top scores on all standard self-supervised benchmarks, including classification and detection on PASCAL VOC 2007, segmentation on PASCAL VOC 2012, and "linear tests" on the ImageNet and CSAIL Places datasets. We obtain an improvement over our baseline method of between 4.0 to 7.1 percentage points on transfer learning classification tests. We also show results on different standard network architectures to demonstrate generalization as well as portability.

Rajan, D., and Thiagarajan, J.J. (2018). “A Generative Modeling Approach to Limited Channel ECG Classification.” Conference proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society. []

Processing temporal sequences is central to a variety of applications in health care, and in particular multi-channel Electrocardiogram (ECG) is a highly prevalent diagnostic modality that relies on robust sequence modeling. While Recurrent Neural Networks (RNNs) have led to significant advances in automated diagnosis with time-series data, they perform poorly when models are trained using a limited set of channels. A crucial limitation of existing solutions is that they rely solely on discriminative models, which tend to generalize poorly in such scenarios. In order to combat this limitation, we develop a generative modeling approach to limited channel ECG classification. This approach first uses a Seq2Seq model to implicitly generate the missing channel information, and then uses the latent representation to perform the actual supervisory task. This decoupling enables the use of unsupervised data and also provides highly robust metric spaces for subsequent discriminative learning. Our experiments with the Physionet dataset clearly evidence the effectiveness of our approach over standard RNNs in disease prediction.

Song H., Rajan D., Thiagarajan, J. J. and Spanias, A. (2018). "Attend and Diagnose: Clinical Time Series Analysis using Attention Models." AAAI Conference. []

With widespread adoption of electronic health records, there is an increased emphasis for predictive models that can effectively deal with clinical time-series data. Powered by Recurrent Neural Network (RNN) architectures with Long Short-Term Memory (LSTM) units, deep neural networks have achieved state-of-the-art results in several clinical prediction tasks. Despite the success of RNNs, its sequential nature prohibits parallelized computing, thus making it inefficient particularly when processing long sequences. Recently, architectures which are based solely on attention mechanisms have shown remarkable success in transduction tasks in NLP, while being computationally superior. In this paper, for the first time, we utilize attention models for clinical time-series modeling, thereby dispensing recurrence entirely. We develop the \textit{SAnD} (Simply Attend and Diagnose) architecture, which employs a masked, self-attention mechanism, and uses positional encoding and dense interpolation strategies for incorporating temporal order. Furthermore, we develop a multi-task variant of \textit{SAnD} to jointly infer models with multiple diagnosis tasks. Using the recent MIMIC-III benchmark datasets, we demonstrate that the proposed approach achieves state-of-the-art performance in all tasks, outperforming LSTM models and classical baselines with hand-engineered features.

Song, H., Thiagarajan, J.J., Sattigeri, P. and Spanias, A. (2018). "Optimizing Kernel Machines using Deep Learning." IEEE Transactions on Neural Networks and Learning Systems. []

Building highly non-linear and non-parametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring non-linear models through the construction of similarity functions from data. These methods are particularly preferred in cases where the training data sizes are limited and when prior knowledge of the data similarities is available. Despite their usefulness, they are limited by the computational complexity and their inability to support end-to-end learning with a task-specific objective. On the other hand, deep neural networks have become the de facto solution for end-to-end inference in several learning paradigms. In this article, we explore the idea of using deep architectures to perform kernel machine optimization, for both computational efficiency and end-to-end inferencing. To this end, we develop the DKMO (Deep Kernel Machine Optimization) framework, that creates an ensemble of dense embeddings using Nystrom kernel approximations and utilizes deep learning to generate task-specific representations through the fusion of the embeddings. Intuitively, the filters of the network are trained to fuse information from an ensemble of linear subspaces in the RKHS. Furthermore, we introduce the kernel dropout regularization to enable improved training convergence. Finally, we extend this framework to the multiple kernel case, by coupling a global fusion layer with pre-trained deep kernel machines for each of the constituent kernels. Using case studies with limited training data, and lack of explicit feature sources, we demonstrate the effectiveness of our framework over conventional model inferencing techniques.

Song, H., Willi, M., Thiagarajan, J.J., Berisha, V., and Spanias, A. (2018). “Triplet Network with Attention for Speaker Diarization.” Proceedings of the Annual Conference of the International Speech Communication Association. []

In automatic speech processing systems, speaker diarization is a crucial front-end component to separate segments from different speakers. Inspired by the recent success of deep neural networks (DNNs) in semantic inferencing, triplet loss-based architectures have been successfully used for this problem. However, existing work utilizes conventional i-vectors as the input representation and builds simple fully connected networks for metric learning, thus not fully leveraging the modeling power of DNN architectures. This paper investigates the importance of learning effective representations from the sequences directly in metric learning pipelines for speaker diarization. More specifically, we propose to employ attention models to learn embeddings and the metric jointly in an end-to-end fashion. Experiments are conducted on the CALLHOME conversational speech corpus. The diarization results demonstrate that, besides providing a unified model, the proposed approach achieves improved performance when compared against existing approaches.

Thiagarajan, J. J., Anirudh, R., Kailkhura, B., Jain, N., Islam, T., Bhatele, A., Yeom, J.S. and Gamblin, T. (2018). "PADDLE: Performance Analysis using a Data-driven Learning Environment." IEEE International Parallel and Distributed Processing Symposium. []

The use of machine learning techniques to model execution time and power consumption, and, more generally, to characterize performance data is gaining traction in the HPC community. Although this signifies huge potential for automating complex inference tasks, a typical analytics pipeline requires selecting and extensively tuning multiple components ranging from feature learning to statistical inferencing to visualization. Further, the algorithmic solutions often do not generalize between problems, thereby making it cumbersome to design and validate machine learning techniques in practice. In order to address these challenges, we propose a unified machine learning framework, PADDLE, which is specifically designed for problems encountered during analysis of HPC data. The proposed framework uses an information-theoretic approach for hierarchical feature learning and can produce highly robust and interpretable models. We present user-centric workflows for using PADDLE and demonstrate its effectiveness in different scenarios: (a) identifying causes of network congestion; (b) determining the best performing linear solver for sparse matrices; and (c) comparing performance characteristics of parent and proxy application pairs.

Thiagarajan, J.J., Jain, N., Anirudh, R., Giménez, A., Sridhar, R., Marathe, A., Wang, T., Emani, M., Bhatele, A., and Gamblin, T. (2018). “Bootstrapping Parameter Space Exploration for Fast Tuning.” Association for Computing Machinery. []

The task of tuning parameters for optimizing performance or other metrics of interest such as energy, variability, etc. can be resource and time consuming. Presence of a large parameter space makes a comprehensive exploration infeasible. In this paper, we propose a novel bootstrap scheme, called GEIST, for parameter space exploration to find performance optimizing configurations quickly. Our scheme represents the parameter space as a graph whose connectivity guides information propagation from known configurations. Guided by the predictions of a semi-supervised learning method over the parameter graph, GEIST is able to adaptively sample and find desirable configurations using limited results from experiments. We show the effectiveness of GEIST for selecting application input options, compiler flags, and runtime/system settings for several parallel codes including LULESH, Kripke, Hypre, and OpenAtom.

Thiagarajan, J. J., Liu, S., Ramamurthy, K. and Bremer, P.T. (2018). "Exploring High-Dimensional Structure via Axis-Aligned Decomposition of Linear Projections." Conference on Visualization. []

Two-dimensional embeddings remain the dominant approach to visualize high dimensional data. The choice of embeddings ranges from highly non-linear ones, which can capture complex relationships but are difficult to interpret quantitatively, to axis-aligned projections, which are easy to interpret but are limited to bivariate relationships. Linear project can be considered as a compromise between complexity and interpretability, as they allow explicit axes labels, yet provide significantly more degrees of freedom compared to axis-aligned projections. Nevertheless, interpreting the axes directions, which are linear combinations often with many non-trivial components, remains difficult. To address this problem we introduce a structure aware decomposition of (multiple) linear projections into sparse sets of axis aligned projections, which jointly capture all information of the original linear ones. In particular, we use tools from Dempster-Shafer theory to formally define how relevant a given axis aligned project is to explain the neighborhood relations displayed in some linear projection. Furthermore, we introduce a new approach to discover a diverse set of high quality linear projections and show that in practice the information of k linear projections is often jointly encoded in ∼k axis aligned plots. We have integrated these ideas into an interactive visualization system that allows users to jointly browse both linear projections and their axis aligned representatives. Using a number of case studies we show how the resulting plots lead to more intuitive visualizations and new insight.

Zheng, P., Aravkin, A. Y., Ramamurthy, K. and Thiagarajan, J. J. (2018). "Visual Exploration of Semantic Relationships in Neural Word Embeddings." IEEE International Conference on Computer Vision Workshops. []

Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets.


Anirudh, R., Kailkhura, B., Thiagarajan, J.J. and Bremer, P. T. (2017). "Poisson Disk Sampling on the Grassmannian: Applications in Subspace Optimization." Conference on Computer Vision and Pattern Recognition. []

To develop accurate inference algorithms on embedded manifolds such as the Grassmannian, we often employ several optimization tools and incorporate the characteristics of known manifolds as additional constraints. However, a direct analysis of the nature of functions on manifolds is rarely performed. In this paper, we propose an alternative approach to this inference by adopting a statistical pipeline that first generates an initial sampling of the manifold, and then performs subsequent analysis based on these samples. First, we introduce a better sampling technique based on dart throwing (called the Poisson disk sampling (PDS)) to effectively sample the Grassmannian. Next, using Grassmannian sparse coding, we demonstrate the improved coverage achieved by PDS. Finally, we develop a consensus approach, with Grassmann samples, to infer the optimal embeddings for linear dimensionality reduction, and show that the resulting solutions are nearly optimal.

Kim, S., Ames, S., Lee, J., Zhang, C., Wilson, A. C. and Williams, D. (2017). "Massive Scale Deep Learning for Detecting Extreme Climate Events." International Workshop on Climate Informatics. []

Conventional extreme climate event detection relies on high spatial resolution climate model output for improved accuracy. It often poses significant computational challenges due to its tremendous iteration cost. As a cost-efficient alternative, we developed a system to detect and locate extreme climate events by deep learning. Our system can capture the pattern of extreme climate events from pre-existing coarse reanalysis data, corresponds to only 16 thousand grid points without expensive downscaling process with less than 5 hours to training our dataset, and less than 5 seconds to testing our test set using 5-layered Convolutional Neural Networks (CNNs). As the use case of our framework, we tested tropical cyclones detection with labeled reanalysis data and our cross validation results show 99.98% of detection accuracy and the localization accuracy is within 4.5 degrees of longitude/latitude (which is around 500 km, and is 3 times of data resolution).

Kim, S., Ames, S., Lee, J., Zhang, C., Wilson, A. C. and Williams, D. (2017). "Resolution Reconstruction of Climate Data with Pixel Recursive Model." IEEE International Conference on Data Mining. []

Deep learning techniques have been successfully applied to solve many problems in climate and geoscience using massive-scaled observed and modeled data. For extreme climate event detections, several models based on deep neural networks have been recently proposed and attend superior performance that overshadows all previous handcrafted expert based method. The issue arising, though, is that accurate localization of events requires high quality of climate data. In this work, we propose framework capable of detecting and localizing extreme climate events in very coarse climate data. Our framework is based on two models using deep neural networks, (1) Convolutional Neural Networks (CNNs) to detect and localize extreme climate events, and (2) Pixel recursive super resolution model to reconstruct high resolution climate data from low resolution climate data. Based on our preliminary work, we have presented two CNNs in our framework for different purposes, detection and localization. Our results using CNNs for extreme climate events detection shows that simple neural nets can capture the pattern of extreme climate events with high accuracy from very coarse reanalysis data. However, localization accuracy is relatively low due to the coarse resolution. To resolve this issue, the pixel recursive super resolution model reconstructs the resolution of input of localization CNNs. We present a best networks using pixel recursive super resolution model that synthesizes details of tropical cyclone in ground truth data while enhancing their resolution. Therefore, this approach not only dramatically reduces the human effort, but also suggests possibility to reduce computing cost required for downscaling process to increase resolution of data.

Lennox, K. P., Rosenfield, P., Blair, B., Kaplan, A., Ruz, J., Glenn, A. and Wurtz, R. (2017). "Assessing and Minimizing Contamination in Time of Flight Based Validation Data." Nuclear Instruments and Methods in Physics Research. []

Time of flight experiments are the gold standard method for generating labeled training and testing data for the neutron/gamma pulse shape discrimination problem. As the popularity of supervised classification methods increases in this field, there will also be increasing reliance on time of flight data for algorithm development and evaluation. However, time of flight experiments are subject to various sources of contamination that lead to neutron and gamma pulses being mislabeled. Such labeling errors have a detrimental effect on classification algorithm training and testing, and should therefore be minimized. This paper presents a method for identifying minimally contaminated data sets from time of flight experiments and estimating the residual contamination rate. This method leverages statistical models describing neutron and gamma travel time distributions and is easily implemented using existing statistical software. The method produces a set of optimal intervals that balance the trade-off between interval size and nuisance particle contamination, and its use is demonstrated on a time of flight data set for Cf-252. The particular properties of the optimal intervals for the demonstration data are explored in detail.

Li, Q., Kailkhura, B., Thiagarajan, J. J. and Varshney, P. K. (2017). "Influential Node Detection in Implicit Social Networks using Multi-task Gaussian Copula Models." Conference on Neural Information Processing Systems. []

Influential node detection is a central research topic in social network analysis. Many existing methods rely on the assumption that the network structure is completely known a priori. However, in many applications, network structure is unavailable to explain the underlying information diffusion phenomenon. To address the challenge of information diffusion analysis with incomplete knowledge of network structure, we develop a multi-task low rank linear influence model. By exploiting the relationships between contagions, our approach can simultaneously predict the volume (i.e. time series prediction) for each contagion (or topic) and automatically identify the most influential nodes for each contagion. The proposed model is validated using synthetic data and an ISIS twitter dataset. In addition to improving the volume prediction performance significantly, we show that the proposed approach can reliably infer the most influential users for specific contagions.

Lin, Y., Wang, S., Thiagarajan, J. J., Guthrie, G. and Coblentz, D. (2017). "Towards Real-Time Geologic Feature Detection from Seismic Measurements Using a Randomized Machine-Learning Algorithm." SEG Annual Conference. []

Conventional seismic techniques for detecting the subsurface geologic features are challenged by limited data coverage, computational inefficiency, and subjective human factors. We propose to employ an efficient and accurate machine-learning detection approach to extract useful subsurface geologic features automatically. We employ a data reduction technique in combination with the conventional kernel ridge regression method to improve the computational efficiency and reduce the memory usage. Specifically, we utilize a randomized numerical linear algebra technique to effectively reduce the dimensionality of the feature space without compromising the information content required for accurate detection. We validate the performance of our new subsurface geologic feature detection method using synthetic surface seismic data for a 2D geophysical model. Our numerical examples demonstrate that our new detection method significantly improves the computational efficiency while maintaining comparable accuracy. Interestingly, we show that our method yields a speed-up ratio on the order of ~102 to ~103 in a multi-core computational environment.

Marathe, A., Anirudh, R., Jain, N., Bhatele, A., Thiagarajan, J. J., Kailkhura, B., Yeom, J. S., Rountree, B. and Gamblin, T. (2017). "Performance Modeling Under Resource Constraints Using Deep Transfer Learning." Supercomputing Conference. []

Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing configurations even when trained using as few as 1% of observations at the target scale.

Mudigonda, M., Kim, S., Mahesh, A., Kahou, S., Kashinath, K., Williams, D., Michalski, V., O’Brien, T. and Prabhat, M. (2017). "Segmenting and Tracking Extreme Climate Events using Neural Networks." Conference on Neural Information Processing Systems. []

Predicting extreme weather events in a warming world is one of the most pressing and challenging problems that humanity faces today. Deep learning and advances in the field of computer vision provide a novel and powerful set of tools to tackle this demanding task. However, unlike images employed in computer vision, climate datasets present unique challenges. The channels (or physical variables) in a climate dataset are manifold, and unlike pixel information in computer vision data, these channels have physical properties. We present preliminary work using a convolutional neural network and a recurrent neural network for tracking cyclonic storms. We also show how state-of-the-art segmentation algorithms can be used to segment atmospheric rivers and tropical cyclones in global climate model simulations. We show how the latest advances in machine learning and computer vision can provide solutions to important problems in weather and climate sciences, and we highlight unique challenges and limitations.

Mundhenk, N. T., Kegelmeyer, L. M. and Trummer, S.K. (2017). "Deep learning for evaluating difficult-to-detect incomplete repairs of high fluence laser optics at the National Ignition Facility." Thirteenth International Conference on Quality Control by Artificial Vision. []

Two machine-learning methods were evaluated to help automate the quality control process for mitigating damage sites on laser optics. The mitigation is a cone-like structure etched into locations on large optics that have been chipped by the high fluence (energy per unit area) laser light. Sometimes the repair leaves a difficult to detect remnant of the damage that needs to be addressed before the optic can be placed back on the beam line. We would like to be able to automatically detect these remnants. We try Deep Learning (convolutional neural networks using features autogenerated from large stores of labeled data, like ImageNet) and find it outperforms ensembles of decision trees (using custom-built features) in finding these subtle, rare, incomplete repairs of damage. We also implemented an unsupervised method for helping operators visualize where the network has spotted problems. This is done by projecting the credit for the result backwards onto the input image. This shows regions in an image most responsible for the networks decision. This can also be used to help understand the black box decisions the network is making and potentially improve the training process.

Pallotta, G., Konjevod, G., Cadena, J. and Nguyen, P. (2017). "Context-aided Analysis of Community Evolution in Networks." Statistical Analysis and Data Mining: The ASA Data Science Journal. []

We are interested in detecting and analyzing global changes in dynamic networks (networks that evolve with time). More precisely, we consider changes in the activity distribution within the network, in terms of density (ie, edge existence) and intensity (ie, edge weight). Detecting change in local properties, as well as individual measurements or metrics, has been well studied and often reduces to traditional statistical process control. In contrast, detecting change in larger scale structure of the network is more challenging and not as well understood. We address this problem by proposing a framework for detecting change in network structure based on separate pieces: a probabilistic model for partitioning nodes by their behavior, a label-unswitching heuristic, and an approach to change detection for sequences of complex objects. We examine the performance of one instantiation of such a framework using mostly previously available pieces. The dataset we use for these investigations is the publicly available New York City Taxi and Limousine Commission dataset covering all taxi trips in New York City since 2009. Using it, we investigate the evolution of an ensemble of networks under different spatiotemporal resolutions. We identify the community structure by fitting a weighted stochastic block model. We offer insights on different node ranking and clustering methods, their ability to capture the rhythm of life in the Big Apple, and their potential usefulness in highlighting changes in the underlying network structure.

Sakla, W., Konjevod, G. and Mundhenk, N.T. (2017). "Deep Multi-modal Vehicle Detection in Aerial ISR Imagery." IEEE Winter Conference on Applications of Computer Vision. []

Since the introduction of deep convolutional neural networks (CNNs), object detection in imagery has witnessed substantial breakthroughs in state-of-the-art performance. The defense community utilizes overhead image sensors that acquire large field-of-view aerial imagery in various bands of the electromagnetic spectrum, which is then exploited for various applications, including the detection and localization of human-made objects. In this work, we utilize a recent state-of-the art object detection algorithm, faster R-CNN, to train a deep CNN for vehicle detection in multimodal imagery. We utilize the vehicle detection in aerial imagery (VEDAI) dataset, which contains overhead imagery that is representative of an ISR setting. Our contribution includes modification of key parameters in the faster R-CNN algorithm for this setting where the objects of interest are spatially small, occupying less than 1:5×10-3 of the total image pixels. Our experiments show that (1) an appropriately trained deep CNN leads to average precision rates above 93% on vehicle detection, and (2) transfer learning between imagery modalities is possible, yielding average precision rates above 90% in the absence of fine-tuning.

Song, H., Thiagarajan, J. J., Sattigeri, P. and Spanias, A. (2017). "A Deep Learning Approach to Multiple Kernel Learning." IEEE International Conference on Acoustics, Speech and Signal Processing. []

Kernel fusion is a popular and effective approach for combining multiple features that characterize different aspects of data. Traditional approaches for Multiple Kernel Learning (MKL) attempt to learn the parameters for combining the kernels through sophisticated optimization procedures. In this paper, we propose an alternative approach that creates dense embeddings for data using the kernel similarities and adopts a deep neural network architecture for fusing the embeddings. In order to improve the effectiveness of this network, we introduce the kernel dropout regularization strategy coupled with the use of an expanded set of composition kernels. Experiment results on a real-world activity recognition dataset show that the proposed architecture is effective in fusing kernels and achieves state-of-the-art performance.

Zheng, P., Aravkin, A. Y., Ramamurthy, K. and Thiagarajan, J.J. (2017). "Learning Robust Representations for Computer Vision." IEEE International Conference on Computer Vision Workshops. []

Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets.