Publications

Alom, Z., Taha, T., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, S., Hasan, M., Van Essen, B., Awwal, A., and Asari, V. (2019). “A State-of-the-Art Survey on Deep Learning Theory and Architectures.” Electronics. []

In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the-art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others. This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network (DNN). The survey goes on to cover Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). Additionally, we have discussed recent developments, such as advanced variant DL techniques based on these DL approaches. This work considers most of the papers published after 2012 from when the history of deep learning began. Furthermore, DL approaches that have been explored and evaluated in different application domains are also included in this survey. We also included recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys that have been published on DL using neural networks and a survey on Reinforcement Learning (RL). However, those papers have not discussed individual advanced techniques for training large-scale deep learning models and the recently developed method of generative models.

Anirudh, R. and Thiagarajan, J.J. (2019). “Bootstrapping Graph Convolutional Neural Networks for Autism Spectrum Disorder Classification.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

Using predictive models to identify patterns that can act as biomarkers for different neuropathological conditions is becoming highly prevalent. In this paper, we consider the problem of Autism Spectrum Disorder (ASD) classification where previous work has shown that it can be beneficial to incorporate a wide variety of meta features, such as socio-cultural traits, into predictive modeling. A graph-based approach naturally suits these scenarios, where a contextual graph captures traits that characterize a population, while the specific brain activity patterns are utilized as a multivariate signal at the nodes. Graph neural networks have shown improvements in inferencing with graph-structured data. Though the underlying graph strongly dictates the overall performance, there exists no systematic way of choosing an appropriate graph in practice, thus making predictive models non-robust. To address this, we propose a bootstrapped version of graph convolutional neural networks (G-CNNs) that utilizes an ensemble of weakly trained G-CNNs, and reduce the sensitivity of models on the choice of graph construction. We demonstrate its effectiveness on the challenging Autism Brain Imaging Data Exchange (ABIDE) dataset and show that our approach improves upon recently proposed graph-based neural networks. We also show that our method remains more robust to noisy graphs.

Chu, A., Nguyen, D., Talathi, S.S., (…), Stolaroff, J.K., and Giera, B. (2019). “Automated Detection and Sorting of Microencapsulation: via Machine Learning.” Lab on a Chip. []

Microfluidic-based microencapsulation requires significant oversight to prevent material and quality loss due to sporadic disruptions in fluid flow that routinely arise. State-of-the-art microcapsule production is laborious and relies on experts to monitor the process, e.g. through a microscope. Unnoticed defects diminish the quality of collected material and/or may cause irreversible clogging. To address these issues, we developed an automated monitoring and sorting system that operates on consumer-grade hardware in real-time. Using human-labeled microscope images acquired during typical operation, we train a convolutional neural network that assesses microencapsulation. Based on output from the machine learning algorithm, an integrated valving system collects desirable microcapsules or diverts waste material accordingly. Although the system notifies operators to make necessary adjustments to restore microencapsulation, we can extend the system to automate corrections. Since microfluidic-based production platforms customarily collect image and sensor data, machine learning can help to scale up and improve microfluidic techniques beyond microencapsulation.

Cong, G., Domeniconi, G., Shapiro, J., Zhou, F., and Chen, B. (2019). “Accelerating Deep Neural Network Training for Action Recognition on a Cluster of GPUs.” Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing. []

Due to the additional temporal dimension, large-scale video action recognition is even more challenging than image recognition and typically takes days to train on modern GPUs even for modest-sized datasets. We propose algorithms and techniques to accelerate training of deep neural networks for action recognition on a cluster of GPUs. In terms of convergence and scaling, our distributed training algorithm with adaptive batch size is provably superior to popular asynchronous stochastic gradient descent algorithms. The convergence analysis of our algorithm shows it is possible to reduce communication cost and at the same time minimize the number of iterations needed for convergence. We customize the Adam optimizer for our distributed algorithm to improve efficiency. In addition, we employ transfer-learning to further reduce training time while improving validation accuracy. Compared with the base-line single-GPU stochastic gradient descent implementation of the two-stream training approach, our implementation achieves super-linear speedups on 16 GPUs while improving validation accuracy. For the UCFI0l and HMDB51 datasets, the validation accuracies achieved are 93.1 % and 67.9% respectively. As far as we know, these are the highest accuracies achieved with the two-stream approach that does not involve computationally expensive 3D convolutions or pretraining on much larger datasets.

Cong, G., Domeniconi, G., Yang, C.-C., Shapiro, J., and Chen, B. (2019). “Video Action Recognition with an Additional End-To-End Trained Temporal Stream.” 2019 IEEE Winter Conference on Applications of Computer Vision. []

Detecting actions in videos requires understanding the temporal relationships among frames. Typical action recognition approaches rely on optical flow estimation methods to convey temporal information to a CNN. Recent studies employ 3D convolutions in addition to optical flow to process the temporal information. While these models achieve slightly better results than two-stream 2D convolutional approaches, they are significantly more complex, requiring more data and time to be trained. We propose an efficient, adaptive batch size distributed training algorithm with customized optimizations for training the two 2D streams. We introduce a new 2D convolutional temporal stream that is trained end-to-end with a neural network. The flexibility to freeze some network layers from training in this temporal stream brings the possibility of ensemble learning with more than one temporal streams. Our architecture that combines three streams achieves the highest accuracies as we know of on UCF101 and HMDB51 by systems that do not pretrain on much larger datasets (e.g., Kinetics). We achieve these results while keeping our spatial and temporal streams 4.67x faster to train than the 3D convolution approaches.

Deelman, E., Mandal, A., Jiang, M., and Sakellariou, R. (2019). “The Role of Machine Learning in Scientific Workflows.” International Journal of High Performance Computing Applications. []

Machine learning (ML) is being applied in a number of everyday contexts from image recognition, to natural language processing, to autonomous vehicles, to product recommendation. In the science realm, ML is being used for medical diagnosis, new materials development, smart agriculture, DNA classification, and many others. In this article, we describe the opportunities of using ML in the area of scientific workflow management. Scientific workflows are key to today’s computational science, enabling the definition and execution of complex applications in heterogeneous and often distributed environments. We describe the challenges of composing and executing scientific workflows and identify opportunities for applying ML techniques to meet these challenges by enhancing the current workflow management system capabilities. We foresee that as the ML field progresses, the automation provided by workflow management systems will greatly increase and result in significant improvements in scientific productivity.

Dryden, N., Maruyama, N., Benson, T., Moon, T., Snir, M., and Van Essen, B. (2019). “Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism.” International Parallel and Distributed Processing Symposium. []

Scaling CNN training is necessary to keep up with growing datasets and reduce training time. We also see an emerging need to handle datasets with very large samples, where memory requirements for training are large. Existing training frameworks use a data-parallel approach that partitions samples within a mini-batch, but limits to scaling the mini-batch size and memory consumption makes this untenable for large samples. We describe and implement new approaches to convolution, which parallelize using spatial decomposition or a combination of sample and spatial decomposition. This introduces many performance knobs for a network, so we develop a performance model for CNNs and present a method for using it to automatically determine efficient parallelization strategies. We evaluate our algorithms with microbenchmarks and image classification with ResNet-50. Our algorithms allow us to prototype a model for a mesh-tangling dataset, where sample sizes are very large. We show that our parallelization achieves excellent strong and weak scaling and enables training for previously unreachable datasets.

Dryden, N., Maruyama, N., Moon, T., (…), Snir, M., and Van Essen, B. (2019). “Aluminum: An Asynchronous, GPU-Aware Communication Library Optimized for Large-Scale Training of Deep Neural Networks on HPC Systems.” Proceedings of Machine Learning in HPC Environments and the International Conference for High Performance Computing, Networking, Storage and Analysis. []

We identify communication as a major bottleneck for training deep neural networks on large-scale GPU clusters, taking over 10x as long as computation. To reduce this overhead, we discuss techniques to overlap communication and computation as much as possible. This leads to much of the communication being latency-bound instead of bandwidth-bound, and we find that using a combination of latency- and bandwidth-optimized allreduce algorithms significantly reduces communication costs. We also discuss a semantic mismatch between MPI and CUDA that increases overheads and limits asynchrony, and propose a solution that enables communication to be aware of CUDA streams. We implement these optimizations in the open-source Aluminum communication library, enabling optimized, asynchronous, GPU-aware communication. Aluminum demonstrates improved performance in benchmarks and end-to-end training of deep networks, for both strong and weak scaling.

Endrei, M., Jin, C., Dinh, M.N., (…)., DeRose, L., and de Supinski, B.R. (2019). “Statistical and Machine Learning Models for Optimizing Energy in Parallel Applications.” International Journal of High Performance Computing Applications. []

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Fan, Y.J. (2019). “Autoencoder Node Saliency: Selecting Relevant Latent Representations.” Pattern Recognition. []

The autoencoder is an artificial neural network that performs nonlinear dimension reduction and learns hidden representations of unlabeled data. With a linear transfer function it is similar to the principal component analysis (PCA). While both methods use weight vectors for linear transformations, the autoencoder does not come with any indication similar to the eigenvalues in PCA that are paired with eigenvectors. We propose a novel autoencoder node saliency method that examines whether the features constructed by autoencoders exhibit properties related to known class labels. The supervised node saliency ranks the nodes based on their capability of performing a learning task. It is coupled with the normalized entropy difference (NED). We establish a property for NED values to verify classifying behaviors among the top ranked nodes. By applying our methods to real datasets, we demonstrate their ability to provide indications on the performing nodes and explain the learned tasks in autoencoders.

Humbird, K.D., Peterson, J.L., and Mcclarren, R.G. (2019). “Deep Neural Network Initialization with Decision Trees.” IEEE Transactions on Neural Networks and Learning Systems. []

In this paper, a novel, automated process for constructing and initializing deep feedforward neural networks based on decision trees is presented. The proposed algorithm maps a collection of decision trees trained on the data into a collection of initialized neural networks with the structures of the networks determined by the structures of the trees. The tree-informed initialization acts as a warm-start to the neural network training process, resulting in efficiently trained, accurate networks. These models, referred to as 'deep jointly informed neural networks' (DJINN), demonstrate high predictive performance for a variety of regression and classification data sets and display comparable performance to Bayesian hyperparameter optimization at a lower computational cost. By combining the user-friendly features of decision tree models with the flexibility and scalability of deep neural networks, DJINN is an attractive algorithm for training predictive models on a wide range of complex data sets.

Kafle, S., Gupta, V., Kailkhura, B., Wimalajeewa, T., and Varshney, P.K. (2019). “Joint Sparsity Pattern Recovery with 1-b Compressive Sensing in Distributed Sensor Networks.” IEEE Transactions on Signal and Information Processing over Networks. []

In this paper, we study the problem of joint sparse support recovery with 1-b quantized compressive measurements in a distributed sensor network. Multiple nodes in the network are assumed to observe sparse signals having the same but unknown sparse support. Each node quantizes its measurement vector element-wise to 1-b. First, we consider that all the quantized measurements are available at a central fusion center. We derive performance bounds for sparsity pattern recovery using 1-bit quantized measurements from multiple sensors when the maximum likelihood decoder is employed. We further develop two computationally tractable algorithms for joint sparse support recovery in the centralized setting. One algorithm minimizes a cost function defined as the sum of the likelihood function and the l 1,∞ quasi-norm, while the other algorithm extends the binary iterative hard thresholding algorithm to the multiple measurement vector case. Second, we consider a decentralized setting where each node transmits 1-b measurements to its one-hop neighbors. The basic idea behind the algorithms developed in the decentralized setting is to embed collaboration among nodes and fusion strategies. We show that even with noisy 1-b compressed measurements, joint support recovery can be carried out accurately in both centralized and decentralized settings. We further show that the performance of the proposed 1-bit compressive sensing-based algorithms is very close to that of their real-valued counterparts except when the signal-to-noise ratio is very small.

Kailkhura, B., Gallagher, B., Kim, S., Hiszpanski, A., and Yong-Jin Han, T. (2019). “Reliable and explainable machine-learning methods for accelerated material discovery.” npj Computational Materials. []

Despite ML’s impressive performance in commercial applications, several unique challenges exist when applying ML in materials science applications. In such a context, the contributions of this work are twofold. First, we identify common pitfalls of existing ML techniques when learning from underrepresented/imbalanced material data. Specifically, we show that with imbalanced data, standard methods for assessing quality of ML models break down and lead to misleading conclusions. Furthermore, we find that the model’s own confidence score cannot be trusted and model introspection methods (using simpler models) do not help as they result in loss of predictive performance (reliability-explainability trade-off). Second, to overcome these challenges, we propose a general-purpose explainable and reliable machine-learning framework. Specifically, we propose a generic pipeline that employs an ensemble of simpler models to reliably predict material properties. We also propose a transfer learning technique and show that the performance loss due to models’ simplicity can be overcome by exploiting correlations among different material properties. A new evaluation metric and a trust score to better quantify the confidence in the predictions are also proposed. To improve the interpretability, we add a rationale generator component to our framework which provides both model-level and decision-level explanations. Finally, we demonstrate the versatility of our technique on two applications: (1) predicting properties of crystalline compounds and (2) identifying potentially stable solar cell materials. We also point to some outstanding issues yet to be resolved for a successful application of ML in material science.

Kim, S., Kim, H., Yoon, S., Lee, J., Kahou, S., Kashinath, K., and Prabhat, M. (2019). “Deep-Hurricane-Tracker: Tracking and Predicting Extreme Climate Events using ConvLSTM.” 2019 IEEE Winter Conference on Applications of Computer Vision. []

Tracking and predicting extreme events in large-scale spatio-temporal climate data are long standing challenges in climate science. In this paper, we propose Convolutional LSTM (ConvLSTM)-based spatio-temporal models to track and predict hurricane trajectories from large-scale climate data; namely, pixel-level spatio-temporal history of tropical cyclones. To address the tracking problem, we model time sequential density maps of hurricane trajectories, enabling to capture not only the temporal dynamics but also spatial distribution of the trajectories. Furthermore, we introduce a new trajectory prediction approach as a problem of sequential forecasting from past to future hurricane density map sequences. Extensive experiment on actual 20 years record shows that our ConvLSTM-based tracking model significantly outperforms existing approaches, and that the proposed forecasting model achieves successful mapping from predicted density map to ground truth.

Leach, W., Henrikson, J., Hatarik, R., (…), Palmer, N., and Rever, M. (2019). “Using Convolutional Neural Networks to Classify Static X-ray Imager Diagnostic Data at the National Ignition Facility.” Proceedings of the International Society for Optical Engineering. []

Hohlraums convert the laser energy at the National Ignition Facility (NIF) into X-ray energy to compress and implode a fusion capsule, creating fusion. The Static X-ray Imager (SXI) diagnostic collects time-integrated images of hohlraum wall X-ray illumination patterns viewed through the laser entrance hole (LEH). NIF image processing algorithms calculate the size and location of the LEH opening from the SXI images. Images obtained come from different experimental categories and camera setups and occasionally do not contain applicable or usable information. Unexpected experimental noise in the data can also occur where affected images should be removed and not run through the processing algorithms. Current approaches to try and identify these types of images are done manually and on a case-by-case basis, which can be prohibitively time-consuming. In addition, the diagnostic image data can be sparse (missing segments or pieces) and may lead to false analysis results. There exists, however, an abundant variety of image examples in the NIF database. Convolutional Neural Networks (CNNs) have been shown to work well with this type of data and under these conditions. The objective of this work was to apply transfer learning and fine tune a pre-trained CNN using a relatively small-scale dataset (∼1500 images) and determine which instances contained useful image data. Experimental results are presented that show that CNNs can readily identify useful image data while filtering out undesirable images. The CNN filter is currently being used in production at the NIF.

Maiti, A. (2019). “Second-order Statistical Bootstrap for the Uncertainty Quantification of Time-temperature-superposition Analysis.” Rheologica Acta. []

Time-temperature superposition (TTS), which for decades has been a powerful method for long-term prediction from accelerated aging data, involves rigid-shifting isotherms in logarithmic time to produce a single master prediction curve. For simple thermo-rheological properties that accurately follow the TTS principle, the shifts can be easily determined, even manually by the eye. However, for many properties of interest, where the principle is obeyed only approximately, or the data is noisy, it is imperative to develop objective shifting techniques along with reliable uncertainty bounds. This work analyzes in detail the method of arclength-minimization as an unsupervised algorithm to determining optimum shifts and demonstrates that the method is nearly unbiased for all practical datasets with a variety of noise distributions. Moreover, if averaged over with-replacement (bootstrap) resamples, the predicted shifts follow a normal distribution, a fact that can be used to construct confidence interval for the master curve through a second-order bootstrap procedure.

Narayanaswamy, V.S., Thiagarajan, J.J., Song, H., and Spanias, A. (2019). “Designing an Effective Metric Learning Pipeline for Speaker Diarization.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

State-of-the-art speaker diarization systems utilize knowledge from external data, in the form of a pre-trained distance metric, to effectively determine relative speaker identities to unseen data. However, much of recent focus has been on choosing the appropriate feature extractor, ranging from pre-trained i-vectors to representations learned via different sequence modeling architectures (e.g. 1D-CNNs, LSTMs, attention models), while adopting off-the-shelf metric learning solutions. In this paper, we argue that, regardless of the feature extractor, it is crucial to carefully design a metric learning pipeline, namely the loss function, the sampling strategy and the discriminative margin parameter, for building robust diarization systems. Furthermore, we propose to adopt a fine-grained validation process to obtain a comprehensive evaluation of the generalization power of metric learning pipelines. To this end, we measure diarization performance across different language speakers, and variations in the number of speakers in a recording. Using empirical studies, we provide interesting insights into the effectiveness of different design choices and make recommendations.

Nathan, E., Sanders, G., and Henson, V.E. (2019). “Personalized Ranking in Dynamic Graphs Using Nonbacktracking Walks.” Lecture Notes in Computer Science, including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. []

Centrality has long been studied as a method of identifying node importance in networks. In this paper we study a variant of several walk-based centrality metrics based on the notion of a nonbacktracking walk, where the pattern i→ j → i is forbidden in the walk. Specifically, we focus our analysis on dynamic graphs, where the underlying data stream the network is drawn from is constantly changing. Efficient algorithms for calculating nonbacktracking walk centrality scores in static and dynamic graphs are provided and experiments on graphs with several million vertices and edges are conducted. For the static algorithm, comparisons to a traditional linear algebraic method of calculating scores show that our algorithm produces scores of high accuracy within a theoretically guaranteed bound. Comparisons of our dynamic algorithm to the static show speedups of several orders of magnitude as well as a significant reduction in space required.

Petersen, B.K., Yang, J., Grathwohl, W.S., (…), An, G., and Faissol, D.M. (2019). “Deep Reinforcement Learning and Simulation as a Path toward Precision Medicine.” Journal of Computational Biology. []

Traditionally, precision medicine involves classifying patients to identify subpopulations that respond favorably to specific therapeutics. We pose precision medicine as a dynamic feedback control problem, where treatment administered to a patient is guided by measurements taken during the course of treatment. We consider sepsis, a life-threatening condition in which dysregulation of the immune system causes tissue damage. We leverage an existing simulation of the innate immune response to infection and apply deep reinforcement learning (DRL) to discover an adaptive personalized treatment policy that specifies effective multicytokine therapy to simulated sepsis patients based on systemic measurements. The learned policy achieves a dramatic reduction in mortality rate over a set of 500 simulated patients relative to standalone antibiotic therapy. Advantages of our approach are threefold: (1) the use of simulation allows exploring therapeutic strategies beyond clinical practice and available data, (2) advances in DRL accommodate learning complex therapeutic strategies for complex biological systems, and (3) optimized treatments respond to a patient's individual disease progression over time, therefore, capturing both differences across patients and the inherent randomness of disease progression within a single patient. We hope that this work motivates both considering adaptive personalized multicytokine mediation therapy for sepsis and exploiting simulation with DRL for precision medicine more broadly.

Reza, T., Ripeanu, M. Tripoul, N., Sanders, G., and Pearce, R. (2019). “PruneJuice: Pruning Trillion-edge Graphs to a Precise Pattern-matching Solution.” Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. []

Pattern matching is a powerful graph analysis tool. Unfortunately, existing solutions have limited scalability, support only a limited set of search patterns, and/or focus on only a subset of the real-world problems associated with pattern matching. This paper presents a new algorithmic pipeline that: (i) enables highly scalable pattern matching on labeled graphs, (ii) supports arbitrary patterns, (iii) enables trade-offs between precision and time-to-solution (while always selecting all vertices and edges that participate in matches, thus offering 100% recall), and (iv) supports a set of popular data analytics scenarios. We implement our approach on top of HavoqGT and demonstrate its advantages through strong and weak scaling experiments on massive-scale real-world (up to 257 billion edges) and synthetic (up to 4.4 trillion edges) graphs, respectively, and at scales (1,024 nodes / 36,864 cores) orders of magnitude larger than used in the past for similar problems.

Roberts, R. S., Goforth, J.W., Weinert, G.F., (…), Stinson, B.J., and Duncan, A.M. (2019). “Automated Annotation of Satellite Imagery using Model-based Projections.” Proceedings of the Applied Imagery Pattern Recognition Workshop. []

GeoVisipedia is a new and novel approach to annotating satellite imagery. It uses wiki pages to annotate objects rather than simple labels. The use of wiki pages to contain annotations is particularly useful for annotating objects in imagery of complex geospatial configurations such as industrial facilities. GeoVisipedia uses the PRISM algorithm to project annotations applied to one image to other imagery, hence enabling ubiquitous annotation. This paper derives the PRISM algorithm, which uses image metadata and a 3D facility model to create a view matrix unique to each image. The view matrix is used to project model components onto a mask which aligns the components with the objects in the scene that they represent. Wiki pages are linked to model components, which are in turn linked to the image via the component mask. An illustration of the efficacy of the PRISM algorithm is provided, demonstrating the projection of model components onto an effluent stack. We conclude with a discussion of the efficiencies of GeoVisipedia over manual annotation, and the use of PRISM for creating training sets for machine learning algorithms.

Shukla, R., Lipasti, M., Van Essen, B., Moody, A., and Maruyama, N. (2019). “Remodel: Rethinking Deep CNN Models to Detect and Count on a Neurosynaptic System.” Frontiers in Neuroscience. []

In this work, we perform analysis of detection and counting of cars using a low-power IBM TrueNorth Neurosynaptic System. For our evaluation we looked at a publicly-available dataset that has overhead imagery of cars with context present in the image. The trained neural network for image analysis was deployed on the NS16e system using IBM's EEDN training framework. Through multiple experiments we identify the architectural bottlenecks present in TrueNorth system that does not let us deploy large neural network structures. Following these experiments we propose changes to CNN model to circumvent these architectural bottlenecks. The results of these evaluations have been compared with caffe-based implementations of standard neural networks that were deployed on a Titan-X GPU. Results showed that TrueNorth can detect cars from the dataset with 97.60% accuracy and can be used to accurately count the number of cars in the image with 69.04% accuracy. The car detection accuracy and car count (-/+ 2 error margin) accuracy are comparable to high-precision neural networks like AlexNet, GoogLeNet, and ResCeption, but show a manifold improvement in power consumption.

Thiagarajan, J.J., Anirudh, R., Sridhar, R., and Bremer, P.-T. (2019). “Unsupervised Dimension Selection Using a Blue Noise Graph Spectrum.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

Unsupervised dimension selection is an important problem that seeks to reduce dimensionality of data, while preserving the most useful characteristics. While dimensionality reduction is commonly utilized to construct low-dimensional embeddings, they produce feature spaces that are hard to interpret. Further, in applications such as sensor design, one needs to perform reduction directly in the input domain, instead of constructing transformed spaces. Consequently, dimension selection (DS) aims to solve the combinatorial problem of identifying the top-k dimensions, which is required for effective experiment design, reducing data while keeping it interpretable, and designing better sensing mechanisms. In this paper, we develop a novel approach for DS based on graph signal analysis to measure feature influence. By analyzing synthetic graph signals with a blue noise spectrum, we show that we can measure the importance of each dimension. Using experiments in supervised learning and image masking, we demonstrate the superiority of the proposed approach over existing techniques in capturing crucial characteristics of high dimensional spaces, using only a small subset of the original features.

Thiagarajan, J.J., Kim, I., Anirudh, R., and Bremer, P.-T. (2019). “Understanding Deep Neural Networks through Input Uncertainties.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

Techniques for understanding the functioning of complex machine learning models are becoming increasingly popular, not only to improve the validation process, but also to extract new insights about the data via exploratory analysis. Though a large class of such tools currently exists, most assume that predictions are point estimates and use a sensitivity analysis of these estimates to interpret the model. Using lightweight probabilistic networks we show how including prediction uncertainties in the sensitivity analysis leads to: (i) more robust and generalizable models; and (ii) a new approach for model interpretation through uncertainty decomposition. In particular, we introduce a new regularization that takes both the mean and variance of a prediction into account and demonstrate that the resulting networks provide improved generalization to unseen data. Furthermore, we propose a new technique to explain prediction uncertainties through uncertainties in the input domain, thus providing new ways to validate and interpret deep learning models.

Thiagarajan, J., Rajan, D., and Sattigeri, P. (2019). “Understanding Behavior of Clinical Models under Domain Shifts.” 2019 KDD Workshop on Applied Data Science for Healthcare. []

The hypothesis that computational models can be reliable enough to be adopted in prognosis and patient care is revolutionizing healthcare. Deep learning, in particular, has been a game changer in building predictive models, thus leading to community-wide data curation efforts. However, due to inherent variabilities in population characteristics and biological systems, these models are often biased to the training datasets. This can be limiting when models are deployed in new environments, when there are systematic domain shifts not known a priori. In this paper, we propose to emulate a large class of domain shifts, that can occur in clinical settings, with a given dataset, and argue that evaluating the behavior of predictive models in light of those shifts is an effective way to quantify their reliability. More specifically, we develop an approach for building realistic scenarios, based on analysis of \textit{disease landscapes} in multi-label classification. Using the openly available MIMIC-III EHR dataset for phenotyping, for the first time, our work sheds light into data regimes where deep clinical models can fail to generalize. This work emphasizes the need for novel validation mechanisms driven by real-world domain shifts in AI for healthcare.

Thopalli, K., Anirudh, R., Thiagarajan, J.J., and Turaga, P. (2019). “Multiple Subspace Alignment Improves Domain Adaptation.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

We present a novel unsupervised domain adaptation (DA) method for cross-domain visual recognition. Though subspace methods have found success in DA, their performance is often limited due to the assumption of approximating an entire dataset using a single low-dimensional subspace. Instead, we develop a method to effectively represent the source and target datasets via a collection of low-dimensional subspaces, and subsequently align them by exploiting the natural geometry of the space of subspaces, on the Grassmann manifold. We demonstrate the effectiveness of this approach, using empirical studies on two widely used benchmarks, with performance on par or better than the performance of the state-of-the-art domain adaptation methods.

Tran, K., Panahi, A., Adiga, A., Sakla, W., and Krim, H. (2019). “Nonlinear Multi-scale Super-resolution Using Deep Learning.” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. []

We propose a deep learning architecture capable of performing up to 8× single image super-resolution. Our architecture incorporates an adversarial component from the super-resolution generative adversarial networks (SRGANs) and a multi-scale learning component from the multiple scale super-resolution network (MSSRNet), which only together can recover smaller structures inherent in satellite images. To further enhance our performance, we integrate progressive growing and training to our network. This, aided by feed forwarding connections in the network to move along and enrich information from previous inputs, produces super-resolved images at scaling factors of 2, 4, and 8. To ensure and enhance the stability of GANs, we employ Wasserstein GANs (WGANs) during training. Experimentally, we find that our architecture can recover small objects in satellite images during super-resolution whereas previous methods cannot.

Tripoul, N., Halawa, H., Reza, T., (…), Pearce, R., and Ripeanu, M. (2019). “There Are Trillions of Little Forks in the Road. Choose Wisely! Estimating the Cost and Likelihood of Success of Constrained Walks to Optimize a Graph Pruning Pipeline.” Proceedings of IA3 2018: 8th Workshop on Irregular Applications: Architectures and Algorithms, and the International Conference for High Performance Computing, Networking, Storage and Analysis. []

We have developed [Reza et al. SC'18] a highly scalable algorithmic pipeline for pattern matching in labeled graphs and demonstrated it on trillion-edge graphs. This pipeline: (i) supports arbitrary search patterns, (ii) identifies all the vertices and edges that participate in matches - offering 100% precision and recall, and (iii) supports realistic data analytics scenarios. This pipeline is based on graph pruning: it decomposes the search template into individual constraints and uses them to repeatedly prune the graph to a final solution. Our current solution, however, makes a number of ad-hoc intuition-based decisions with impact on performance. In a nutshell these relate to (i) constraint selection - which constraints to generate? (ii) constraint ordering - in which order to use them? and (iii) individual constraint generation - how to best verify them? This position paper makes the observation that by estimating the runtime cost and likelihood of success of a constrained walk in a labeled graph one can inform these optimization decisions. We propose a preliminary solution to make these estimates, and demonstrate - using a prototype shared-memory implementation - that this: (i) is feasible with low overheads, and (ii) offers accurate enough information to optimize our pruning pipeline by a significant margin.

Veldt, N., Klymko, C., and Gleich, D.F. (2019). “Flow-based Local Graph Clustering with Better Seed Set Inclusion.” SIAM International Conference on Data Mining. []

Flow-based methods for local graph clustering have received significant recent attention for their theoretical cut improvement and runtime guarantees. In this work we present two improvements for using flow-based methods in real-world semi-supervised clustering problems. Our first contribution is a generalized objective function that allows practitioners to place strict and soft penalties on excluding specific seed nodes from the output set. This feature allows us to avoid the tendency, often exhibited by previous flow-based methods, to contract a large seed set into a small set of nodes that does not contain all or even most of the seed nodes. Our second contribution is a fast algorithm for minimizing our generalized objective function, based on a variant of the push-relabel algorithm for computing preflows. We make our approach very fast in practice by implementing a global relabeling heuristic and employing a warm-start procedure to quickly solve related cut problems. In practice our algorithm is faster than previous related flow-based methods, and is also more robust in detecting ground truth target regions in a graph thanks to its ability to better incorporate semi-supervised information about target clusters.

White, D.A., Arrighi, W.J., Kudo, J., and Watts, S.E. (2019). “Multiscale Topology Optimization Using Neural Network Surrogate Models.” Computer Methods in Applied Mechanics and Engineering. []

We are concerned with optimization of macroscale elastic structures that are designed utilizing spatially varying microscale metamaterials. The macroscale optimization is accomplished using gradient-based nonlinear topological optimization. But instead of using density as the optimization decision variable, the decision variables are the multiple parameters that define the local microscale metamaterial. This is accomplished using single layer feedforward Gaussian basis function networks as a surrogate models of the elastic response of the microscale metamaterial. The surrogate models are trained using highly resolved continuum finite element simulations of the microscale metamaterials and hence are significantly more accurate than analytical models e.g. classical beam theory. Because the derivative of the surrogate model is important for sensitivity analysis of the macroscale topology optimization, a neural network training procedure based on the Sobolev norm is described. Since the SIMP method is not appropriate for spatially varying lattices, an alternative method is developed to enable creation of void regions. The efficacy of this approach is demonstrated via several examples in which the optimal graded metamaterial outperforms a traditional solid structure.

Yuan, B., Giera, B., Guss, G., Matthews, M., and McMains, S. (2019). “Semi-supervised Convolutional Neural Networks for in-situ Video Monitoring of Selective Laser Melting.” IEEE Winter Conference on Applications of Computer Vision. []

Selective Laser Melting (SLM) is a metal additive manufacturing technique. The lack of SLM process repeatability is a barrier for industrial progression. SLM product quality is hard to control, even when using fixed system settings. Thus SLM could benefit from a monitoring system that provides quality assessments in real-time. Since there is no publicly available SLM dataset, we ran experiments to collect over one thousand SLM videos, measured the physical output via height map images, and applied a proposed image processing algorithm to them to produce a dataset for semi-supervised learning. Then we trained convolutional neural networks (CNNs) to recognize desired quality metrics from videos. Experimental results demonstrate the effectiveness of our proposed monitoring approach and also show that the semi-supervised model can mitigate the time and expense of labeling an entire SLM dataset.