Kamath, C. and Fan, Y. (2018). "Regression with small data sets: A case study using code surrogates in additive manufacturing." Knowledge and Information Systems: An International Journal.

There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. However, there are some problems where collecting even a single data point is very expensive, resulting in data sets with only tens or hundreds of samples. One such problem is that of building code surrogates, where a computer simulation is run using many different values of the input parameters and a regression model is built to relate the outputs of the simulation to the inputs.  A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments, but the cost of running expensive simulations at many samples points can be high. In this paper, we use a problem from the domain of additive manufacturing to show that even with small data sets, we can build good quality surrogates by appropriately selecting the input samples and the regression algorithm.  Our work is broadly applicable to simulations in other domains and the ideas proposed can be used in  time-constrained machine learning tasks, such as hyper-parameter optimization.

Mundhenk, T. N., Ho, D., Chen, B. Y. (2018). "Improvements to context based self-supervised learning." Conference on Computer Vision and Pattern Recognition.

We develop a set of methods to improve on the results of self-supervised learning using context. We start with a baseline of patch based arrangement context learning and go from there. Our methods address some overt problems such as chromatic aberration as well as other potential problems such as spatial skew and mid-level feature neglect. We prevent problems with testing generalization on common self-supervised benchmark tests by using different datasets during our development. The results of our methods combined yield top scores on all standard self-supervised benchmarks, including classification and detection on PASCAL VOC 2007, segmentation on PASCAL VOC 2012, and "linear tests" on the ImageNet and CSAIL Places datasets. We obtain an improvement over our baseline method of between 4.0 to 7.1 percentage points on transfer learning classification tests. We also show results on different standard network architectures to demonstrate generalization as well as portability.

Anirudh, R., Kim, H., Thiagarajan, J. J., Mohan, K. A., Champley, K. and Bremer, P.T. (2018). “Lose The Views: Limited Angle CT Reconstruction via Implicit Sinogram Completion.”  Conference on Computer Vision and Pattern Recognition.

Computed Tomography (CT) reconstruction is a fundamental component to a wide variety of applications ranging from security, to healthcare. The classical techniques require measuring projections, called sinograms, from a full 180° view of the object. This is impractical in a limited angle scenario, when the viewing angle is less than 180°, which can occur due to different factors including restrictions on scanning time, limited flexibility of scanner rotation, etc. The sinograms obtained as a result, cause existing techniques to produce highly artifact-laden reconstructions. In this paper, we propose to address this problem through implicit sinogram completion, on a challenging real world dataset containing scans of common checked-in luggage. We propose a system, consisting of 1D and 2D convolutional neural networks, that operates on a limited angle sinogram to directly produce the best estimate of a reconstruction. Next, we use the x-ray transform on this reconstruction to obtain a “completed” sinogram, as if it came from a full 180°measurement. We feed this to standard analytical and iterative reconstruction techniques to obtain the final reconstruction. We show with extensive experimentation that this combined strategy outperforms many competitive baselines. We also propose a measure of confidence for the reconstruction that enables a practitioner to gauge the reliability of a prediction made by our network. We show that this measure is a strong indicator of quality as measured by the PSNR, while not requiring ground truth at test time. Finally, using a segmentation experiment, we show that our reconstruction preserves the 3D structure of objects effectively.

Song H., Rajan D., Thiagarajan, J. J. and Spanias, A. (2018). "Attend and Diagnose: Clinical Time Series Analysis using Attention Models." AAAI Conference.

With widespread adoption of electronic health records, there is an increased emphasis for predictive models that can effectively deal with clinical time-series data. Powered by Recurrent Neural Network (RNN) architectures with Long Short-Term Memory (LSTM) units, deep neural networks have achieved state-of-the-art results in several clinical prediction tasks. Despite the success of RNNs, its sequential nature prohibits parallelized computing, thus making it inefficient particularly when processing long sequences. Recently, architectures which are based solely on attention mechanisms have shown remarkable success in transduction tasks in NLP, while being computationally superior. In this paper, for the first time, we utilize attention models for clinical time-series modeling, thereby dispensing recurrence entirely. We develop the \textit{SAnD} (Simply Attend and Diagnose) architecture, which employs a masked, self-attention mechanism, and uses positional encoding and dense interpolation strategies for incorporating temporal order. Furthermore, we develop a multi-task variant of \textit{SAnD} to jointly infer models with multiple diagnosis tasks. Using the recent MIMIC-III benchmark datasets, we demonstrate that the proposed approach achieves state-of-the-art performance in all tasks, outperforming LSTM models and classical baselines with hand-engineered features.

Thiagarajan, J. J., Liu, S., Ramamurthy, K. and Bremer, P.T. (2018). "Exploring High-Dimensional Structure via Axis-Aligned Decomposition of Linear Projections." Conference on Visualization.

Two-dimensional embeddings remain the dominant approach to visualize high dimensional data. The choice of embeddings ranges from highly non-linear ones, which can capture complex relationships but are difficult to interpret quantitatively, to axis-aligned projections, which are easy to interpret but are limited to bivariate relationships. Linear project can be considered as a compromise between complexity and interpretability, as they allow explicit axes labels, yet provide significantly more degrees of freedom compared to axis-aligned projections. Nevertheless, interpreting the axes directions, which are linear combinations often with many non-trivial components, remains difficult. To address this problem we introduce a structure aware decomposition of (multiple) linear projections into sparse sets of axis aligned projections, which jointly capture all information of the original linear ones. In particular, we use tools from Dempster-Shafer theory to formally define how relevant a given axis aligned project is to explain the neighborhood relations displayed in some linear projection. Furthermore, we introduce a new approach to discover a diverse set of high quality linear projections and show that in practice the information of k linear projections is often jointly encoded in ∼k axis aligned plots. We have integrated these ideas into an interactive visualization system that allows users to jointly browse both linear projections and their axis aligned representatives. Using a number of case studies we show how the resulting plots lead to more intuitive visualizations and new insight.

Song, H., Thiagarajan, J.J., Sattigeri, P. and Spanias, A. (2018). "Optimizing Kernel Machines using Deep Learning." IEEE Transactions on Neural Networks and Learning Systems.

Building highly non-linear and non-parametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring non-linear models through the construction of similarity functions from data. These methods are particularly preferred in cases where the training data sizes are limited and when prior knowledge of the data similarities is available. Despite their usefulness, they are limited by the computational complexity and their inability to support end-to-end learning with a task-specific objective. On the other hand, deep neural networks have become the de facto solution for end-to-end inference in several learning paradigms. In this article, we explore the idea of using deep architectures to perform kernel machine optimization, for both computational efficiency and end-to-end inferencing. To this end, we develop the DKMO (Deep Kernel Machine Optimization) framework, that creates an ensemble of dense embeddings using Nystrom kernel approximations and utilizes deep learning to generate task-specific representations through the fusion of the embeddings. Intuitively, the filters of the network are trained to fuse information from an ensemble of linear subspaces in the RKHS. Furthermore, we introduce the kernel dropout regularization to enable improved training convergence. Finally, we extend this framework to the multiple kernel case, by coupling a global fusion layer with pre-trained deep kernel machines for each of the constituent kernels. Using case studies with limited training data, and lack of explicit feature sources, we demonstrate the effectiveness of our framework over conventional model inferencing techniques.

Thiagarajan, J. J., Anirudh, R., Kailkhura, B., Jain, N., Islam, T., Bhatele, A., Yeom, J.S. and Gamblin, T. (2018). "PADDLE: Performance Analysis using a Data-driven Learning Environment." IEEE International Parallel and Distributed Processing Symposium.

The use of machine learning techniques to model execution time and power consumption, and, more generally, to characterize performance data is gaining traction in the HPC community. Although this signifies huge potential for automating complex inference tasks, a typical analytics pipeline requires selecting and extensively tuning multiple components ranging from feature learning to statistical inferencing to visualization. Further, the algorithmic solutions often do not generalize between problems, thereby making it cumbersome to design and validate machine learning techniques in practice. In order to address these challenges, we propose a unified machine learning framework, PADDLE, which is specifically designed for problems encountered during analysis of HPC data. The proposed framework uses an information-theoretic approach for hierarchical feature learning and can produce highly robust and interpretable models. We present user-centric workflows for using PADDLE and demon- strate its effectiveness in different scenarios: (a) identifying causes of network congestion; (b) determining the best performing linear solver for sparse matrices; and (c) comparing performance characteristics of parent and proxy application pairs.

Liu, S., Bremer, P.T., Thiagarajan, J. J., Srikumar, V., Wang, B., Livnat, Y. and Pascucci, V. (2018). "Visual Exploration of Semantic Relationships in Neural Word Embeddings." IEEE Transactions on Visualization and Computer Graphics.

Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). However, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. In particular, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or even misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. Here, we introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.

Zheng, P., Aravkin, A. Y., Ramamurthy, K. and Thiagarajan, J. J. (2018). "Visual Exploration of Semantic Relationships in Neural Word Embeddings." IEEE International Conference on Computer Vision Workshops.

Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets.

Lin, Y., Wang, S., Thiagarajan, J. J., Guthrie, G. and Coblentz, D. (2018). "Efficient Data-Driven Geologic Feature Characterization from Pre-stack Seismic Measurements using Randomized Machine-Learning Algorithm." Geophysical Journal International.

Conventional seismic techniques for detecting the subsurface geologic features are chal- lenged by limited data coverage, computational inefficiency, and subjective human fac- tors. We developed a novel data-driven geological feature characterization approach based on pre-stack seismic measurements. Our characterization method employs an efficient and accurate machine-learning method to extract useful subsurface geologic features au- tomatically. Specifically, our method is based on the kernel ridge regression model. The conventional kernel ridge regression can be computationally prohibitive because of the large volume of seismic measurements. We employ a data reduction technique in combi- nation with the conventional kernel ridge regression method to improve the computational efficiency and reduce memory usage. In particular, we utilize a randomized numerical lin- ear algebra technique, named Nystro ̈m method, to effectively reduce the dimensionality of the feature space without compromising the information content required for accu- rate characterization. We provide thorough computational cost analysis to show the ef- ficiency of our new geological feature characterization methods. We further validate the performance of our new subsurface geologic feature characterization method using syn- thetic surface seismic data for 2D acoustic and elastic velocity models. Our numerical examples demonstrate that our new characterization method significantly improves the computational efficiency while maintaining comparable accuracy. Interestingly, we show that our method yields a speed-up ratio on the order of ∼ 102 to ∼ 103 in a multi-core computational environment.


Lennox, K. P., Rosenfield, P., Blair, B., Kaplan, A., Ruz, J., Glenn, A. and Wurtz, R. (2017). "Assessing and Minimizing Contamination in Time of Flight Based Validation Data." Nuclear Instruments and Methods in Physics Research.

Time of flight experiments are the gold standard method for generating labeled training and testing data for the neutron/gamma pulse shape discrimination problem. As the popularity of supervised classification methods increases in this field, there will also be increasing reliance on time of flight data for algorithm development and evaluation. However, time of flight experiments are subject to various sources of contamination that lead to neutron and gamma pulses being mislabeled. Such labeling errors have a detrimental effect on classification algorithm training and testing, and should therefore be minimized. This paper presents a method for identifying minimally contaminated data sets from time of flight experiments and estimating the residual contamination rate. This method leverages statistical models describing neutron and gamma travel time distributions and is easily implemented using existing statistical software. The method produces a set of optimal intervals that balance the trade-off between interval size and nuisance particle contamination, and its use is demonstrated on a time of flight data set for Cf-252. The particular properties of the optimal intervals for the demonstration data are explored in detail.

Mundhenk, N. T., Kegelmeyer, L. M. and Trummer, S.K. (2017). "Deep learning for evaluating difficult-to-detect incomplete repairs of high fluence laser optics at the National Ignition Facility." Thirteenth International Conference on Quality Control by Artificial Vision.

Two machine-learning methods were evaluated to help automate the quality control process for mitigating damage sites on laser optics. The mitigation is a cone-like structure etched into locations on large optics that have been chipped by the high fluence (energy per unit area) laser light. Sometimes the repair leaves a difficult to detect remnant of the damage that needs to be addressed before the optic can be placed back on the beam line. We would like to be able to automatically detect these remnants. We try Deep Learning (convolutional neural networks using features autogenerated from large stores of labeled data, like ImageNet) and find it outperforms ensembles of decision trees (using custom-built features) in finding these subtle, rare, incomplete repairs of damage. We also implemented an unsupervised method for helping operators visualize where the network has spotted problems. This is done by projecting the credit for the result backwards onto the input image. This shows regions in an image most responsible for the networks decision. This can also be used to help understand the black box decisions the network is making and potentially improve the training process.

Pallotta, G., Konjevod, G., Cadena, J. and Nguyen, P. (2017). "Context-aided Analysis of Community Evolution in Networks." Statistical Analysis and Data Mining: The ASA Data Science Journal.

We are interested in detecting and analyzing global changes in dynamic networks (networks that evolve with time). More precisely, we consider changes in the activity distribution within the network, in terms of density (ie, edge existence) and intensity (ie, edge weight). Detecting change in local properties, as well as individual measurements or metrics, has been well studied and often reduces to traditional statistical process control. In contrast, detecting change in larger scale structure of the network is more challenging and not as well understood. We address this problem by proposing a framework for detecting change in network structure based on separate pieces: a probabilistic model for partitioning nodes by their behavior, a label-unswitching heuristic, and an approach to change detection for sequences of complex objects. We examine the performance of one instantiation of such a framework using mostly previously available pieces. The dataset we use for these investigations is the publicly available New York City Taxi and Limousine Commission dataset covering all taxi trips in New York City since 2009. Using it, we investigate the evolution of an ensemble of networks under different spatiotemporal resolutions. We identify the community structure by fitting a weighted stochastic block model. We offer insights on different node ranking and clustering methods, their ability to capture the rhythm of life in the Big Apple, and their potential usefulness in highlighting changes in the underlying network structure.

Sakla, W., Konjevod, G. and Mundhenk, N.T. (2017). "Deep Multi-modal Vehicle Detection in Aerial ISR Imagery." IEEE Winter Conference on Applications of Computer Vision.

Since the introduction of deep convolutional neural networks (CNNs), object detection in imagery has witnessed substantial breakthroughs in state-of-the-art performance. The defense community utilizes overhead image sensors that acquire large field-of-view aerial imagery in various bands of the electromagnetic spectrum, which is then exploited for various applications, including the detection and localization of man-made objects. In this work, we utilize a recent state-of-the art object detection algorithm, faster R-CNN, to train a deep CNN for vehicle detection in multimodal imagery. We utilize the vehicle detection in aerial imagery (VEDAI) dataset, which contains overhead imagery that is representative of an ISR setting. Our contribution includes modification of key parameters in the faster R-CNN algorithm for this setting where the objects of interest are spatially small, occupying less than 1:5×10-3 of the total image pixels. Our experiments show that (1) an appropriately trained deep CNN leads to average precision rates above 93% on vehicle detection, and (2) transfer learning between imagery modalities is possible, yielding average precision rates above 90% in the absence of fine-tuning.

Zheng, P., Aravkin, A. Y., Ramamurthy, K. and Thiagarajan, J.J. (2017). "Learning Robust Representations for Computer Vision." IEEE International Conference on Computer Vision Workshops.

Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets.

Marathe, A., Anirudh, R., Jain, N., Bhatele, A., Thiagarajan, J. J., Kailkhura, B., Yeom, J. S., Rountree, B. and Gamblin, T. (2017). "Performance Modeling Under Resource Constraints Using Deep Transfer Learning." Supercomputing Conference.

Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques for modeling the functional relationships between various input features in the parameter space and application performance are important. We show that simple statistical inference techniques are inadequate to capture these relationships. Even with more complex ensembles of models, the minimum coverage of the parameter space required via experimental observations is still quite large. We propose a deep learning based approach that can combine information from exhaustive observations collected at a smaller scale with limited observations collected at a larger target scale. The proposed approach is able to accurately predict performance in the regimes of interest to performance analysts while outperforming many traditional techniques. In particular, our approach can identify the best performing congurations even when trained using as few as 1% of observations at the target scale.

Lin, Y., Wang, S., Thiagarajan, J. J., Guthrie, G. and Coblentz, D. (2017). "Towards Real-Time Geologic Feature Detection from Seismic Measurements Using a Randomized Machine-Learning Algorithm." SEG Annual Conference.

Conventional seismic techniques for detecting the subsurface geologic features are challenged by limited data coverage, computational inefficiency, and subjective human factors. We propose to employ an efficient and accurate machine-learning detection approach to extract useful subsurface geologic features automatically. We employ a data reduction technique in combination with the conventional kernel ridge regression method to improve the computational efficiency and reduce the memory usage. Specifically, we utilize a randomized numerical linear algebra technique to effectively reduce the dimensionality of the feature space without compromising the information content required for accurate detection. We validate the performance of our new subsurface geologic feature detection method using synthetic surface seismic data for a 2D geophysical model. Our numerical examples demonstrate that our new detection method significantly improves the computational efficiency while maintaining comparable accuracy. Interestingly, we show that our method yields a speed-up ratio on the order of ~102 to ~103 in a multi-core computational environment.

Anirudh, R., Kailkhura, B., Thiagarajan, J.J. and Bremer, P. T. (2017). "Poisson Disk Sampling on the Grassmannnian: Applications in Subspace Optimization." Conference on Computer Vision and Pattern Recognition.

To develop accurate inference algorithms on embedded manifolds such as the Grassmannian, we often employ several optimization tools and incorporate the characteristics of known manifolds as additional constraints. However, a direct analysis of the nature of functions on manifolds is rarely performed. In this paper, we propose an alternative approach to this inference by adopting a statistical pipeline that first generates an initial sampling of the manifold, and then performs subsequent analysis based on these samples. First, we introduce a better sampling technique based on dart throwing (called the Poisson disk sampling (PDS)) to effectively sample the Grassmannian. Next, using Grassmannian sparse coding, we demonstrate the improved coverage achieved by PDS. Finally, we develop a consensus approach, with Grassmann samples, to infer the optimal embeddings for linear dimensionality reduction, and show that the resulting solutions are nearly optimal.

Song, H., Thiagarajan, J. J., Sattigeri, P. and Spanias, A. (2017). "A Deep Learning Approach to Multiple Kernel Learning." IEEE International Conference on Acoustics, Speech and Signal Processing.

Kernel fusion is a popular and effective approach for com- bining multiple features that characterize different aspects of data. Traditional approaches for Multiple Kernel Learning (MKL) attempt to learn the parameters for combining the ker- nels through sophisticated optimization procedures. In this paper, we propose an alternative approach that creates dense embeddings for data using the kernel similarities and adopts a deep neural network architecture for fusing the embeddings. In order to improve the effectiveness of this network, we introduce the kernel dropout regularization strategy coupled with the use of an expanded set of composition kernels. Ex- periment results on a real-world activity recognition dataset show that the proposed architecture is effective in fusing kernels and achieves state-of-the-art performance.

Li, Q., Kailkhura, B., Thiagarajan, J. J. and Varshney, P. K. (2017). "Influential Node Detection in Implicit Social Networks using Multi-task Gaussian Copula Models." Conference on Neural Information Processing Systems.

Influential node detection is a central research topic in social network analysis. Many ex- isting methods rely on the assumption that the network structure is completely known a priori. However, in many applications, network structure is unavailable to explain the underlying information diffusion phenomenon. To address the challenge of information dif- fusion analysis with incomplete knowledge of network structure, we develop a multi-task low rank linear influence model. By exploiting the relationships between contagions, our approach can simultaneously predict the volume (i.e. time series prediction) for each con- tagion (or topic) and automatically identify the most influential nodes for each contagion. The proposed model is validated using synthetic data and an ISIS twitter dataset. In addition to improving the volume prediction performance significantly, we show that the proposed approach can reliably infer the most influential users for specific contagions.

Mudigonda, M., Kim, S., Mahesh, A., Kahou, S., Kashinath, K., Williams, D., Michalski, V., O’Brien, T. and Prabhat, M. (2017). "Segmenting and Tracking Extreme Climate Events using Neural Networks." Conference on Neural Information Processing Systems.

Predicting extreme weather events in a warming world is one of the most pressing and challenging problems that humanity faces today. Deep learning and advances in the field of computer vision provide a novel and powerful set of tools to tackle this demanding task. However, unlike images employed in computer vision, climate datasets present unique challenges. The channels (or physical variables) in a climate dataset are manifold, and unlike pixel information in computer vision data, these channels have physical properties. We present preliminary work using a convolutional neural network and a recurrent neural network for tracking cyclonic storms. We also show how state-of-the-art segmentation algorithms can be used to segment atmospheric rivers and tropical cyclones in global climate model simulations. We show how the latest advances in machine learning and computer vision can provide solutions to important problems in weather and climate sciences, and we highlight unique challenges and limitations.

Kim, S., Ames, S., Lee, J., Zhang, C., Wilson, A. C. and Williams, D. (2017). "Resolution Reconstruction of Climate Data with Pixel Recursive Model." IEEE International Conference on Data Mining.

Deep learning techniques have been successfully applied to solve many problems in climate and geoscience using massive-scaled observed and modeled data. For extreme climate event detections, several models based on deep neural networks have been recently proposed and attend superior performance that overshadows all previous handcrafted expert based method. The issue arising, though, is that accurate localization of events requires high quality of climate data. In this work, we propose framework capable of detecting and localizing extreme climate events in very coarse climate data. Our framework is based on two models using deep neural networks, (1) Convolutional Neural Networks (CNNs) to detect and localize extreme climate events, and (2) Pixel recursive recursive super resolution model to reconstruct high resolution climate data from low resolution climate data. Based on our preliminary work, we have presented two CNNs in our framework for different purposes, detection and localization. Our results using CNNs for extreme climate events detection shows that simple neural nets can capture the pattern of extreme climate events with high accuracy from very coarse reanalysis data. However, localization accuracy is relatively low due to the coarse resolution. To resolve this issue, the pixel recursive super resolution model reconstructs the resolution of input of localization CNNs. We present a best networks using pixel recursive super resolution model that synthesizes details of tropical cyclone in ground truth data while enhancing their resolution. Therefore, this approach not only dramatically reduces the human effort, but also suggests possibility to reduce computing cost required for downscaling process to increase resolution of data.

Kim, S., Ames, S., Lee, J., Zhang, C., Wilson, A. C. and Williams, D. (2017). "Massive Scale Deep Learning for Detecting Extreme Climate Events." International Workshop on Climate Informatics.

Conventional extreme climate event detection relies on high spatial resolution climate model output for improved accuracy. It often poses significant computational challenges due to its tremendous iteration cost. As a cost-efficient alternative, we developed a system to detect and locate extreme climate events by deep learning. Our system can capture the pattern of extreme climate events from pre-existing coarse reanalysis data, corresponds to only 16 thousand grid points without expensive downscaling process with less than 5 hours to training our dataset, and less than 5 seconds to testing our test set using 5-layered Convolutional Neural Networks (CNNs). As the use case of our framework, we tested tropical cyclones detection with labeled reanalysis data and our cross validation results show 99.98% of detection accuracy and the localization accuracy is within 4.5 degrees of longitude/latitude (which is around 500 km, and is 3 times of data resolution).