Feb. 27, 2024
Register for WiDS Livermore by March 1
You can still sign up for the Lab’s Women in Data Science (WiDS) conference taking place on Wednesday, March 13. This hybrid event is free and open to everyone—inside or outside the Lab, any career level, and data science experience level. The registration link and other details are posted at data-science.llnl.gov/wids.
- Register by March 1
- Hosted at the University of California Livermore Collaboration Center (UCLCC) and virtually
- Sponsored by LLNL’s Data Science Institute; Computing Principal Directorate; and Office of Inclusion, Diversity, Equity, and Accountability
- This regional conference will include a tie-in with the LLNL datathon as well as keynote speakers, technical talks, career-focused panel discussions, speed mentoring, and a poster session.
In lieu of a DSI-hosted seminar in March, we invite everyone to attend all or a portion of this WiDS event. See the web page linked above for more information about the speakers and panelists.
This is the seventh year for WiDS Livermore, which is independently organized by LLNL to be part of the mission to increase participation of women in data science and to feature outstanding women doing outstanding work. Contact WiDS-Committee [at] llnl.gov (WiDS-Committee[at]llnl[dot]gov) with any questions.
Machine Learning Tool Fills in the Blanks for Satellite Light Curves
When viewed from Earth, objects in space are seen at a specific brightness, called apparent magnitude. Over time, ground-based telescopes can track a specific object’s change in brightness. This time-dependent magnitude variation is known as an object’s light curve, and can allow astronomers to infer the object’s size, shape, material, location, and more. Monitoring the light curve of satellites or debris orbiting the earth can help identify changes or anomalies in these bodies. However, light curves are missing a lot of data points. The weather, the season, dust accumulation, time of day, eclipses—these all affect not only the quality of the data, but whether it can be taken at all.
Livermore researchers have developed a machine learning (ML) process for light curve modeling and prediction. Called MuyGPs, the process drastically reduces the size of a conventional Gaussian process problem—a type of statistical process that often does not scale well—by limiting the correlation of predictions to their nearest neighboring data points, reducing a large linear algebra problem to many smaller, parallelizable problems. This type of ML enables training on more sensitive parameters, optimizing the efficient prediction of the missing data. “We want to be able to establish patterns of light so that we can in near-real time project where we expect things to be and how they should look, and detect when they deviate from that,” says Min Priest, one of the computer scientists on the project.
MuyGPs is a general-use ML technique, not limited to applications in astronomy. Watch a video explaining how MuyGPs works, or download the open-source Python implementation of MuyGPs. (Image at left: The light curve of a single object over the course of four years, where the color represents the brightness. The black stripes and hole shapes are regions of missing data due to weather and the eclipse, respectively. Data courtesy of Dave Monet and the public catalog www.space-track.org.)
Applied Statistics Seminar Series Turns 10
The field of statistics is more than just crunching numbers. At a national security–focused research organization like LLNL, statisticians apply their expertise to a range of scientific problems, where analyzing, interpreting, and drawing conclusions from rich and unique datasets are crucial tasks in many projects. “Statistics as a discipline covers a wide range of subject areas, and it’s helpful to be exposed to new ways of thinking about a problem you wouldn’t have considered before,” notes Kevin Quinlan, a member of the Lab’s Applied Statistics Group (ASG) and coordinator of the group’s long-running seminar series.
Since November 2013, the ASG has invited guest speakers to describe their work and meet with Lab staff. Since launching in 2013—several years before the DSI launched its own seminar series—the ASG has hosted nearly 80 speakers. The speakers’ affiliations are split down the middle: half from LLNL and half from external organizations such as universities or other national labs. Seminar topics have ranged from climate modeling and remote sensing to microbiology and autonomous driving.
A unique aspect of the series is the inclusion of talks from researchers outside of data science, with a goal of informing data scientists of research problems that intersect with their area of expertise. “The ASG Seminar Series is a great way to hear about new and exciting research from talented statisticians and to help foster collaboration both externally and internally,” Quinlan points out. The seminar team welcomes presenter suggestions at quinlan5 [at] llnl.gov (quinlan5[at]llnl[dot]gov). ASG seminars are usually one hour long, on any statistics related topic, and virtual or in person.
Consulting Service Success Story: Radioactivity Measurements
Talking with a data scientist before collecting experimental or simulation data can enrich a project’s statistical analysis, prevent costly errors, and improve modeling effectiveness. LLNL researchers often contact the DSI’s Consulting Service (DSICS) for advice on experimental design, sensitivity analysis, sample size calculations, data visualization, and more. The DSICS’s short-term consulting engagements have given many projects a much-needed nudge in the right direction, as in this recent example.
Mike Firpo, an LLNL radiation spectroscopist, performed an experiment to characterize the variability of radioactivity measurements in a set of lung phantoms—embedded in a realistic torso phantom, a 3D model designed to mimic human physiology (see image at left)—from measurements of emitted gamma rays. Three factors were varied in the experiment, including the setup and the operator, and multiple runs were obtained in each experimental configuration. However, when Firpo performed an analysis of variance (ANOVA) to determine whether significantly different counts occurred across the experimental combinations, a numerical issue occurred where the results could not be computed. Firpo states, “I kept encountering this problem when I ran the ANOVA in Excel, so I reached out to the DSICS for help.”
Boya Zhang, a DSICS consultant and staff scientist in LLNL’s Applied Statistics Group, helped Firpo fix the issue by identifying the proper ANOVA test to perform. She also advised on how to interpret the experiment’s results. “It was a lot of fun helping Mike. I was able to use my statistics knowledge to help ensure the health and safety of employees at LLNL. I enjoy working on these consulting projects because I meet different people around the Lab,” she explains. Firpo adds, “Boya was very helpful. I have a much better understanding now, and I have a clear path forward.”
Recent Research
Preprints, full text, PDFs, or conference abstracts are linked where available.
- Computer Methods in Applied Mechanics and Engineering: GPLaSDI: Gaussian process-based interpretable latent space dynamics identification through deep autoencoder – Youngsoo Choi, Debojyoti Ghosh, and Jonathan Belof with a Cornell University colleague
- IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023): Single-shot domain adaptation via target-aware generative augmentations – Jayaraman Thiagarajan and colleagues from Arizona State University
- Medical Imaging with Deep Learning Conference (MIDL 2023): Know your space: inlier and outlier construction for calibrating medical OOD detectors – Yamen Mubarka, Thiagarajan, and colleagues from Arizona State University and Microsoft
- Preprint: Accurate and scalable estimation of epistemic uncertainty for graph neural networks – Mark Heimann, Thiagarajan, and colleagues from the University of Michigan
- Preprint: AVA: towards autonomous visualization agents through visual perception-driven decision-making – Shusen Liu, Haichao Miao, Matthew Olson, and Peer-Timo Bremer with colleagues from University of Utah (image at left: hyperparameter optimization results for t-SNE and UMAP methods, which are used to visualize high-dimensional datasets)
- Preprint: TrustLLM: trustworthiness in large language models – Bhavya Kailkhura and colleagues from 39 academic and commercial organizations
Seminar Explores AI/ML in Cardiovascular Medicine
The DSI’s January seminar was presented by Geoffrey H. Tison, Associate Professor of Medicine and Cardiology at the University of California, San Francisco. In “Using AI to Expand What Is Possible in Cardiovascular Medicine,” Dr. Tison discussed the application of ML/AI approaches in medicine, focusing on his prior work spanning several cardiovascular diagnostic modalities including electrocardiograms, echocardiograms, photoplethysmography, and angiography. Medicine has unique characteristics that can make medical data more complex and in some respects harder to analyze compared to data outside of medicine. These issues include the complicated clinical workflow and the many human stakeholders and decision makers that all contribute at various time-points to any given patient’s medical data record.
Speakers’ biographies and abstracts are available on the seminar series web page, and many recordings are posted to the YouTube playlist. To become or recommend a speaker for a future seminar, or to request a WebEx link for an upcoming seminar if you’re outside LLNL, contact DSI-Seminars [at] llnl.gov (DSI-Seminars[at]llnl[dot]gov). (See the WiDS story above; we invite you to hear those speakers and panelists in place of a March seminar.)
Meet an LLNL Data Scientist
Research support engineer Robert Cerda is new to the Lab, but as a graduate of UC Berkeley’s Air Force ROTC program and a newly commissioned second lieutenant in the United States Space Force, he’s no stranger to an ambitious challenge. Cerda joined LLNL as an intern in 2022, transitioned to staff after graduation, picked up his commission in 2023, and is now automating polymer-related projects with the Materials Engineering Division. His current focus is the automation of database pipelines, postprocessing scripts, and physical processes through software. He recently presented a poster at the Artificial Intelligence for Robust Engineering & Science Conference at Oak Ridge National Lab on direct-ink write automation and database-pipeline refinement for leveraging big data. “Basically, I do all things computer and data science as they relate to additive manufacturing,” says Cerda. He enjoys helping interns from his ROTC/MARA program and welcomes opportunities to mentor. “I feel it’s especially important for me to impart what I’ve learned to those who are where I was not long ago.” He also enjoys working at the Lab and flexing his creativity to solve a problem. “I like working on cutting-edge projects of national importance,” says Cerda. “I also just think working at the Lab is fun.”