March 24, 2021
Previous Next

Winter hackathon highlights data science talks and tutorial

Holly Auten/LLNL
hackathon winter 2021 logo

The Data Science Institute (DSI) sponsored LLNL’s 27th hackathon on February 11–12. Held four times a year, these seasonal events bring the computing community together for a 24-hour period where anything goes: Participants can focus on special projects, learn new programming languages, develop skills, dig into challenging tasks, and more. The winter hackathon was the DSI’s second such sponsorship. Organizers were data scientist Ryan Dana, postdoctoral researcher Sarah Mackay, and DSI administrator Jennifer Bellig. DSI director Michael Goldman opened the event by noting, “Hackathons are great opportunities to explore new ideas and make connections with other staff, and to both innovate and learn.”

In a new twist to the typical hackathon schedule, organizers offered four optional presentations showcasing data science techniques in COVID-19 drug discovery, inertial confinement fusion, central nervous system modeling, and querying of massive graphs. Participants could also choose to attend an introductory tutorial on deep learning (DL) for image classification. Goldman noted, “Almost every program area at the Lab has some type of data science element. The hackathon is one way to help build that community.”

Team and individual presentations at the end of the 24-hour period featured a range of projects. Lisa Hughey, a data analytics applications developer, used the time to learn R Shiny and build an interactive web application. Former hackathon organizer Geoff Cleary experimented with packaging Python applications, while Enterprise Application Services developers Brinda Jana and Yunki Paik continued a previous hackathon project to track radio hazardous waste material.

Tutorial Teamwork

Data scientists Cindy Gonzales and Luke Jaffe ran the two-hour DL tutorial, which explained how to perform multi-class image classification in Python using the PyTorch library. Image classification is a problem in computer vision in which a model recognizes an image and outputs a label for it. This process can play an important role in a variety of mission-relevant scenarios such as chemical detection, remote sensing, optics inspections, and disease diagnosis.

“We designed the material so participants wouldn’t need to know anything about deep learning, machine learning in general, or computer vision,” said Jaffe, who works in LLNL’s Global Security Computing Applications Division (GS-CAD). “We expected some level of comfort with Python, and provided links where participants could learn more about the machine learning theory we covered.”

The team provided sample code via Jupyter Notebook and first walked attendees through importing packages and setting up constants and image display utility functions. Next, the tutorial explored working with images as arrays and tensors—i.e., how a computer “sees” an image in order to classify it—using a CIFAR10 dataset that contains images of airplanes, cars, birds, cats, and other vehicles and animals.

Gonzales and Jaffe went on to describe the concepts behind neural networks, logistic regression to optimize classification accuracy, and different types of gradient descent algorithms. The tutorial included step-by-step instructions for using PyTorch to load data and create, train, and test the DL model.

Both tutorial leaders are expanding their data science skills on the job. Gonzales came to the Lab as an administrator in 2016 and later changed careers with the help of LLNL’s Education Assistance Program (EAP) and Data Science Immersion Program. Now a GS-CAD data scientist, she is pursuing a Master’s in Data Science via a Johns Hopkins distance-learning program. Jaffe was a Lab intern who was hired full time in 2016 after earning undergraduate and graduate degrees in Computer Engineering from Northeastern University. He is now using the EAP to fund PhD studies in Computer Vision at UC Berkeley. The team hopes to present their tutorial again to the Lab’s incoming summer interns.

Continuity Is Crucial

The Lab has held four hackathons virtually since the COVID-19 pandemic began, and Goldman emphasized the importance of continuing the event. “We’ve been out of the office for almost a year. Several new staff haven’t been onsite or met colleagues in person, so virtual events are crucial,” he stated.

Although online attendance has not been as high as with in-person hackathons, this winter event saw a steady participation of 30–35 hackers throughout. Bellig said, “As much as I missed the energy of an in-person hackathon, I was quite impressed with all the people who participated virtually and, once again, with the presentations and hacking accomplishments of my fellow employees.”

These circumstances haven’t dampened enthusiasm for the event. Dana, who joined GS-CAD in January 2020, volunteered to help organize the event even though he had not attended a previous hackathon. “I wanted to learn more about how data science is applied throughout the Lab, and network with some of incredible talent and research that is being done,” he said. Gonzales added, “I am definitely interested in participating in future hackathons.”