LSU Computer Science, IBM Studying More Effective Data Collection, Analysis, and Sharing Methods

Hao Wang standing next to serverDecember 19, 2023 

BATON ROUGE, LA – As various industry sectors continue their adoption of big data analytics, privacy concerns have grown accordingly. Traditional centralized data analytics require data collection and sharing, which opens the door to leaks and hacks. Federated Analytics (FA), on the other hand, presents a more attractive option for collaborative data science, as it does not collect data to a centralized server.

Even so, FA systems are not without their own issues. Because of their distributed nature and the fact that they do not share data, the accuracy and efficiency of results can be problematic. This is the conundrum that LSU Computer Science Assistant Professor Hao Wang is looking to solve through a nearly $300,000 National Science Foundation grant.

“Skewed data distribution across participating clients in FA leads to severe bias and inconsistency in analytic results compared with traditional analytics on centralized data,” Wang said. “In applications such as healthcare, skewed data is a common issue, leading to unrepresentative analysis results for the overall patient population, as well as unfair and potentially harmful decisions.

“In addition, FA systems apply privacy-preserving techniques to the entire dataset and analysis results to prevent leakage of raw data, leaving poor data utility and analysis efficiency. The noise added to the data…extensively reduces the analysis results’ usefulness and accuracy.”

In short, maximizing data utility without degrading data protection is the goal of Wang’s project. For example, existing FA methods protect an image entirely. Wang instead wants to comprehend the image at the granular level by learning about the background, foreground, main objects, etc. By utilizing that approach, he said it’s not necessary to encrypt the entire image with noise, leading to more accurate data analytics and protection.

To assist with the project, Wang will collaborate with researchers at the IBM T. J. Watson Research Center, the headquarters for IBM research, in Yorktown Heights, NY. The team there is led by Shiqiang Wang, whose background is in data analytics, edge-based artificial intelligence (Edge AI), the Internet of Things, and future wireless systems beyond 5G and 6G.

The proposed FA systems will be extensively evaluated using public datasets on realistic large-scale testbeds at LSU and IBM. These datasets include various data types and content, such as text-based datasets and image-based datasets. IBM Research will share access to internal domain-specific datasets and benchmarks from the manufacturing, retail, and finance industries.

One final aspect of the project, Wang said, will be sharing it with the next generation of data scientists.

“We plan to host summer camps at LSU introducing privacy-preserving data science tutorials to undergraduate and graduate students not only at LSU but throughout Louisiana,” Wang said. “Engineers and researchers from IBM will also be invited to give guest talks and seminars about privacy-preserving data science from the perspective of industry. This will be helpful to close the gap between in-classroom training and industrial expectations for Louisiana students.

“We will break down the research problems in our project and reshape some of the questions to be challenges in the [LSU Geaux Hack] Hackathon. This will attract more students to develop privacy-preserving data science techniques. We plan to further package some modules of the proposed FA system as a playground for K-12 students. For example…we could open an interface for students to adjust the trade-off between the sensitivity and accuracy of privacy-preserving data analytics. This will raise K-12 students’ awareness and interest in data privacy, eventually preparing them to become part of the privacy-preserving data science workforce in Louisiana.”

Like us on Facebook (@lsuengineering) or follow us on Twitter and Instagram (@lsuengineering).​

###

Contact: Joshua Duplechain
Director of Communications
225-578-5706
josh@lsu.edu