Abstract
Big data is considered as one of the most essential and promising future technology areas and has been attracting a lot of communities' attention. As big data has been evolving, introducing challenges and problems caused by the exponential growth of data. Efficient data processing is crucial in reducing learning curves, simplifying maintenance efforts, and decreasing operational complexity. This talk addresses the challenges of managing separate batch and streaming data pipelines, including the complexity of maintaining multiple codebases and implementation efforts. It introduces a unified data pipeline architecture that efficiently handles both real-time streaming and periodic batch backfilling within a single codebase. This talk will also cover some tips for getting hands-on experience to land a big data job.
Biography
Xiang Liu is a visiting assistant professor of practice at NYU Shanghai. She received her PhD degree in computer science from NYU Tandon. Her research interests include big data, machine learning, and recommender systems. Prior to joining NYU Shanghai, Xiang worked as a staff machine learning engineer at Block, and a senior data engineer at Spotify.
