[Internship at INRIA] Robust PCA for Traffic Prediction
Dates: July 2021 - Aug. 2021
During summer 2021, I did an internship at INRIA (French Institute for Research in Computer Science), supervised by Jean-Marc Lasgouttes (RITS team) and Cyril Furtlehner (LRI team). They had developed a machine learning algorithm for traffic prediction [Furtlehner 2022], and they were wondering whether the algorithm could be improved by preprocessing the data with a robust PCA algorithm. Indeed, the input data is that of hundreds of sensors subject to errors and failures. However, it was not so simple to clean the data, because unpredictable events happen a lot in traffic.
Concretely, I was in charge of programming a Robust PCA (Principal Component Analysis), inspired from [Hu 2021]. In simple terms, PCA is a method to “smooth” data (remove noise caused by randomness and errors, to keep only the important information). Its Robust version is more complete because it allows to control in a much more precise way the way one smoothes the data, and it works with missing data. In particular, it allows to take better advantage of correlations between sensors, between days and between weeks. Another advantage is that by carefully designing regularization methods, I was able to separate the abnormal events into two categories: those observed by all the sensors (traffic jams) and those measured by only one sensor (bugs). Then we had to determine what we wanted to keep: above all, we wanted to remove all unpredictable events. As the Robust PCA, in addition to detecting these anomalies, proposes a correction, there are several ways to use its result for the traffic prediction algorithm.
References
[1] Cyril Furtlehner, Jean-Marc Lasgouttes, Alessandro Attanasi, Marco Pezzulla, and Guido Gentile. Short-term forecasting of urban traffic using spatio-temporal Markov field. IEEE Transactions on Intelligent Transportation Systems, 23(8):10858–10867, 2022. (PDF) DOI:10.1109/TITS.2021.3096798
[2] Yue Hu and Daniel B. Work. 2021. Robust Tensor Recovery with Fiber Outliers for Traffic Events. ACM Trans. Knowl. Discov. Data 15, 1 (December 2021). DOI:10.1145/3417337