Detection, analysis and prediction of traffic anomalies due to special events
Abstract
The transport system consists of complex microscopic and macroscopic interactions that affect our transport choices, spatial planning, economic activities, safety, CO2 emissions and much more. This thesis focuses on unexpected and unwanted demand fluctuations that we often observe in the network and lead to system failures and cost implications. Significantly low speeds or excessively low flows at an unusual time are only some of the phenomena that may confuse a driver or transport authorities, since they are totally unexpected and frequently there is no obvious explanation for them. The term “anomalies” refers to those non-conforming patterns which appear into a well-defined notion of normal behavior. In the literature, similar phenomena can be described as outliers, exceptions or discordant observations. With the knowledge of the need to understand traffic anomalies and eventually predict them after the localization of historical records spatially and temporally, we start by introducing a methodology that identifies traffic anomalies on traffic networks and correlates them with special events using internet data. We investigate why traffic congestion was occurring as well as why demand fluctuated on days when there were no apparent reasons for such phenomena. The system is evaluated by using Google’s public data set for taxi trips in New York City. A “normality” baseline is defined at the outset and then used in the subsequent study of the demand patterns of individual days to detect outliers. With the use of this approach it is possible to detect fluctuations in demand and to analyze and correlate them with disruptive event scenarios such as extreme weather conditions, public holidays, religious festivities, and parades. Kernel density analysis is used so that the affected areas, as well as the significance of the observed differences compared with the average day, can be depicted. The search for possible explanations for the observed anomalies in the road network has highlighted the huge amount of information that is available on the internet, while stressing the difficulty of retrieving documents that are highly correlated with examined events’ details (location, time of the day, etc.). In the above contexts, we develop a framework that predicts transport demand with a supervised topic modeling algorithm by utilizing information about social events retrieved using various strategies, which made use of search aggregation, natural language processing, and query expansion. It is found that a two-step process produced the highest accuracy for transport demand prediction, where different (but related) queries are used to retrieve an initial set of documents, and then, based on these documents, a final query is constructed that obtains the set of predictive documents. These are then used to model the most discriminating topics related to the transport demand. Having verified that the Internet can give a further insight into demand hotspots’ prediction, we explore time-series data and semantic information combinations using machine learning and deep learning techniques in the context of creating a prediction model that is able to capture in real-time future stressful situations of the studied transportation system. We apply the proposed approaches in event areas in New York using publicly available taxi data. We empirically show that the proposed models are able to significantly reduce the error in the forecasts. The importance of semantic information is highlighted in all presented methods. In addition to the investigation of which types of data can positively contribute to the accuracy of our forecasts, the structure of the model that can perform better using the available information has also been studied more extensively. We mainly focus on the analysis, evaluation, and forecasting of prediction model’s residuals in a realtime taxi demand forecasting framework. We comprise a deep learning architecture that is based on Fully-Connected dense layers. The analysis focuses on areas where significant fluctuations in demand are observed, due to popular venues located in the area. The performance of our proposed two-stage process with the inclusion of residuals’ forecasts, is improved considerably. Overall, the models proposed in this thesis highlight the value of data fusion of text and time series data, as well as the capabilities of information retrieval using query expansion methods. They can be of great value to a broad range of traffic incidents’ management frameworks.