A new information criterion combined with cross-validation method to estimate generalization capability

Yasuhiro Wada

ATR Auditory and Visual Perception Research Laboratories, Kyoto, Japan 619-02

Yasuhiro Wada: received a B.E. and an M.E. in Engineering from Tokyo Institute of Technology in 1980 and 1982, respectively. In 1982, he joined Kawasaki Steel Co., Ltd. Since 1989, he has been on loan to ATR Auditory and Visual Perception Research Laboratories. His research interests include neural network models and motor learning control.

Search for more papers by this author

Mitsuo Kawato,

Mitsuo Kawato

Members

ATR Auditory and Visual Perception Research Laboratories, Kyoto, Japan 619-02

Mitsuo Kawato: received a B.S. in Physics from Tokyo University in 1976 and an M.E. and a Ph.D. in Biophysical Engineering from Osaka University in 1978 and 1981, respectively. From 1981 to 1987, he was a member of the faculty of Osaka University. In 1987, he became University Lecturer of Biophysical Engineering, Faculty of Engineering Science, Osaka University. In 1988, he moved to ATR Auditory and Visual Perception Research Laboratories. His research interests include computational neuroscience and its application to engineering problems.

Search for more papers by this author

Yasuhiro Wada,

Yasuhiro Wada

ATR Auditory and Visual Perception Research Laboratories, Kyoto, Japan 619-02

Search for more papers by this author

Mitsuo Kawato,

Mitsuo Kawato

Members

ATR Auditory and Visual Perception Research Laboratories, Kyoto, Japan 619-02

Search for more papers by this author

First published: 1992

https://doi.org/10.1002/scj.4690230409

Citations: 3

About

PDF

Tools

Share a link

Email
Wechat
Bluesky

Abstract

Neural network learning processes use only a limited number of examples of a given problem. Thus, generally speaking, it is not necessarily theoretically guaranteed that the trained network can give correct answers for unknown examples. A new method of selecting the optimal neural network structure with maximum generalization capability is proposed. In statistical mathematics, several information criteria, such as AIC (Akaike's information criterion), BIC (Bayesian information criterion), and MDL (minimum description length), are used widely to select a suitable model. Applications of these criteria were quite successful, especially for linear models. These criteria assume that the model parameters are estimated correctly by using the maximum likelihood method. Unfortunately, however, this assumption does not hold for conventional iterative learning processes such as backpropagation in multilayer perceptrons or Boltzmann machine learning. Thus, we should not apply AIC directly to the selection of the optimal neural network structure.

In this paper, by expanding AIC, a new information criterion is proposed that can estimate generalization capability without the maximum likelihood estimator of synaptic weights. The cross-validation method is used to calculate the new information criterion. By computer simulation, we show that the proposed information criterion can accurately predict the generalization capability of multilayer perceptrons, and thus the optimal number of hidden units can be determined.

Reference

1 H. Akaike. A new look at the statistical model identification. IEEE, Trans. Automatic Control, AC-19, 6, pp. 716–723 (1974).
CAS Google Scholar
2 E. B. Baum and D. Haussler. What size net gives valid generalization? Neural Computation, 1, pp. 151–160 (1989).
10.1162/neco.1989.1.1.151
Web of Science® Google Scholar
3 B. Efron. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7, 1, pp. 1–26 (1979).
10.1214/aos/1176344552
Web of Science® Google Scholar
4 K. Funahashi. On the approximate realization of continuous mappings by neural networks. Neural Networks, 2, pp. 183–192 (1989).
10.1016/0893-6080(89)90003-8
Web of Science® Google Scholar
5 B. Irie and S. Miyake. Capabilities of three-layered perceptrons. IEEE, International Conference on Neural Networks, pp. 641–648 (1988).
Web of Science® Google Scholar
6 M. Ishiguro and Y. Sakamoto. EFIC: An estimator-free information criterion. Research Memorandum #381, The Institute of Statistical Mathematics (1990).
Google Scholar
7 T. Kurita. A method to determine the number of hidden units of three-layered neural networks by information criteria. I.E.I.C.E., Japan, J73-D-II, pp. 1872–1878 (1990).
Google Scholar
8 N. Tishby, E. Levin and S. Solla. Consistent Inference of Probabilities in Layered Networks: Predictions and Generalization. Proceedings of the International Joint Conference on Neural Networks, Washington, DC, pp. II403–409. IEEE (1989).
Google Scholar
9 N. Toda, K. Hagiwara and S. Usui. Data fitting by multilayered neural network— Decision of network structure via information criterions. Technical Report, I.E.I.C.E., Japan, NC89-100 (1990).
Google Scholar
10 Y. S. Abu-Mostafa. The Vapnik-Chervonenkis dimension: Information versus complexity in learning. Neural Computation, 1, pp. 312–317 (1989).
10.1162/neco.1989.1.3.312
Web of Science® Google Scholar

Citing Literature

Volume23, Issue4

1992

Pages 92-104

A new information criterion combined with cross-validation method to estimate generalization capability

Abstract

Reference

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

A new information criterion combined with cross-validation method to estimate generalization capability

Abstract

Reference

Citing Literature

References

Related

Information