A text mining approach to assist the general public in the retrieval of legal documents
Yen-Liang Chen
Department of Information Management, National Central University, Chung-Li, No. 300, Jhongda Road, Jhongli City, Taoyuan County, 32001 Taiwan (R.O.C.)
Search for more papers by this authorYi-Hung Liu
Department of Information Management, National Central University, Chung-Li, No. 300, Jhongda Road, Jhongli City, Taoyuan County, 32001 Taiwan (R.O.C.)
Search for more papers by this authorWu-Liang Ho
Department of Legal Service, Straits Exchange Foundation, No. 536, Beian Road, Zhongshan District, Taipei City, 10465 Taiwan (R.O.C.)
Search for more papers by this authorYen-Liang Chen
Department of Information Management, National Central University, Chung-Li, No. 300, Jhongda Road, Jhongli City, Taoyuan County, 32001 Taiwan (R.O.C.)
Search for more papers by this authorYi-Hung Liu
Department of Information Management, National Central University, Chung-Li, No. 300, Jhongda Road, Jhongli City, Taoyuan County, 32001 Taiwan (R.O.C.)
Search for more papers by this authorWu-Liang Ho
Department of Legal Service, Straits Exchange Foundation, No. 536, Beian Road, Zhongshan District, Taipei City, 10465 Taiwan (R.O.C.)
Search for more papers by this authorAbstract
Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although some previous studies focused on assisting professionals in the retrieval of related legal documents, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms. Because this problem has not been addressed by previous research, this study aims to design a text-mining-based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. The experimental results indicate that our method can help the general public, who are not familiar with professional legal terms, to acquire relevant criminal judgments more accurately and effectively.
References
- Aliguliyev, R.M. (2010). Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization. Computational Intelligence, 26(4), 420–448.
- Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Wokingham, UK: Addison-Wesley.
-
Bergholz, A., De Beer, J., Glahn, S., Moens, M.-F., Paaß, G., & Strobel, S. (2010). New filtering approaches for phishing email. Journal of Computer Security, 18(1), 7–35.
10.3233/JCS-2010-0371 Google Scholar
- Calvo, R.A. (2001). Classifying Financial News With Neural Networks, Proc. of the 6th Australasian Document Computing Symposium.
- Can, F., & Ozkarahan, E. A. (1987) Computation of term/document discrimination values by use of the cover coefficient concept. Journal of the American Society for Information Science, 38(3), 171–183.
- Chen, C.-H., & Chi, J.Y.P. (2010). Use text mining to generate the draft of indictment for prosecutor. PACIS 2010 proceedings. 706–712.
- Chen, Y. L., & Chiu, Y. T. (2011). An IPC-based vector space model for patent retrieval. Information Processing and Management, 47(3), 309–322.
- Chou, S.C., & Hsing, T. P. (2010). Text Mining Technique for Chinese Written Judgment of Criminal Case. IEEE Intelligence and Security Informatics Conference. 113–125.
- Cilibrasi, R. L., & Vitanyi, P. M.B. (2007). The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–383.
- CKIP. (2011). On-line Chinese words segmented service. Retrieved October 1, 2011, from http://ckipsvr.iis.sinica.edu.tw/.
- Cortez, P., Correia, A., Sousa, P., Rocha, M., & Rio, M. (2010). Spam email filtering using network-level properties. Lecture Notes in Computer Science, 6171 LNAI, 476–489.
- Evangelista, A., & Kjos-Hanssen, B. (2006). Google distance between words. Frontiers in Undergraduate Research, University of Connecticut.
- Feldman, R., & Sanger, J. (2007). The text mining handbook: advanced approaches in analyzing unstructured data. New York: Cambridge University Press.
- Herlocker, J.L., Konstan, J.A., Borchers, A., & Riedll, J. (1999). An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd Conference on Research and Development in Information Retrieval (SIGIR'99), (pp. 230–237). New York: ACM Press.
- Hotho, A., Nürnberger, A., & Paaß, G. (2005). A brief survey of text mining. LDV-Forum GLDV Journal for Computational Linguistics and Language Technology, 20(1), 19–62.
- He, Y., & Zhou, D. (2011). Self-training from labeled features for sentiment analysis. Information Processing and Management, 47(4), 606–616.
- Kang, I.S., Na, S.H., Kim, J., & Lee, J.H. (2007). Cluster-based patent retrieval. Information Processing & Management, 43(5), 1173–1182.
- Kaur, J., Yusof, M., Boursier, P., & Ogier, J.-M. (2010) Automated scientific document retrieval. The 2nd International Conference on Computer and Automation Engineering, ICCAE 20105, 732–736.
- Kawai, H., Jatowt, A., Tanaka, K., Kunieda, K., & Yamada, K. (2011). Query expansion and text mining for chronoseeker-search engine for future/past events. IEICE Transactions on Information and Systems, E94-D (3), 552–563.
- Lochbaum, K.E., & Streeter, L.A. (1989). Combining and comparing the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval. Information Processing & Management, 25(6), 665–676.
- Li, Y.J., Luo, C., & Chung, S.M. (2008). Text clustering with feature selection by using statistical data. IEEE Transactions on Knowledge And Data Engineering, 20(5), 641–652.
- Li, N., & Wu, D.D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision Support Systems, 48(2), 354– 3542368.
- Moens, M.F. (2001). Innovative techniques for legal text retrieval. Artificial Intelligence and Law, 29–57.
- Pons-Porrata, A., Berlanga-Llavori, R., & Ruiz-Shulcloper, J. (2007). Topic discovery based on text mining techniques. Information Processing and Management, 43(3), 752–768.
- Salton, G., Wong, A., & Yang, C.S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
- Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.
- Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.
- Salton, G., Allan, J., & Buckley, C. (1994). Automatic structuring and retrieval of large text files. Communications of the ACM, 37(2), 97–108.
- Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3), 538–556.
- Thomaidou, S., & Vazirgiannis, M. (2011). Multiword keyword recommendation system for online advertising. Proceedings of 2011 International Conference on Advances in Social Networks Analysis and Mining, 423–427.
- Trappey, A.J.C., & Trappey, C.V. (2008). An R&D knowledge management method for patent document. Industrial Management & Data Systems, 108(1–2), 245–257.
- Velardi, P., Navigli, R., Cucchiarelli, A., & D'Antonio, F. (2008). A new content-based model for social network analysis. Proceedings of IEEE International Conference on Semantic Computing 2008, 18–25.
- Wang, J., Wang, B., Duan, L.-y., Tian, Q., & Lu, H. (2011). Interactive ads recommendation with contextual search on product topic space. Multimedia Tools and Applications, &– 22.
- Yin, H. (2007). Method and system of knowledge based search engine using text mining. Google Patents, US Patent 7257530.
- Zhang C. L., Zeng D., Li J.X., Wang F.Y., & Zuo W.L. (2009). Sentiment analysis of Chinese documents: From sentence to document level. Journal of the American Society for Information Science and Technology, 60(12), 2474–2487.
- Zheng, W., Milios, E., & Watters, C. (2002) Filtering for medical news items using a machine learning approach. AMIA Annual Symposium Proceedings, 949–953.
- Zheng, R., Li, J., Chen, H., & Huang, Z. (2006). A framework for authorship identification of online messages. Journal of the American Society for Information Science and Technology, 57(3), 378–393.
- Judicial Yuan. (2011). Law and regulations retrieving system of the Judicial Yuan of The Republic of China. Retrieved from http://jirs.judicial.gov.tw/Index.htm
- LexisNexis. (2012). A web-based legal database system. Retrieved from http://www.lexisnexis.com/
- Westlaw. (2012). A web-based legal information database system. Retrieved from http://international.westlaw.com/
- Wikipedia. (2012). The definition of Normailzed Google distance. Retrieved from http://en.wikipedia.org/wiki/Normalized_Google_distance
- Udn News Net. (2011). Union daily news. Retrieved from http://udn.com/NEWS/mainpage.shtml