FNLP-ONT: A feasible ontology for improving NLP tasks in Persian
Zahra Hosseini Pozveh
Department of Computer, Science and Research branch, Islamic Azad University, Tehran, Iran
Search for more papers by this authorCorresponding Author
Amirhassan Monadjemi
Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran
Correspondence
Amirhassan Monadjemi, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.
Email: [email protected]
Search for more papers by this authorAli Ahmadi
School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
Faculty of Computer Engineering, K.N.Toosi University of Technology, Tehran, Iran
Search for more papers by this authorZahra Hosseini Pozveh
Department of Computer, Science and Research branch, Islamic Azad University, Tehran, Iran
Search for more papers by this authorCorresponding Author
Amirhassan Monadjemi
Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran
Correspondence
Amirhassan Monadjemi, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.
Email: [email protected]
Search for more papers by this authorAli Ahmadi
School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
Faculty of Computer Engineering, K.N.Toosi University of Technology, Tehran, Iran
Search for more papers by this authorAbstract
Natural language processing is a composition of several error-prone and challenging tasks, including part of speech tagging, word sense disambiguation, named entity recognition, and compound verb detection. Studying intrasentence relations and roles is essential to improve the mentioned subtasks. Semi-automatic schemes such as ontologies can be applied to clarify word's dependencies. This paper presents an ontology that is targeting to improve POS tagging, WSD, NER, and compound verb detection in Persian with extra properties that may ameliorate machine translation. The ontology is tested in combinations with several state-of-art algorithms on Dadegan corpus. The results show that coping semantic analysis with machine learning methods enhance relation detection and consequently precision of the mentioned subtasks, which is not widely addressed in Persian. Furthermore, the experimental results declare that the accuracy rate increases between 4.5 and 23% for different tasks.
CONFLICTS OF INTEREST
None.
REFERENCES
- Ahmadi, F., & Moradi, H. (2015). A hybrid method for Persian named entity recognition. Information and Knowledge Technology, Urumia, Iran, IEEE, pp. 1–7.
- Alba, E., Luque, G., & Araujo, L. (2006). Natural language tagging with genetic algorithms. Information Processing Letters, 100(5), 173–182. https://doi.org/10.1016/j.ipl.2006.07.002
- Aldam, R., & Guessoum, A. (2010). Building a neural network-based English to Arabic transfer module from unrestricted domain. In: proceedings of IEEE international conference on Machine and web Intelligence, Algiers: IEEE, pp. 4–101.
- Aleksandrovic, I. (1988). Dependency syntax: Theory and practice. Newyork: State University Press.
- Arab, M., & Azimazadeh, A. (2008). Persian part of speech tagger based on hidden Markov model. In: the statistical analysis of textual data, Lyon, France, ACM, pp. 72–77.
- Asooja, K., Gracia, j., Aggarwal, N., & Gómez-Pérez, A. (2012). Using cross-lingual explicit semantic analysis for improving ontology translation, In:Second ML4HMT workshop. Mumbai, India, Coling, pp.25–36.
- Assi, M., & Abdolhosseini, M. (2000). Grammatical tagging of a Persian corpus. International Journal of Corpus Linguistics, 5(1), 69–82. https://doi.org/10.1075/ijcl.5.1.05ass
10.1075/ijcl.5.1.05ass Google Scholar
- Attia, M., Rashwan, M., & Al-Badrashiny, M. (2009). Fassieh®, a semi-automatic visual interactive tool for morphological, PoS-Tags, phonetic, and semantic annotation of Arabic text corpora. Audio, Speech, and Language Processing. IEEE Transactions on Audio, Speech, and Language Processing, 17(5), 916–992. https://doi.org/10.1109/TASL.2009.2019298
10.1109/TASL.2009.2019298 Google Scholar
- Batanović, V., & Bojić, D. (2015). Using part-of-speech tags as deep-syntax indicators in determining short-text semantic similarity. Computer Science and Information Systems., 12(1), 1–31. https://doi.org/10.2298/CSIS131127082B
- Beale, A. D. (1985). Grammatical analysis by computer of the Lancaster–Oslo/Bergen (LOB) corpus of British English texts, In: Proceedings of the 23rd annual meeting on Association for Computational Linguistics, Stroudsburg, US, ACL. pp. 93-298.
- Bhatt, M., Hois, J., & Kutz, O. (2012). Ontological modeling of form and function for architectural design. Applied Ontology, 7(3), 233–267. https://doi.org/10.3233/AO-2012-0104
- Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., & Racioppa, S. (2008). Ontology-based information extraction and integration from heterogeneous data sources. International Journal of Human Computer Studies, 9, 759–788. https://doi.org/10.1016/j.ijhcs.2008.07.007
- Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. In: the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Denmark, ACL, pp. 61–72.
- Cer, D., Galley, M., Jurafsky, D., & Manning, C. (2010). Phrasal: A toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features, In: Proceedings of the NAACL HLT 2010 demonstration session, Los Angeles, US, HLT, pp. 9–12.
- Chanlekha, H., & Kawtrakul, A. (2004). Thai named entity extraction by incorporating maximum entropy model with simple heuristic information, In the proceeding of IJCNLP, Hainan, China, pp. 49–55.
- Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., & Vaithyanathan, S. (2010). Domain adaptation of rule-based annotators for named-entity recognition tasks. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 1002–1012). Massachusetts: US.
- Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms, In: Proceedings of the 2002 conference on empirical methods in natural language processing, Philadelphia, pp. 1–8.
- Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.
- Dadegan Research Group. (2012). Persian dependency treebank, annotation manual and user guide, In: Supreme Council of Information and Communication Technology (SCICT), Tehran.
- Derose, S. J. (1988). Grammatical category disambiguation by statistical optimization. Computational Linguistic, 14, 31–39.
- Eka, T., Kirkegaarda, C., Jonssonb, H., & Nuguesa, P. (2015). Named entity recognition for short text messages. Procedia-Social and Behavioral Sciences, 27, 178–187. https://doi.org/10.1016/j.sbspro.2011.10.596
10.1016/j.sbspro.2011.10.596 Google Scholar
- Embley, D. W., Liddle, S., Lonsdale, D., & Tijerino, Y. (2011). Multilingual ontologies for cross-language information extraction and semantic search. In: Proceedings of the 30th international conference on conceptual modelling. Brussels, Belgium, ACL, pp. 147–160.
- Fadaie, H., & Shamsfard, M. (2010). Persian POS-tagging using probabilistic morphological analysis. International Journal of Computer Applications in Technology, 38(4), 264–273. https://doi.org/10.1504/IJCAT.2010.034527
10.1504/IJCAT.2010.034527 Google Scholar
- Faili, H., & Rvanbakhsh, H. (2010). Affix augmented stem based language model for Persian. In: 6th conference on natural language processing and knowledge engineering, Beijing: China, IEEE, pp. 250–253.
- Fakhrahmad, S. M., Sadreddini, M. H., & Zolghadri Jahromi, M. (2014). A proposed expert system for word sense disambiguation: Deductive ambiguity resolution based on data mining and forward chaining. Expert Systems, 32(2), 178–191. https://doi.org/10.1111/exsy.12075
- Fersini, E., Messina, E., Felici, G., & Roth, D. (2014). Soft-constrained inference for named entity recognition. Information Processing and Management, 50(5), 807–819. https://doi.org/10.1016/j.ipm.2014.04.005
- Florian, R., Ittycheriah, Abe, Jin, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In: Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies, Edmonton, Canada, pp.168–171.
- Forsati, R., & Shamsfard, M. (2012). Cooperation of evolutionary and statistical PoS-tagging. In: Artificial intelligence and signal processing. Shiraz, Iran, IEEE, pp. 446–451.
- Forsati, R., & Shamsfard, M. (2014). Novel harmony search-based algorithms for part-of-speech tagging. Knowledge and Information Systems, 42(3), 709–736. https://doi.org/10.1007/s10115-013-0719-6
- Gale, W. A., Church, K., & Yarowsky, D. (1992). A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26(5), 415–439.
10.1007/BF00136984 Google Scholar
- Galton, A. (2014). On generically dependent entities. Applied Ontology, 9(2), 129–153. https://doi.org/10.3233/AO-140133
- Gopal, S., & Haroon, R. P. (2016). Malayalam word sense disambiguation using Naive Bayes classifier. In: International conference on advances in human machine interaction. Karnataka, India, IEEE, pp. 1–4.
- Gruber, T. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5(9), 199–220. https://doi.org/10.1006/knac.1993.1008
10.1006/knac.1993.1008 Google Scholar
- Horridge, M. (2011). A practical guide to building OWL ontologies using Protégé 4 and CO-ODE tools ( 1.3 ed.). Manchester: University of Manchester.
- Kardan, A., & Imani, M. B. (2014). Improving Persian POS tagging using the maximum entropy model. In: Intelligent systems. Bam, Iran, IEEE, pp. 1–5.
- Khalifehsoltani, S. N., Cholmaghani, A., Vahdani, A., & Moallemi, R. (2010). Building a large Persian verb collection: A generative approach. In Computer engineering and technology (pp. 677–687). Chengdu, China: IEEE.
10.1109/ICCET.2010.5485686 Google Scholar
- Klien, S., & Simons, R. F. (1963). A computational approach to grammatical coding of English word. Journal of the ACM, 10(3), 334–347.
- Knight, K., & Luk, S. (1994). Building an average knowledge base for machine translation. In Proceedings of the twelfth national conference on artificial intelligence (pp. 773–778). Washington, US: AAAI.
- Knoll, B., Melton, G. B., Liu, H., Xu, H., & Pakhomov, S. (2016). Using synthetic clinical data to train an HMM-based POS tagger. In International conference on biomedical and health informatics (pp. 252–255). Las Vegas, US: IEEE.
10.1109/BHI.2016.7455882 Google Scholar
- Kwong, O. Y. (2013). New perspectives on computational and cognitive strategies for word sense disambiguation ( First ed.). New York: Springer-Verlag.
10.1007/978-1-4614-1320-2 Google Scholar
- Law, T., Itoh, H., & Seki, H. (1993). A neural network assisted Japanese–English machine translation system. In: International joint conference of neural networks. Nagoya, Japan IEEE, pp.2905–2908.
- Lee, H., & Shon, M. (2013). Tag-based integrated semantic ontology construction and evolution. In: Seventh international conference on innovative mobile and internet services in ubiquitous computing. Taichung, Taiwan, ACM, pp. 221–227.
- Levy, M., Belin, F., Siskin, C., & Takeuchi, O. (2011). WorldCALL: International perspectives on computer-assisted language learning ( First ed.). Kentucky: Routledge.
10.4324/9780203831762 Google Scholar
- Magnini, B., Strapparava, C., Pezzulo, G., & Gliozzo, A. (2002). The role of domain information in word sense disambiguation. Natural Language Engineering, 8(4), 359–373.
10.1017/S1351324902003029 Google Scholar
- McCallum, A., & Lafferty, J. (2001). Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In Proceedings of the 18th international conference on machine learning ICML (pp. 282–289). Williamstown: US.
- McCarthy, D., Koeling, R., Weeds, J., & Carroll, J. (2007). Unsupervised acquisition of predominant word senses. Computational Linguistics., 33(4), 553–590. https://doi.org/10.1162/coli.2007.33.4.553
- Mohseni, M., & Minaei, B. (2010). A Persian part-of-speech tagger based on morphological analysis. In: International conference on language resources and evaluation. Malta, pp. 1253–1257.
- Murugesan, G., Abdulkadhar, S., Bhasuran, B., & Natarajan, J. (2017). BCC-NER: Bidirectional, contextual clues named entity tagger for gene/protein mention recognition. Journal on Bioinformatics and Systems Biology, 7. https://doi.org/10.1186/s13637-017-0060-6
- Navigli, R., & Di Marco, A. (2013). Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics., 39(3), 709–754. https://doi.org/10.1162/COLI_a_00148
- Nirenburg, S., & Raskin, V. (2001). Ontological semantics, formal ontology, and ambiguity. In Proceedings of the international conference on formal ontology in information systems (pp. 151–161). New York, US: ACM.
10.1145/505168.505183 Google Scholar
- Nirve, J. (2005). Dependency grammar and dependency parsing. Sweden: MSI report of Växjö University.
- Nourian, A., Rasooli, M. S., Imany, M., & Faili, H. (2015). On the importance of Ezafe construction in Persian parsing. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China, ACL, pp. 877–882.
- Okhovvat, M., & Minaei, B. (2011). A hidden Markov model for Persian part-of-speech tagging. Procedia Computer Science, 3, 977–981. https://doi.org/10.1016/j.procs
10.1016/j.procs.2010.12.160 Google Scholar
- Oroumchian, F., Tasharofi, S., & Raja, F. (2006). Creating a feasible corpus for Persian POS tagging. Technical report: University of Wollongong, Dubai.
- Pailai, J., Kongkachandra, R., Suppnithi, T., & Boonkwan, P. (2013). A comparative study on different techniques for Thai part-of-speech tagging. In: Electrical engineering/electronics computer telecommunications and information technology. Krabi, Thailand.
- Pakzad, A., & Minaei, B. (2016). An improved joint model: POS tagging and dependency parsing. Journal of AI and Data Mining, 4(1), 1–8. https://doi.org/10.5829/IDOSI.JAIDM.2016.04.01.01
- Pilehvar, M. T., & Navigli, R. (2014). A large-scale pseudoword-based evaluation framework for state-of-the-art word sense disambiguation. Computational Linguistics, 40(4), 837–881. https://doi.org/10.1162/COLI_a_00202
- Pla, F., & Molina, A. (2004). Improving part-of-speech tagging using lexicalized HMMs. Natural Lnguage Engineering, 10(2), 167–189. https://doi.org/10.1017/S1351324904003353
10.1017/S1351324904003353 Google Scholar
- Rasooli, M.S., Kuhestani, M., & Moloodi, A. (2013). Development of a Persian syntactic dependency treebank. In: The North American chapter of the association for computational linguistics: human language technologies NAACL HLT. Atlanta, Georgia, ACL, pp.133–142.
- Rasooli, M. S., Moloodi, M. Kouhestani, A., & Minaei, B. (2011). A syntactic valence lexicon for Persian verbs: The first steps towards Persian dependency treebank. In: 5th language and technology conference LTC: Human language technologies as a challenge for computer science and linguistics. Pozenan, Poland, pp. 227–231.
- Ravi, S., & Knight, K. (2009). Minimized models for unsupervised part-of-speech tagging. In:09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudsburg, US, ACL, pp. 504–512.
- Ravi, S., Vaswani, A., Knight, K., & Chiang, D. (2010). Fast, greedy model minimization for unsupervised tagging, in: Proceeding of the 23rd international conference on computational linguistics. Beijing, China, ACM, pp.940–948.
- Reed, S., & Pease, A. (2014). A framework for constructing cognition ontologies using WordNet, FrameNet, and SUMO. Cognitive System Research., 33, 122–144. https://doi.org/10.1016/j.cogsys.2014.06.001
- Sarrafzadeh, B., Yakovets, N., Cercone, N., & An, A. (2011). Cross-lingual word sense disambiguation for languages with scarce resources. Advances in Artificial Intelligence Heidelberg, Germany, Springer, pp.347–358.
- Seraji, M., Megyesi, B., & Nivre, J. (2012). Dependency parser for Persian. In: Proceedings of the 10th workshop on Asian language resources. Mumbai, India, COLING, pp.35–44.
- Shamsfard, M., Hesab, A., Fadaei, H., Mansoory, N., Famian, A., & Bagherbeigi, S. (2010). Semi-automatic development of FarsNet; the Persian WordNet. In: Proceedings of 5th global WordNet conference, Mumbai, India, pp. 1–8.
- Soltani, M., & Faili, H. (2010). A statistical approach on Persian word sense disambiguation. In: 7th international conference on informatics and systems, Cairo, Egypt, IEEE, pp.1–6.
- Tesni'ere, L. (1959). El'ements de syntaxe structural ( First ed.). Paris: Klincksieck.
- Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, IEEE, pp. 260–269.
- Wong, M. L. (2014). Verb-preposition constructions in Hong Kong English: A cognitive semantic account. Linguistics, 52(3), 20–47. https://doi.org/10.1515/ling-2014-0001
- Zabihi, M., & Akbarzadeh, M. (2012). Generalized fuzzy C-means clustering with improved fuzzy partitions and shadowed sets. International Scholarly Research Notices, 2012, 1–6. https://doi.org/10.5402/2012/929085