Volume 35, Issue 4 e12282

ORIGINAL ARTICLE

FNLP-ONT: A feasible ontology for improving NLP tasks in Persian

Zahra Hosseini Pozveh,

Zahra Hosseini Pozveh

orcid.org/0000-0002-5081-2571

Department of Computer, Science and Research branch, Islamic Azad University, Tehran, Iran

Search for more papers by this author

Amirhassan Monadjemi,

Corresponding Author

Amirhassan Monadjemi

[email protected]

Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran

Correspondence

Amirhassan Monadjemi, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.

Email: [email protected]

Search for more papers by this author

Ali Ahmadi,

Ali Ahmadi

School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

Faculty of Computer Engineering, K.N.Toosi University of Technology, Tehran, Iran

Search for more papers by this author

Zahra Hosseini Pozveh,

Zahra Hosseini Pozveh

orcid.org/0000-0002-5081-2571

Department of Computer, Science and Research branch, Islamic Azad University, Tehran, Iran

Search for more papers by this author

Amirhassan Monadjemi,

Corresponding Author

Amirhassan Monadjemi

[email protected]

Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran

Correspondence

Amirhassan Monadjemi, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.

Email: [email protected]

Search for more papers by this author

Ali Ahmadi,

Ali Ahmadi

School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran

Faculty of Computer Engineering, K.N.Toosi University of Technology, Tehran, Iran

Search for more papers by this author

First published: 10 May 2018

https://doi.org/10.1111/exsy.12282

Citations: 1

Share a link

Email
Wechat
Bluesky

Abstract

Natural language processing is a composition of several error-prone and challenging tasks, including part of speech tagging, word sense disambiguation, named entity recognition, and compound verb detection. Studying intrasentence relations and roles is essential to improve the mentioned subtasks. Semi-automatic schemes such as ontologies can be applied to clarify word's dependencies. This paper presents an ontology that is targeting to improve POS tagging, WSD, NER, and compound verb detection in Persian with extra properties that may ameliorate machine translation. The ontology is tested in combinations with several state-of-art algorithms on Dadegan corpus. The results show that coping semantic analysis with machine learning methods enhance relation detection and consequently precision of the mentioned subtasks, which is not widely addressed in Persian. Furthermore, the experimental results declare that the accuracy rate increases between 4.5 and 23% for different tasks.

CONFLICTS OF INTEREST

None.

REFERENCES

Ahmadi, F., & Moradi, H. (2015). A hybrid method for Persian named entity recognition. Information and Knowledge Technology, Urumia, Iran, IEEE, pp. 1–7.
Google Scholar
Alba, E., Luque, G., & Araujo, L. (2006). Natural language tagging with genetic algorithms. Information Processing Letters, 100(5), 173–182. https://doi.org/10.1016/j.ipl.2006.07.002
10.1016/j.ipl.2006.07.002
Web of Science® Google Scholar
Aldam, R., & Guessoum, A. (2010). Building a neural network-based English to Arabic transfer module from unrestricted domain. In: proceedings of IEEE international conference on Machine and web Intelligence, Algiers: IEEE, pp. 4–101.
Google Scholar
Aleksandrovic, I. (1988). Dependency syntax: Theory and practice. Newyork: State University Press.
Google Scholar
Arab, M., & Azimazadeh, A. (2008). Persian part of speech tagger based on hidden Markov model. In: the statistical analysis of textual data, Lyon, France, ACM, pp. 72–77.
Google Scholar
Asooja, K., Gracia, j., Aggarwal, N., & Gómez-Pérez, A. (2012). Using cross-lingual explicit semantic analysis for improving ontology translation, In:Second ML4HMT workshop. Mumbai, India, Coling, pp.25–36.
Google Scholar
Assi, M., & Abdolhosseini, M. (2000). Grammatical tagging of a Persian corpus. International Journal of Corpus Linguistics, 5(1), 69–82. https://doi.org/10.1075/ijcl.5.1.05ass
10.1075/ijcl.5.1.05ass
Google Scholar
Attia, M., Rashwan, M., & Al-Badrashiny, M. (2009). Fassieh®, a semi-automatic visual interactive tool for morphological, PoS-Tags, phonetic, and semantic annotation of Arabic text corpora. Audio, Speech, and Language Processing. IEEE Transactions on Audio, Speech, and Language Processing, 17(5), 916–992. https://doi.org/10.1109/TASL.2009.2019298
10.1109/TASL.2009.2019298
Google Scholar
Batanović, V., & Bojić, D. (2015). Using part-of-speech tags as deep-syntax indicators in determining short-text semantic similarity. Computer Science and Information Systems., 12(1), 1–31. https://doi.org/10.2298/CSIS131127082B
10.2298/CSIS131127082B
Web of Science® Google Scholar
Beale, A. D. (1985). Grammatical analysis by computer of the Lancaster–Oslo/Bergen (LOB) corpus of British English texts, In: Proceedings of the 23rd annual meeting on Association for Computational Linguistics, Stroudsburg, US, ACL. pp. 93-298.
Google Scholar
Bhatt, M., Hois, J., & Kutz, O. (2012). Ontological modeling of form and function for architectural design. Applied Ontology, 7(3), 233–267. https://doi.org/10.3233/AO-2012-0104
Web of Science® Google Scholar
Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., & Racioppa, S. (2008). Ontology-based information extraction and integration from heterogeneous data sources. International Journal of Human Computer Studies, 9, 759–788. https://doi.org/10.1016/j.ijhcs.2008.07.007
10.1016/j.ijhcs.2008.07.007
Web of Science® Google Scholar
Carpuat, M., & Wu, D. (2007). Improving statistical machine translation using word sense disambiguation. In: the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Denmark, ACL, pp. 61–72.
Google Scholar
Cer, D., Galley, M., Jurafsky, D., & Manning, C. (2010). Phrasal: A toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features, In: Proceedings of the NAACL HLT 2010 demonstration session, Los Angeles, US, HLT, pp. 9–12.
Google Scholar
Chanlekha, H., & Kawtrakul, A. (2004). Thai named entity extraction by incorporating maximum entropy model with simple heuristic information, In the proceeding of IJCNLP, Hainan, China, pp. 49–55.
Google Scholar
Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., & Vaithyanathan, S. (2010). Domain adaptation of rule-based annotators for named-entity recognition tasks. In Proceedings of the 2010 conference on empirical methods in natural language processing (pp. 1002–1012). Massachusetts: US.
Google Scholar
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms, In: Proceedings of the 2002 conference on empirical methods in natural language processing, Philadelphia, pp. 1–8.
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.
Web of Science® Google Scholar
Dadegan Research Group. (2012). Persian dependency treebank, annotation manual and user guide, In: Supreme Council of Information and Communication Technology (SCICT), Tehran.
Google Scholar
Derose, S. J. (1988). Grammatical category disambiguation by statistical optimization. Computational Linguistic, 14, 31–39.
Google Scholar
Eka, T., Kirkegaarda, C., Jonssonb, H., & Nuguesa, P. (2015). Named entity recognition for short text messages. Procedia-Social and Behavioral Sciences, 27, 178–187. https://doi.org/10.1016/j.sbspro.2011.10.596
10.1016/j.sbspro.2011.10.596
Google Scholar
Embley, D. W., Liddle, S., Lonsdale, D., & Tijerino, Y. (2011). Multilingual ontologies for cross-language information extraction and semantic search. In: Proceedings of the 30th international conference on conceptual modelling. Brussels, Belgium, ACL, pp. 147–160.
Google Scholar
Fadaie, H., & Shamsfard, M. (2010). Persian POS-tagging using probabilistic morphological analysis. International Journal of Computer Applications in Technology, 38(4), 264–273. https://doi.org/10.1504/IJCAT.2010.034527
10.1504/IJCAT.2010.034527
Google Scholar
Faili, H., & Rvanbakhsh, H. (2010). Affix augmented stem based language model for Persian. In: 6th conference on natural language processing and knowledge engineering, Beijing: China, IEEE, pp. 250–253.
Google Scholar
Fakhrahmad, S. M., Sadreddini, M. H., & Zolghadri Jahromi, M. (2014). A proposed expert system for word sense disambiguation: Deductive ambiguity resolution based on data mining and forward chaining. Expert Systems, 32(2), 178–191. https://doi.org/10.1111/exsy.12075
10.1111/exsy.12075
Web of Science® Google Scholar
Fersini, E., Messina, E., Felici, G., & Roth, D. (2014). Soft-constrained inference for named entity recognition. Information Processing and Management, 50(5), 807–819. https://doi.org/10.1016/j.ipm.2014.04.005
10.1016/j.ipm.2014.04.005
Web of Science® Google Scholar
Florian, R., Ittycheriah, Abe, Jin, H., & Zhang, T. (2003). Named entity recognition through classifier combination. In: Conference of the North American Chapter of the Association for Computational Linguistics & Human Language Technologies, Edmonton, Canada, pp.168–171.
Google Scholar
Forsati, R., & Shamsfard, M. (2012). Cooperation of evolutionary and statistical PoS-tagging. In: Artificial intelligence and signal processing. Shiraz, Iran, IEEE, pp. 446–451.
Google Scholar
Forsati, R., & Shamsfard, M. (2014). Novel harmony search-based algorithms for part-of-speech tagging. Knowledge and Information Systems, 42(3), 709–736. https://doi.org/10.1007/s10115-013-0719-6
10.1007/s10115-013-0719-6
Web of Science® Google Scholar
Gale, W. A., Church, K., & Yarowsky, D. (1992). A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26(5), 415–439.
10.1007/BF00136984
Google Scholar
Galton, A. (2014). On generically dependent entities. Applied Ontology, 9(2), 129–153. https://doi.org/10.3233/AO-140133
Web of Science® Google Scholar
Gopal, S., & Haroon, R. P. (2016). Malayalam word sense disambiguation using Naive Bayes classifier. In: International conference on advances in human machine interaction. Karnataka, India, IEEE, pp. 1–4.
Google Scholar
Gruber, T. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5(9), 199–220. https://doi.org/10.1006/knac.1993.1008
10.1006/knac.1993.1008
Google Scholar
Horridge, M. (2011). A practical guide to building OWL ontologies using Protégé 4 and CO-ODE tools ( 1.3 ed.). Manchester: University of Manchester.
Google Scholar
Kardan, A., & Imani, M. B. (2014). Improving Persian POS tagging using the maximum entropy model. In: Intelligent systems. Bam, Iran, IEEE, pp. 1–5.
Google Scholar
Khalifehsoltani, S. N., Cholmaghani, A., Vahdani, A., & Moallemi, R. (2010). Building a large Persian verb collection: A generative approach. In Computer engineering and technology (pp. 677–687). Chengdu, China: IEEE.
10.1109/ICCET.2010.5485686
Google Scholar
Klien, S., & Simons, R. F. (1963). A computational approach to grammatical coding of English word. Journal of the ACM, 10(3), 334–347.
10.1145/321172.321180
Web of Science® Google Scholar
Knight, K., & Luk, S. (1994). Building an average knowledge base for machine translation. In Proceedings of the twelfth national conference on artificial intelligence (pp. 773–778). Washington, US: AAAI.
Google Scholar
Knoll, B., Melton, G. B., Liu, H., Xu, H., & Pakhomov, S. (2016). Using synthetic clinical data to train an HMM-based POS tagger. In International conference on biomedical and health informatics (pp. 252–255). Las Vegas, US: IEEE.
10.1109/BHI.2016.7455882
Google Scholar
Kwong, O. Y. (2013). New perspectives on computational and cognitive strategies for word sense disambiguation ( First ed.). New York: Springer-Verlag.
10.1007/978-1-4614-1320-2
Google Scholar
Law, T., Itoh, H., & Seki, H. (1993). A neural network assisted Japanese–English machine translation system. In: International joint conference of neural networks. Nagoya, Japan IEEE, pp.2905–2908.
Google Scholar
Lee, H., & Shon, M. (2013). Tag-based integrated semantic ontology construction and evolution. In: Seventh international conference on innovative mobile and internet services in ubiquitous computing. Taichung, Taiwan, ACM, pp. 221–227.
Google Scholar
Levy, M., Belin, F., Siskin, C., & Takeuchi, O. (2011). WorldCALL: International perspectives on computer-assisted language learning ( First ed.). Kentucky: Routledge.
10.4324/9780203831762
Google Scholar
Magnini, B., Strapparava, C., Pezzulo, G., & Gliozzo, A. (2002). The role of domain information in word sense disambiguation. Natural Language Engineering, 8(4), 359–373.
10.1017/S1351324902003029
Google Scholar
McCallum, A., & Lafferty, J. (2001). Conditional random fields: Probabilistic models for segmenting and labelling sequence data. In Proceedings of the 18th international conference on machine learning ICML (pp. 282–289). Williamstown: US.
Google Scholar
McCarthy, D., Koeling, R., Weeds, J., & Carroll, J. (2007). Unsupervised acquisition of predominant word senses. Computational Linguistics., 33(4), 553–590. https://doi.org/10.1162/coli.2007.33.4.553
10.1162/coli.2007.33.4.553
Web of Science® Google Scholar
Mohseni, M., & Minaei, B. (2010). A Persian part-of-speech tagger based on morphological analysis. In: International conference on language resources and evaluation. Malta, pp. 1253–1257.
Google Scholar
Murugesan, G., Abdulkadhar, S., Bhasuran, B., & Natarajan, J. (2017). BCC-NER: Bidirectional, contextual clues named entity tagger for gene/protein mention recognition. Journal on Bioinformatics and Systems Biology, 7. https://doi.org/10.1186/s13637-017-0060-6
Google Scholar
Navigli, R., & Di Marco, A. (2013). Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics., 39(3), 709–754. https://doi.org/10.1162/COLI_a_00148
10.1162/COLI_a_00148
Web of Science® Google Scholar
Nirenburg, S., & Raskin, V. (2001). Ontological semantics, formal ontology, and ambiguity. In Proceedings of the international conference on formal ontology in information systems (pp. 151–161). New York, US: ACM.
10.1145/505168.505183
Google Scholar
Nirve, J. (2005). Dependency grammar and dependency parsing. Sweden: MSI report of Växjö University.
Google Scholar
Nourian, A., Rasooli, M. S., Imany, M., & Faili, H. (2015). On the importance of Ezafe construction in Persian parsing. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China, ACL, pp. 877–882.
Google Scholar
Okhovvat, M., & Minaei, B. (2011). A hidden Markov model for Persian part-of-speech tagging. Procedia Computer Science, 3, 977–981. https://doi.org/10.1016/j.procs
10.1016/j.procs.2010.12.160
Google Scholar
Oroumchian, F., Tasharofi, S., & Raja, F. (2006). Creating a feasible corpus for Persian POS tagging. Technical report: University of Wollongong, Dubai.
Google Scholar
Pailai, J., Kongkachandra, R., Suppnithi, T., & Boonkwan, P. (2013). A comparative study on different techniques for Thai part-of-speech tagging. In: Electrical engineering/electronics computer telecommunications and information technology. Krabi, Thailand.
Google Scholar
Pakzad, A., & Minaei, B. (2016). An improved joint model: POS tagging and dependency parsing. Journal of AI and Data Mining, 4(1), 1–8. https://doi.org/10.5829/IDOSI.JAIDM.2016.04.01.01
Google Scholar
Pilehvar, M. T., & Navigli, R. (2014). A large-scale pseudoword-based evaluation framework for state-of-the-art word sense disambiguation. Computational Linguistics, 40(4), 837–881. https://doi.org/10.1162/COLI_a_00202
10.1162/COLI_a_00202
Web of Science® Google Scholar
Pla, F., & Molina, A. (2004). Improving part-of-speech tagging using lexicalized HMMs. Natural Lnguage Engineering, 10(2), 167–189. https://doi.org/10.1017/S1351324904003353
10.1017/S1351324904003353
Google Scholar
Rasooli, M.S., Kuhestani, M., & Moloodi, A. (2013). Development of a Persian syntactic dependency treebank. In: The North American chapter of the association for computational linguistics: human language technologies NAACL HLT. Atlanta, Georgia, ACL, pp.133–142.
Google Scholar
Rasooli, M. S., Moloodi, M. Kouhestani, A., & Minaei, B. (2011). A syntactic valence lexicon for Persian verbs: The first steps towards Persian dependency treebank. In: 5th language and technology conference LTC: Human language technologies as a challenge for computer science and linguistics. Pozenan, Poland, pp. 227–231.
Google Scholar
Ravi, S., & Knight, K. (2009). Minimized models for unsupervised part-of-speech tagging. In:09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudsburg, US, ACL, pp. 504–512.
Google Scholar
Ravi, S., Vaswani, A., Knight, K., & Chiang, D. (2010). Fast, greedy model minimization for unsupervised tagging, in: Proceeding of the 23rd international conference on computational linguistics. Beijing, China, ACM, pp.940–948.
Google Scholar
Reed, S., & Pease, A. (2014). A framework for constructing cognition ontologies using WordNet, FrameNet, and SUMO. Cognitive System Research., 33, 122–144. https://doi.org/10.1016/j.cogsys.2014.06.001
10.1016/j.cogsys.2014.06.001
Web of Science® Google Scholar
Sarrafzadeh, B., Yakovets, N., Cercone, N., & An, A. (2011). Cross-lingual word sense disambiguation for languages with scarce resources. Advances in Artificial Intelligence Heidelberg, Germany, Springer, pp.347–358.
Google Scholar
Seraji, M., Megyesi, B., & Nivre, J. (2012). Dependency parser for Persian. In: Proceedings of the 10th workshop on Asian language resources. Mumbai, India, COLING, pp.35–44.
Google Scholar
Shamsfard, M., Hesab, A., Fadaei, H., Mansoory, N., Famian, A., & Bagherbeigi, S. (2010). Semi-automatic development of FarsNet; the Persian WordNet. In: Proceedings of 5th global WordNet conference, Mumbai, India, pp. 1–8.
Google Scholar
Soltani, M., & Faili, H. (2010). A statistical approach on Persian word sense disambiguation. In: 7th international conference on informatics and systems, Cairo, Egypt, IEEE, pp.1–6.
Google Scholar
Tesni'ere, L. (1959). El'ements de syntaxe structural ( First ed.). Paris: Klincksieck.
Google Scholar
Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, IEEE, pp. 260–269.
Google Scholar
Wong, M. L. (2014). Verb-preposition constructions in Hong Kong English: A cognitive semantic account. Linguistics, 52(3), 20–47. https://doi.org/10.1515/ling-2014-0001
10.1515/ling-2014-0001
Web of Science® Google Scholar
Zabihi, M., & Akbarzadeh, M. (2012). Generalized fuzzy C-means clustering with improved fuzzy partitions and shadowed sets. International Scholarly Research Notices, 2012, 1–6. https://doi.org/10.5402/2012/929085
Google Scholar

Citing Literature

Volume35, Issue4

Fourth special issue on knowledge discovery and business intelligence

August 2018

e12282

FNLP-ONT: A feasible ontology for improving NLP tasks in Persian

Abstract

CONFLICTS OF INTEREST

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

FNLP-ONT: A feasible ontology for improving NLP tasks in Persian

Abstract

CONFLICTS OF INTEREST

REFERENCES

Citing Literature

References

Related

Information