Development of Punjabi-English (PunEng) Parallel Corpus for Machine Translation System
DOI:
https://doi.org/10.14419/ijet.v7i2.10762Keywords:
English, Machine Translation, Parallel Corpus, Punjabi, Puneng CorpusAbstract
This paper describes the creation process and statistics of Punjabi English (PunEng) parallel corpus. Parallel corpus is the main requirement to develop statistical machine translation as well as neural machine translation. Until now, we do not have any availability of PunEng parallel corpus. In this paper, we have shown difficulties and intensive labor to develop parallel corpus. Methods used for collecting data and the results are discussed, errors during the process of collecting data and how to handle these errors will be described.
References
[1] M. Post, C. Callison-Burch, and M. Osborne, “Constructing parallel corpora for six Indian languages via crowdsourcing,†Wmt-2012, pp. 401–409, 2012.
[2] A. Kunchukuttan, P. Mehta, and P. Bhattacharyya, “The IIT Bombay English-Hindi Parallel Corpus,†pp. 2–5, 2017.
[3] V. Goyal and G. S. Lehal, “Hindi to Punjabi machine translation system,†Commun. Comput. Inf. Sci., vol. 139 CCIS, no. 1, pp. 236–241, 2011.
Webliography
[W1]https://en.wikipedia.org/wiki/Punjabi_language
[W2] http://www.lancaster.ac.uk/fass/projects/corpus/emille/
[W3]http://www.lancaster.ac.uk/fass/projects/corpus/emille/MAUAL.htm
[W4]http://tdildc.in/index.php?option=com_download&task=fsearch&lang=en&limitstart=15&limit=5
[W5]http://www.statmt.org/wmt16/translation-task.html
[W6]https://translate.google.com/
[W7]https://www.wikipedia.org/
[W8]http://tdildc.in/index.php?option=com_download&task=showresourceDetails&toolid=281&lang=en
Downloads
How to Cite
Received date: March 28, 2018
Accepted date: April 6, 2018
Published date: May 10, 2018