Karpagam JCS ISSN: 2582 – 8525 (Print), 2583 – 3669 (Online)

Automatic Sentence Alignment Algorithm for Bilingual Hindi –Punjabi Parallel Text

Abstract
This paper presents a project for automatically aligning sentences of bilingual parallel Hindi Punjabi Parallel Texts. Bilingual Hindi-Punjabi Corpus has been collected from resources like CDAC Noida, Book Publishers and others. This automatic sentence alignment of bilingual Corpus is very beneficial in developing machine translation systems. The work involves the alignment of bilingual texts at sentence levels. The alignment algorithm used in this project used the concept of sentence length. The algorithm for aligning sentences is based on a simple statistical model of sentence lengths in terms of number of words in a sentence for aligning first at the paragraph level and then to sentence level. The program uses the fact that longer sentences in one language tend to be translated into longer sentences in the other language, and that shorter sentences tend to be translated into shorter sentences. This simple approach has shown remarkable results. An evaluation was performed based on a parallel corpus from different fields and almost all the sentences were correctly aligned. For very large corpus, the accuracy achieved is 74.6%. This proposed algorithm can be implemented for other closely related language pairs.

View Full Article

Download or view the complete article PDF published by the author.

📥 Download PDF 👁️ View in Browser