| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <title>Indian Languages Parallel Corpora</title> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <meta name="description" content=""> |
| <meta name="author" content=""> |
| |
| <!-- Le styles --> |
| <link href="/bootstrap/css/bootstrap.css" rel="stylesheet" /> |
| <link href="/joshua.css" rel="stylesheet" /> |
| <style> |
| body { |
| padding-top: 60px; /* 60px to make the container go all the way to the bottom of the topbar */ |
| } |
| </style> |
| <link href="bootstrap/css/bootstrap-responsive.css" rel="stylesheet"> |
| |
| <!-- HTML5 shim, for IE6-8 support of HTML5 elements --> |
| <!--[if lt IE 9]> |
| <script src="bootstrap/js/html5shiv.js"></script> |
| <![endif]--> |
| |
| <!-- Fav and touch icons --> |
| <link rel="apple-touch-icon-precomposed" sizes="144x144" href="bootstrap/ico/apple-touch-icon-144-precomposed.png"> |
| <link rel="apple-touch-icon-precomposed" sizes="114x114" href="bootstrap/ico/apple-touch-icon-114-precomposed.png"> |
| <link rel="apple-touch-icon-precomposed" sizes="72x72" href="bootstrap/ico/apple-touch-icon-72-precomposed.png"> |
| <link rel="apple-touch-icon-precomposed" href="bootstrap/ico/apple-touch-icon-57-precomposed.png"> |
| <link rel="shortcut icon" href="bootstrap/ico/favicon.png"> |
| </head> |
| |
| <body> |
| |
| <div class="navbar navbar-inverse navbar-fixed-top"> |
| <div class="navbar-inner"> |
| <div class="container"> |
| <button type="button" class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse"> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <a class="brand" href="/">Joshua</a> |
| <div class="nav-collapse collapse"> |
| <ul class="nav"> |
| <li class="active"><a href="index.html">Indian Languages</a></li> |
| </ul> |
| </div><!--/.nav-collapse --> |
| </div> |
| </div> |
| </div> |
| |
| <div class="container"> |
| |
| <div class="row"> |
| <div class="span8"> |
| <h1>Datasets</h1> |
| <h2>Indian Parallel Languages</h2> |
| <span id="download"> |
| <a href="https://github.com/joshua-decoder/indian-parallel-corpora/zipball/master">Download</a> |
| </span> |
| </div> |
| </div> |
| |
| <hr /> |
| |
| <div class="row"> |
| <div class="span8"> |
| |
| This page describes a set of six parallel corpora obtained by translating popular |
| Wikipedia documents in six languages from the Indian sub-continent into English. The |
| languages are: |
| |
| <ul> |
| <li>Bengali</li> |
| <li>Hindi</li> |
| <li>Malayalam</li> |
| <li>Tamil</li> |
| <li>Telugu</li> |
| <li>Urdu</li> |
| </ul> |
| |
| <p> |
| The collection and release of this data is described in the following paper: |
| </p> |
| |
| <blockquote> |
| <i>Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing</i> <br/> |
| <a href="http://cs.jhu.edu/~post">Matt Post</a>, <a href="http://cs.jhu.edu/~ccb">Chris |
| Callison-Burch</a>, and <a href="http://homepages.inf.ed.ac.uk/miles/">Miles |
| Osborne</a> <br/> |
| <a href="http://statmt.org/wmt12">WMT 2012</a> <br/> |
| <a class="pdf" href="http://aclweb.org/anthology-new/W/W12/W12-3152.pdf">PDF</a> |
| <a class="bibtex" href="http://aclweb.org/anthology-new/W/W12/W12-3152.bib">BIB</a> |
| </blockquote> |
| |
| <h2>Download & License</h2> |
| |
| The Indian parallel corpora dataset |
| is <a href="https://github.com/joshua-decoder/indian-parallel-corpora">hosted on |
| Github</a>. You can clone that, or download a release tarball by clicking the big green |
| button above. The corpus is licensed under |
| the <a href="http://creativecommons.org/">Creative |
| Commons</a> <a href="http://creativecommons.org/licenses/by-sa/3.0/">Attribution-Sharealike |
| 3.0 Unported License</a> (CC BY-SA 3.0). |
| |
| <h2>Scores</h2> |
| |
| <p> |
| Below are the best translation scores (case-insensitive BLEU-4) that have been reported |
| on the provided test sets. The Google results were recorded in the fall of 2011 (and |
| are described in Post et al. (2012)). Google does not have a Malayalam system. |
| </p> |
| |
| <div> |
| <table> |
| <tr> |
| <th style="width:150px">Citation</th> |
| <th>BN</th> |
| <th>HI</th> |
| <th>ML</th> |
| <th>TA</th> |
| <th>TE</th> |
| <th>UR</th> |
| </tr> |
| <tr> |
| <td class="system">Google</td> |
| <td>20.01</td> |
| <td>25.21</td> |
| <td>–</td> |
| <td>13.51</td> |
| <td>16.03</td> |
| <td>23.09</td> |
| </tr> |
| <tr> |
| <td class="system"><a href="http://aclweb.org/anthology/W/W12/W12-3152.pdf">Post et al. (2012)</a></td> |
| <td>13.53</td> |
| <td>17.29</td> |
| <td>13.72</td> |
| <td> 9.81</td> |
| <td>12.46</td> |
| <td>19.53</td> |
| </tr> |
| </table> |
| </div> |
| </div> |
| |
| <div class="span4"> |
| <div> |
| <img width="250px" src="images/map1.png"/> |
| <p style="text-align: center"><a href="http://en.wikipedia.org/wiki/Indo-Aryan_languages">Indo-Aryan languages</a></p> |
| |
| <img width="250px" src="images/map2.png"/> |
| <p style="text-align: center"><a href="http://en.wikipedia.org/wiki/Dravidian_languages">Dravidian languages</a></p> |
| </div> |
| </div> |
| </div> |
| </div> <!-- /container --> |
| |
| <!-- Le javascript |
| ================================================== --> |
| <!-- Placed at the end of the document so the pages load faster --> |
| <script src="bootstrap/js/jquery.js"></script> |
| <script src="bootstrap/js/bootstrap-transition.js"></script> |
| <script src="bootstrap/js/bootstrap-alert.js"></script> |
| <script src="bootstrap/js/bootstrap-modal.js"></script> |
| <script src="bootstrap/js/bootstrap-dropdown.js"></script> |
| <script src="bootstrap/js/bootstrap-scrollspy.js"></script> |
| <script src="bootstrap/js/bootstrap-tab.js"></script> |
| <script src="bootstrap/js/bootstrap-tooltip.js"></script> |
| <script src="bootstrap/js/bootstrap-popover.js"></script> |
| <script src="bootstrap/js/bootstrap-button.js"></script> |
| <script src="bootstrap/js/bootstrap-collapse.js"></script> |
| <script src="bootstrap/js/bootstrap-carousel.js"></script> |
| <script src="bootstrap/js/bootstrap-typeahead.js"></script> |
| |
| </body> |
| </html> |
| |