blob: cf46ef5a393cc4ec7ed7c32b9aa1d11d552c93e6 [file] [log] [blame]
---
layout: documentation
title: Fisher and CALLHOME Spanish English Speech Translation Corpus
---
<div class="container">
<div class="row">
<div class="span8">
<h1>Datasets</h1>
<h2>Fisher and CALLHOME Spanish&ndash;English Speech Translation Corpus</h2>
<span id="download">
<a href="https://github.com/joshua-decoder/fisher-callhome-corpus/zipball/master">Download</a>
</span>
</div>
</div>
<hr />
<div class="row">
<div class="span8">
<p>
This paper describes the release of a set of English translations (obtained
on <a href="http://mturk.com">Amazon's Mechcanical Turk</a>) and ASR lattice output
(produced with <a href="http://kaldi.sf.net">Kaldi</a>). Together, this data supplements
existing LDC datasets (in the form of audio and Spanish transcriptions), yielding a
four-way parallel corpus for research in Spanish&ndash;English spoken language
translation.
</p>
<p>
The LDC datasets that this dataset extends are as follows:
</p>
<p style="text-align: center"><center>
<table style="border: 1px solid lightgray">
<tr>
<th></th>
<th>Audio</th>
<th>Transcripts</th>
</tr>
<tr>
<td>Fisher Spanish</td>
<td><a href="http://catalog.ldc.upenn.edu/LDC2010S01">LDC2010S01</a></td>
<td><a href="http://catalog.ldc.upenn.edu/LDC2010T04">LDC2010T04</a></td>
</tr>
<tr>
<td>CALLHOME Spanish</td>
<td><a href="http://catalog.ldc.upenn.edu/LDC96S35">LDC96S35</a></td>
<td><a href="http://catalog.ldc.upenn.edu/LDC96T17">LDC96T17</a></td>
</tr>
</table>
</center></p>
<p>
If you use this dataset, please cite the following paper, which also contains a number
of experiments to compare against:
</p>
<blockquote>
<i>Improved Speech-to-Text Translation with the Fisher and Callhome Spanish&ndash;English
Speech Translation Corpus</i> <br/>
Matt Post, Gaurav Kumar, Adam Lopez, Damianos Karakos, Chris Callison-Burch and Sanjeev
Khudanpur <br/>
<a href="http://www.iwslt2013.org">IWSLT 2013</a> <br/>
<a class="pdf" href="http://cs.jhu.edu/~post/papers/post2013improved.pdf">PDF</a>
<a class="bibtex" href="http://cs.jhu.edu/~post/papers/post2013improved.bib">BIB</a>
</blockquote>
<h2>Download & License</h2>
The Fisher / CALLHOME corpus
is <a href="https://github.com/joshua-decoder/fisher-callhome-corpus">hosted on
Github</a>. You can clone that, or download a release tarball by clicking the big green
button above. The corpus is licensed under
the <a href="http://creativecommons.org/">Creative
Commons</a> <a href="http://creativecommons.org/licenses/by-sa/3.0/">Attribution-Sharealike
3.0 Unported License</a> (CC BY-SA 3.0).
</div>
<div class="span4">
<div style="border: 1px solid lightgray">
<p style="text-align: center">
<img width="250px" src="images/lattice.png"/><br/>
</p>
<p style="text-align: center">
An example lattice from the dataset
</p>
</div>
</div>
</div>
</div>