Automatic Ranking of MT Systems
Martin Rajman (EPFL, Lausanne, Switzerland)
Anthony Hartley (Centre for Translation Studies, Leeds, UK)
EO4: MT Evaluation
In earlier work, we succeeded in automatically predicting the relative rankings of MT systems derived from human judgments on the Fluency, Adequacy or Informativeness of their output. In this paper, we present an experiment - using human evaluators and additional data - designed to test the robustness of our earlier results. These had yielded two promising automatically computable predictors, the D-score based on semantic features of the MT output, and the X-score based on syntactic features. We conclude that the X-score is indeed a robust and reliable predictor, even on new data for which it has not been specifically tuned.
Machine translation evaluation, Automated scoring and ranking, X-Scores