MT quality – Automatic metrics or manual evaluation?

Automatic metrics
The comparison has been done among machine translated content compared to human translations by measuring similarity with blue score metrics.
It ignores the difference among different words and it works on a very local level (I guess is sentence based).
Manual evaluation
Done by asking humans sometimes through crowd-sourcing and the Amazon Mechanical Turk crowd-sourcing platform.
The research is going on in both directions.

You can read the full article here.

2 comments on “MT quality – Automatic metrics or manual evaluation?

  1. Hi, I received a very interesting comment on linkedin…copy/paste the links to know more.

    Kirti Vashee • There is more discussion on this at :


    And comments on MT quality in terms of productivity implications

  2. I have to post another very interesting link received through Linkedin from Sandra Williams, I will just copy paste the message:

    Linda, There was a paper on human evaluation of MT output at the 2012 NAACL workshop on Predicting and improving text readability for target reader populations (PITR2012):

    Tucker Maney, Linda Sibert, Dennis Perzanowski, Kalyan Gupta and Astrid Schmidt-Nielsen (2012) Toward Determining the Comprehensibility of Machine Translations.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s