The comparison has been done among machine translated content compared to human translations by measuring similarity with blue score metrics.
It ignores the difference among different words and it works on a very local level (I guess is sentence based).
Done by asking humans sometimes through crowd-sourcing and the Amazon Mechanical Turk crowd-sourcing platform.
The research is going on in both directions.
You can read the full article here.