Discussion:
[Moses-support] Fwd: Different translations are obtained from the same decoder without alignment information
Ergun Bicici
7 years ago
Permalink
Dear Moses maintainers,

I discovered that the translations obtained differ when alignment
flags (--mark-unknown
--unknown-word-prefix UNK --print-alignment-inf) are used. Comparison table
is attached (en-ru and ru-en are being recomputed). We expect them to be
the same since alignment flags only print additional information and they
are not supposed to alter decoding. In both, the same EMS system was re-run
with the alignment information flags or not.

- Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
points).
- Average of the difference is 0.0051 BLEU (about 0.5 BLEU points,
results are better with alignment flags).

ᅩ

/opt/Programs/SMT/moses/mosesdecoder/bin/moses --version

Moses code version (git tag or commit hash):
mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
Libraries used:
Boost version 1.62.0

git status
On branch RELEASE-4.0
Your branch is up to date with 'origin/RELEASE-4.0'.


Note: Using alignment information to recase tokens was tried in [1] for
en-fi and en-tr to claim positive results. We tried this method in all
translation directions we considered as as can be seen in the align row,
this only improves the performance for tr-en and en-tr and for tr-en Moses
provides better translations without the alignment flags.
[1]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
http://www.statmt.org/wmt16/pdf/W16-2310.pdf


Best Regards,
Ergun

Ergun Biçici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
Hieu Hoang
7 years ago
Permalink
are you rerunning tuning for each case? Or are you using exactly the same
moses.ini file for the with and with alignment experiments?

Hieu Hoang
http://statmt.org/hieu
...
Ergun Bicici
7 years ago
Permalink
only the evaluation decoding steps are repeated that are steps 10, 9, and 7
in the following steps in EMS output:
48 TRAINING:consolidate -> re-using (1)
47 TRAINING:prepare-data -> re-using (1)
46 TRAINING:run-giza -> re-using (1)
45 TRAINING:run-giza-inverse -> re-using (1)
44 TRAINING:symmetrize-giza -> re-using (1)
43 TRAINING:build-lex-trans -> re-using (1)
40 TRAINING:build-osm -> re-using (1)
39 TRAINING:extract-phrases -> re-using (1)
38 TRAINING:build-reordering -> re-using (1)
37 TRAINING:build-ttable -> re-using (1)
34 TRAINING:create-config -> re-using (1)
28 TUNING:truecase-input -> re-using (1)
24 TUNING:truecase-reference -> re-using (1)
21 TUNING:filter -> re-using (1)
20 TUNING:apply-filter -> re-using (1)
19 TUNING:tune -> re-using (1)
18 TUNING:apply-weights -> re-using (1)
15 EVALUATION:test:truecase-input -> re-using (1)
12 EVALUATION:test:filter -> re-using (1)
11 EVALUATION:test:apply-filter -> re-using (1)



*10 EVALUATION:test:decode -> run 9 EVALUATION:test:remove-markup ->
run 7 EVALUATION:test:detruecase-output -> run *3
EVALUATION:test:multi-bleu-c -> run
2 EVALUATION:test:analysis-coverage -> re-using (1)
1 EVALUATION:test:analysis-precision -> run
...
--
Regards,
Ergun
Hieu Hoang
7 years ago
Permalink
that would be a bug.

could you please make the model and input files available for download.
I'll check it out

Hieu Hoang
http://statmt.org/hieu
...
Hieu Hoang
7 years ago
Permalink
could you run with alignments, but WITHOUT -unknown-word-prefix UNK.

alignments shouldn't change the translation but the OOV prefix may do

Hieu Hoang
http://statmt.org/hieu
...
Ergun Bicici
7 years ago
Permalink
ok.
...
--
Regards,
Ergun
Ergun Bicici
7 years ago
Permalink
I am still waiting for the new results.

Ergun
...
--
Regards,
Ergun
Ergun Bicici
7 years ago
Permalink
Hi Hieu,

Thank you very much. An issue is that using "--mark-unknown
--unknown-word-prefix UNK" changes casing of text. Example:

1) input: UNK_" the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business ,
UNK_" said the mayor .
output: UNK_" the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business ,
UNK_" said the mayor .
2) input: " the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business , "
said the mayor .
output: " The greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business , "
said the mayor .

I also found out that for de-en, I was using a different language model,
which was decreasing the scores. I used EMS for all experiments before but
made the system skip some parts. Apparently some change in the data paths
caused the language model files for another experiment to be used.

I obtained all translations again and now the scores match. The gain from
additional truecasing step also disappeared. Checking the results further.

Thank you very much for your help.

Regards,
Ergun
...
--
Regards,
Ergun
Ergun Bicici
7 years ago
Permalink
tuning step is not repeated. decoding use the same moses.ini and the same
input but different parameters:
moses/mosesdecoder/65c75ff/bin/moses -search-algorithm 1
-cube-pruning-pop-limit 5000 -s 5000 -threads 8 -text-type "test" -v 0 -f
wmt18_en-de/evaluation/test.filtered.ini.7 <
wmt18_en-de/evaluation/test.input.tc.1 >
wmt18_en-de/evaluation/test.output.7

vs. with alignment:
moses/mosesdecoder/65c75ff/bin/moses -search-algorithm 1
-cube-pruning-pop-limit 5000 -s 5000 -threads 8 --mark-unknown
--unknown-word-prefix UNK_ --print-alignment-info -text-type "test" -v 0 -f
wmt18_en-de/evaluation/test.filtered.ini.7 <
wmt18_en-de/evaluation/test.input.tc.1 >
wmt18_en-de/evaluation/test.output.9

both are followed by the following steps:
moses/mosesdecoder/scripts/ems/support/remove-segmentation-markup.perl <
wmt18_en-de/evaluation/test.output.7 > wmt18_en-de/evaluation/test.cleaned.7
moses/mosesdecoder/scripts/recaser/detruecase.perl <
wmt18_en-de/evaluation/test.cleaned.7 >
wmt18_en-de/evaluation/test.truecased.7
and equivalently with:
moses/mosesdecoder/scripts/ems/support/remove-segmentation-markup.perl <
wmt18_en-de/evaluation/test.output.9 > wmt18_en-de/evaluation/test.cleaned.9
moses/mosesdecoder/scripts/recaser/detruecase.perl <
wmt18_en-de/evaluation/test.cleaned.9 >
wmt18_en-de/evaluation/test.truecased.9

scoring step use test.truecased.7 and test.truecased.9.

Ergun
...
--
Regards,
Ergun
Tom Hoar
7 years ago
Permalink
I remember 3 years ago, I reported a similar (same?) problem with
--print-alignment-inf flag, without EMS. The time, I was using the
legacy binarized translation and reordering table and everything was
great. Then, I started testing the compact binarized format. The flag
caused translations to change and some were even lost (blank lines). No
one on the support list knew of any reason and I didn't have bandwidth
to troubleshoot. Instead, I continued using the legacy binarized files.
Maybe try changing to the legacy binarized files and see if the problem
disappears. This could help you narrow-down where to look.


Best regards,
Tom Hoar
*Slate Rocks, LLC*
Web: https://www.slate.rocks
Thailand Mobile: +66 87 345-1875 <tel:+66873451875>
Skype: tahoar <skype:tahoar?call>
...
Ergun Bicici
7 years ago
Permalink
Dear Tom,

Thank you for sharing your finding. This does not apply in this case since
I re-compiled the code to build the initial Moses 4.0 model. Then moses
binary is not changed and even though I am observing different scores, they
are better when the alignment flags are included. I am waiting for de-en
results with "-print-alignment-info" flag.

I tried to debug some decentralized Moses server-client model before that
was encountering similar symptoms where the error could source from
additional sources such as the network being interrupted, issues with the
syncing of buffers etc. With a binarized version you get a translation, but
the translation options are somewhat fixed. Could Moses provide a better
translation? Turns out that truecasing before detruecasing improves the
scores by 0.002 BLEU for instance on average of 8 translation directions in
WMT18.

Regards,
Ergun
bicici.github.com
...
--
Regards,
Ergun
Loading...