[Moses-support] Fwd: Different translations are obtained from the same decoder without alignment information

Discussion:

Ergun Bicici

7 years ago

Dear Moses maintainers,

I discovered that the translations obtained differ when alignment
flags (--mark-unknown
--unknown-word-prefix UNK --print-alignment-inf) are used. Comparison table
is attached (en-ru and ru-en are being recomputed). We expect them to be
the same since alignment flags only print additional information and they
are not supposed to alter decoding. In both, the same EMS system was re-run
with the alignment information flags or not.

- Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
points).
- Average of the difference is 0.0051 BLEU (about 0.5 BLEU points,
results are better with alignment flags).

ï¿Œ

/opt/Programs/SMT/moses/mosesdecoder/bin/moses --version

Moses code version (git tag or commit hash):
mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
Libraries used:
Boost version 1.62.0

git status
On branch RELEASE-4.0
Your branch is up to date with 'origin/RELEASE-4.0'.

Note: Using alignment information to recase tokens was tried in [1] for
en-fi and en-tr to claim positive results. We tried this method in all
translation directions we considered as as can be seen in the align row,
this only improves the performance for tr-en and en-tr and for tr-en Moses
provides better translations without the alignment flags.
[1]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
http://www.statmt.org/wmt16/pdf/W16-2310.pdf

Best Regards,
Ergun

Ergun BiÃ§ici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>

Hieu Hoang

7 years ago

Permalink

are you rerunning tuning for each case? Or are you using exactly the same
moses.ini file for the with and with alignment experiments?

Hieu Hoang
http://statmt.org/hieu

...

Ergun Bicici

7 years ago

Permalink

only the evaluation decoding steps are repeated that are steps 10, 9, and 7
in the following steps in EMS output:
48 TRAINING:consolidate -> re-using (1)
47 TRAINING:prepare-data -> re-using (1)
46 TRAINING:run-giza -> re-using (1)
45 TRAINING:run-giza-inverse -> re-using (1)
44 TRAINING:symmetrize-giza -> re-using (1)
43 TRAINING:build-lex-trans -> re-using (1)
40 TRAINING:build-osm -> re-using (1)
39 TRAINING:extract-phrases -> re-using (1)
38 TRAINING:build-reordering -> re-using (1)
37 TRAINING:build-ttable -> re-using (1)
34 TRAINING:create-config -> re-using (1)
28 TUNING:truecase-input -> re-using (1)
24 TUNING:truecase-reference -> re-using (1)
21 TUNING:filter -> re-using (1)
20 TUNING:apply-filter -> re-using (1)
19 TUNING:tune -> re-using (1)
18 TUNING:apply-weights -> re-using (1)
15 EVALUATION:test:truecase-input -> re-using (1)
12 EVALUATION:test:filter -> re-using (1)
11 EVALUATION:test:apply-filter -> re-using (1)

*10 EVALUATION:test:decode -> run 9 EVALUATION:test:remove-markup ->
run 7 EVALUATION:test:detruecase-output -> run *3
EVALUATION:test:multi-bleu-c -> run
2 EVALUATION:test:analysis-coverage -> re-using (1)
1 EVALUATION:test:analysis-precision -> run

...

--
Regards,
Ergun

Hieu Hoang

7 years ago

Permalink

that would be a bug.

could you please make the model and input files available for download.
I'll check it out

Hieu Hoang
http://statmt.org/hieu

...

Hieu Hoang

7 years ago

Permalink

could you run with alignments, but WITHOUT -unknown-word-prefix UNK.

alignments shouldn't change the translation but the OOV prefix may do

Hieu Hoang
http://statmt.org/hieu

ok, thank you. I'll upload and send you a link.

Post by Hieu Hoang
that would be a bug.
could you please make the model and input files available for download.
I'll check it out
Hieu Hoang
http://statmt.org/hieu

Post by Ergun Bicici
only the evaluation decoding steps are repeated that are steps 10, 9,
48 TRAINING:consolidate -> re-using (1)
47 TRAINING:prepare-data -> re-using (1)
46 TRAINING:run-giza -> re-using (1)
45 TRAINING:run-giza-inverse -> re-using (1)
44 TRAINING:symmetrize-giza -> re-using (1)
43 TRAINING:build-lex-trans -> re-using (1)
40 TRAINING:build-osm -> re-using (1)
39 TRAINING:extract-phrases -> re-using (1)
38 TRAINING:build-reordering -> re-using (1)
37 TRAINING:build-ttable -> re-using (1)
34 TRAINING:create-config -> re-using (1)
28 TUNING:truecase-input -> re-using (1)
24 TUNING:truecase-reference -> re-using (1)
21 TUNING:filter -> re-using (1)
20 TUNING:apply-filter -> re-using (1)
19 TUNING:tune -> re-using (1)
18 TUNING:apply-weights -> re-using (1)
15 EVALUATION:test:truecase-input -> re-using (1)
12 EVALUATION:test:filter -> re-using (1)
11 EVALUATION:test:apply-filter -> re-using (1)
*10 EVALUATION:test:decode -> run 9 EVALUATION:test:remove-markup ->
run 7 EVALUATION:test:detruecase-output -> run *3
EVALUATION:test:multi-bleu-c -> run
2 EVALUATION:test:analysis-coverage -> re-using (1)
1 EVALUATION:test:analysis-precision -> run

Post by Hieu Hoang
are you rerunning tuning for each case? Or are you using exactly the
same moses.ini file for the with and with alignment experiments?
Hieu Hoang
http://statmt.org/hieu

Post by Ergun Bicici
Dear Moses maintainers,
I discovered that the translations obtained differ when alignment
flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
are used. Comparison table is attached (en-ru and ru-en are being
recomputed). We expect them to be the same since alignment flags only print
additional information and they are not supposed to alter decoding. In
both, the same EMS system was re-run with the alignment information flags
or not.
- Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
points).
- Average of the difference is 0.0051 BLEU (about 0.5 BLEU points,
results are better with alignment flags).
ï¿Œ
/opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
Boost version 1.62.0
git status
On branch RELEASE-4.0
Your branch is up to date with 'origin/RELEASE-4.0'.
Note: Using alignment information to recase tokens was tried in [1]
for en-fi and en-tr to claim positive results. We tried this method in all
translation directions we considered as as can be seen in the align row,
this only improves the performance for tr-en and en-tr and for tr-en Moses
provides better translations without the alignment flags.
[1]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
http://www.statmt.org/wmt16/pdf/W16-2310.pdf
Best Regards,
Ergun
Ergun BiÃ§ici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
_______________________________________________
Moses-support mailing list
http://mailman.mit.edu/mailman/listinfo/moses-support

--
Regards,
Ergun

...

Ergun Bicici

7 years ago

Permalink

ok.

Post by Hieu Hoang
could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
alignments shouldn't change the translation but the OOV prefix may do
Hieu Hoang
http://statmt.org/hieu

ok, thank you. I'll upload and send you a link.

Post by Hieu Hoang
that would be a bug.
could you please make the model and input files available for download.
I'll check it out
Hieu Hoang
http://statmt.org/hieu

Post by Hieu Hoang
are you rerunning tuning for each case? Or are you using exactly the
same moses.ini file for the with and with alignment experiments?
Hieu Hoang
http://statmt.org/hieu

Post by Ergun Bicici
Dear Moses maintainers,
I discovered that the translations obtained differ when alignment
flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
are used. Comparison table is attached (en-ru and ru-en are being
recomputed). We expect them to be the same since alignment flags only print
additional information and they are not supposed to alter decoding. In
both, the same EMS system was re-run with the alignment information flags
or not.
- Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
points).
- Average of the difference is 0.0051 BLEU (about 0.5 BLEU
points, results are better with alignment flags).
ï¿Œ
/opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
Boost version 1.62.0
git status
On branch RELEASE-4.0
Your branch is up to date with 'origin/RELEASE-4.0'.
Note: Using alignment information to recase tokens was tried in [1]
for en-fi and en-tr to claim positive results. We tried this method in all
translation directions we considered as as can be seen in the align row,
this only improves the performance for tr-en and en-tr and for tr-en Moses
provides better translations without the alignment flags.
[1]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
http://www.statmt.org/wmt16/pdf/W16-2310.pdf
Best Regards,
Ergun
Ergun BiÃ§ici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
_______________________________________________
Moses-support mailing list
http://mailman.mit.edu/mailman/listinfo/moses-support

--
Regards,
Ergun

...

--
Regards,
Ergun

Ergun Bicici

7 years ago

Permalink

I am still waiting for the new results.

Ergun

Thanks

Post by Ergun Bicici
ok.

Post by Hieu Hoang
could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
alignments shouldn't change the translation but the OOV prefix may do
Hieu Hoang
http://statmt.org/hieu

ok, thank you. I'll upload and send you a link.

Post by Hieu Hoang
that would be a bug.
could you please make the model and input files available for
download. I'll check it out
Hieu Hoang
http://statmt.org/hieu

Post by Ergun Bicici
only the evaluation decoding steps are repeated that are steps 10, 9,
48 TRAINING:consolidate -> re-using (1)
47 TRAINING:prepare-data -> re-using (1)
46 TRAINING:run-giza -> re-using (1)
45 TRAINING:run-giza-inverse -> re-using (1)
44 TRAINING:symmetrize-giza -> re-using (1)
43 TRAINING:build-lex-trans -> re-using (1)
40 TRAINING:build-osm -> re-using (1)
39 TRAINING:extract-phrases -> re-using (1)
38 TRAINING:build-reordering -> re-using (1)
37 TRAINING:build-ttable -> re-using (1)
34 TRAINING:create-config -> re-using (1)
28 TUNING:truecase-input -> re-using (1)
24 TUNING:truecase-reference -> re-using (1)
21 TUNING:filter -> re-using (1)
20 TUNING:apply-filter -> re-using (1)
19 TUNING:tune -> re-using (1)
18 TUNING:apply-weights -> re-using (1)
15 EVALUATION:test:truecase-input -> re-using (1)
12 EVALUATION:test:filter -> re-using (1)
11 EVALUATION:test:apply-filter -> re-using (1)
*10 EVALUATION:test:decode -> run 9 EVALUATION:test:remove-markup
-> run 7 EVALUATION:test:detruecase-output -> run *3
EVALUATION:test:multi-bleu-c -> run
2 EVALUATION:test:analysis-coverage -> re-using (1)
1 EVALUATION:test:analysis-precision -> run

Post by Hieu Hoang
are you rerunning tuning for each case? Or are you using exactly the
same moses.ini file for the with and with alignment experiments?
Hieu Hoang
http://statmt.org/hieu

Post by Ergun Bicici
Dear Moses maintainers,
I discovered that the translations obtained differ when alignment
flags (--mark-unknown --unknown-word-prefix UNK
--print-alignment-inf) are used. Comparison table is attached
(en-ru and ru-en are being recomputed). We expect them to be the same since
alignment flags only print additional information and they are not supposed
to alter decoding. In both, the same EMS system was re-run with the
alignment information flags or not.
- Average of the absolute difference is 0.0094 BLEU (about 1
BLEU points).
- Average of the difference is 0.0051 BLEU (about 0.5 BLEU
points, results are better with alignment flags).
ï¿Œ
/opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
Boost version 1.62.0
git status
On branch RELEASE-4.0
Your branch is up to date with 'origin/RELEASE-4.0'.
Note: Using alignment information to recase tokens was tried in [1]
for en-fi and en-tr to claim positive results. We tried this method in all
translation directions we considered as as can be seen in the align row,
this only improves the performance for tr-en and en-tr and for tr-en Moses
provides better translations without the alignment flags.
[1]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
http://www.statmt.org/wmt16/pdf/W16-2310.pdf
Best Regards,
Ergun
Ergun BiÃ§ici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
_______________________________________________
Moses-support mailing list
http://mailman.mit.edu/mailman/listinfo/moses-support

--
Regards,
Ergun

...

--
Regards,
Ergun

Ergun Bicici

7 years ago

Permalink

Hi Hieu,

Thank you very much. An issue is that using "--mark-unknown
--unknown-word-prefix UNK" changes casing of text. Example:

1) input: UNK_" the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business ,
UNK_" said the mayor .
output: UNK_" the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business ,
UNK_" said the mayor .
2) input: " the greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business , "
said the mayor .
output: " The greatest treasure we are the people who work in
agriculture , and worry about how they continue to bring their business , "
said the mayor .

I also found out that for de-en, I was using a different language model,
which was decreasing the scores. I used EMS for all experiments before but
made the system skip some parts. Apparently some change in the data paths
caused the language model files for another experiment to be used.

I obtained all translations again and now the scores match. The gain from
additional truecasing step also disappeared. Checking the results further.

Thank you very much for your help.

Regards,
Ergun

Post by Hieu Hoang
could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
alignments shouldn't change the translation but the OOV prefix may do
Hieu Hoang
http://statmt.org/hieu

ok, thank you. I'll upload and send you a link.

Post by Hieu Hoang
that would be a bug.
could you please make the model and input files available for download.
I'll check it out
Hieu Hoang
http://statmt.org/hieu

Post by Hieu Hoang
are you rerunning tuning for each case? Or are you using exactly the
same moses.ini file for the with and with alignment experiments?
Hieu Hoang
http://statmt.org/hieu

Post by Ergun Bicici
Dear Moses maintainers,
I discovered that the translations obtained differ when alignment
flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
are used. Comparison table is attached (en-ru and ru-en are being
recomputed). We expect them to be the same since alignment flags only print
additional information and they are not supposed to alter decoding. In
both, the same EMS system was re-run with the alignment information flags
or not.
- Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
points).
- Average of the difference is 0.0051 BLEU (about 0.5 BLEU
points, results are better with alignment flags).
ï¿Œ
/opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
Boost version 1.62.0
git status
On branch RELEASE-4.0
Your branch is up to date with 'origin/RELEASE-4.0'.
Note: Using alignment information to recase tokens was tried in [1]
for en-fi and en-tr to claim positive results. We tried this method in all
translation directions we considered as as can be seen in the align row,
this only improves the performance for tr-en and en-tr and for tr-en Moses
provides better translations without the alignment flags.
[1]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
http://www.statmt.org/wmt16/pdf/W16-2310.pdf
Best Regards,
Ergun
Ergun BiÃ§ici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
_______________________________________________
Moses-support mailing list
http://mailman.mit.edu/mailman/listinfo/moses-support

--
Regards,
Ergun

...

--
Regards,
Ergun

Ergun Bicici

7 years ago

Permalink

tuning step is not repeated. decoding use the same moses.ini and the same
input but different parameters:
moses/mosesdecoder/65c75ff/bin/moses -search-algorithm 1
-cube-pruning-pop-limit 5000 -s 5000 -threads 8 -text-type "test" -v 0 -f
wmt18_en-de/evaluation/test.filtered.ini.7 <
wmt18_en-de/evaluation/test.input.tc.1 >
wmt18_en-de/evaluation/test.output.7

vs. with alignment:
moses/mosesdecoder/65c75ff/bin/moses -search-algorithm 1
-cube-pruning-pop-limit 5000 -s 5000 -threads 8 --mark-unknown
--unknown-word-prefix UNK_ --print-alignment-info -text-type "test" -v 0 -f
wmt18_en-de/evaluation/test.filtered.ini.7 <
wmt18_en-de/evaluation/test.input.tc.1 >
wmt18_en-de/evaluation/test.output.9

both are followed by the following steps:
moses/mosesdecoder/scripts/ems/support/remove-segmentation-markup.perl <
wmt18_en-de/evaluation/test.output.7 > wmt18_en-de/evaluation/test.cleaned.7
moses/mosesdecoder/scripts/recaser/detruecase.perl <
wmt18_en-de/evaluation/test.cleaned.7 >
wmt18_en-de/evaluation/test.truecased.7
and equivalently with:
moses/mosesdecoder/scripts/ems/support/remove-segmentation-markup.perl <
wmt18_en-de/evaluation/test.output.9 > wmt18_en-de/evaluation/test.cleaned.9
moses/mosesdecoder/scripts/recaser/detruecase.perl <
wmt18_en-de/evaluation/test.cleaned.9 >
wmt18_en-de/evaluation/test.truecased.9

scoring step use test.truecased.7 and test.truecased.9.

Ergun

...

--
Regards,
Ergun

Tom Hoar

7 years ago

Permalink

I remember 3 years ago, I reported a similar (same?) problem with
--print-alignment-inf flag, without EMS. The time, I was using the
legacy binarized translation and reordering table and everything was
great. Then, I started testing the compact binarized format. The flag
caused translations to change and some were even lost (blank lines). No
one on the support list knew of any reason and I didn't have bandwidth
to troubleshoot. Instead, I continued using the legacy binarized files.
Maybe try changing to the legacy binarized files and see if the problem
disappears. This could help you narrow-down where to look.

Best regards,
Tom Hoar
*Slate Rocks, LLC*
Web: https://www.slate.rocks
Thailand Mobile: +66 87 345-1875 <tel:+66873451875>
Skype: tahoar <skype:tahoar?call>

Date: Fri, 24 Aug 2018 15:31:14 +0100
Subject: Re: [Moses-support] Fwd: Different translations are obtained
from the same decoder without alignment information
Content-Type: text/plain; charset="utf-8"
could you run with alignments, but WITHOUT -unknown-word-prefix UNK.
alignments shouldn't change the translation but the OOV prefix may do
Hieu Hoang
http://statmt.org/hieu

ok, thank you. I'll upload and send you a link.

Post by Hieu Hoang
that would be a bug.
could you please make the model and input files available for download.
I'll check it out
Hieu Hoang
http://statmt.org/hieu

Post by Hieu Hoang
are you rerunning tuning for each case? Or are you using exactly the
same moses.ini file for the with and with alignment experiments?
Hieu Hoang
http://statmt.org/hieu

Post by Ergun Bicici
Dear Moses maintainers,
I discovered that the translations obtained differ when alignment
flags (--mark-unknown --unknown-word-prefix UNK --print-alignment-inf)
are used. Comparison table is attached (en-ru and ru-en are being
recomputed). We expect them to be the same since alignment flags only print
additional information and they are not supposed to alter decoding. In
both, the same EMS system was re-run with the alignment information flags
or not.
- Average of the absolute difference is 0.0094 BLEU (about 1 BLEU
points).
- Average of the difference is 0.0051 BLEU (about 0.5 BLEU points,
results are better with alignment flags).
?
/opt/Programs/SMT/moses/mosesdecoder/bin/moses --version
mmt-mvp-v0.12.1-2775-g65c75ff07-dirty
Boost version 1.62.0
git status
On branch RELEASE-4.0
Your branch is up to date with 'origin/RELEASE-4.0'.
Note: Using alignment information to recase tokens was tried in [1]
for en-fi and en-tr to claim positive results. We tried this method in all
translation directions we considered as as can be seen in the align row,
this only improves the performance for tr-en and en-tr and for tr-en Moses
provides better translations without the alignment flags.
[1]The JHU Machine Translation Systems for WMT 2016
Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn and Matt Post
http://www.statmt.org/wmt16/pdf/W16-2310.pdf
Best Regards,
Ergun
Ergun Bi?ici
http://bicici.github.com/ <http://ergunbicici.blogspot.com/>
_______________________________________________
Moses-support mailing list
http://mailman.mit.edu/mailman/listinfo/moses-support

--
Regards,
Ergun

-------------- next part --------------
An HTML attachment was scrubbed...
URL:http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 59618 bytes
Desc: not available
Url :Failed to load image: http://mailman.mit.edu/mailman/private/moses-support/attachments/20180824/2bd1c008/attachment.png

...

Ergun Bicici

7 years ago

Permalink

Dear Tom,

Thank you for sharing your finding. This does not apply in this case since
I re-compiled the code to build the initial Moses 4.0 model. Then moses
binary is not changed and even though I am observing different scores, they
are better when the alignment flags are included. I am waiting for de-en
results with "-print-alignment-info" flag.

I tried to debug some decentralized Moses server-client model before that
was encountering similar symptoms where the error could source from
additional sources such as the network being interrupted, issues with the
syncing of buffers etc. With a binarized version you get a translation, but
the translation options are somewhat fixed. Could Moses provide a better
translation? Turns out that truecasing before detruecasing improves the
scores by 0.002 BLEU for instance on average of 8 translation directions in
WMT18.

Regards,
Ergun
bicici.github.com

...

--
Regards,
Ergun