Are there advantages or disadvantages in changing the format for translation?

#10

by BigDeeper - opened Jul 23, 2024

Discussion

BigDeeper

Jul 23, 2024

Translate to X: text text text.

vs.

Translate from Y: text text text to X.

etc...

Is the example in the card the best approach?

Also, if I want to do tuning, prompt tuning for example, what is the optimal format of the training data?

Translate from English: text text text to Spanish: spanishtext spanishtext spanishtext?

How should I format the training data?

Muennighoff

BigScience Workshop org Jul 23, 2024

Is the example in the card the best approach?

No most likely not. You can very likely find a better approach via prompt engineering.

what is the optimal format of the training data? How should I format the training data?

I think using a variety of formats, not a single one, will likely yield the best model.

BigDeeper

Jul 23, 2024

Is the example in the card the best approach?

No most likely not. You can very likely find a better approach via prompt engineering.

what is the optimal format of the training data? How should I format the training data?

I think using a variety of formats, not a single one, will likely yield the best model.

Do you have any insight for how training data should be structured for this specific model?

I started with the idea that the source language texts would be input, and the target language corresponding texts would be labels.

But as I look at the model maybe that's not possible.

I probably have to fashion a single text sample that contains both, the source, and the target translation.

Do you have any insight how it needs to be structured?

Or if my original thought was correct, then where exactly do I stick the target translation tokens?

Muennighoff

BigScience Workshop org Jul 23, 2024

I recommend taking a look at the data that was used to train this model: https://huggingface.co/datasets/bigscience/xP3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment