Hello! I’m trying to figure out how to fine-tune bert-base-uncased to perform zero-shot classification. I managed to get it working but I’m not sure I’m preparing the data in the right way:
I initialize my model with the problem_type="multi_label_classification" setting so it uses a sigmoid loss function as opposed to softmax.
Then I prepare my data in the following way
- I tokenize the input string and the label together using
tokenizer(sentence,label, truncation=True)and save it in theinput_idsfield of my dataset. - Then I also convert the label as a one hot vector and keep it in the
labelsfield of my dataset.
Finally I ran the training with about 10k lines of annotated data, but the results were kind of nonesense, not better than the untrained model.
Am I on the right path? Should I keep this as a self-supervised fine-tuning task?
Thanks a lot for your help!