Use original key for label in DataCollatorForTokenClassification (#13057)

* Use original key for label in DataCollatorForTokenClassification

DataCollatorForTokenClassification accepts either `label` or `labels` as key for label in it's input. However after padding the label it assigns the padded labels to key `labels`. If originally `label` was used as key than the original upadded labels still remains in the batch. Then at line 192 when we try to convert the batch elements to torch tensor than these original unpadded labels cannot be converted as the labels for different samples have different lengths.

* Fixed style.
This commit is contained in:
Ibraheem Moosa 2021-08-10 22:39:48 +06:00 committed by GitHub
parent 95e2e14f9d
commit 29dada00c4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 6 additions and 2 deletions

View File

@ -185,9 +185,13 @@ class DataCollatorForTokenClassification:
sequence_length = torch.tensor(batch["input_ids"]).shape[1]
padding_side = self.tokenizer.padding_side
if padding_side == "right":
batch["labels"] = [label + [self.label_pad_token_id] * (sequence_length - len(label)) for label in labels]
batch[label_name] = [
label + [self.label_pad_token_id] * (sequence_length - len(label)) for label in labels
]
else:
batch["labels"] = [[self.label_pad_token_id] * (sequence_length - len(label)) + label for label in labels]
batch[label_name] = [
[self.label_pad_token_id] * (sequence_length - len(label)) + label for label in labels
]
batch = {k: torch.tensor(v, dtype=torch.int64) for k, v in batch.items()}
return batch