Use original key for label in DataCollatorForTokenClassification (#13057)
* Use original key for label in DataCollatorForTokenClassification DataCollatorForTokenClassification accepts either `label` or `labels` as key for label in it's input. However after padding the label it assigns the padded labels to key `labels`. If originally `label` was used as key than the original upadded labels still remains in the batch. Then at line 192 when we try to convert the batch elements to torch tensor than these original unpadded labels cannot be converted as the labels for different samples have different lengths. * Fixed style.
This commit is contained in:
parent
95e2e14f9d
commit
29dada00c4
|
@ -185,9 +185,13 @@ class DataCollatorForTokenClassification:
|
|||
sequence_length = torch.tensor(batch["input_ids"]).shape[1]
|
||||
padding_side = self.tokenizer.padding_side
|
||||
if padding_side == "right":
|
||||
batch["labels"] = [label + [self.label_pad_token_id] * (sequence_length - len(label)) for label in labels]
|
||||
batch[label_name] = [
|
||||
label + [self.label_pad_token_id] * (sequence_length - len(label)) for label in labels
|
||||
]
|
||||
else:
|
||||
batch["labels"] = [[self.label_pad_token_id] * (sequence_length - len(label)) + label for label in labels]
|
||||
batch[label_name] = [
|
||||
[self.label_pad_token_id] * (sequence_length - len(label)) + label for label in labels
|
||||
]
|
||||
|
||||
batch = {k: torch.tensor(v, dtype=torch.int64) for k, v in batch.items()}
|
||||
return batch
|
||||
|
|
Loading…
Reference in New Issue