302 lines
36 KiB
Markdown
302 lines
36 KiB
Markdown
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||
the License. You may obtain a copy of the License at
|
||
|
||
http://www.apache.org/licenses/LICENSE-2.0
|
||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||
specific language governing permissions and limitations under the License.
|
||
|
||
â ïž Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||
rendered properly in your Markdown viewer.
|
||
|
||
-->
|
||
|
||
# How ð€ Transformers solve tasks
|
||
|
||
[ð€ Transformersã§ã§ããããš](task_summary)ã§ãèªç¶èšèªåŠçïŒNLPïŒãé³å£°ãšãªãŒãã£ãªãã³ã³ãã¥ãŒã¿ããžã§ã³ã®ã¿ã¹ã¯ããããã®éèŠãªã¢ããªã±ãŒã·ã§ã³ã«ã€ããŠåŠã³ãŸããããã®ããŒãžã§ã¯ãã¢ãã«ããããã®ã¿ã¹ã¯ãã©ã®ããã«è§£æ±ºãããã詳ããèŠãŠãã¢ãã«ã®å
éšã§äœãèµ·ãã£ãŠãããã説æããŸããç¹å®ã®ã¿ã¹ã¯ã解決ããããã«ã¯å€ãã®æ¹æ³ããããäžéšã®ã¢ãã«ã¯ç¹å®ã®ãã¯ããã¯ãå®è£
ãããããŸãã¯æ°ãã芳ç¹ããã¿ã¹ã¯ã«åãçµããããããŸããããTransformerã¢ãã«ã«ãšã£ãŠãäžè¬çãªã¢ã€ãã¢ã¯åãã§ããæè»ãªã¢ãŒããã¯ãã£ã®ãããã§ãã»ãšãã©ã®ã¢ãã«ã¯ãšã³ã³ãŒãããã³ãŒãããŸãã¯ãšã³ã³ãŒã-ãã³ãŒãæ§é ã®å€çš®ã§ããTransformerã¢ãã«ä»¥å€ã«ããåœç€Ÿã®ã©ã€ãã©ãªã«ã¯ã³ã³ãã¥ãŒã¿ããžã§ã³ã¿ã¹ã¯ã«ä»ã§ã䜿çšãããŠããããã€ãã®ç³ã¿èŸŒã¿ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒCNNïŒããããŸãããŸããçŸä»£ã®CNNãã©ã®ããã«æ©èœãããã説æããŸãã
|
||
|
||
ã¿ã¹ã¯ãã©ã®ããã«è§£æ±ºããããã説æããããã«ãã¢ãã«å
éšã§æçšãªäºæž¬ãåºåããããã«äœãèµ·ãããã«ã€ããŠèª¬æããŸãã
|
||
|
||
- [Wav2Vec2](model_doc/wav2vec2)ïŒãªãŒãã£ãªåé¡ããã³èªåé³å£°èªèïŒASRïŒåã
|
||
- [Vision TransformerïŒViTïŒ](model_doc/vit)ããã³[ConvNeXT](model_doc/convnext)ïŒç»ååé¡åã
|
||
- [DETR](model_doc/detr)ïŒãªããžã§ã¯ãæ€åºåã
|
||
- [Mask2Former](model_doc/mask2former)ïŒç»åã»ã°ã¡ã³ããŒã·ã§ã³åã
|
||
- [GLPN](model_doc/glpn)ïŒæ·±åºŠæšå®åã
|
||
- [BERT](model_doc/bert)ïŒãšã³ã³ãŒãã䜿çšããããã¹ãåé¡ãããŒã¯ã³åé¡ãããã³è³ªåå¿çãªã©ã®NLPã¿ã¹ã¯åã
|
||
- [GPT2](model_doc/gpt2)ïŒãã³ãŒãã䜿çšããããã¹ãçæãªã©ã®NLPã¿ã¹ã¯åã
|
||
- [BART](model_doc/bart)ïŒãšã³ã³ãŒã-ãã³ãŒãã䜿çšããèŠçŽããã³ç¿»èš³ãªã©ã®NLPã¿ã¹ã¯åã
|
||
|
||
<Tip>
|
||
|
||
ããã«é²ãåã«ãå
ã®Transformerã¢ãŒããã¯ãã£ã®åºæ¬çãªç¥èãæã€ãšè¯ãã§ãããšã³ã³ãŒãããã³ãŒããããã³æ³šæåãã©ã®ããã«åäœããããç¥ã£ãŠãããšãç°ãªãTransformerã¢ãã«ãã©ã®ããã«åäœããããç解ããã®ã«åœ¹ç«ã¡ãŸããå§ããŠãããããªãã¬ãã·ã¥ãå¿
èŠãªå Žåã¯ã詳现ãªæ
å ±ã«ã€ããŠã¯åœç€Ÿã®[ã³ãŒã¹](https://huggingface.co/course/chapter1/4?fw=pt)ããã§ãã¯ããŠãã ããïŒ
|
||
|
||
</Tip>
|
||
|
||
## Speech and audio
|
||
|
||
[Wav2Vec2](model_doc/wav2vec2)ã¯ãæªã©ãã«ã®é³å£°ããŒã¿ã§äºåãã¬ãŒãã³ã°ããããªãŒãã£ãªåé¡ããã³èªåé³å£°èªèã®ã©ãã«ä»ãããŒã¿ã§ãã¡ã€ã³ãã¥ãŒã³ãããèªå·±æåž«ã¢ãã«ã§ãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/wav2vec2_architecture.png"/>
|
||
</div>
|
||
|
||
ãã®ã¢ãã«ã«ã¯äž»ã«æ¬¡ã®4ã€ã®ã³ã³ããŒãã³ãããããŸãã
|
||
|
||
1. *ç¹åŸŽãšã³ã³ãŒã*ïŒçã®é³å£°æ³¢åœ¢ãåãåããå¹³åå€ããŒãã«æ£èŠåããåäœåæ£ã«å€æããããã20msããšã®ç¹åŸŽãã¯ãã«ã®ã·ãŒã±ã³ã¹ã«å€æããŸãã
|
||
|
||
2. 波圢ã¯èªç¶ã«é£ç¶ããŠãããããããã¹ãã®ã·ãŒã±ã³ã¹ãåèªã«åå²ã§ããããã«ã§ããããã«ãç¹åŸŽãã¯ãã«ã¯*éååã¢ãžã¥ãŒã«*ã«æž¡ãããé¢æ£é³å£°ãŠããããåŠç¿ããããšããŸããé³å£°ãŠãããã¯*ã³ãŒãããã¯*ïŒèªåœãšèããããšãã§ããŸãïŒãšããŠç¥ãããã³ãŒãã¯ãŒãã®ã³ã¬ã¯ã·ã§ã³ããéžæãããŸããã³ãŒãããã¯ãããé£ç¶ãããªãŒãã£ãªå
¥åãæãããè¡šããã¯ãã«ãŸãã¯é³å£°ãŠãããïŒã¿ãŒã²ããã©ãã«ãšèããããšãã§ããŸãïŒãéžæãããã¢ãã«ãä»ããŠè»¢éãããŸãã
|
||
|
||
3. ç¹åŸŽãã¯ãã«ã®çŽååã¯ã©ã³ãã ã«ãã¹ã¯ããããã¹ã¯ãããç¹åŸŽãã¯ãã«ã¯*ã³ã³ããã¹ããããã¯ãŒã¯*ã«äŸçµŠãããŸããããã¯ãçžå¯Ÿçãªäœçœ®ãšã³ãããã£ã³ã°ãè¿œå ããTransformerãšã³ã³ãŒãã§ãã
|
||
|
||
4. ã³ã³ããã¹ããããã¯ãŒã¯ã®äºåãã¬ãŒãã³ã°ã®ç®çã¯*ã³ã³ãã©ã¹ãã£ãã¿ã¹ã¯*ã§ããã¢ãã«ã¯ãã¹ã¯ãããäºæž¬ã®çã®éååé³å£°è¡šçŸããåœã®äºæž¬ã®ã»ããããäºæž¬ããªããã°ãªãããã¢ãã«ã¯æã䌌ãã³ã³ããã¹ããã¯ãã«ãšéååé³å£°ãŠãããïŒã¿ãŒã²ããã©ãã«ïŒãèŠã€ããããã«ä¿ãããŸãã
|
||
|
||
ä»ãWav2Vec2ã¯äºåãã¬ãŒãã³ã°ãããŠããã®ã§ããªãŒãã£ãªåé¡ãŸãã¯èªåé³å£°èªèã®ããã«ããŒã¿ããã¡ã€ã³ãã¥ãŒã³ã§ããŸãïŒ
|
||
|
||
### Audio classification
|
||
|
||
äºåãã¬ãŒãã³ã°ãããã¢ãã«ããªãŒãã£ãªåé¡ã«äœ¿çšããã«ã¯ãåºæ¬çãªWav2Vec2ã¢ãã«ã®äžã«ã·ãŒã±ã³ã¹åé¡ããããè¿œå ããŸããåé¡ãããã¯ãšã³ã³ãŒãã®é ããç¶æ
ãåãå
¥ããç·åœ¢å±€ã§ãåãªãŒãã£ãªãã¬ãŒã ããåŠç¿ãããç¹åŸŽãè¡šããŸãããããã®é ããç¶æ
ã¯é·ããç°ãªãå¯èœæ§ããããããæåã«é ããç¶æ
ãããŒã«ããã次ã«ã¯ã©ã¹ã©ãã«ã«å¯Ÿããããžããã«å€æãããŸããããžãããšã¿ãŒã²ããéã®ã¯ãã¹ãšã³ããããŒæ倱ãèšç®ãããæãå¯èœæ§ã®é«ãã¯ã©ã¹ãèŠã€ããããã«äœ¿çšãããŸãã
|
||
|
||
ãªãŒãã£ãªåé¡ãè©Šãæºåã¯ã§ããŸãããïŒWav2Vec2ããã¡ã€ã³ãã¥ãŒã³ããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã®å®å
šãª[ãªãŒãã£ãªåé¡ã¬ã€ã](tasks/audio_classification)ããã§ãã¯ããŠãã ããïŒ
|
||
|
||
### Automatic speech recognition
|
||
|
||
äºåãã¬ãŒãã³ã°ãããã¢ãã«ãèªåé³å£°èªèã«äœ¿çšããã«ã¯ã[connectionist temporal classificationïŒCTCïŒ](glossary#connectionist-temporal-classification-ctc)ã®ããã®åºæ¬çãªWav2Vec2ã¢ãã«ã®äžã«èšèªã¢ããªã³ã°ããããè¿œå ããŸããèšèªã¢ããªã³ã°ãããã¯ãšã³ã³ãŒãã®é ããç¶æ
ãåãå
¥ããããããããžããã«å€æããŸããåããžããã¯ããŒã¯ã³ã¯ã©ã¹ãè¡šãïŒããŒã¯ã³æ°ã¯ã¿ã¹ã¯ã®èªåœããæ¥ãŸãïŒãããžãããšã¿ãŒã²ããéã®CTCæ倱ãèšç®ããã次ã«è»¢åã«å€æãããŸãã
|
||
|
||
èªåé³å£°èªèãè©Šãæºåã¯ã§ããŸãããïŒWav2Vec2ããã¡ã€ã³ãã¥ãŒã³ããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã®å®å
šãª[èªåé³å£°èªèã¬ã€ã](tasks/asr)ããã§ãã¯ããŠãã ããïŒ
|
||
|
||
## Computer vision
|
||
|
||
ã³ã³ãã¥ãŒã¿ããžã§ã³ã®ã¿ã¹ã¯ãã¢ãããŒãããæ¹æ³ã¯2ã€ãããŸãã
|
||
|
||
1. ç»åããããã®ã·ãŒã±ã³ã¹ã«åå²ããTransformerã䜿çšããŠäžŠåã«åŠçããŸãã
|
||
2. [ConvNeXT](model_doc/convnext)ãªã©ã®ã¢ãã³ãªCNNã䜿çšããŸãããããã¯ç³ã¿èŸŒã¿å±€ã䜿çšããŸãããã¢ãã³ãªãããã¯ãŒã¯èšèšãæ¡çšããŠããŸãã
|
||
|
||
<Tip>
|
||
|
||
ãµãŒãã¢ãããŒãã§ã¯ãTransformerãšç³ã¿èŸŒã¿ãçµã¿åããããã®ããããŸãïŒäŸïŒ[Convolutional Vision Transformer](model_doc/cvt)ãŸãã¯[LeViT](model_doc/levit)ïŒããããã«ã€ããŠã¯è°è«ããŸãããããããã¯ããã§èª¿ã¹ã2ã€ã®ã¢ãããŒããçµã¿åãããŠããŸãã
|
||
|
||
</Tip>
|
||
|
||
ViTãšConvNeXTã¯ç»ååé¡ã«ãã䜿çšãããŸããããªããžã§ã¯ãæ€åºãã»ã°ã¡ã³ããŒã·ã§ã³ã深床æšå®ãªã©ã®ä»ã®ããžã§ã³ã¿ã¹ã¯ã«å¯ŸããŠã¯ãDETRãMask2FormerãGLPNãªã©ãé©ããŠããŸãã
|
||
|
||
### Image classification
|
||
|
||
ViTãšConvNeXTã®äž¡æ¹ãç»ååé¡ã«äœ¿çšã§ããŸããäž»ãªéãã¯ãViTã泚æã¡ã«ããºã ã䜿çšããConvNeXTãç³ã¿èŸŒã¿ã䜿çšããããšã§ãã
|
||
|
||
#### Transformer
|
||
|
||
[ViT](model_doc/vit)ã¯ç³ã¿èŸŒã¿ãå®å
šã«Transformerã¢ãŒããã¯ãã£ã§çœ®ãæããŸããå
ã®Transformerã«ç²ŸéããŠããå ŽåãViTã®ç解ã¯æ¢ã«ã»ãšãã©å®äºããŠããŸãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/vit_architecture.jpg"/>
|
||
</div>
|
||
|
||
ViTãå°å
¥ããäž»ãªå€æŽç¹ã¯ãç»åãTransformerã«äŸçµŠããæ¹æ³ã§ãã
|
||
|
||
1. ç»åã¯æ£æ¹åœ¢ã§éãªããªããããã®ã·ãŒã±ã³ã¹ã«åå²ãããåãããã¯ãã¯ãã«ãŸãã¯*ãããåã蟌ã¿*ã«å€æãããŸãããããåã蟌ã¿ã¯ãé©åãªå
¥å次å
ãäœæããããã«2Dç³ã¿èŸŒã¿å±€ããçæãããŸãïŒåºæ¬ã®Transformerã®å Žåãåãããåã蟌ã¿ã«768ã®å€ããããŸãïŒã224x224ãã¯ã»ã«ã®ç»åãããå Žåãããã16x16ã®ç»åãããã«åå²ã§ããŸããããã¹ããåèªã«ããŒã¯ã³åãããããã«ãç»åã¯ãããã®ã·ãŒã±ã³ã¹ã«ãããŒã¯ã³åããããŸãã
|
||
|
||
2. *åŠç¿åã蟌ã¿*ãã€ãŸãç¹å¥ãª `[CLS]` ããŒã¯ã³ããBERTã®ããã«ãããåã蟌ã¿ã®å
é ã«è¿œå ãããŸãã `[CLS]` ããŒã¯ã³ã®æçµçãªé ããç¶æ
ã¯ãä»å±ã®åé¡ãããã®å
¥åãšããŠäœ¿çšãããŸããä»ã®åºåã¯ç¡èŠãããŸãããã®ããŒã¯ã³ã¯ãã¢ãã«ãç»åã®è¡šçŸããšã³ã³ãŒãããæ¹æ³ãåŠã¶ã®ã«åœ¹ç«ã¡ãŸãã
|
||
|
||
3. ããããšåŠç¿åã蟌ã¿ã«è¿œå ããæåŸã®èŠçŽ ã¯*äœçœ®åã蟌ã¿*ã§ããã¢ãã«ã¯ç»åããããã©ã®ããã«äžŠã¹ãããŠããããç¥ããŸããã®ã§ãäœçœ®åã蟌ã¿ãåŠç¿å¯èœã§ããããåã蟌ã¿ãšåããµã€ãºãæã¡ãŸããæåŸã«ããã¹ãŠã®åã蟌ã¿ãTransformerãšã³ã³ãŒãã«æž¡ãããŸãã
|
||
|
||
4. åºåãå
·äœçã«ã¯ `[CLS]` ããŒã¯ã³ã®åºåã ãããå€å±€ããŒã»ãããã³ãããïŒMLPïŒã«æž¡ãããŸããViTã®äºåãã¬ãŒãã³ã°ã®ç®çã¯åçŽã«åé¡ã§ããä»ã®åé¡ããããšåæ§ã«ãMLPãããã¯åºåãã¯ã©ã¹ã©ãã«ã«å¯Ÿããããžããã«å€æããã¯ãã¹ãšã³ããããŒæ倱ãèšç®ããŠæãå¯èœæ§ã®é«ãã¯ã©ã¹ãèŠã€ããŸãã
|
||
|
||
ç»ååé¡ãè©Šãæºåã¯ã§ããŸãããïŒViTããã¡ã€ã³ãã¥ãŒã³ããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã®å®å
šãª[ç»ååé¡ã¬ã€ã](tasks/image_classification)ããã§ãã¯ããŠãã ããïŒ
|
||
|
||
|
||
#### CNN
|
||
|
||
<Tip>
|
||
|
||
ãã®ã»ã¯ã·ã§ã³ã§ã¯ç³ã¿èŸŒã¿ã«ã€ããŠç°¡åã«èª¬æããŠããŸãããç»åã®åœ¢ç¶ãšãµã€ãºãã©ã®ããã«å€åããããäºåã«ç解ããŠãããšåœ¹ç«ã¡ãŸããç³ã¿èŸŒã¿ã«æ
£ããŠããªãå Žåã¯ãfastaiã®æžç±ãã[Convolution Neural Networks chapter](https://github.com/fastai/fastbook/blob/master/13_convolutions.ipynb)ããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
</Tip>
|
||
|
||
[ConvNeXT](model_doc/convnext)ã¯ãæ§èœãåäžãããããã«æ°ããã¢ãã³ãªãããã¯ãŒã¯èšèšãæ¡çšããCNNã¢ãŒããã¯ãã£ã§ãããã ããç³ã¿èŸŒã¿ã¯ã¢ãã«ã®äžæ žã«ãŸã ãããŸããé«ã¬ãã«ããèŠãå Žåã[ç³ã¿èŸŒã¿ïŒconvolutionïŒ](glossary#convolution)ã¯ãå°ããªè¡åïŒ*ã«ãŒãã«*ïŒãç»åã®ãã¯ã»ã«ã®å°ããªãŠã£ã³ããŠã«ä¹ç®ãããæäœã§ããããã¯ç¹å®ã®ãã¯ã¹ãã£ãç·ã®æ²çãªã©ã®ç¹åŸŽãèšç®ããŸãããã®åŸã次ã®ãã¯ã»ã«ã®ãŠã£ã³ããŠã«ç§»åããŸããç³ã¿èŸŒã¿ã移åããè·é¢ã¯*ã¹ãã©ã€ã*ãšããŠç¥ãããŠããŸãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/convolution.gif"/>
|
||
</div>
|
||
|
||
<small>[Convolution Arithmetic for Deep Learning](https://arxiv.org/abs/1603.07285) ããã®åºæ¬çãªããã£ã³ã°ãã¹ãã©ã€ãã®ãªãç³ã¿èŸŒã¿ã</small>
|
||
|
||
ãã®åºåãå¥ã®ç³ã¿èŸŒã¿å±€ã«äŸçµŠããåé£ç¶ããå±€ããšã«ããããã¯ãŒã¯ã¯ãããããã°ããã±ããã®ãããªããè€éã§æœè±¡çãªãã®ãåŠç¿ããŸããç³ã¿èŸŒã¿å±€ã®éã«ã¯ãç¹åŸŽã®æ¬¡å
ãåæžããç¹åŸŽã®äœçœ®ã®å€åã«å¯ŸããŠã¢ãã«ãããå
ç¢ã«ããããã«ããŒãªã³ã°å±€ãè¿œå ããã®ãäžè¬çã§ãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/convnext_architecture.png"/>
|
||
</div>
|
||
|
||
ConvNeXTã¯ã以äžã®5ã€ã®æ¹æ³ã§CNNãã¢ãã³åããŠããŸãã
|
||
|
||
1. åã¹ããŒãžã®ãããã¯æ°ãå€æŽããç»åããã倧ããªã¹ãã©ã€ããšå¯Ÿå¿ããã«ãŒãã«ãµã€ãºã§*ãããå*ããŸããéãªããªãã¹ã©ã€ãã£ã³ã°ãŠã£ã³ããŠã¯ãããã«ããç»åããããã«åå²ããViTã®æŠç¥ãšäŒŒãŠããŸãã
|
||
|
||
2. *ããã«ããã¯* ã¬ã€ã€ãŒã¯ãã£ãã«æ°ãçž®å°ããããã埩å
ããŸãã1x1ã®ç³ã¿èŸŒã¿ãå®è¡ããã®ã¯éããæ·±ããå¢ããããšãã§ããŸããéããã«ããã¯ã¯éã®ããšãè¡ãããã£ãã«æ°ãæ¡åŒµãããããçž®å°ããŸããããã¯ã¡ã¢ãªå¹çãé«ãã§ãã
|
||
|
||
3. ããã«ããã¯ã¬ã€ã€ãŒå
ã®éåžžã®3x3ã®ç³ã¿èŸŒã¿å±€ãã*深床æ¹åã®ç³ã¿èŸŒã¿*ã§çœ®ãæããŸããããã¯åå
¥åãã£ãã«ã«åå¥ã«ç³ã¿èŸŒã¿ãé©çšããæåŸã«ããããç©ã¿éããç³ã¿èŸŒã¿ã§ããããã«ãããæ§èœåäžã®ããã«ãããã¯ãŒã¯å¹
ãåºãããŸãã
|
||
|
||
4. ViTã¯ã°ããŒãã«å容éãæã£ãŠããããããã®æ³šæã¡ã«ããºã ã®ãããã§äžåºŠã«ç»åã®å€ããèŠãããšãã§ããŸããConvNeXTã¯ãã®å¹æãåçŸããããšããã«ãŒãã«ãµã€ãºã7x7ã«å¢ãããŸãã
|
||
|
||
5. ConvNeXTã¯ãŸããTransformerã¢ãã«ãæš¡å£ããããã€ãã®ã¬ã€ã€ãŒãã¶ã€ã³å€æŽãè¡ã£ãŠããŸããã¢ã¯ãã£ããŒã·ã§ã³ãšæ£èŠåã¬ã€ã€ãŒãå°ãªãã掻æ§åé¢æ°ã¯ReLUã®ä»£ããã«GELUã«åãæ¿ããBatchNormã®ä»£ããã«LayerNormã䜿çšããŠããŸãã
|
||
|
||
ç³ã¿èŸŒã¿ãããã¯ããã®åºåã¯ãåé¡ãããã«æž¡ãããåºåãããžããã«å€æããæãå¯èœæ§ã®é«ãã©ãã«ãèŠã€ããããã«ã¯ãã¹ãšã³ããããŒæ倱ãèšç®ãããŸãã
|
||
|
||
### Object detection
|
||
|
||
[DETR](model_doc/detr)ã*DEtection TRansformer*ãã¯CNNãšTransformerãšã³ã³ãŒããŒãã³ãŒããŒãçµã¿åããããšã³ãããŒãšã³ãã®ãªããžã§ã¯ãæ€åºã¢ãã«ã§ãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/detr_architecture.png"/>
|
||
</div>
|
||
|
||
1. äºåãã¬ãŒãã³ã°ãããCNN *ããã¯ããŒã³* ã¯ããã¯ã»ã«å€ã§è¡šãããç»åãåãåããããã®äœè§£å床ã®ç¹åŸŽããããäœæããŸããç¹åŸŽãããã«ã¯æ¬¡å
åæžã®ããã«1x1ã®ç³ã¿èŸŒã¿ãé©çšãããé«ã¬ãã«ã®ç»åè¡šçŸãæã€æ°ããç¹åŸŽããããäœæãããŸããTransformerã¯é£ç¶ã¢ãã«ã§ãããããç¹åŸŽãããã¯ç¹åŸŽãã¯ãã«ã®ã·ãŒã±ã³ã¹ã«å¹³åŠåãããäœçœ®ãšã³ããã£ã³ã°ãšçµã¿åãããããŸãã
|
||
|
||
2. ç¹åŸŽãã¯ãã«ã¯ãšã³ã³ãŒããŒã«æž¡ããããã®æ³šæã¬ã€ã€ãŒã䜿çšããŠç»åè¡šçŸãåŠç¿ããŸãã次ã«ããšã³ã³ãŒããŒã®é ãç¶æ
ã¯ãã³ãŒããŒã®*ãªããžã§ã¯ãã¯ãšãª*ãšçµã¿åããããŸãããªããžã§ã¯ãã¯ãšãªã¯ãç»åã®ç°ãªãé åã«çŠç¹ãåœãŠãåŠç¿åã蟌ã¿ã§ãå泚æã¬ã€ã€ãŒãé²è¡ããã«ã€ããŠæŽæ°ãããŸãããã³ãŒããŒã®é ãç¶æ
ã¯ãåãªããžã§ã¯ãã¯ãšãªã«å¯ŸããŠããŠã³ãã£ã³ã°ããã¯ã¹ã®åº§æšãšã¯ã©ã¹ã©ãã«ãäºæž¬ãããã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ã«æž¡ãããŸãããŸãã¯ãååšããªãå Žå㯠`no object` ãæž¡ãããŸãã
|
||
|
||
DETRã¯åãªããžã§ã¯ãã¯ãšãªã䞊è¡ããŠãã³ãŒãããŠã*N*ã®æçµçãªäºæž¬ïŒ*N*ã¯ã¯ãšãªã®æ°ïŒãåºåããŸããå
žåçãªèªå·±ååž°ã¢ãã«ã1ã€ã®èŠçŽ ã1åãã€äºæž¬ããã®ãšã¯ç°ãªãããªããžã§ã¯ãæ€åºã¯ã»ããäºæž¬ã¿ã¹ã¯ïŒ`ããŠã³ãã£ã³ã°ããã¯ã¹`ã`ã¯ã©ã¹ã©ãã«`ïŒã§ããã1åã®ãã¹ã§*N*ã®äºæž¬ãè¡ããŸãã
|
||
|
||
3. èšç·ŽäžãDETRã¯*äºéšãããã³ã°æ倱*ã䜿çšããŠãåºå®ãããæ°ã®äºæž¬ãšåºå®ãããäžé£ã®æ£è§£ã©ãã«ãæ¯èŒããŸãã *N*ã®ã©ãã«ã»ããã«æ£è§£ã©ãã«ãå°ãªãå Žåã `no object` ã¯ã©ã¹ã§ããã£ã³ã°ãããŸãããã®æ倱é¢æ°ã¯ãDETRã«äºæž¬ãšæ£è§£ã©ãã«ãšã®éã§1察1ã®å²ãåœãŠãèŠã€ããããã«ä¿ããŸããããŠã³ãã£ã³ã°ããã¯ã¹ãŸãã¯ã¯ã©ã¹ã©ãã«ã®ã©ã¡ãããæ£ãããªãå Žåãæ倱ãçºçããŸããåæ§ã«ãDETRãååšããªããªããžã§ã¯ããäºæž¬ããå Žåã眰éãç§ããããŸããããã«ãããDETRã¯1ã€ã®éåžžã«é¡èãªãªããžã§ã¯ãã«çŠç¹ãåœãŠãã®ã§ã¯ãªããç»åå
ã®ä»ã®ãªããžã§ã¯ããèŠã€ããããã«ä¿ãããŸãã
|
||
|
||
DETRã®äžã«ãªããžã§ã¯ãæ€åºããããè¿œå ããŠãã¯ã©ã¹ã©ãã«ãšããŠã³ãã£ã³ã°ããã¯ã¹ã®åº§æšãèŠã€ããŸãããªããžã§ã¯ãæ€åºãããã«ã¯2ã€ã®ã³ã³ããŒãã³ãããããŸãïŒãã³ãŒããŒã®é ãç¶æ
ãã¯ã©ã¹ã©ãã«ã®ããžããã«å€æããããã®ç·åœ¢å±€ãããã³ããŠã³ãã£ã³ã°ããã¯ã¹ãäºæž¬ããããã®MLPã§ãã
|
||
|
||
ãªããžã§ã¯ãæ€åºãè©Šãæºåã¯ã§ããŸãããïŒDETROã®å®å
šãª[ãªããžã§ã¯ãæ€åºã¬ã€ã](tasks/object_detection)ããã§ãã¯ããŠãDETROã®ãã¡ã€ã³ãã¥ãŒãã³ã°æ¹æ³ãšæšè«æ¹æ³ãåŠãã§ãã ããïŒ
|
||
|
||
### Image segmentation
|
||
|
||
[Mask2Former](model_doc/mask2former)ã¯ããã¹ãŠã®çš®é¡ã®ç»åã»ã°ã¡ã³ããŒã·ã§ã³ã¿ã¹ã¯ã解決ããããã®ãŠãããŒãµã«ã¢ãŒããã¯ãã£ã§ããåŸæ¥ã®ã»ã°ã¡ã³ããŒã·ã§ã³ã¢ãã«ã¯éåžžãã€ã³ã¹ã¿ã³ã¹ãã»ãã³ãã£ãã¯ããŸãã¯ããããã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã®ç¹å®ã®ãµãã¿ã¹ã¯ã«åãããŠèšèšãããŠããŸããMask2Formerã¯ããããã®ã¿ã¹ã¯ã®ããããã*ãã¹ã¯åé¡*ã®åé¡ãšããŠæããŸãããã¹ã¯åé¡ã¯ãã¯ã»ã«ã*N*ã®ã»ã°ã¡ã³ãã«ã°ã«ãŒãåããäžããããç»åã«å¯ŸããŠ*N*ã®ãã¹ã¯ãšããã«å¯Ÿå¿ããã¯ã©ã¹ã©ãã«ãäºæž¬ããŸãããã®ã»ã¯ã·ã§ã³ã§ã¯ãMask2Formerã®åäœæ¹æ³ã説æããæåŸã«SegFormerã®ãã¡ã€ã³ãã¥ãŒãã³ã°ãè©Šãããšãã§ããŸãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/mask2former_architecture.png"/>
|
||
</div>
|
||
|
||
Mask2Formerã®äž»èŠãªã³ã³ããŒãã³ãã¯æ¬¡ã®3ã€ã§ãã
|
||
|
||
1. [Swin](model_doc/swin)ããã¯ããŒã³ã¯ç»åãåãå
¥ãã3ã€ã®é£ç¶ãã3x3ã®ç³ã¿èŸŒã¿ããäœè§£å床ã®ç»åç¹åŸŽããããäœæããŸãã
|
||
|
||
2. ç¹åŸŽãããã¯*ãã¯ã»ã«ãã³ãŒããŒ*ã«æž¡ãããäœè§£å床ã®ç¹åŸŽãé«è§£å床ã®ãã¯ã»ã«åã蟌ã¿ã«åŸã
ã«ã¢ãããµã³ããªã³ã°ããŸãããã¯ã»ã«ãã³ãŒããŒã¯å®éã«ã¯è§£å床1/32ã1/16ãããã³1/8ã®ãªãªãžãã«ç»åã®ãã«ãã¹ã±ãŒã«ç¹åŸŽïŒäœè§£å床ãšé«è§£å床ã®ç¹åŸŽãå«ãïŒãçæããŸãã
|
||
|
||
3. ãããã®ç°ãªãã¹ã±ãŒã«ã®ç¹åŸŽãããã®ããããã¯ãé«è§£å床ã®ç¹åŸŽããå°ãããªããžã§ã¯ãããã£ããã£ããããã«1åãã€ãã©ã³ã¹ãã©ãŒããŒãã³ãŒããŒã¬ã€ã€ãŒã«æž¡ãããŸããMask2Formerã®èŠç¹ã¯ããã³ãŒããŒã®*ãã¹ã¯ã¢ãã³ã·ã§ã³*ã¡ã«ããºã ã§ããã¯ãã¹ã¢ãã³ã·ã§ã³ãç»åå
šäœã«æ³šæãåããããšãã§ããã®ã«å¯Ÿãããã¹ã¯ã¢ãã³ã·ã§ã³ã¯ç»åã®ç¹å®ã®é åã«ã®ã¿çŠç¹ãåœãŠãŸããããã¯éããããŒã«ã«ãªç»åç¹åŸŽã ãã§ãã¢ãã«ãåŠç¿ã§ãããããããã©ãŒãã³ã¹ãåäžããŸãã
|
||
|
||
4. [DETR](tasks_explained#object-detection)ãšåæ§ã«ãMask2FormerãåŠç¿ããããªããžã§ã¯ãã¯ãšãªã䜿çšããç»åã®ç¹åŸŽãšçµã¿åãããŠã»ããã®äºæž¬ïŒ`ã¯ã©ã¹ã©ãã«`ã`ãã¹ã¯äºæž¬`ïŒãè¡ããŸãããã³ãŒããŒã®é ãç¶æ
ã¯ç·åœ¢å±€ã«æž¡ãããã¯ã©ã¹ã©ãã«ã«å¯Ÿããããžããã«å€æãããŸããããžãããšæ£è§£ã©ãã«éã®ã¯ãã¹ãšã³ããããŒæ倱ãæãå¯èœæ§ã®é«ããã®ãèŠã€ããŸãã
|
||
|
||
ãã¹ã¯äºæž¬ã¯ããã¯ã»ã«åã蟌ã¿ãšæçµçãªãã³ãŒããŒã®é ãç¶æ
ãçµã¿åãããŠçæãããŸããã·ã°ã¢ã€ãã¯ãã¹ãšã³ããããŒããã€ã¹æ倱ãããžãããšæ£è§£ãã¹ã¯ã®éã§æãå¯èœæ§ã®é«ããã¹ã¯ãèŠã€ããŸãã
|
||
|
||
ã»ã°ã¡ã³ããŒã·ã§ã³ã¿ã¹ã¯ã«åãçµãæºåãã§ããŸãããïŒSegFormerã®ãã¡ã€ã³ãã¥ãŒãã³ã°æ¹æ³ãšæšè«æ¹æ³ãåŠã¶ããã«ãå®å
šãª[ç»åã»ã°ã¡ã³ããŒã·ã§ã³ã¬ã€ã](tasks/semantic_segmentation)ããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
### Depth estimation
|
||
|
||
[GLPN](model_doc/glpn)ã*Global-Local Path Network*ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ãŸãã¯æ·±åºŠæšå®ãªã©ã®å¯ãªäºæž¬ã¿ã¹ã¯ã«é©ããŠããŸãã[SegFormer](model_doc/segformer)ãšã³ã³ãŒããŒã軜éãã³ãŒããŒãšçµã¿åãããTransformerããŒã¹ã®æ·±åºŠæšå®ã¢ãã«ã§ãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/glpn_architecture.jpg"/>
|
||
</div>
|
||
|
||
1. ViTã®ããã«ãç»åã¯ãããã®ã·ãŒã±ã³ã¹ã«åå²ãããŸããããããã®ç»åãããã¯å°ããã§ããããã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã深床æšå®ãªã©ã®å¯ãªäºæž¬ã¿ã¹ã¯ã«é©ããŠããŸããç»åãããã¯ãããåã蟌ã¿ã«å€æãããŸãïŒãããåã蟌ã¿ã®äœææ¹æ³ã®è©³çŽ°ã«ã€ããŠã¯ã[ç»ååé¡](#image-classification)ã»ã¯ã·ã§ã³ãåç
§ããŠãã ããïŒããããã®ãããåã蟌ã¿ã¯ãšã³ã³ãŒããŒã«æž¡ãããŸãã
|
||
|
||
2. ãšã³ã³ãŒããŒã¯ãããåã蟌ã¿ãåãå
¥ããè€æ°ã®ãšã³ã³ãŒããŒãããã¯ãéããŠããããæž¡ããŸããåãããã¯ã«ã¯ã¢ãã³ã·ã§ã³ãšMix-FFNã¬ã€ã€ãŒãå«ãŸããŠããŸããåŸè
ã®åœ¹å²ã¯äœçœ®æ
å ±ãæäŸããããšã§ããåãšã³ã³ãŒããŒãããã¯ã®æåŸã«ã¯ãéå±€çè¡šçŸãäœæããããã®*ãããããŒãžã³ã°*ã¬ã€ã€ãŒããããŸããé£æ¥ãããããã®ã°ã«ãŒãããšã®ç¹åŸŽãé£çµãããé£çµãããç¹åŸŽã«å¯ŸããŠç·åœ¢å±€ãé©çšããããããã®æ°ã1/4ã®è§£å床ã«åæžããŸããããã次ã®ãšã³ã³ãŒããŒãããã¯ãžã®å
¥åãšãªããããã§ã¯ãã®ããã»ã¹å
šäœãç¹°ãè¿ãããå
ã®ç»åã®1/8ã1/16ãããã³1/32ã®è§£å床ã®ç»åç¹åŸŽãåŸãããŸãã
|
||
|
||
3. 軜éãã³ãŒããŒã¯ããšã³ã³ãŒããŒããã®æåŸã®ç¹åŸŽãããïŒ1/32ã¹ã±ãŒã«ïŒãåãåããããã1/16ã¹ã±ãŒã«ã«ã¢ãããµã³ããªã³ã°ããŸãããã®åŸãç¹åŸŽã¯åç¹åŸŽã«å¯Ÿããã¢ãã³ã·ã§ã³ãããããããŒã«ã«ãšã°ããŒãã«ãªç¹åŸŽãéžæããŠçµã¿åããã*ã»ã¬ã¯ãã£ããã£ãŒãã£ãŒãã¥ãŒãžã§ã³ïŒSFFïŒ*ã¢ãžã¥ãŒã«ã«æž¡ããã1/8ã«ã¢ãããµã³ããªã³ã°ãããŸãããã®ããã»ã¹ã¯ãã³ãŒããããç¹åŸŽãå
ã®ç»åãšåããµã€ãºã«ãªããŸã§ç¹°ãè¿ãããŸãã
|
||
|
||
4. ãã³ãŒããããç¹åŸŽã¯ãæçµçãªäºæž¬ãè¡ãããã«ã»ãã³ãã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã深床æšå®ããŸãã¯ãã®ä»ã®å¯ãªäºæž¬ã¿ã¹ã¯ã«äŸçµŠãããŸããã»ãã³ãã£ãã¯ã»ã°ã¡ã³ããŒã·ã§ã³ã®å Žåãç¹åŸŽã¯ã¯ã©ã¹æ°ã«å¯Ÿããããžããã«å€æãããã¯ãã¹ãšã³ããããŒæ倱ã䜿çšããŠæé©åãããŸãã深床æšå®ã®å Žåãç¹åŸŽã¯æ·±åºŠãããã«å€æãããå¹³å絶察誀差ïŒMAEïŒãŸãã¯å¹³åäºä¹èª€å·®ïŒMSEïŒæ倱ã䜿çšãããŸãã
|
||
|
||
|
||
|
||
## Natural language processing
|
||
|
||
Transformerã¯æåã«æ©æ¢°ç¿»èš³ã®ããã«èšèšããããã以éãã»ãšãã©ã®NLPã¿ã¹ã¯ã解決ããããã®ããã©ã«ãã®ã¢ãŒããã¯ãã£ãšãªã£ãŠããŸããäžéšã®ã¿ã¹ã¯ã¯Transformerã®ãšã³ã³ãŒããŒæ§é ã«é©ããŠãããä»ã®ã¿ã¹ã¯ã¯ãã³ãŒããŒã«é©ããŠããŸããããã«ãäžéšã®ã¿ã¹ã¯ã§ã¯Transformerã®ãšã³ã³ãŒããŒ-ãã³ãŒããŒæ§é ã䜿çšããŸãã
|
||
|
||
### Text classification
|
||
|
||
[BERT](model_doc/bert)ã¯ãšã³ã³ãŒããŒã®ã¿ã®ã¢ãã«ã§ãããããã¹ãã®è±ããªè¡šçŸãåŠç¿ããããã«äž¡åŽã®åèªã«æ³šæãæãããšã§ãæ·±ãåæ¹åæ§ãå¹æçã«å®è£
ããæåã®ã¢ãã«ã§ãã
|
||
|
||
1. BERTã¯[WordPiece](tokenizer_summary#wordpiece)ããŒã¯ãã€ãŒãŒã·ã§ã³ã䜿çšããŠããã¹ãã®ããŒã¯ã³åã蟌ã¿ãçæããŸããåäžã®æãšæã®ãã¢ãåºå¥ããããã«ãç¹å¥ãª `[SEP]` ããŒã¯ã³ãè¿œå ãããŸãã `[CLS]` ããŒã¯ã³ã¯ãã¹ãŠã®ããã¹ãã·ãŒã±ã³ã¹ã®å
é ã«è¿œå ãããŸãã `[CLS]` ããŒã¯ã³ãšãšãã«æçµåºåã¯ãåé¡ã¿ã¹ã¯ã®ããã®å
¥åãšããŠäœ¿çšãããŸããBERTã¯ãŸããããŒã¯ã³ãæã®ãã¢ã®æåãŸãã¯2çªç®ã®æã«å±ãããã©ããã瀺ãã»ã°ã¡ã³ãåã蟌ã¿ãè¿œå ããŸãã
|
||
|
||
2. BERTã¯ãäºåãã¬ãŒãã³ã°ã§2ã€ã®ç®æšã䜿çšããŸãïŒãã¹ã¯ãããèšèªã¢ããªã³ã°ãšæ¬¡ã®æã®äºæž¬ã§ãããã¹ã¯ãããèšèªã¢ããªã³ã°ã§ã¯ãå
¥åããŒã¯ã³ã®äžéšãã©ã³ãã ã«ãã¹ã¯ãããã¢ãã«ã¯ããããäºæž¬ããå¿
èŠããããŸããããã«ãããã¢ãã«ãå
šãŠã®åèªãèŠãŠã次ã®åèªããäºæž¬ããããšãã§ããåæ¹åæ§ã®åé¡ã解決ãããŸããäºæž¬ããããã¹ã¯ããŒã¯ã³ã®æçµçãªé ããç¶æ
ã¯ããœããããã¯ã¹ã䜿çšããåèªã®ãã¹ã¯ãäºæž¬ããããã®ãã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ã«æž¡ãããŸãã
|
||
|
||
2çªç®ã®äºåãã¬ãŒãã³ã°ãªããžã§ã¯ãã¯æ¬¡ã®æã®äºæž¬ã§ããã¢ãã«ã¯æAã®åŸã«æBãç¶ããã©ãããäºæž¬ããå¿
èŠããããŸããååã®å ŽåãæBã¯æ¬¡ã®æã§ãããæ®ãã®ååã®å ŽåãæBã¯ã©ã³ãã ãªæã§ããäºæž¬ïŒæ¬¡ã®æãã©ããïŒã¯ã2ã€ã®ã¯ã©ã¹ïŒ`IsNext`ããã³`NotNext`ïŒã«å¯Ÿãããœããããã¯ã¹ãæã€ãã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ã«æž¡ãããŸãã
|
||
|
||
3. å
¥ååã蟌ã¿ã¯ãæçµçãªé ããç¶æ
ãåºåããããã«è€æ°ã®ãšã³ã³ãŒããŒã¬ã€ã€ãŒãä»ããŠæž¡ãããŸãã
|
||
|
||
äºåèšç·Žæžã¿ã¢ãã«ãããã¹ãåé¡ã«äœ¿çšããã«ã¯ãããŒã¹ã®BERTã¢ãã«ã®äžã«ã·ãŒã±ã³ã¹åé¡ããããè¿œå ããŸããã·ãŒã±ã³ã¹åé¡ãããã¯æçµçãªé ããç¶æ
ãåãå
¥ããããããããžããã«å€æããããã®ç·åœ¢å±€ã§ããã¯ãã¹ãšã³ããããŒæ倱ã¯ãããžãããšã¿ãŒã²ããéã§æãå¯èœæ§ã®é«ãã©ãã«ãèŠã€ããããã«èšç®ãããŸãã
|
||
|
||
ããã¹ãåé¡ãè©ŠããŠã¿ãæºåã¯ã§ããŸãããïŒDistilBERTã埮調æŽããæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å
šãª[ããã¹ãåé¡ã¬ã€ã](tasks/sequence_classification)ããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
### Token classification
|
||
|
||
BERTãååãšã³ãã£ãã£èªèïŒNERïŒãªã©ã®ããŒã¯ã³åé¡ã¿ã¹ã¯ã«äœ¿çšããã«ã¯ãããŒã¹ã®BERTã¢ãã«ã®äžã«ããŒã¯ã³åé¡ããããè¿œå ããŸããããŒã¯ã³åé¡ãããã¯æçµçãªé ããç¶æ
ãåãå
¥ããããããããžããã«å€æããããã®ç·åœ¢å±€ã§ããã¯ãã¹ãšã³ããããŒæ倱ã¯ãããžãããšåããŒã¯ã³éã§æãå¯èœæ§ã®é«ãã©ãã«ãèŠã€ããããã«èšç®ãããŸãã
|
||
|
||
ããŒã¯ã³åé¡ãè©ŠããŠã¿ãæºåã¯ã§ããŸãããïŒDistilBERTã埮調æŽããæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å
šãª[ããŒã¯ã³åé¡ã¬ã€ã](tasks/token_classification)ããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
### Question answering
|
||
|
||
BERTã質åå¿çã«äœ¿çšããã«ã¯ãããŒã¹ã®BERTã¢ãã«ã®äžã«ã¹ãã³åé¡ããããè¿œå ããŸãããã®ç·åœ¢å±€ã¯æçµçãªé ããç¶æ
ãåãå
¥ããåçã«å¯Ÿå¿ããããã¹ãã®ãã¹ãã³ãéå§ãšçµäºã®ããžãããèšç®ããŸããã¯ãã¹ãšã³ããããŒæ倱ã¯ãããžãããšã©ãã«äœçœ®ãšã®éã§æãå¯èœæ§ã®é«ãããã¹ãã¹ãã³ãèŠã€ããããã«èšç®ãããŸãã
|
||
|
||
質åå¿çãè©ŠããŠã¿ãæºåã¯ã§ããŸãããïŒDistilBERTã埮調æŽããæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å
šãª[質åå¿çã¬ã€ã](tasks/question_answering)ããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
<Tip>
|
||
|
||
ð¡ 泚æããŠãã ãããäžåºŠäºåãã¬ãŒãã³ã°ãå®äºããBERTã䜿çšããŠããŸããŸãªã¿ã¹ã¯ã«ç°¡åã«é©çšã§ããããšã«æ³šç®ããŠãã ãããå¿
èŠãªã®ã¯ãäºåãã¬ãŒãã³ã°æžã¿ã¢ãã«ã«ç¹å®ã®ããããè¿œå ããŠãé ããç¶æ
ãææã®åºåã«å€æããããšã ãã§ãïŒ
|
||
|
||
</Tip>
|
||
|
||
### Text generation
|
||
|
||
[GPT-2](model_doc/gpt2)ã¯å€§éã®ããã¹ãã§äºåãã¬ãŒãã³ã°ããããã³ãŒããŒå°çšã¢ãã«ã§ããããã³ãããäžãããšèª¬åŸåã®ããããã¹ããçæããæ瀺çã«ãã¬ãŒãã³ã°ãããŠããªãã«ããããããã質åå¿çãªã©ã®ä»ã®NLPã¿ã¹ã¯ãå®äºã§ããŸãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/gpt2_architecture.png"/>
|
||
</div>
|
||
|
||
1. GPT-2ã¯[ãã€ããã¢ãšã³ã³ãŒãã£ã³ã°ïŒBPEïŒ](tokenizer_summary#bytepair-encoding-bpe)ã䜿çšããŠåèªãããŒã¯ãã€ãºããããŒã¯ã³åã蟌ã¿ãçæããŸããäœçœ®ãšã³ã³ãŒãã£ã³ã°ãããŒã¯ã³åã蟌ã¿ã«è¿œå ãããåããŒã¯ã³ã®äœçœ®ã瀺ããŸããå
¥ååã蟌ã¿ã¯è€æ°ã®ãã³ãŒããŒãããã¯ãä»ããŠæçµçãªé ããç¶æ
ãåºåããããã«æž¡ãããŸããåãã³ãŒããŒãããã¯å
ã§ãGPT-2ã¯ããã¹ã¯ãããèªå·±æ³šæãã¬ã€ã€ãŒã䜿çšããŸããããã¯ãGPT-2ãæªæ¥ã®ããŒã¯ã³ã«æ³šæãæãããšã¯ã§ããªãããšãæå³ããŸããGPT-2ã¯å·ŠåŽã®ããŒã¯ã³ã«ã®ã¿æ³šæãæãããšãèš±å¯ãããŠããŸããããã¯BERTã®[`mask`]ããŒã¯ã³ãšã¯ç°ãªãããã¹ã¯ãããèªå·±æ³šæã§ã¯æªæ¥ã®ããŒã¯ã³ã«å¯ŸããŠã¹ã³ã¢ã`0`ã«èšå®ããããã®æ³šæãã¹ã¯ã䜿çšãããŸãã
|
||
|
||
2. ãã³ãŒããŒããã®åºåã¯ãèšèªã¢ããªã³ã°ãããã«æž¡ãããæçµçãªé ããç¶æ
ãããžããã«å€æããããã®ç·åœ¢å€æãå®è¡ããŸããã©ãã«ã¯ã·ãŒã±ã³ã¹å
ã®æ¬¡ã®ããŒã¯ã³ã§ãããããã¯ããžãããå³ã«1ã€ããããŠçæãããŸããã¯ãã¹ãšã³ããããŒæ倱ã¯ãã·ãããããããžãããšã©ãã«éã§èšç®ããã次ã«æãå¯èœæ§ã®é«ãããŒã¯ã³ãåºåããŸãã
|
||
|
||
GPT-2ã®äºåãã¬ãŒãã³ã°ã®ç®æšã¯å®å
šã«[å æèšèªã¢ããªã³ã°](glossary#causal-language-modeling)ã«åºã¥ããŠãããã·ãŒã±ã³ã¹å
ã®æ¬¡ã®åèªãäºæž¬ããŸããããã«ãããGPT-2ã¯ããã¹ãçæãå«ãã¿ã¹ã¯ã§ç¹ã«åªããæ§èœãçºæ®ããŸãã
|
||
|
||
ããã¹ãçæãè©ŠããŠã¿ãæºåã¯ã§ããŸãããïŒDistilGPT-2ã埮調æŽããæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å
šãª[å æèšèªã¢ããªã³ã°ã¬ã€ã](tasks/language_modeling#causal-language-modeling)ããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
<Tip>
|
||
|
||
ããã¹ãçæã«é¢ãã詳现ã¯ã[ããã¹ãçææŠç¥](generation_strategies)ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
</Tip>
|
||
|
||
|
||
### Summarization
|
||
|
||
[BART](model_doc/bart) ã [T5](model_doc/t5) ã®ãããªãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ã¯ãèŠçŽã¿ã¹ã¯ã®ã·ãŒã±ã³ã¹ã»ãã¥ã»ã·ãŒã±ã³ã¹ã»ãã¿ãŒã³ã«èšèšãããŠããŸãããã®ã»ã¯ã·ã§ã³ã§ã¯ãBARTã®åäœæ¹æ³ã説æããæåŸã«T5ã®åŸ®èª¿æŽãè©Šãããšãã§ããŸãã
|
||
|
||
<div class="flex justify-center">
|
||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bart_architecture.png"/>
|
||
</div>
|
||
|
||
1. BARTã®ãšã³ã³ãŒããŒã¢ãŒããã¯ãã£ã¯ãBERTãšéåžžã«äŒŒãŠãããããã¹ãã®ããŒã¯ã³ãšäœçœ®ãšã³ããã£ã³ã°ãåãå
¥ããŸããBARTã¯ãå
¥åãç Žå£ããŠãããã³ãŒããŒã§åæ§ç¯ããããšã«ãã£ãŠäºåãã¬ãŒãã³ã°ãããŸããç¹å®ã®ç Žå£æŠç¥ãæã€ä»ã®ãšã³ã³ãŒããŒãšã¯ç°ãªããBARTã¯ä»»æã®çš®é¡ã®ç Žå£ãé©çšã§ããŸãããã ãã*ããã¹ãã€ã³ãã£ãªã³ã°*ç Žå£æŠç¥ãæé©ã§ããããã¹ãã€ã³ãã£ãªã³ã°ã§ã¯ãããã€ãã®ããã¹ãã¹ãã³ã**åäžã®** [`mask`] ããŒã¯ã³ã§çœ®ãæããããŸããããã¯éèŠã§ãããªããªãã¢ãã«ã¯ãã¹ã¯ãããããŒã¯ã³ãäºæž¬ããªããã°ãªãããã¢ãã«ã«æ¬ èœããŒã¯ã³ã®æ°ãäºæž¬ãããããã§ããå
¥ååã蟌ã¿ãšãã¹ã¯ãããã¹ãã³ã¯ãšã³ã³ãŒããŒãä»ããŠæçµçãªé ããç¶æ
ãåºåããŸãããBERTãšã¯ç°ãªããBARTã¯åèªãäºæž¬ããããã®æçµçãªãã£ãŒããã©ã¯ãŒããããã¯ãŒã¯ãæåŸã«è¿œå ããŸããã
|
||
|
||
2. ãšã³ã³ãŒããŒã®åºåã¯ãã³ãŒããŒã«æž¡ããããã³ãŒããŒã¯ãšã³ã³ãŒããŒã®åºåãããã¹ã¯ãããããŒã¯ã³ãšéç Žå£ããŒã¯ã³ãäºæž¬ããå¿
èŠããããŸããããã«ããããã³ãŒããŒã¯å
ã®ããã¹ãã埩å
ããã®ã«åœ¹ç«ã€è¿œå ã®ã³ã³ããã¹ããæäŸãããŸãããã³ãŒããŒããã®åºåã¯èšèªã¢ããªã³ã°ãããã«æž¡ãããé ããç¶æ
ãããžããã«å€æããããã®ç·åœ¢å€æãå®è¡ããŸããã¯ãã¹ãšã³ããããŒæ倱ã¯ãããžãããšã©ãã«ã®éã§èšç®ãããã©ãã«ã¯åã«å³ã«ã·ãããããããŒã¯ã³ã§ãã
|
||
|
||
èŠçŽãè©Šãæºåã¯ã§ããŸãããïŒT5ã埮調æŽããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å
šãª[èŠçŽã¬ã€ã](tasks/summarization)ãã芧ãã ããïŒ
|
||
|
||
<Tip>
|
||
|
||
ããã¹ãçæã«é¢ãã詳现ã¯ã[ããã¹ãçææŠç¥](generation_strategies)ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
</Tip>
|
||
|
||
### Translation
|
||
|
||
翻蚳ã¯ãããäžã€ã®ã·ãŒã±ã³ã¹ã»ãã¥ã»ã·ãŒã±ã³ã¹ã»ã¿ã¹ã¯ã®äŸã§ããã[BART](model_doc/bart) ã [T5](model_doc/t5) ã®ãããªãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ã䜿çšããŠå®è¡ã§ããŸãããã®ã»ã¯ã·ã§ã³ã§ã¯ãBARTã®åäœæ¹æ³ã説æããæåŸã«T5ã®åŸ®èª¿æŽãè©Šãããšãã§ããŸãã
|
||
|
||
BARTã¯ããœãŒã¹èšèªãã¿ãŒã²ããèšèªã«ãã³ãŒãã§ããããã«ããããã«ãå¥åã«ã©ã³ãã ã«åæåããããšã³ã³ãŒããŒãè¿œå ããããšã§ç¿»èš³ã«é©å¿ããŸãããã®æ°ãããšã³ã³ãŒããŒã®åã蟌ã¿ã¯ãå
ã®åèªåã蟌ã¿ã®ä»£ããã«äºåãã¬ãŒãã³ã°æžã¿ã®ãšã³ã³ãŒããŒã«æž¡ãããŸãããœãŒã¹ãšã³ã³ãŒããŒã¯ãã¢ãã«ã®åºåããã®ã¯ãã¹ãšã³ããããŒæ倱ãçšããŠãœãŒã¹ãšã³ã³ãŒããŒãäœçœ®ãšã³ããã£ã³ã°ãããã³å
¥åãšã³ããã£ã³ã°ãæŽæ°ããããšã«ãã£ãŠèšç·ŽãããŸãããã®æåã®ã¹ãããã§ã¯ã¢ãã«ãã©ã¡ãŒã¿ãåºå®ããããã¹ãŠã®ã¢ãã«ãã©ã¡ãŒã¿ã2çªç®ã®ã¹ãããã§äžç·ã«èšç·ŽãããŸãã
|
||
|
||
ãã®åŸã翻蚳ã®ããã«å€èšèªçã®mBARTãç»å Žããå€èšèªã§äºåãã¬ãŒãã³ã°ãããã¢ãã«ãšããŠå©çšå¯èœã§ãã
|
||
|
||
翻蚳ãè©Šãæºåã¯ã§ããŸãããïŒT5ã埮調æŽããŠæšè«ã«äœ¿çšããæ¹æ³ãåŠã¶ããã«ãå®å
šãª[翻蚳ã¬ã€ã](tasks/summarization)ãã芧ãã ããïŒ
|
||
|
||
<Tip>
|
||
|
||
ããã¹ãçæã«é¢ãã詳现ã¯ã[ããã¹ãçææŠç¥](generation_strategies)ã¬ã€ãããã§ãã¯ããŠã¿ãŠãã ããïŒ
|
||
|
||
</Tip>
|