[Time-Series] Autoformer model (#21891)

* ran `transformers-cli add-new-model-like`

* added `AutoformerLayernorm` and `AutoformerSeriesDecomposition`

* added `decomposition_layer` in `init` and `moving_avg` to config

* added `AutoformerAutoCorrelation` to encoder & decoder

* removed caninical self attention `AutoformerAttention`

* added arguments in config and model tester. Init works! 😁

* WIP autoformer attention with autocorrlation

* fixed `attn_weights` size

* wip time_delay_agg_training

* fixing sizes and debug time_delay_agg_training

* aggregation in training works! 😁

* `top_k_delays` -> `top_k_delays_index` and added `contiguous()`

* wip time_delay_agg_inference

* finish time_delay_agg_inference 😎

* added resize to autocorrelation

* bug fix: added the length of the output signal to `irfft`

* `attention_mask = None` in the decoder

* fixed test: changed attention expected size, `test_attention_outputs` works!

* removed unnecessary code

* apply AutoformerLayernorm in final norm in enc & dec

* added series decomposition to the encoder

* added series decomp to decoder, with inputs

* added trend todos

* added autoformer to README

* added to index

* added autoformer.mdx

* remove scaling and init attention_mask in the decoder

* make style

* fix copies

* make fix-copies

* inital fix-copies

* fix from https://github.com/huggingface/transformers/pull/22076

* make style

* fix class names

* added trend

* added d_model and projection layers

* added `trend_projection` source, and decomp layer init

* added trend & seasonal init for decoder input

* AutoformerModel cannot be copied as it has the decomp layer too

* encoder can be copied from time series transformer

* fixed generation and made distrb. out more robust

* use context window to calculate decomposition

* use the context_window for decomposition

* use output_params helper

* clean up AutoformerAttention

* subsequences_length off by 1

* make fix copies

* fix test

* added init for nn.Conv1d

* fix IGNORE_NON_TESTED

* added model_doc

* fix ruff

* ignore tests

* remove dup

* fix SPECIAL_CASES_TO_ALLOW

* do not copy due to conv1d weight init

* remove unused imports

* added short summary

* added label_length and made the model non-autoregressive

* added params docs

* better doc for `factor`

* fix tests

* renamed `moving_avg` to `moving_average`

* renamed `factor` to `autocorrelation_factor`

* make style

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix configurations

* fix integration tests

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fixing `lags_sequence` doc

* Revert "fixing `lags_sequence` doc"

This reverts commit 21e34911e3.

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* model layers now take the config

* added `layer_norm_eps` to the config

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* added `config.layer_norm_eps` to AutoformerLayernorm

* added `config.layer_norm_eps` to all layernorm layers

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix variable names

* added inital pretrained model

* added use_cache docstring

* doc strings for trend and use_cache

* fix order of args

* imports on one line

* fixed get_lagged_subsequences docs

* add docstring for create_network_inputs

* get rid of layer_norm_eps config

* add back layernorm

* update fixture location

* fix signature

* use AutoformerModelOutput dataclass

* fix pretrain config

* no need as default exists

* subclass ModelOutput

* remove layer_norm_eps config

* fix test_model_outputs_equivalence test

* test hidden_states_output

* make fix-copies

* Update src/transformers/models/autoformer/configuration_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* removed unused attr

* Update tests/models/autoformer/test_modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/autoformer/modeling_autoformer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use AutoFormerDecoderOutput

* fix formatting

* fix formatting

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
This commit is contained in:
Eli Simhayev 2023-05-30 13:53:32 +05:30 committed by GitHub
parent 17a55534f5
commit 4b6a5a7caa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
29 changed files with 3058 additions and 16 deletions

View File

@ -292,6 +292,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.

View File

@ -267,6 +267,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.

View File

@ -239,6 +239,7 @@ conda install -c huggingface transformers
1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research से) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. द्वाराअनुसंधान पत्र [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) के साथ जारी किया गया
1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (फेसबुक) साथ थीसिस [बार्ट: प्राकृतिक भाषा निर्माण, अनुवाद के लिए अनुक्रम-से-अनुक्रम पूर्व प्रशिक्षण , और समझ] (https://arxiv.org/pdf/1910.13461.pdf) पर निर्भर माइक लुईस, यिनहान लियू, नमन गोयल, मार्जन ग़ज़विनिनेजाद, अब्देलरहमान मोहम्मद, ओमर लेवी, वेस स्टोयानोव और ल्यूक ज़ेटलमॉयर
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (से École polytechnique) साथ थीसिस [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) पर निर्भर Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis रिहाई।
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research से) साथ में पेपर [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701)गुयेन लुओंग ट्रान, डुओंग मिन्ह ले और डाट क्वोक गुयेन द्वारा पोस्ट किया गया।

View File

@ -301,6 +301,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research から) Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. から公開された研究論文 [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918)
1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (BAAI から) Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell から公開された研究論文: [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679)
1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (MIT から) Yuan Gong, Yu-An Chung, James Glass から公開された研究論文: [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778)
1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (Facebook から) Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer から公開された研究論文: [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461)
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (École polytechnique から) Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis から公開された研究論文: [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321)
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (VinAI Research から) Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen から公開された研究論文: [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701)

View File

@ -216,6 +216,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (Google Research 에서 제공)은 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.의 [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918)논문과 함께 발표했습니다.
1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.

View File

@ -240,6 +240,7 @@ conda install -c huggingface transformers
1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (来自 Google Research) 伴随论文 [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) 由 Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig 发布。
1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (来自 BAAI) 伴随论文 [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) 由 Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell 发布。
1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (来自 MIT) 伴随论文 [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) 由 Yuan Gong, Yu-An Chung, James Glass 发布。
1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (来自 Facebook) 伴随论文 [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) 由 Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer 发布。
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (来自 École polytechnique) 伴随论文 [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) 由 Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis 发布。
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (来自 VinAI Research) 伴随论文 [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) 由 Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen 发布。

View File

@ -252,6 +252,7 @@ conda install -c huggingface transformers
1. **[ALIGN](https://huggingface.co/docs/transformers/model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
1. **[AltCLIP](https://huggingface.co/docs/transformers/model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
1. **[Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
1. **[Autoformer](https://huggingface.co/docs/transformers/main/model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.

View File

@ -656,6 +656,8 @@
title: Reinforcement learning models
- isExpanded: false
sections:
- local: model_doc/autoformer
title: Autoformer
- local: model_doc/informer
title: Informer
- local: model_doc/time_series_transformer

View File

@ -53,6 +53,7 @@ The documentation is organized into five sections:
1. **[ALIGN](model_doc/align)** (from Google Research) released with the paper [Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision](https://arxiv.org/abs/2102.05918) by Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig.
1. **[AltCLIP](model_doc/altclip)** (from BAAI) released with the paper [AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities](https://arxiv.org/abs/2211.06679) by Chen, Zhongzhi and Liu, Guang and Zhang, Bo-Wen and Ye, Fulong and Yang, Qinghong and Wu, Ledell.
1. **[Audio Spectrogram Transformer](model_doc/audio-spectrogram-transformer)** (from MIT) released with the paper [AST: Audio Spectrogram Transformer](https://arxiv.org/abs/2104.01778) by Yuan Gong, Yu-An Chung, James Glass.
1. **[Autoformer](model_doc/autoformer)** (from Tsinghua University) released with the paper [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
@ -268,6 +269,7 @@ Flax), PyTorch, and/or TensorFlow.
| ALIGN | ❌ | ❌ | ✅ | ❌ | ❌ |
| AltCLIP | ❌ | ❌ | ✅ | ❌ | ❌ |
| Audio Spectrogram Transformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| Autoformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| BART | ✅ | ✅ | ✅ | ✅ | ✅ |
| BEiT | ❌ | ❌ | ✅ | ❌ | ✅ |
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ |

View File

@ -0,0 +1,42 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Autoformer
## Overview
The Autoformer model was proposed in [Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting](https://arxiv.org/abs/2106.13008) by Haixu Wu, Jiehui Xu, Jianmin Wang, Mingsheng Long.
This model augments the Transformer as a deep decomposition architecture, which can progressively decompose the trend and seasonal components during the forecasting process.
The abstract from the paper is the following:
*Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the long-term forecasting problem of time series. Prior Transformer-based models adopt various self-attention mechanisms to discover the long-range dependencies. However, intricate temporal patterns of the long-term future prohibit the model from finding reliable dependencies. Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck. Going beyond Transformers, we design Autoformer as a novel decomposition architecture with an Auto-Correlation mechanism. We break with the pre-processing convention of series decomposition and renovate it as a basic inner block of deep models. This design empowers Autoformer with progressive decomposition capacities for complex time series. Further, inspired by the stochastic process theory, we design the Auto-Correlation mechanism based on the series periodicity, which conducts the dependencies discovery and representation aggregation at the sub-series level. Auto-Correlation outperforms self-attention in both efficiency and accuracy. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a 38% relative improvement on six benchmarks, covering five practical applications: energy, traffic, economics, weather and disease.*
This model was contributed by [elisim](https://huggingface.co/elisim) and [kashif](https://huggingface.co/kashif).
The original code can be found [here](https://github.com/thuml/Autoformer).
## AutoformerConfig
[[autodoc]] AutoformerConfig
## AutoformerModel
[[autodoc]] AutoformerModel
- forward
## AutoformerForPrediction
[[autodoc]] AutoformerForPrediction
- forward

View File

@ -155,6 +155,10 @@ _import_structure = {
"AutoProcessor",
"AutoTokenizer",
],
"models.autoformer": [
"AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP",
"AutoformerConfig",
],
"models.bart": ["BartConfig", "BartTokenizer"],
"models.barthez": [],
"models.bartpho": [],
@ -1082,6 +1086,14 @@ else:
"AutoModelWithLMHead",
]
)
_import_structure["models.autoformer"].extend(
[
"AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"AutoformerForPrediction",
"AutoformerModel",
"AutoformerPreTrainedModel",
]
)
_import_structure["models.bart"].extend(
[
"BART_PRETRAINED_MODEL_ARCHIVE_LIST",
@ -3946,6 +3958,10 @@ if TYPE_CHECKING:
AutoProcessor,
AutoTokenizer,
)
from .models.autoformer import (
AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP,
AutoformerConfig,
)
from .models.bart import BartConfig, BartTokenizer
from .models.beit import BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP, BeitConfig
from .models.bert import (
@ -4784,6 +4800,12 @@ if TYPE_CHECKING:
AutoModelForZeroShotObjectDetection,
AutoModelWithLMHead,
)
from .models.autoformer import (
AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
AutoformerForPrediction,
AutoformerModel,
AutoformerPreTrainedModel,
)
from .models.bart import (
BART_PRETRAINED_MODEL_ARCHIVE_LIST,
BartForCausalLM,

View File

@ -18,6 +18,7 @@ from . import (
altclip,
audio_spectrogram_transformer,
auto,
autoformer,
bart,
barthez,
bartpho,

View File

@ -33,6 +33,7 @@ CONFIG_MAPPING_NAMES = OrderedDict(
("align", "AlignConfig"),
("altclip", "AltCLIPConfig"),
("audio-spectrogram-transformer", "ASTConfig"),
("autoformer", "AutoformerConfig"),
("bart", "BartConfig"),
("beit", "BeitConfig"),
("bert", "BertConfig"),
@ -225,6 +226,7 @@ CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
("align", "ALIGN_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("altclip", "ALTCLIP_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("audio-spectrogram-transformer", "AUDIO_SPECTROGRAM_TRANSFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("autoformer", "AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("bart", "BART_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("beit", "BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("bert", "BERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
@ -399,6 +401,7 @@ MODEL_NAMES_MAPPING = OrderedDict(
("align", "ALIGN"),
("altclip", "AltCLIP"),
("audio-spectrogram-transformer", "Audio Spectrogram Transformer"),
("autoformer", "Autoformer"),
("bart", "BART"),
("barthez", "BARThez"),
("bartpho", "BARTpho"),

View File

@ -32,6 +32,7 @@ MODEL_MAPPING_NAMES = OrderedDict(
("align", "AlignModel"),
("altclip", "AltCLIPModel"),
("audio-spectrogram-transformer", "ASTModel"),
("autoformer", "AutoformerModel"),
("bart", "BartModel"),
("beit", "BeitModel"),
("bert", "BertModel"),

View File

@ -0,0 +1,63 @@
# Copyright 2023 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING
# rely on isort to merge the imports
from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available
_import_structure = {
"configuration_autoformer": [
"AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP",
"AutoformerConfig",
],
}
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_autoformer"] = [
"AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"AutoformerForPrediction",
"AutoformerModel",
"AutoformerPreTrainedModel",
]
if TYPE_CHECKING:
from .configuration_autoformer import (
AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP,
AutoformerConfig,
)
try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_autoformer import (
AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
AutoformerForPrediction,
AutoformerModel,
AutoformerPreTrainedModel,
)
else:
import sys
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)

View File

@ -0,0 +1,245 @@
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Autoformer model configuration"""
from typing import List, Optional
from ...configuration_utils import PretrainedConfig
from ...utils import logging
logger = logging.get_logger(__name__)
AUTOFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP = {
"huggingface/autoformer-tourism-monthly": "https://huggingface.co/huggingface/autoformer-tourism-monthly/resolve/main/config.json",
}
class AutoformerConfig(PretrainedConfig):
r"""
This is the configuration class to store the configuration of an [`AutoformerModel`]. It is used to instantiate an
Autoformer model according to the specified arguments, defining the model architecture. Instantiating a
configuration with the defaults will yield a similar configuration to that of the Autoformer
[huggingface/autoformer-tourism-monthly](https://huggingface.co/huggingface/autoformer-tourism-monthly)
architecture.
Configuration objects inherit from [`PretrainedConfig`] can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.
Args:
prediction_length (`int`):
The prediction length for the decoder. In other words, the prediction horizon of the model.
context_length (`int`, *optional*, defaults to `prediction_length`):
The context length for the encoder. If unset, the context length will be the same as the
`prediction_length`.
distribution_output (`string`, *optional*, defaults to `"student_t"`):
The distribution emission head for the model. Could be either "student_t", "normal" or "negative_binomial".
loss (`string`, *optional*, defaults to `"nll"`):
The loss function for the model corresponding to the `distribution_output` head. For parametric
distributions it is the negative log likelihood (nll) - which currently is the only supported one.
input_size (`int`, *optional*, defaults to 1):
The size of the target variable which by default is 1 for univariate targets. Would be > 1 in case of
multivariate targets.
lags_sequence (`list[int]`, *optional*, defaults to `[1, 2, 3, 4, 5, 6, 7]`):
The lags of the input time series as covariates often dictated by the frequency. Default is `[1, 2, 3, 4,
5, 6, 7]`.
scaling (`bool`, *optional* defaults to `True`):
Whether to scale the input targets.
num_time_features (`int`, *optional*, defaults to 0):
The number of time features in the input time series.
num_dynamic_real_features (`int`, *optional*, defaults to 0):
The number of dynamic real valued features.
num_static_categorical_features (`int`, *optional*, defaults to 0):
The number of static categorical features.
num_static_real_features (`int`, *optional*, defaults to 0):
The number of static real valued features.
cardinality (`list[int]`, *optional*):
The cardinality (number of different values) for each of the static categorical features. Should be a list
of integers, having the same length as `num_static_categorical_features`. Cannot be `None` if
`num_static_categorical_features` is > 0.
embedding_dimension (`list[int]`, *optional*):
The dimension of the embedding for each of the static categorical features. Should be a list of integers,
having the same length as `num_static_categorical_features`. Cannot be `None` if
`num_static_categorical_features` is > 0.
d_model (`int`, *optional*, defaults to 64):
Dimensionality of the transformer layers.
encoder_layers (`int`, *optional*, defaults to 2):
Number of encoder layers.
decoder_layers (`int`, *optional*, defaults to 2):
Number of decoder layers.
encoder_attention_heads (`int`, *optional*, defaults to 2):
Number of attention heads for each attention layer in the Transformer encoder.
decoder_attention_heads (`int`, *optional*, defaults to 2):
Number of attention heads for each attention layer in the Transformer decoder.
encoder_ffn_dim (`int`, *optional*, defaults to 32):
Dimension of the "intermediate" (often named feed-forward) layer in encoder.
decoder_ffn_dim (`int`, *optional*, defaults to 32):
Dimension of the "intermediate" (often named feed-forward) layer in decoder.
activation_function (`str` or `function`, *optional*, defaults to `"gelu"`):
The non-linear activation function (function or string) in the encoder and decoder. If string, `"gelu"` and
`"relu"` are supported.
dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for all fully connected layers in the encoder, and decoder.
encoder_layerdrop (`float`, *optional*, defaults to 0.1):
The dropout probability for the attention and fully connected layers for each encoder layer.
decoder_layerdrop (`float`, *optional*, defaults to 0.1):
The dropout probability for the attention and fully connected layers for each decoder layer.
attention_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability for the attention probabilities.
activation_dropout (`float`, *optional*, defaults to 0.1):
The dropout probability used between the two layers of the feed-forward networks.
num_parallel_samples (`int`, *optional*, defaults to 100):
The number of samples to generate in parallel for each time step of inference.
init_std (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated normal weight initialization distribution.
use_cache (`bool`, *optional*, defaults to `True`):
Whether to use the past key/values attentions (if applicable to the model) to speed up decoding.
label_length (`int`, *optional*, defaults to 10):
Start token length of the Autoformer decoder, which is used for direct multi-step prediction (i.e.
non-autoregressive generation).
moving_average (`int`, defaults to 25):
The window size of the moving average. In practice, it's the kernel size in AvgPool1d of the Decomposition
Layer.
autocorrelation_factor (`int`, defaults to 3):
"Attention" (i.e. AutoCorrelation mechanism) factor which is used to find top k autocorrelations delays.
It's recommended in the paper to set it to a number between 1 and 5.
Example:
```python
>>> from transformers import AutoformerConfig, AutoformerModel
>>> # Initializing a default Autoformer configuration
>>> configuration = AutoformerConfig()
>>> # Randomly initializing a model (with random weights) from the configuration
>>> model = AutoformerModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
```"""
model_type = "autoformer"
attribute_map = {
"hidden_size": "d_model",
"num_attention_heads": "encoder_attention_heads",
"num_hidden_layers": "encoder_layers",
}
def __init__(
self,
prediction_length: Optional[int] = None,
context_length: Optional[int] = None,
distribution_output: str = "student_t",
loss: str = "nll",
input_size: int = 1,
lags_sequence: List[int] = [1, 2, 3, 4, 5, 6, 7],
scaling: bool = True,
num_time_features: int = 0,
num_dynamic_real_features: int = 0,
num_static_categorical_features: int = 0,
num_static_real_features: int = 0,
cardinality: Optional[List[int]] = None,
embedding_dimension: Optional[List[int]] = None,
d_model: int = 64,
encoder_attention_heads: int = 2,
decoder_attention_heads: int = 2,
encoder_layers: int = 2,
decoder_layers: int = 2,
encoder_ffn_dim: int = 32,
decoder_ffn_dim: int = 32,
activation_function: str = "gelu",
dropout: float = 0.1,
encoder_layerdrop: float = 0.1,
decoder_layerdrop: float = 0.1,
attention_dropout: float = 0.1,
activation_dropout: float = 0.1,
num_parallel_samples: int = 100,
init_std: float = 0.02,
use_cache: bool = True,
is_encoder_decoder=True,
# Autoformer arguments
label_length: int = 10,
moving_average: int = 25,
autocorrelation_factor: int = 3,
**kwargs,
):
# time series specific configuration
self.prediction_length = prediction_length
self.context_length = context_length if context_length is not None else prediction_length
self.distribution_output = distribution_output
self.loss = loss
self.input_size = input_size
self.num_time_features = num_time_features
self.lags_sequence = lags_sequence
self.scaling = scaling
self.num_dynamic_real_features = num_dynamic_real_features
self.num_static_real_features = num_static_real_features
self.num_static_categorical_features = num_static_categorical_features
if cardinality is not None and num_static_categorical_features > 0:
if len(cardinality) != num_static_categorical_features:
raise ValueError(
"The cardinality should be a list of the same length as `num_static_categorical_features`"
)
self.cardinality = cardinality
else:
self.cardinality = [0]
if embedding_dimension is not None and num_static_categorical_features > 0:
if len(embedding_dimension) != num_static_categorical_features:
raise ValueError(
"The embedding dimension should be a list of the same length as `num_static_categorical_features`"
)
self.embedding_dimension = embedding_dimension
else:
self.embedding_dimension = [min(50, (cat + 1) // 2) for cat in self.cardinality]
self.num_parallel_samples = num_parallel_samples
# Transformer architecture configuration
self.feature_size = input_size * len(self.lags_sequence) + self._number_of_features
self.d_model = d_model
self.encoder_attention_heads = encoder_attention_heads
self.decoder_attention_heads = decoder_attention_heads
self.encoder_ffn_dim = encoder_ffn_dim
self.decoder_ffn_dim = decoder_ffn_dim
self.encoder_layers = encoder_layers
self.decoder_layers = decoder_layers
self.dropout = dropout
self.attention_dropout = attention_dropout
self.activation_dropout = activation_dropout
self.encoder_layerdrop = encoder_layerdrop
self.decoder_layerdrop = decoder_layerdrop
self.activation_function = activation_function
self.init_std = init_std
self.use_cache = use_cache
# Autoformer
self.label_length = label_length
self.moving_average = moving_average
self.autocorrelation_factor = autocorrelation_factor
super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs)
@property
def _number_of_features(self) -> int:
return (
sum(self.embedding_dimension)
+ self.num_dynamic_real_features
+ self.num_time_features
+ self.num_static_real_features
+ self.input_size * 2 # the log1p(abs(loc)) and log(scale) features
)

File diff suppressed because it is too large Load Diff

View File

@ -232,9 +232,6 @@ class InformerConfig(PretrainedConfig):
self.activation_function = activation_function
self.init_std = init_std
self.output_attentions = False
self.output_hidden_states = False
self.use_cache = use_cache
# Informer

View File

@ -142,7 +142,9 @@ class InformerMeanScaler(nn.Module):
self.default_scale = default_scale
@torch.no_grad()
def forward(self, data: torch.Tensor, observed_indicator: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
def forward(
self, data: torch.Tensor, observed_indicator: torch.Tensor
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
# shape: (N, [C], T=1)
ts_sum = (data * observed_indicator).abs().sum(self.dim, keepdim=True)
num_observed = observed_indicator.sum(self.dim, keepdim=True)
@ -1669,7 +1671,7 @@ class InformerModel(InformerPreTrainedModel):
>>> from transformers import InformerModel
>>> file = hf_hub_download(
... repo_id="kashif/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)
@ -1834,7 +1836,7 @@ class InformerForPrediction(InformerPreTrainedModel):
>>> from transformers import InformerForPrediction
>>> file = hf_hub_download(
... repo_id="kashif/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)

View File

@ -217,9 +217,6 @@ class TimeSeriesTransformerConfig(PretrainedConfig):
self.activation_function = activation_function
self.init_std = init_std
self.output_attentions = False
self.output_hidden_states = False
self.use_cache = use_cache
super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs)

View File

@ -140,7 +140,9 @@ class TimeSeriesMeanScaler(nn.Module):
self.default_scale = default_scale
@torch.no_grad()
def forward(self, data: torch.Tensor, observed_indicator: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
def forward(
self, data: torch.Tensor, observed_indicator: torch.Tensor
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
# shape: (N, [C], T=1)
ts_sum = (data * observed_indicator).abs().sum(self.dim, keepdim=True)
num_observed = observed_indicator.sum(self.dim, keepdim=True)
@ -1394,7 +1396,7 @@ class TimeSeriesTransformerModel(TimeSeriesTransformerPreTrainedModel):
>>> from transformers import TimeSeriesTransformerModel
>>> file = hf_hub_download(
... repo_id="kashif/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)
@ -1558,7 +1560,7 @@ class TimeSeriesTransformerForPrediction(TimeSeriesTransformerPreTrainedModel):
>>> from transformers import TimeSeriesTransformerForPrediction
>>> file = hf_hub_download(
... repo_id="kashif/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)

View File

@ -171,7 +171,7 @@ class StudentTOutput(DistributionOutput):
@classmethod
def domain_map(cls, df: torch.Tensor, loc: torch.Tensor, scale: torch.Tensor):
scale = cls.squareplus(scale)
scale = cls.squareplus(scale).clamp_min(torch.finfo(scale.dtype).eps)
df = 2.0 + cls.squareplus(df)
return df.squeeze(-1), loc.squeeze(-1), scale.squeeze(-1)
@ -186,7 +186,7 @@ class NormalOutput(DistributionOutput):
@classmethod
def domain_map(cls, loc: torch.Tensor, scale: torch.Tensor):
scale = cls.squareplus(scale)
scale = cls.squareplus(scale).clamp_min(torch.finfo(scale.dtype).eps)
return loc.squeeze(-1), scale.squeeze(-1)

View File

@ -772,6 +772,30 @@ class AutoModelWithLMHead(metaclass=DummyObject):
requires_backends(self, ["torch"])
AUTOFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = None
class AutoformerForPrediction(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class AutoformerModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
class AutoformerPreTrainedModel(metaclass=DummyObject):
_backends = ["torch"]
def __init__(self, *args, **kwargs):
requires_backends(self, ["torch"])
BART_PRETRAINED_MODEL_ARCHIVE_LIST = None

View File

View File

@ -0,0 +1,449 @@
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Testing suite for the PyTorch Autoformer model. """
import inspect
import tempfile
import unittest
from huggingface_hub import hf_hub_download
from transformers import is_torch_available
from transformers.testing_utils import require_torch, slow, torch_device
from ...test_configuration_common import ConfigTester
from ...test_modeling_common import ModelTesterMixin, floats_tensor, ids_tensor
TOLERANCE = 1e-4
if is_torch_available():
import torch
from transformers import AutoformerConfig, AutoformerForPrediction, AutoformerModel
from transformers.models.autoformer.modeling_autoformer import AutoformerDecoder, AutoformerEncoder
@require_torch
class AutoformerModelTester:
def __init__(
self,
parent,
d_model=16,
batch_size=13,
prediction_length=7,
context_length=14,
label_length=10,
cardinality=19,
embedding_dimension=5,
num_time_features=4,
is_training=True,
hidden_size=16,
num_hidden_layers=2,
num_attention_heads=4,
intermediate_size=4,
hidden_act="gelu",
hidden_dropout_prob=0.1,
attention_probs_dropout_prob=0.1,
lags_sequence=[1, 2, 3, 4, 5],
moving_average=25,
autocorrelation_factor=5,
):
self.d_model = d_model
self.parent = parent
self.batch_size = batch_size
self.prediction_length = prediction_length
self.context_length = context_length
self.cardinality = cardinality
self.num_time_features = num_time_features
self.lags_sequence = lags_sequence
self.embedding_dimension = embedding_dimension
self.is_training = is_training
self.hidden_size = hidden_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.intermediate_size = intermediate_size
self.hidden_act = hidden_act
self.hidden_dropout_prob = hidden_dropout_prob
self.attention_probs_dropout_prob = attention_probs_dropout_prob
self.encoder_seq_length = context_length
self.decoder_seq_length = prediction_length + label_length
self.label_length = label_length
self.moving_average = moving_average
self.autocorrelation_factor = autocorrelation_factor
def get_config(self):
return AutoformerConfig(
d_model=self.d_model,
encoder_layers=self.num_hidden_layers,
decoder_layers=self.num_hidden_layers,
encoder_attention_heads=self.num_attention_heads,
decoder_attention_heads=self.num_attention_heads,
encoder_ffn_dim=self.intermediate_size,
decoder_ffn_dim=self.intermediate_size,
dropout=self.hidden_dropout_prob,
attention_dropout=self.attention_probs_dropout_prob,
prediction_length=self.prediction_length,
context_length=self.context_length,
label_length=self.label_length,
lags_sequence=self.lags_sequence,
num_time_features=self.num_time_features,
num_static_categorical_features=1,
cardinality=[self.cardinality],
embedding_dimension=[self.embedding_dimension],
moving_average=self.moving_average,
)
def prepare_autoformer_inputs_dict(self, config):
_past_length = config.context_length + max(config.lags_sequence)
static_categorical_features = ids_tensor([self.batch_size, 1], config.cardinality[0])
past_time_features = floats_tensor([self.batch_size, _past_length, config.num_time_features])
past_values = floats_tensor([self.batch_size, _past_length])
past_observed_mask = floats_tensor([self.batch_size, _past_length]) > 0.5
# decoder inputs
future_time_features = floats_tensor([self.batch_size, config.prediction_length, config.num_time_features])
future_values = floats_tensor([self.batch_size, config.prediction_length])
inputs_dict = {
"past_values": past_values,
"static_categorical_features": static_categorical_features,
"past_time_features": past_time_features,
"past_observed_mask": past_observed_mask,
"future_time_features": future_time_features,
"future_values": future_values,
}
return inputs_dict
def prepare_config_and_inputs(self):
config = self.get_config()
inputs_dict = self.prepare_autoformer_inputs_dict(config)
return config, inputs_dict
def prepare_config_and_inputs_for_common(self):
config, inputs_dict = self.prepare_config_and_inputs()
return config, inputs_dict
def check_encoder_decoder_model_standalone(self, config, inputs_dict):
model = AutoformerModel(config=config).to(torch_device).eval()
outputs = model(**inputs_dict)
encoder_last_hidden_state = outputs.encoder_last_hidden_state
last_hidden_state = outputs.last_hidden_state
with tempfile.TemporaryDirectory() as tmpdirname:
encoder = model.get_encoder()
encoder.save_pretrained(tmpdirname)
encoder = AutoformerEncoder.from_pretrained(tmpdirname).to(torch_device)
transformer_inputs, feature, _, _, _ = model.create_network_inputs(**inputs_dict)
seasonal_input, trend_input = model.decomposition_layer(transformer_inputs[:, : config.context_length, ...])
enc_input = torch.cat(
(transformer_inputs[:, : config.context_length, ...], feature[:, : config.context_length, ...]),
dim=-1,
)
encoder_last_hidden_state_2 = encoder(inputs_embeds=enc_input)[0]
self.parent.assertTrue((encoder_last_hidden_state_2 - encoder_last_hidden_state).abs().max().item() < 1e-3)
mean = (
torch.mean(transformer_inputs[:, : config.context_length, ...], dim=1)
.unsqueeze(1)
.repeat(1, config.prediction_length, 1)
)
zeros = torch.zeros(
[transformer_inputs.shape[0], config.prediction_length, transformer_inputs.shape[2]],
device=enc_input.device,
)
dec_input = torch.cat(
(
torch.cat((seasonal_input[:, -config.label_length :, ...], zeros), dim=1),
feature[:, config.context_length - config.label_length :, ...],
),
dim=-1,
)
trend_init = torch.cat(
(
torch.cat((trend_input[:, -config.label_length :, ...], mean), dim=1),
feature[:, config.context_length - config.label_length :, ...],
),
dim=-1,
)
with tempfile.TemporaryDirectory() as tmpdirname:
decoder = model.get_decoder()
decoder.save_pretrained(tmpdirname)
decoder = AutoformerDecoder.from_pretrained(tmpdirname).to(torch_device)
last_hidden_state_2 = decoder(
trend=trend_init,
inputs_embeds=dec_input,
encoder_hidden_states=encoder_last_hidden_state,
)[0]
self.parent.assertTrue((last_hidden_state_2 - last_hidden_state).abs().max().item() < 1e-3)
@require_torch
class AutoformerModelTest(ModelTesterMixin, unittest.TestCase):
all_model_classes = (AutoformerModel, AutoformerForPrediction) if is_torch_available() else ()
all_generative_model_classes = (AutoformerForPrediction,) if is_torch_available() else ()
test_pruning = False
test_head_masking = False
test_missing_keys = False
test_torchscript = False
test_inputs_embeds = False
test_model_common_attributes = False
def setUp(self):
self.model_tester = AutoformerModelTester(self)
self.config_tester = ConfigTester(self, config_class=AutoformerConfig, has_text_modality=False)
def test_config(self):
self.config_tester.run_common_tests()
def test_save_load_strict(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs()
for model_class in self.all_model_classes:
model = model_class(config)
with tempfile.TemporaryDirectory() as tmpdirname:
model.save_pretrained(tmpdirname)
model2, info = model_class.from_pretrained(tmpdirname, output_loading_info=True)
self.assertEqual(info["missing_keys"], [])
def test_encoder_decoder_model_standalone(self):
config_and_inputs = self.model_tester.prepare_config_and_inputs_for_common()
self.model_tester.check_encoder_decoder_model_standalone(*config_and_inputs)
@unittest.skip(reason="Model has no tokens embeddings")
def test_resize_tokens_embeddings(self):
pass
# # Input is 'static_categorical_features' not 'input_ids'
def test_model_main_input_name(self):
model_signature = inspect.signature(getattr(AutoformerModel, "forward"))
# The main input is the name of the argument after `self`
observed_main_input_name = list(model_signature.parameters.keys())[1]
self.assertEqual(AutoformerModel.main_input_name, observed_main_input_name)
def test_forward_signature(self):
config, _ = self.model_tester.prepare_config_and_inputs_for_common()
for model_class in self.all_model_classes:
model = model_class(config)
signature = inspect.signature(model.forward)
# signature.parameters is an OrderedDict => so arg_names order is deterministic
arg_names = [*signature.parameters.keys()]
expected_arg_names = [
"past_values",
"past_time_features",
"past_observed_mask",
"static_categorical_features",
"static_real_features",
"future_values",
"future_time_features",
]
if model.__class__.__name__ in ["AutoformerForPrediction"]:
expected_arg_names.append("future_observed_mask")
expected_arg_names.extend(
[
"decoder_attention_mask",
"head_mask",
"decoder_head_mask",
"cross_attn_head_mask",
"encoder_outputs",
"past_key_values",
"output_hidden_states",
"output_attentions",
"use_cache",
"return_dict",
]
)
self.assertListEqual(arg_names[: len(expected_arg_names)], expected_arg_names)
def test_attention_outputs(self):
config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
config.return_dict = True
seq_len = getattr(self.model_tester, "seq_length", None)
decoder_seq_length = getattr(self.model_tester, "decoder_seq_length", seq_len)
encoder_seq_length = getattr(self.model_tester, "encoder_seq_length", seq_len)
d_model = getattr(self.model_tester, "d_model", None)
num_attention_heads = getattr(self.model_tester, "num_attention_heads", None)
dim = d_model // num_attention_heads
for model_class in self.all_model_classes:
inputs_dict["output_attentions"] = True
inputs_dict["output_hidden_states"] = False
config.return_dict = True
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
attentions = outputs.encoder_attentions if config.is_encoder_decoder else outputs.attentions
self.assertEqual(len(attentions), self.model_tester.num_hidden_layers)
# check that output_attentions also work using config
del inputs_dict["output_attentions"]
config.output_attentions = True
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
attentions = outputs.encoder_attentions
self.assertEqual(len(attentions), self.model_tester.num_hidden_layers)
self.assertListEqual(
list(attentions[0].shape[-3:]),
[self.model_tester.num_attention_heads, encoder_seq_length, dim],
)
out_len = len(outputs)
correct_outlen = 7
if "last_hidden_state" in outputs:
correct_outlen += 1
if "trend" in outputs:
correct_outlen += 1
if "past_key_values" in outputs:
correct_outlen += 1 # past_key_values have been returned
if "loss" in outputs:
correct_outlen += 1
if "params" in outputs:
correct_outlen += 1
self.assertEqual(out_len, correct_outlen)
# decoder attentions
decoder_attentions = outputs.decoder_attentions
self.assertIsInstance(decoder_attentions, (list, tuple))
self.assertEqual(len(decoder_attentions), self.model_tester.num_hidden_layers)
self.assertListEqual(
list(decoder_attentions[0].shape[-3:]),
[self.model_tester.num_attention_heads, decoder_seq_length, dim],
)
# cross attentions
cross_attentions = outputs.cross_attentions
self.assertIsInstance(cross_attentions, (list, tuple))
self.assertEqual(len(cross_attentions), self.model_tester.num_hidden_layers)
self.assertListEqual(
list(cross_attentions[0].shape[-3:]),
[self.model_tester.num_attention_heads, decoder_seq_length, dim],
)
# Check attention is always last and order is fine
inputs_dict["output_attentions"] = True
inputs_dict["output_hidden_states"] = True
model = model_class(config)
model.to(torch_device)
model.eval()
with torch.no_grad():
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
self.assertEqual(out_len + 2, len(outputs))
self_attentions = outputs.encoder_attentions if config.is_encoder_decoder else outputs.attentions
self.assertEqual(len(self_attentions), self.model_tester.num_hidden_layers)
self.assertListEqual(
list(self_attentions[0].shape[-3:]),
[self.model_tester.num_attention_heads, encoder_seq_length, dim],
)
def prepare_batch(filename="train-batch.pt"):
file = hf_hub_download(repo_id="hf-internal-testing/tourism-monthly-batch", filename=filename, repo_type="dataset")
batch = torch.load(file, map_location=torch_device)
return batch
@require_torch
@slow
class AutoformerModelIntegrationTests(unittest.TestCase):
def test_inference_no_head(self):
model = AutoformerModel.from_pretrained("huggingface/autoformer-tourism-monthly").to(torch_device)
batch = prepare_batch()
with torch.no_grad():
output = model(
past_values=batch["past_values"],
past_time_features=batch["past_time_features"],
past_observed_mask=batch["past_observed_mask"],
static_categorical_features=batch["static_categorical_features"],
future_values=batch["future_values"],
future_time_features=batch["future_time_features"],
)[0]
expected_shape = torch.Size(
(64, model.config.prediction_length + model.config.label_length, model.config.feature_size)
)
self.assertEqual(output.shape, expected_shape)
expected_slice = torch.tensor(
[[0.3593, -1.3398, 0.6330], [0.2279, 1.5396, -0.1792], [0.0450, 1.3225, -0.2335]], device=torch_device
)
self.assertTrue(torch.allclose(output[0, :3, :3], expected_slice, atol=TOLERANCE))
def test_inference_head(self):
model = AutoformerForPrediction.from_pretrained("huggingface/autoformer-tourism-monthly").to(torch_device)
batch = prepare_batch("val-batch.pt")
with torch.no_grad():
output = model(
past_values=batch["past_values"],
past_time_features=batch["past_time_features"],
past_observed_mask=batch["past_observed_mask"],
static_categorical_features=batch["static_categorical_features"],
).encoder_last_hidden_state
expected_shape = torch.Size((64, model.config.context_length, model.config.d_model))
self.assertEqual(output.shape, expected_shape)
expected_slice = torch.tensor(
[[-0.0734, -0.9036, 0.8358], [4.7186, 2.4113, 1.9581], [1.7953, 2.3558, 1.2970]], device=torch_device
)
self.assertTrue(torch.allclose(output[0, :3, :3], expected_slice, atol=TOLERANCE))
def test_seq_to_seq_generation(self):
model = AutoformerForPrediction.from_pretrained("huggingface/autoformer-tourism-monthly").to(torch_device)
batch = prepare_batch("val-batch.pt")
with torch.no_grad():
outputs = model.generate(
static_categorical_features=batch["static_categorical_features"],
past_time_features=batch["past_time_features"],
past_values=batch["past_values"],
future_time_features=batch["future_time_features"],
past_observed_mask=batch["past_observed_mask"],
)
expected_shape = torch.Size((64, model.config.num_parallel_samples, model.config.prediction_length))
self.assertEqual(outputs.sequences.shape, expected_shape)
expected_slice = torch.tensor([3130.6763, 4056.5293, 7053.0786], device=torch_device)
mean_prediction = outputs.sequences.mean(dim=1)
self.assertTrue(torch.allclose(mean_prediction[0, -3:], expected_slice, rtol=1e-1))

View File

@ -438,7 +438,7 @@ class InformerModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestCase
def prepare_batch(filename="train-batch.pt"):
file = hf_hub_download(repo_id="kashif/tourism-monthly-batch", filename=filename, repo_type="dataset")
file = hf_hub_download(repo_id="hf-internal-testing/tourism-monthly-batch", filename=filename, repo_type="dataset")
batch = torch.load(file, map_location=torch_device)
return batch

View File

@ -459,7 +459,7 @@ class TimeSeriesTransformerModelTest(ModelTesterMixin, PipelineTesterMixin, unit
def prepare_batch(filename="train-batch.pt"):
file = hf_hub_download(repo_id="kashif/tourism-monthly-batch", filename=filename, repo_type="dataset")
file = hf_hub_download(repo_id="hf-internal-testing/tourism-monthly-batch", filename=filename, repo_type="dataset")
batch = torch.load(file, map_location=torch_device)
return batch

View File

@ -73,6 +73,8 @@ SPECIAL_CASES_TO_ALLOW = {
"InformerConfig": ["num_static_real_features", "num_time_features"],
# used internally to calculate the feature size
"TimeSeriesTransformerConfig": ["num_static_real_features", "num_time_features"],
# used internally to calculate the feature size
"AutoformerConfig": ["num_static_real_features", "num_time_features"],
}
# TODO (ydshieh): Check the failing cases, try to fix them or move some cases to the above block once we are sure

View File

@ -73,6 +73,8 @@ IGNORE_NON_TESTED = PRIVATE_MODELS.copy() + [
"TimeSeriesTransformerDecoder", # Building part of bigger (tested) model.
"InformerEncoder", # Building part of bigger (tested) model.
"InformerDecoder", # Building part of bigger (tested) model.
"AutoformerEncoder", # Building part of bigger (tested) model.
"AutoformerDecoder", # Building part of bigger (tested) model.
"JukeboxVQVAE", # Building part of bigger (tested) model.
"JukeboxPrior", # Building part of bigger (tested) model.
"DeformableDetrEncoder", # Building part of bigger (tested) model.
@ -223,6 +225,7 @@ IGNORE_NON_AUTO_CONFIGURED = PRIVATE_MODELS.copy() + [
"GPTSanJapaneseModel",
"TimeSeriesTransformerForPrediction",
"InformerForPrediction",
"AutoformerForPrediction",
"JukeboxVQVAE",
"JukeboxPrior",
"PegasusXEncoder",